diff --git a/docs/aggregate-dp-functions.md b/docs/aggregate-dp-functions.md index a8671d6ce..30513d467 100644 --- a/docs/aggregate-dp-functions.md +++ b/docs/aggregate-dp-functions.md @@ -29,151 +29,165 @@ determine the optimal privacy parameters for your dataset and organization. - ANON_AVG + AVG + DIFFERENTIAL_PRIVACY-supported AVG.

Gets the differentially-private average of non-NULL, - non-NaN values in a query with an - ANONYMIZATION clause. + non-NaN values in a query with a + DIFFERENTIAL_PRIVACY clause. - ANON_COUNT + COUNT - Signature 1: Gets the differentially-private count of rows in a query - with an ANONYMIZATION clause. + DIFFERENTIAL_PRIVACY-supported COUNT.

+ Signature 1: Gets the differentially-private count of rows in a query with a + DIFFERENTIAL_PRIVACY clause.

Signature 2: Gets the differentially-private count of rows with a - non-NULL expression in a query with an - ANONYMIZATION clause. + non-NULL expression in a query with a + DIFFERENTIAL_PRIVACY clause. - ANON_PERCENTILE_CONT + PERCENTILE_CONT + DIFFERENTIAL_PRIVACY-supported PERCENTILE_CONT.

Computes a differentially-private percentile across privacy unit columns - in a query with an ANONYMIZATION clause. + in a query with a DIFFERENTIAL_PRIVACY clause. - ANON_QUANTILES + SUM - Produces an array of differentially-private quantile boundaries - in a query with an ANONYMIZATION clause. + DIFFERENTIAL_PRIVACY-supported SUM.

+ Gets the differentially-private sum of non-NULL, + non-NaN values in a query with a + DIFFERENTIAL_PRIVACY clause. - ANON_STDDEV_POP + VAR_POP - Computes a differentially-private population (biased) standard deviation of - values in a query with an ANONYMIZATION clause. + DIFFERENTIAL_PRIVACY-supported VAR_POP.

+ Computes the differentially-private population (biased) variance of values + in a query with a DIFFERENTIAL_PRIVACY clause. - ANON_SUM + ANON_AVG - Gets the differentially-private sum of non-NULL, + Deprecated. + Gets the differentially-private average of non-NULL, non-NaN values in a query with an ANONYMIZATION clause. - ANON_VAR_POP + ANON_COUNT - Computes a differentially-private population (biased) variance of values - in a query with an ANONYMIZATION clause. + Deprecated. +
+
+ Signature 1: Gets the differentially-private count of rows in a query + with an ANONYMIZATION clause. +
+
+ Signature 2: Gets the differentially-private count of rows with a + non-NULL expression in a query with an + ANONYMIZATION clause. - AVG (differential privacy) + ANON_PERCENTILE_CONT - Gets the differentially-private average of non-NULL, - non-NaN values in a query with a - DIFFERENTIAL_PRIVACY clause. + Deprecated. + Computes a differentially-private percentile across privacy unit columns + in a query with an ANONYMIZATION clause. - COUNT (differential privacy) + ANON_QUANTILES - Signature 1: Gets the differentially-private count of rows in a query with a - DIFFERENTIAL_PRIVACY clause. -
-
- Signature 2: Gets the differentially-private count of rows with a - non-NULL expression in a query with a - DIFFERENTIAL_PRIVACY clause. + Deprecated. + Produces an array of differentially-private quantile boundaries + in a query with an ANONYMIZATION clause. - PERCENTILE_CONT (differential privacy) + ANON_STDDEV_POP - Computes a differentially-private percentile across privacy unit columns - in a query with a DIFFERENTIAL_PRIVACY clause. + Deprecated. + Computes a differentially-private population (biased) standard deviation of + values in a query with an ANONYMIZATION clause. - SUM (differential privacy) + ANON_SUM + Deprecated. Gets the differentially-private sum of non-NULL, - non-NaN values in a query with a - DIFFERENTIAL_PRIVACY clause. + non-NaN values in a query with an + ANONYMIZATION clause. - VAR_POP (differential privacy) + ANON_VAR_POP - Computes the differentially-private population (biased) variance of values - in a query with a DIFFERENTIAL_PRIVACY clause. + Deprecated. + Computes a differentially-private population (biased) variance of values + in a query with an ANONYMIZATION clause. -### `ANON_AVG` - +### `AVG` (`DIFFERENTIAL_PRIVACY`) + ```sql -WITH ANONYMIZATION ... - ANON_AVG(expression [clamped_between_clause]) - -clamped_between_clause: - CLAMPED BETWEEN lower_bound AND upper_bound +WITH DIFFERENTIAL_PRIVACY ... + AVG( + expression, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) ``` **Description** @@ -182,13 +196,15 @@ Returns the average of non-`NULL`, non-`NaN` values in the expression. This function first computes the average per privacy unit column, and then computes the final result by averaging these averages. -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support the following arguments: + `expression`: The input expression. This can be any numeric input type, such as `INT64`. -+ `clamped_between_clause`: Perform [clamping][dp-clamp-between] per - privacy unit column averages. ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. **Return type** @@ -196,6 +212,51 @@ can support these arguments: **Examples** +The following differentially private query gets the average number of each item +requested per professor. Smaller aggregations might not be included. This query +references a table called [`professors`][dp-example-tables]. + +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity +FROM professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | pencil | 38.5038356810269 | + | pen | 13.4725028762032 | + *----------+------------------*/ +``` + +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + AVG(quantity) average_quantity +FROM professors +GROUP BY item; + +-- These results will not change when you run the query. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 40 | + | pen | 18.5 | + *----------+------------------*/ +``` + The following differentially private query gets the average number of each item requested per professor. Smaller aggregations might not be included. This query references a view called [`view_on_professors`][dp-example-views]. @@ -203,10 +264,10 @@ references a view called [`view_on_professors`][dp-example-views]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity + AVG(quantity, contribution_bounds_per_group=>(0, 100)) average_quantity FROM {{USERNAME}}.view_on_professors GROUP BY item; @@ -224,10 +285,10 @@ GROUP BY item; -- Without noise, using the epsilon parameter. -- (this un-noised version is for demonstration only) SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) item, - ANON_AVG(quantity) average_quantity + AVG(quantity) average_quantity FROM {{USERNAME}}.view_on_professors GROUP BY item; @@ -241,38 +302,51 @@ GROUP BY item; *----------+------------------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. +Note: For more information about when and when not to use +noise, see [Remove noise][dp-noise]. -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables [dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -[dp-clamp-between]: #dp_clamp_between +[dp-clamped-named]: #dp_clamped_named -### `ANON_COUNT` - +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause -+ [Signature 1](#anon_count_signature1) -+ [Signature 2](#anon_count_signature2) +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views + +### `COUNT` (`DIFFERENTIAL_PRIVACY`) + + ++ [Signature 1](#dp_count_signature1): Returns the number of rows in a + differentially private `FROM` clause. ++ [Signature 2](#dp_count_signature2): Returns the number of non-`NULL` + values in an expression. #### Signature 1 - + ```sql -WITH ANONYMIZATION ... - ANON_COUNT(*) +WITH DIFFERENTIAL_PRIVACY ... + COUNT( + *, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) ``` **Description** Returns the number of rows in the [differentially private][dp-from-clause] `FROM` clause. The final result -is an aggregation across privacy unit columns. -[Input values are clamped implicitly][dp-clamp-implicit]. Clamping is -performed per privacy unit column. +is an aggregation across a privacy unit column. -This function must be used with the `ANONYMIZATION` clause. +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support the following argument: + ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. **Return type** @@ -281,17 +355,17 @@ This function must be used with the `ANONYMIZATION` clause. **Examples** The following differentially private query counts the number of requests for -each item. This query references a view called -[`view_on_professors`][dp-example-views]. +each item. This query references a table called +[`professors`][dp-example-tables]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) item, - ANON_COUNT(*) times_requested -FROM {{USERNAME}}.view_on_professors + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested +FROM professors GROUP BY item; -- These results will change each time you run the query. @@ -308,11 +382,11 @@ GROUP BY item; -- Without noise, using the epsilon parameter. -- (this un-noised version is for demonstration only) SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) item, - ANON_COUNT(*) times_requested -FROM {{USERNAME}}.view_on_professors + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested +FROM professors GROUP BY item; -- These results will not change when you run the query. @@ -325,47 +399,17 @@ GROUP BY item; *----------+-----------------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. - -#### Signature 2 - - -```sql -WITH ANONYMIZATION ... - ANON_COUNT(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) -``` - -**Description** - -Returns the number of non-`NULL` expression values. The final result is an -aggregation across privacy unit columns. - -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: - -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. - -**Return type** - -`INT64` - -**Examples** - -The following differentially private query counts the number of requests made -for each type of item. This query references a view called +The following differentially private query counts the number of requests for +each item. This query references a view called [`view_on_professors`][dp-example-views]. ```sql --- With noise +-- With noise, using the epsilon parameter. SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested FROM {{USERNAME}}.view_on_professors GROUP BY item; @@ -380,12 +424,13 @@ GROUP BY item; ``` ```sql ---Without noise (this un-noised version is for demonstration only) +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) item, - ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested FROM {{USERNAME}}.view_on_professors GROUP BY item; @@ -399,165 +444,175 @@ GROUP BY item; *----------+-----------------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. +Note: For more information about when and when not to use +noise, see [Remove noise][dp-noise]. -[dp-clamp-implicit]: #dp_implicit_clamping - -[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from_rules - -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise - -[dp-clamp-between]: #dp_clamp_between - -### `ANON_PERCENTILE_CONT` - +#### Signature 2 + ```sql -WITH ANONYMIZATION ... - ANON_PERCENTILE_CONT(expression, percentile [CLAMPED BETWEEN lower_bound AND upper_bound]) +WITH DIFFERENTIAL_PRIVACY ... + COUNT( + expression, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) ``` **Description** -Takes an expression and computes a percentile for it. The final result is an -aggregation across privacy unit columns. - -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: +Returns the number of non-`NULL` expression values. The final result is an +aggregation across a privacy unit column. -+ `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL`s are always ignored. -+ `percentile`: The percentile to compute. The percentile must be a literal in - the range [0, 1] -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support these arguments: -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. ++ `expression`: The input expression. This expression can be any + numeric input type, such as `INT64`. ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. **Return type** -`DOUBLE` +`INT64` **Examples** -The following differentially private query gets the percentile of items -requested. Smaller aggregations might not be included. This query references a -view called [`view_on_professors`][dp-example-views]. +The following differentially private query counts the number of requests made +for each type of item. This query references a table called +[`professors`][dp-example-tables]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) item, - ANON_PERCENTILE_CONT(quantity, 0.5 CLAMPED BETWEEN 0 AND 100) percentile_requested -FROM {{USERNAME}}.view_on_professors + COUNT(item, contribution_bounds_per_group => (0,100)) times_requested +FROM professors GROUP BY item; -- These results will change each time you run the query. -- Smaller aggregations might be removed. -/*----------+----------------------* - | item | percentile_requested | - +----------+----------------------+ - | pencil | 72.00011444091797 | - | scissors | 8.000175476074219 | - | pen | 23.001075744628906 | - *----------+----------------------*/ +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ ``` -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - -[dp-clamp-between]: #dp_clamp_between - -### `ANON_QUANTILES` - - ```sql -WITH ANONYMIZATION ... - ANON_QUANTILES(expression, number CLAMPED BETWEEN lower_bound AND upper_bound) -``` - -**Description** - -Returns an array of differentially private quantile boundaries for values in -`expression`. The first element in the return value is the -minimum quantile boundary and the last element is the maximum quantile boundary. -The returned results are aggregations across privacy unit columns. - -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: - -+ `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL`s are always ignored. -+ `number`: The number of quantiles to create. This must be an `INT64`. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. - -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. - -**Return type** - -`ARRAY`<`DOUBLE`> +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + COUNT(item, contribution_bounds_per_group => (0,100)) times_requested +FROM professors +GROUP BY item; -**Examples** +-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ +``` -The following differentially private query gets the five quantile boundaries of -the four quartiles of the number of items requested. Smaller aggregations -might not be included. This query references a view called +The following differentially private query counts the number of requests made +for each type of item. This query references a view called [`view_on_professors`][dp-example-views]. ```sql --- With noise, using the epsilon parameter. +-- With noise SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - ANON_QUANTILES(quantity, 4 CLAMPED BETWEEN 0 AND 100) quantiles_requested + COUNT(item, contribution_bounds_per_group=>(0, 100)) times_requested FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. -- Smaller aggregations might be removed. -/*----------+----------------------------------------------------------------------* - | item | quantiles_requested | - +----------+----------------------------------------------------------------------+ - | pen | [6.409375,20.647684733072918,41.40625,67.30848524305556,99.80078125] | - | pencil | [6.849259,44.010416666666664,62.64204,65.83806818181819,98.59375] | - *----------+----------------------------------------------------------------------*/ +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ ``` -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views +```sql +--Without noise (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + COUNT(item, contribution_bounds_per_group=>(0, 100)) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -[dp-clamp-between]: #dp_clamp_between +-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ +``` -### `ANON_STDDEV_POP` - +Note: For more information about when and when not to use +noise, see [Remove noise][dp-noise]. + +[dp-clamp-implicit]: #dp_implicit_clamping + +[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from + +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables + +[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise + +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause + +[dp-clamped-named]: #dp_clamped_named + +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views + +### `PERCENTILE_CONT` (`DIFFERENTIAL_PRIVACY`) + ```sql -WITH ANONYMIZATION ... - ANON_STDDEV_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) +WITH DIFFERENTIAL_PRIVACY ... + PERCENTILE_CONT( + expression, + percentile, + contribution_bounds_per_row => (lower_bound, upper_bound) + ) ``` **Description** -Takes an expression and computes the population (biased) standard deviation of -the values in the expression. The final result is an aggregation across -privacy unit columns between `0` and `+Inf`. +Takes an expression and computes a percentile for it. The final result is an +aggregation across privacy unit columns. -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support these arguments: + `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL`s are always ignored. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per individual entity values. + such as `INT64`. `NULL` values are always ignored. ++ `percentile`: The percentile to compute. The percentile must be a literal in + the range `[0, 1]`. ++ `contribution_bounds_per_row`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each row separately before performing intermediate + grouping on the privacy unit column. `NUMERIC` and `BIGNUMERIC` arguments are not allowed. If you need them, cast them as the @@ -569,42 +624,73 @@ can support these arguments: **Examples** -The following differentially private query gets the -population (biased) standard deviation of items requested. Smaller aggregations -might not be included. This query references a view called -[`view_on_professors`][dp-example-views]. +The following differentially private query gets the percentile of items +requested. Smaller aggregations might not be included. This query references a +view called [`professors`][dp-example-tables]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row => (0,100)) percentile_requested +FROM professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. + /*----------+----------------------* + | item | percentile_requested | + +----------+----------------------+ + | pencil | 72.00011444091797 | + | scissors | 8.000175476074219 | + | pen | 23.001075744628906 | + *----------+----------------------*/ +``` + +The following differentially private query gets the percentile of items +requested. Smaller aggregations might not be included. This query references a +view called [`view_on_professors`][dp-example-views]. + +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - ANON_STDDEV_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_standard_deviation + PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row=>(0, 100)) percentile_requested FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. -- Smaller aggregations might be removed. -/*----------+------------------------* - | item | pop_standard_deviation | - +----------+------------------------+ - | pencil | 25.350871122442054 | - | scissors | 50 | - | pen | 2 | - *----------+------------------------*/ +/*----------+----------------------* + | item | percentile_requested | + +----------+----------------------+ + | pencil | 72.00011444091797 | + | scissors | 8.000175476074219 | + | pen | 23.001075744628906 | + *----------+----------------------*/ ``` -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables -[dp-clamp-between]: #dp_clamp_between +[dp-clamped-named]: #dp_clamped_named -### `ANON_SUM` - +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause + +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views + +### `SUM` (`DIFFERENTIAL_PRIVACY`) + ```sql -WITH ANONYMIZATION ... - ANON_SUM(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) +WITH DIFFERENTIAL_PRIVACY ... + SUM( + expression, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) ``` **Description** @@ -612,13 +698,15 @@ WITH ANONYMIZATION ... Returns the sum of non-`NULL`, non-`NaN` values in the expression. The final result is an aggregation across privacy unit columns. -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support these arguments: + `expression`: The input expression. This can be any numeric input type, - such as `INT64`. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. + such as `INT64`. `NULL` values are always ignored. ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. **Return type** @@ -630,6 +718,51 @@ One of the following [supertypes][dp-supertype]: **Examples** +The following differentially private query gets the sum of items requested. +Smaller aggregations might not be included. This query references a view called +[`professors`][dp-example-tables]. + +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + SUM(quantity, contribution_bounds_per_group => (0,100)) quantity +FROM professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------* + | item | quantity | + +----------+-----------+ + | pencil | 143 | + | pen | 59 | + *----------+-----------*/ +``` + +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + SUM(quantity) quantity +FROM professors +GROUP BY item; + +-- These results will not change when you run the query. +/*----------+----------* + | item | quantity | + +----------+----------+ + | scissors | 8 | + | pencil | 144 | + | pen | 58 | + *----------+----------*/ +``` + The following differentially private query gets the sum of items requested. Smaller aggregations might not be included. This query references a view called [`view_on_professors`][dp-example-views]. @@ -637,10 +770,10 @@ Smaller aggregations might not be included. This query references a view called ```sql -- With noise, using the epsilon parameter. SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - ANON_SUM(quantity CLAMPED BETWEEN 0 AND 100) quantity + SUM(quantity, contribution_bounds_per_group=>(0, 100)) quantity FROM {{USERNAME}}.view_on_professors GROUP BY item; @@ -658,10 +791,10 @@ GROUP BY item; -- Without noise, using the epsilon parameter. -- (this un-noised version is for demonstration only) SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) item, - ANON_SUM(quantity) quantity + SUM(quantity) quantity FROM {{USERNAME}}.view_on_professors GROUP BY item; @@ -675,23 +808,30 @@ GROUP BY item; *----------+----------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. +Note: For more information about when and when not to use +noise, see [Use differential privacy][dp-noise]. -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables [dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise [dp-supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes -[dp-clamp-between]: #dp_clamp_between +[dp-clamped-named]: #dp_clamped_named -### `ANON_VAR_POP` - +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause + +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views + +### `VAR_POP` (`DIFFERENTIAL_PRIVACY`) + ```sql -WITH ANONYMIZATION ... - ANON_VAR_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) +WITH DIFFERENTIAL_PRIVACY ... + VAR_POP( + expression, + [contribution_bounds_per_row => (lower_bound, upper_bound)] + ) ``` **Description** @@ -700,15 +840,17 @@ Takes an expression and computes the population (biased) variance of the values in the expression. The final result is an aggregation across privacy unit columns between `0` and `+Inf`. You can [clamp the input values][dp-clamp-explicit] explicitly, otherwise input values -are clamped implicitly. Clamping is performed per individual entity values. +are clamped implicitly. Clamping is performed per individual user values. -This function must be used with the `ANONYMIZATION` clause and +This function must be used with the `DIFFERENTIAL_PRIVACY` clause and can support these arguments: + `expression`: The input expression. This can be any numeric input type, such as `INT64`. `NULL`s are always ignored. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per individual entity values. ++ `contribution_bounds_per_row`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each row separately before performing intermediate + grouping on individual user values. `NUMERIC` and `BIGNUMERIC` arguments are not allowed. If you need them, cast them as the @@ -720,18 +862,44 @@ can support these arguments: **Examples** +The following differentially private query gets the +population (biased) variance of items requested. Smaller aggregations may not +be included. This query references a view called +[`professors`][dp-example-tables]. + +```sql +-- With noise +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + VAR_POP(quantity, contribution_bounds_per_row => (0,100)) pop_variance +FROM professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations may be removed. +/*----------+-----------------* + | item | pop_variance | + +----------+-----------------+ + | pencil | 642 | + | pen | 2.6666666666665 | + | scissors | 2500 | + *----------+-----------------*/ +``` + The following differentially private query gets the population (biased) variance of items requested. Smaller aggregations might not be included. This query references a view called [`view_on_professors`][dp-example-views]. ```sql --- With noise, using the epsilon parameter. +-- With noise SELECT - WITH ANONYMIZATION + WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - ANON_VAR_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_variance + VAR_POP(quantity, contribution_bounds_per_row=>(0, 100)) pop_variance FROM {{USERNAME}}.view_on_professors GROUP BY item; @@ -748,16 +916,24 @@ GROUP BY item; [dp-clamp-explicit]: #dp_explicit_clamping +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables + +[dp-clamped-named]: #dp_clamped_named + [dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -[dp-clamp-between]: #dp_clamp_between +### `ANON_AVG` (DEPRECATED) + -### `AVG` (differential privacy) - +Warning: This function has been deprecated. Use +`AVG` (differential privacy) instead. ```sql -WITH DIFFERENTIAL_PRIVACY ... - AVG(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)]) +WITH ANONYMIZATION ... + ANON_AVG(expression [clamped_between_clause]) + +clamped_between_clause: + CLAMPED BETWEEN lower_bound AND upper_bound ``` **Description** @@ -766,15 +942,13 @@ Returns the average of non-`NULL`, non-`NaN` values in the expression. This function first computes the average per privacy unit column, and then computes the final result by averaging these averages. -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support the following arguments: +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: + `expression`: The input expression. This can be any numeric input type, such as `INT64`. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. ++ `clamped_between_clause`: Perform [clamping][dp-clamp-between] per + privacy unit column averages. **Return type** @@ -784,16 +958,16 @@ and can support the following arguments: The following differentially private query gets the average number of each item requested per professor. Smaller aggregations might not be included. This query -references a table called [`professors`][dp-example-tables]. +references a view called [`view_on_professors`][dp-example-views]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity -FROM professors + ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. @@ -810,11 +984,11 @@ GROUP BY item; -- Without noise, using the epsilon parameter. -- (this un-noised version is for demonstration only) SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) item, - AVG(quantity) average_quantity -FROM professors + ANON_AVG(quantity) average_quantity +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will not change when you run the query. @@ -827,46 +1001,41 @@ GROUP BY item; *----------+------------------*/ ``` -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. +Note: You can learn more about when and when not to use +noise [here][dp-noise]. -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views [dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -[dp-clamped-named]: #dp_clamped_named +[dp-clamp-between]: #dp_clamp_between -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +### `ANON_COUNT` (DEPRECATED) + -### `COUNT` (differential privacy) - +Warning: This function has been deprecated. Use +`COUNT` (differential privacy) instead. -+ [Signature 1](#dp_count_signature1): Returns the number of rows in a - differentially private `FROM` clause. -+ [Signature 2](#dp_count_signature2): Returns the number of non-`NULL` - values in an expression. ++ [Signature 1](#anon_count_signature1) ++ [Signature 2](#anon_count_signature2) #### Signature 1 - + ```sql -WITH DIFFERENTIAL_PRIVACY ... - COUNT(* [, contribution_bounds_per_group => (lower_bound, upper_bound)])) +WITH ANONYMIZATION ... + ANON_COUNT(*) ``` **Description** Returns the number of rows in the [differentially private][dp-from-clause] `FROM` clause. The final result -is an aggregation across a privacy unit column. - -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support the following argument: +is an aggregation across privacy unit columns. +[Input values are clamped implicitly][dp-clamp-implicit]. Clamping is +performed per privacy unit column. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. +This function must be used with the `ANONYMIZATION` clause. **Return type** @@ -875,17 +1044,17 @@ and can support the following argument: **Examples** The following differentially private query counts the number of requests for -each item. This query references a table called -[`professors`][dp-example-tables]. +each item. This query references a view called +[`view_on_professors`][dp-example-views]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - COUNT(*) times_requested -FROM professors + ANON_COUNT(*) times_requested +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. @@ -902,11 +1071,11 @@ GROUP BY item; -- Without noise, using the epsilon parameter. -- (this un-noised version is for demonstration only) SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) item, - COUNT(*) times_requested -FROM professors + ANON_COUNT(*) times_requested +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will not change when you run the query. @@ -919,31 +1088,29 @@ GROUP BY item; *----------+-----------------*/ ``` -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. +Note: You can learn more about when and when not to use +noise [here][dp-noise]. #### Signature 2 - + ```sql -WITH DIFFERENTIAL_PRIVACY ... - COUNT(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)]) +WITH ANONYMIZATION ... + ANON_COUNT(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** Returns the number of non-`NULL` expression values. The final result is an -aggregation across a privacy unit column. +aggregation across privacy unit columns. -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support these arguments: +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: -+ `expression`: The input expression. This expression can be any - numeric input type, such as `INT64`. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. **Return type** @@ -952,17 +1119,17 @@ and can support these arguments: **Examples** The following differentially private query counts the number of requests made -for each type of item. This query references a table called -[`professors`][dp-example-tables]. +for each type of item. This query references a view called +[`view_on_professors`][dp-example-views]. ```sql --- With noise, using the epsilon parameter. +-- With noise SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - COUNT(item, contribution_bounds_per_group => (0,100)) times_requested -FROM professors + ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. @@ -976,14 +1143,13 @@ GROUP BY item; ``` ```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) +--Without noise (this un-noised version is for demonstration only) SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) item, - COUNT(item, contribution_bounds_per_group => (0,100)) times_requested -FROM professors + ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will not change when you run the query. @@ -996,27 +1162,28 @@ GROUP BY item; *----------+-----------------*/ ``` -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. +Note: You can learn more about when and when not to use +noise [here][dp-noise]. [dp-clamp-implicit]: #dp_implicit_clamping -[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from +[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from_rules -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views [dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +[dp-clamp-between]: #dp_clamp_between -[dp-clamped-named]: #dp_clamped_named +### `ANON_PERCENTILE_CONT` (DEPRECATED) + -### `PERCENTILE_CONT` (differential privacy) - +Warning: This function has been deprecated. Use +`PERCENTILE_CONT` (differential privacy) instead. ```sql -WITH DIFFERENTIAL_PRIVACY ... - PERCENTILE_CONT(expression, percentile, contribution_bounds_per_row => (lower_bound, upper_bound)) +WITH ANONYMIZATION ... + ANON_PERCENTILE_CONT(expression, percentile [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** @@ -1024,17 +1191,15 @@ WITH DIFFERENTIAL_PRIVACY ... Takes an expression and computes a percentile for it. The final result is an aggregation across privacy unit columns. -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support these arguments: +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: + `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL` values are always ignored. + such as `INT64`. `NULL`s are always ignored. + `percentile`: The percentile to compute. The percentile must be a literal in - the range `[0, 1]`. -+ `contribution_bounds_per_row`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each row separately before performing intermediate - grouping on the privacy unit column. + the range [0, 1] ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. `NUMERIC` and `BIGNUMERIC` arguments are not allowed. If you need them, cast them as the @@ -1048,41 +1213,173 @@ and can support these arguments: The following differentially private query gets the percentile of items requested. Smaller aggregations might not be included. This query references a -view called [`professors`][dp-example-tables]. +view called [`view_on_professors`][dp-example-views]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row => (0,100)) percentile_requested -FROM professors + ANON_PERCENTILE_CONT(quantity, 0.5 CLAMPED BETWEEN 0 AND 100) percentile_requested +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. -- Smaller aggregations might be removed. - /*----------+----------------------* - | item | percentile_requested | - +----------+----------------------+ - | pencil | 72.00011444091797 | - | scissors | 8.000175476074219 | - | pen | 23.001075744628906 | - *----------+----------------------*/ +/*----------+----------------------* + | item | percentile_requested | + +----------+----------------------+ + | pencil | 72.00011444091797 | + | scissors | 8.000175476074219 | + | pen | 23.001075744628906 | + *----------+----------------------*/ +``` + +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views + +[dp-clamp-between]: #dp_clamp_between + +### `ANON_QUANTILES` (DEPRECATED) + + +Warning: This function has been deprecated. Use +`QUANTILES` (differential privacy) instead. + +```sql +WITH ANONYMIZATION ... + ANON_QUANTILES(expression, number CLAMPED BETWEEN lower_bound AND upper_bound) +``` + +**Description** + +Returns an array of differentially private quantile boundaries for values in +`expression`. The first element in the return value is the +minimum quantile boundary and the last element is the maximum quantile boundary. +The returned results are aggregations across privacy unit columns. + +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: + ++ `expression`: The input expression. This can be most numeric input types, + such as `INT64`. `NULL`s are always ignored. ++ `number`: The number of quantiles to create. This must be an `INT64`. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. + +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. + +**Return type** + +`ARRAY`<`DOUBLE`> + +**Examples** + +The following differentially private query gets the five quantile boundaries of +the four quartiles of the number of items requested. Smaller aggregations +might not be included. This query references a view called +[`view_on_professors`][dp-example-views]. + +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_QUANTILES(quantity, 4 CLAMPED BETWEEN 0 AND 100) quantiles_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+----------------------------------------------------------------------* + | item | quantiles_requested | + +----------+----------------------------------------------------------------------+ + | pen | [6.409375,20.647684733072918,41.40625,67.30848524305556,99.80078125] | + | pencil | [6.849259,44.010416666666664,62.64204,65.83806818181819,98.59375] | + *----------+----------------------------------------------------------------------*/ +``` + +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views + +[dp-clamp-between]: #dp_clamp_between + +### `ANON_STDDEV_POP` (DEPRECATED) + + +Warning: This function has been deprecated. Use +`STDDEV_POP` (differential privacy) instead. + +```sql +WITH ANONYMIZATION ... + ANON_STDDEV_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) +``` + +**Description** + +Takes an expression and computes the population (biased) standard deviation of +the values in the expression. The final result is an aggregation across +privacy unit columns between `0` and `+Inf`. + +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: + ++ `expression`: The input expression. This can be most numeric input types, + such as `INT64`. `NULL`s are always ignored. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per individual entity values. + +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. + +**Return type** + +`DOUBLE` + +**Examples** + +The following differentially private query gets the +population (biased) standard deviation of items requested. Smaller aggregations +might not be included. This query references a view called +[`view_on_professors`][dp-example-views]. + +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_STDDEV_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_standard_deviation +FROM {{USERNAME}}.view_on_professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+------------------------* + | item | pop_standard_deviation | + +----------+------------------------+ + | pencil | 25.350871122442054 | + | scissors | 50 | + | pen | 2 | + *----------+------------------------*/ ``` -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -[dp-clamped-named]: #dp_clamped_named +[dp-clamp-between]: #dp_clamp_between -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +### `ANON_SUM` (DEPRECATED) + -### `SUM` (differential privacy) - +Warning: This function has been deprecated. Use +`SUM` (differential privacy) instead. ```sql -WITH DIFFERENTIAL_PRIVACY ... - SUM(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)]) +WITH ANONYMIZATION ... + ANON_SUM(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** @@ -1090,15 +1387,13 @@ WITH DIFFERENTIAL_PRIVACY ... Returns the sum of non-`NULL`, non-`NaN` values in the expression. The final result is an aggregation across privacy unit columns. -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support these arguments: +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: + `expression`: The input expression. This can be any numeric input type, - such as `INT64`. `NULL` values are always ignored. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. + such as `INT64`. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. **Return type** @@ -1112,16 +1407,16 @@ One of the following [supertypes][dp-supertype]: The following differentially private query gets the sum of items requested. Smaller aggregations might not be included. This query references a view called -[`professors`][dp-example-tables]. +[`view_on_professors`][dp-example-views]. ```sql -- With noise, using the epsilon parameter. SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - SUM(quantity, contribution_bounds_per_group => (0,100)) quantity -FROM professors + ANON_SUM(quantity CLAMPED BETWEEN 0 AND 100) quantity +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. @@ -1138,11 +1433,11 @@ GROUP BY item; -- Without noise, using the epsilon parameter. -- (this un-noised version is for demonstration only) SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) item, - SUM(quantity) quantity -FROM professors + ANON_SUM(quantity) quantity +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will not change when you run the query. @@ -1155,25 +1450,26 @@ GROUP BY item; *----------+----------*/ ``` -Note: For more information about when and when not to use -noise, see [Use differential privacy][dp-noise]. +Note: You can learn more about when and when not to use +noise [here][dp-noise]. -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views [dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise [dp-supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes -[dp-clamped-named]: #dp_clamped_named +[dp-clamp-between]: #dp_clamp_between -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +### `ANON_VAR_POP` (DEPRECATED) + -### `VAR_POP` (differential privacy) - +Warning: This function has been deprecated. Use +`VAR_POP` (differential privacy) instead. ```sql -WITH DIFFERENTIAL_PRIVACY ... - VAR_POP(expression[, contribution_bounds_per_row => (lower_bound, upper_bound)]) +WITH ANONYMIZATION ... + ANON_VAR_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** @@ -1182,17 +1478,15 @@ Takes an expression and computes the population (biased) variance of the values in the expression. The final result is an aggregation across privacy unit columns between `0` and `+Inf`. You can [clamp the input values][dp-clamp-explicit] explicitly, otherwise input values -are clamped implicitly. Clamping is performed per individual user values. +are clamped implicitly. Clamping is performed per individual entity values. -This function must be used with the `DIFFERENTIAL_PRIVACY` clause and +This function must be used with the `ANONYMIZATION` clause and can support these arguments: + `expression`: The input expression. This can be any numeric input type, such as `INT64`. `NULL`s are always ignored. -+ `contribution_bounds_per_row`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each row separately before performing intermediate - grouping on individual user values. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per individual entity values. `NUMERIC` and `BIGNUMERIC` arguments are not allowed. If you need them, cast them as the @@ -1205,22 +1499,22 @@ can support these arguments: **Examples** The following differentially private query gets the -population (biased) variance of items requested. Smaller aggregations may not +population (biased) variance of items requested. Smaller aggregations might not be included. This query references a view called -[`professors`][dp-example-tables]. +[`view_on_professors`][dp-example-views]. ```sql --- With noise +-- With noise, using the epsilon parameter. SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) item, - VAR_POP(quantity, contribution_bounds_per_row => (0,100)) pop_variance -FROM professors + ANON_VAR_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_variance +FROM {{USERNAME}}.view_on_professors GROUP BY item; -- These results will change each time you run the query. --- Smaller aggregations may be removed. +-- Smaller aggregations might be removed. /*----------+-----------------* | item | pop_variance | +----------+-----------------+ @@ -1232,9 +1526,9 @@ GROUP BY item; [dp-clamp-explicit]: #dp_explicit_clamping -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -[dp-clamped-named]: #dp_clamped_named +[dp-clamp-between]: #dp_clamp_between ### Clamp values in a differentially private aggregate function @@ -1245,23 +1539,39 @@ clamp explicitly or implicitly as follows: + [Clamp explicitly in the `DIFFERENTIAL_PRIVACY` clause][dp-clamped-named]. + [Clamp implicitly in the `DIFFERENTIAL_PRIVACY` clause][dp-clamped-named-imp]. -+ [Clamp explicitly in the `ANONYMIZATION` clause][dp-clamp-between]. -+ [Clamp implicitly in the `ANONYMIZATION` clause][dp-clamp-between-imp]. - -To learn more about explicit and implicit clamping, see the following: - -+ [About implicit clamping][dp-imp-clamp]. -+ [About explicit clamping][dp-exp-clamp]. -#### Implicitly clamp values in the `DIFFERENTIAL_PRIVACY` clause +#### Implicitly clamp values If you don't include the contribution bounds named argument with the -`DIFFERENTIAL_PRIVACY` clause, clamping is [implicit][dp-imp-clamp], which +`DIFFERENTIAL_PRIVACY` clause, clamping is implicit, which means bounds are derived from the data itself in a differentially private way. Implicit bounding works best when computed using large datasets. For more -information, see [Implicit bounding limitations for small datasets][implicit-limits]. +information, see +[Implicit bounding limitations for small datasets][implicit-limits]. + +**Details** + +In differentially private aggregate functions, explicit clamping is optional. +If you don't include this clause, clamping is implicit, +which means bounds are derived from the data itself in a differentially +private way. The process is somewhat random, so aggregations with identical +ranges can have different bounds. + +Implicit bounds are determined for each aggregation. So if some +aggregations have a wide range of values, and others have a narrow range of +values, implicit bounding can identify different bounds for different +aggregations as appropriate. Implicit bounds might be an advantage or a +disadvantage depending on your use case. Different bounds for different +aggregations can result in lower error. Different bounds also means that +different aggregations have different levels of uncertainty, which might not be +directly comparable. [Explicit bounds][dp-clamped-named], on the other hand, +apply uniformly to all aggregations and should be derived from public +information. + +When clamping is implicit, part of the total epsilon is spent picking bounds. +This leaves less epsilon for aggregations, so these aggregations are noisier. **Example** @@ -1293,7 +1603,35 @@ GROUP BY item; *----------+------------------*/ ``` -#### Explicitly clamp values in the `DIFFERENTIAL_PRIVACY` clause +The following anonymized query clamps each aggregate contribution for each +differential privacy ID and within a derived range from the data itself. +As long as all or most values fall within this range, your results +will be accurate. This query references a view called +[`view_on_professors`][dp-example-views]. + +```sql +--Without noise (this un-noised version is for demonstration only) +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS ( + epsilon = 1e20, + delta = .01, + max_groups_contributed = 1 + ) + item, + AVG(quantity) AS average_quantity +FROM view_on_professors +GROUP BY item; + +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 72 | + | pen | 18.5 | + *----------+------------------*/ +``` + +#### Explicitly clamp values ```sql @@ -1304,7 +1642,7 @@ contribution_bounds_per_group => (lower_bound,upper_bound) contribution_bounds_per_row => (lower_bound,upper_bound) ``` -Use the contribution bounds named argument to [explicitly clamp][dp-exp-clamp] +Use the contribution bounds named argument to explicitly clamp values per group or per row between a lower and upper bound in a `DIFFERENTIAL_PRIVACY` clause. @@ -1328,6 +1666,26 @@ Input values: `NUMERIC` and `BIGNUMERIC` arguments are not allowed. +**Details** + +In differentially private aggregate functions, clamping explicitly clamps the +total contribution from each privacy unit column to within a specified +range. + +Explicit bounds are uniformly applied to all aggregations. So even if some +aggregations have a wide range of values, and others have a narrow range of +values, the same bounds are applied to all of them. On the other hand, when +[implicit bounds][dp-clamped-named-imp] are inferred from the data, the bounds +applied to each aggregation can be different. + +Explicit bounds should be chosen to reflect public information. +For example, bounding ages between 0 and 100 reflects public information +because the age of most people generally falls within this range. + +Important: The results of the query reveal the explicit bounds. Do not use +explicit bounds based on the entity data; explicit bounds should be based on +public information. + **Examples** The following anonymized query clamps each aggregate contribution for each @@ -1385,73 +1743,6 @@ GROUP BY item; *----------+------------------*/ ``` -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. - -#### Implicitly clamp values in the `ANONYMIZATION` clause - - -If you don't include the `CLAMPED BETWEEN` clause with the -`ANONYMIZATION` clause, clamping is [implicit][dp-imp-clamp], which means bounds -are derived from the data itself in a differentially private way. - -Implicit bounding works best when computed using large datasets. For more -information, see [Implicit bounding limitations for small datasets][implicit-limits]. - -**Example** - -The following anonymized query clamps each aggregate contribution for each -differential privacy ID and within a derived range from the data itself. -As long as all or most values fall within this range, your results -will be accurate. This query references a view called -[`view_on_professors`][dp-example-views]. - -```sql ---Without noise (this un-noised version is for demonstration only) -SELECT WITH ANONYMIZATION - OPTIONS ( - epsilon = 1e20, - delta = .01, - max_groups_contributed = 1 - ) - item, - AVG(quantity) AS average_quantity -FROM view_on_professors -GROUP BY item; - -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 8 | - | pencil | 72 | - | pen | 18.5 | - *----------+------------------*/ -``` - -#### Explicitly clamp values in the `ANONYMIZATION` clause - - -```sql -CLAMPED BETWEEN lower_bound AND upper_bound -``` - -Use the `CLAMPED BETWEEN` clause to [explicitly clamp][dp-exp-clamp] values -between a lower and an upper bound in an `ANONYMIZATION` clause. - -Input values: - -+ `lower_bound`: Numeric literal that represents the smallest value to - include in an aggregation. -+ `upper_bound`: Numeric literal that represents the largest value to - include in an aggregation. - -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - -Note: This is a legacy feature. If possible, use the `contribution_bounds` -named argument instead. - -**Examples** - The following differentially private query clamps each aggregate contribution for each privacy unit column and within a specified range (`0` and `100`). As long as all or most values fall within this range, your results will be @@ -1460,14 +1751,14 @@ accurate. This query references a view called ```sql --Without noise (this un-noised version is for demonstration only) -SELECT WITH ANONYMIZATION +SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( epsilon = 1e20, delta = .01, max_groups_contributed = 1 ) item, - ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) AS average_quantity + AVG(quantity, contribution_bounds_per_group=>(0,100)) AS average_quantity FROM view_on_professors GROUP BY item; @@ -1487,14 +1778,14 @@ lower bound. ```sql {.bad} --Without noise (this un-noised version is for demonstration only) -SELECT WITH ANONYMIZATION +SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( epsilon = 1e20, delta = .01, max_groups_contributed = 1 ) item, - ANON_AVG(quantity CLAMPED BETWEEN 50 AND 100) AS average_quantity + AVG(quantity, contribution_bounds_per_group=>(50,100)) AS average_quantity FROM view_on_professors GROUP BY item; @@ -1510,70 +1801,18 @@ GROUP BY item; Note: For more information about when and when not to use noise, see [Remove noise][dp-noise]. -#### About explicit clamping - - -In differentially private aggregate functions, clamping explicitly clamps the -total contribution from each privacy unit column to within a specified -range. - -Explicit bounds are uniformly applied to all aggregations. So even if some -aggregations have a wide range of values, and others have a narrow range of -values, the same bounds are applied to all of them. On the other hand, when -[implicit bounds][dp-imp-clamp] are inferred from the data, the bounds applied -to each aggregation can be different. - -Explicit bounds should be chosen to reflect public information. -For example, bounding ages between 0 and 100 reflects public information -because the age of most people generally falls within this range. - -Important: The results of the query reveal the explicit bounds. Do not use -explicit bounds based on the entity data; explicit bounds should be based on -public information. - -#### About implicit clamping - - -In differentially private aggregate functions, explicit clamping is optional. -If you don't include this clause, clamping is implicit, -which means bounds are derived from the data itself in a differentially -private way. The process is somewhat random, so aggregations with identical -ranges can have different bounds. - -Implicit bounds are determined for each aggregation. So if some -aggregations have a wide range of values, and others have a narrow range of -values, implicit bounding can identify different bounds for different -aggregations as appropriate. Implicit bounds might be an advantage or a -disadvantage depending on your use case. Different bounds for different -aggregations can result in lower error. Different bounds also means that -different aggregations have different levels of uncertainty, which might not be -directly comparable. [Explicit bounds][dp-exp-clamp], on the other hand, -apply uniformly to all aggregations and should be derived from public -information. - -When clamping is implicit, part of the total epsilon is spent picking bounds. -This leaves less epsilon for aggregations, so these aggregations are noisier. - [dp-guide]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md [dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause [agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md -[dp-exp-clamp]: #dp_explicit_clamping - -[dp-imp-clamp]: #dp_implicit_clamping - [dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views [dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise [implicit-limits]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#implicit_limits -[dp-clamp-between]: #dp_clamp_between - -[dp-clamp-between-imp]: #dp_clamp_between_implicit - [dp-clamped-named]: #dp_clamped_named [dp-clamped-named-imp]: #dp_clamped_named_implicit diff --git a/docs/aggregate_functions.md b/docs/aggregate_functions.md index fbd06a1b7..7348a0fe5 100644 --- a/docs/aggregate_functions.md +++ b/docs/aggregate_functions.md @@ -1058,10 +1058,10 @@ window_specification: Returns the count of `TRUE` values for `expression`. Returns `0` if there are zero input rows, or if `expression` evaluates to `FALSE` or `NULL` for all rows. -Since `expression` must be a `BOOL`, the form -`COUNTIF(DISTINCT ...)` is generally not useful: there is only one distinct -value of `TRUE`. So `COUNTIF(DISTINCT ...)` will return 1 if `expression` -evaluates to `TRUE` for one or more input rows, or 0 otherwise. +Since `expression` must be a `BOOL`, the form `COUNTIF(DISTINCT ...)` is +generally not useful: there is only one distinct value of `TRUE`. So +`COUNTIF(DISTINCT ...)` will return 1 if `expression` evaluates to `TRUE` for +one or more input rows, or 0 otherwise. Usually when someone wants to combine `COUNTIF` and `DISTINCT`, they want to count the number of distinct values of an expression for which a certain condition is satisfied. One recipe to achieve this is the following: diff --git a/docs/array_functions.md b/docs/array_functions.md index df09430ae..0bfaee690 100644 --- a/docs/array_functions.md +++ b/docs/array_functions.md @@ -185,6 +185,15 @@ ZetaSQL supports the following array functions. + + ARRAY_ZIP + + + + Combines elements from two to four arrays into one array. + + + FLATTEN @@ -1305,6 +1314,174 @@ SELECT [lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas +### `ARRAY_ZIP` + +```sql +ARRAY_ZIP( + array_input [ AS alias ], + array_input [ AS alias ][, ... ] + [, transformation => lambda_expression ] + [, mode => { 'STRICT' | 'TRUNCATE' | 'PAD' } ] +) +``` + +**Description** + +Combines the elements from two to four arrays into one array. + +**Definitions** + ++ `array_input`: An input `ARRAY` value to be zipped with the other array + inputs. `ARRAY_ZIP` supports two to four input arrays. ++ `alias`: An alias optionally supplied for an `array_input`. In the results, + the alias is the name of the associated `STRUCT` field. ++ `transformation`: A optionally-named lambda argument. `lambda_expression` + specifies how elements are combined as they are zipped. This overrides + the default `STRUCT` creation behavior. ++ `mode`: A mandatory-named argument that determines how arrays of differing + lengths are zipped. If this optional argument is not supplied, the function + uses `STRICT` mode by default. This argument can be one of the following + values: + + + `STRICT` (default): If the length of any array is different from the + others, produce an error. + + + `TRUNCATE`: Truncate longer arrays to match the length of the shortest + array. + + + `PAD`: Pad shorter arrays with `NULL` values to match the length of the + longest array. + +**Details** + ++ If an `array_input` or `mode` is `NULL`, this function returns `NULL`, even when + `mode` is `STRICT`. ++ Argument aliases can't be used with the `transformation` argument. + +**Return type** + ++ If `transformation` is used and `lambda_expression` returns type `T`, the + return type is `ARRAY`. ++ Otherwise, the return type is `ARRAY`, with the `STRUCT` having a + number of fields equal to the number of input arrays. Each field's name is + either the user-provided `alias` for the corresponding `array_input`, or a + default alias assigned by the compiler, following the same logic used for + [naming columns in a SELECT list][implicit-aliases]. + +**Examples** + +The following query zips two arrays into one: + +```sql +SELECT ARRAY_ZIP([1, 2], ['a', 'b']) AS results + +/*----------------------* + | results | + +----------------------+ + | [(1, 'a'), (2, 'b')] | + *----------------------*/ +``` + +You can give an array an alias. For example, in the following +query, the returned array is of type `ARRAY>`, +where: + ++ `A1` is the alias provided for array `[1, 2]`. ++ `alias_inferred` is the inferred alias provided for array `['a', 'b']`. + +```sql +WITH T AS ( + SELECT ['a', 'b'] AS alias_inferred +) +SELECT ARRAY_ZIP([1, 2] AS A1, alias_inferred) AS results +FROM T + +/*----------------------------------------------------------+ + | results | + +----------------------------------------------------------+ + | [{1 A1, 'a' alias_inferred}, {2 A1, 'b' alias_inferred}] | + +----------------------------------------------------------*/ +``` + +To provide a custom transformation of the input arrays, use the `transformation` +argument: + +```sql +SELECT ARRAY_ZIP([1, 2], [3, 4], transformation => (e1, e2) -> (e1 + e2)) + +/*---------+ + | results | + +---------+ + | [4, 6] | + +---------*/ +``` + +The argument name `transformation` is not required. For example: + +```sql +SELECT ARRAY_ZIP([1, 2], [3, 4], (e1, e2) -> (e1 + e2)) + +/*---------+ + | results | + +---------+ + | [4, 6] | + +---------*/ +``` + +When `transformation` is provided, the input arrays are not allowed to have +aliases. For example, the following query is invalid: + +```sql {.bad} +-- Error: ARRAY_ZIP function with lambda argument does not allow providing +-- argument aliases +SELECT ARRAY_ZIP([1, 2], [3, 4] AS alias_not_allowed, (e1, e2) -> (e1 + e2)) +``` + +To produce an error when arrays with different lengths are zipped, don't +add `mode`, or if you do, set it as `STRICT`. For example: + +```sql {.bad} +-- Error: Unequal array length +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd']) AS results +``` + +```sql {.bad} +-- Error: Unequal array length +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'STRICT') AS results +``` + +Use the `PAD` mode to pad missing values with `NULL` when input arrays have +different lengths. For example: + +```sql +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], [], mode => 'PAD') AS results + +/*------------------------------------------------------------------------+ + | results | + +------------------------------------------------------------------------+ + | [{1, 'a', NULL}, {2, 'b', NULL}, {NULL, 'c', NULL}, {NULL, 'd', NULL}] | + +------------------------------------------------------------------------*/ +``` + +Use the `TRUNCATE` mode to truncate all arrays that are longer than the shortest +array. For example: + +```sql +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'TRUNCATE') AS results + +/*----------------------* + | results | + +----------------------+ + | [(1, 'a'), (2, 'b')] | + *----------------------*/ +``` + + + +[implicit-aliases]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#implicit_aliases + + + ### `FLATTEN` ```sql diff --git a/docs/arrays.md b/docs/arrays.md index d47b22c14..c16402f65 100644 --- a/docs/arrays.md +++ b/docs/arrays.md @@ -1233,75 +1233,7 @@ consisting of pairs of elements from input arrays, taken from their corresponding positions. This operation is sometimes called [zipping][convolution]. -You can zip arrays with `UNNEST` and `WITH OFFSET`. In this example, each -value pair is stored as a `STRUCT` in an array. - -```sql -WITH - Combinations AS ( - SELECT - ['a', 'b'] AS letters, - [1, 2, 3] AS numbers - ) -SELECT - ARRAY( - SELECT AS STRUCT - letters[SAFE_OFFSET(index)] AS letter, - numbers[SAFE_OFFSET(index)] AS number - FROM Combinations - CROSS JOIN - UNNEST( - GENERATE_ARRAY( - 0, - LEAST(ARRAY_LENGTH(letters), ARRAY_LENGTH(numbers)) - 1)) AS index - ORDER BY index - ); - -/*------------------------------* - | pairs | - +------------------------------+ - | [{ letter: "a", number: 1 }, | - | { letter: "b", number: 2 }] | - *------------------------------*/ -``` - -You can use input arrays of different lengths as long as the first array -is equal to or less than the length of the second array. The zipped array -will be the length of the shortest input array. - -To get a zipped array that includes all the elements even when the input arrays -are different lengths, change `LEAST` to `GREATEST`. Elements of either array -that have no associated element in the other array will be paired with `NULL`. - -```sql -WITH - Combinations AS ( - SELECT - ['a', 'b'] AS letters, - [1, 2, 3] AS numbers - ) -SELECT - ARRAY( - SELECT AS STRUCT - letters[SAFE_OFFSET(index)] AS letter, - numbers[SAFE_OFFSET(index)] AS number - FROM Combinations - CROSS JOIN - UNNEST( - GENERATE_ARRAY( - 0, - GREATEST(ARRAY_LENGTH(letters), ARRAY_LENGTH(numbers)) - 1)) AS index - ORDER BY index - ); - -/*-------------------------------* - | pairs | - +-------------------------------+ - | [{ letter: "a", number: 1 }, | - | { letter: "b", number: 2 }, | - | { letter: null, number: 3 }] | - *-------------------------------*/ -``` +You can zip arrays with the function [`ARRAY_ZIP`]([array-zip]). ## Building arrays of arrays diff --git a/docs/conditional_expressions.md b/docs/conditional_expressions.md index 91cd33855..dc5ac1a91 100644 --- a/docs/conditional_expressions.md +++ b/docs/conditional_expressions.md @@ -340,7 +340,7 @@ SELECT NULLIFZERO(0) AS result ### `ZEROIFNULL` ```sql -NULLIFZERO(expr) +ZEROIFNULL(expr) ``` **Description** diff --git a/docs/conversion_rules.md b/docs/conversion_rules.md index 8fb627a78..0abc9ebfd 100644 --- a/docs/conversion_rules.md +++ b/docs/conversion_rules.md @@ -145,7 +145,7 @@ literals and parameters can also be coerced. See STRING -BOOL
INT32
INT64
UINT32
UINT64
NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
STRING
BYTES
DATE
DATETIME
TIME
TIMESTAMP
ENUM
PROTO
+BOOL
INT32
INT64
UINT32
UINT64
NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
STRING
BYTES
DATE
DATETIME
TIME
TIMESTAMP
ENUM
PROTO
RANGE
  @@ -247,6 +247,15 @@ literals and parameters can also be coerced. See PROTO (with the same PROTO name) + + RANGE + + +RANGE
STRING
+ +   + + @@ -581,6 +590,11 @@ or more supertypes, including itself, which defines its set of supertypes. + + RANGE + RANGE with the same subtype. + + diff --git a/docs/data-types.md b/docs/data-types.md index 16942cf6d..f29377437 100644 --- a/docs/data-types.md +++ b/docs/data-types.md @@ -103,26 +103,17 @@ SELECT a FROM t ORDER BY a ### Groupable data types Groupable data types can generally appear in an expression following `GROUP BY`, -`DISTINCT`, and `PARTITION BY`. However, `PARTITION BY` expressions cannot -include [floating point types][floating-point-types]. All data types are -supported except for: +`DISTINCT`, and `PARTITION BY`. All data types are supported except for: + `PROTO` + `GEOGRAPHY` + `JSON` -An array type is groupable if its element type is groupable. Two arrays are in -the same group if and only if one of the following statements is true: +#### Grouping with floating point types -+ The two arrays are both `NULL`. -+ The two arrays have the same number of elements and all corresponding - elements are in the same groups. - -A struct type is groupable if its field types are groupable. Two structs -are in the same group if and only if one of the following statements is true: - -+ The two structs are both `NULL`. -+ All corresponding field values between the structs are in the same groups. +Groupable floating point types can appear in an expression following `GROUP BY` +and `DISTINCT`. `PARTITION BY` expressions cannot +include [floating point types][floating-point-types]. Special floating point values are grouped in the following way, including both grouping done by a `GROUP BY` clause and grouping done by the @@ -134,6 +125,29 @@ both grouping done by a `GROUP BY` clause and grouping done by the + 0 or -0 — All zero values are considered equal when grouping. + `+inf` +#### Grouping with arrays + +An `ARRAY` type is groupable if its element type is +groupable. + +Two arrays are in the same group if and only if one of the following statements +is true: + ++ The two arrays are both `NULL`. ++ The two arrays have the same number of elements and all corresponding + elements are in the same groups. + +#### Grouping with structs + +A `STRUCT` type is groupable if its field types are +groupable. + +Two structs are in the same group if and only if one of the following statements +is true: + ++ The two structs are both `NULL`. ++ All corresponding field values between the structs are in the same groups. + ### Comparable data types Values of the same comparable data type can be compared to each other. @@ -152,6 +166,13 @@ Notes: field order. Field names are ignored. Less than and greater than comparisons are not supported. + To compare geography values, use [ST_Equals][st-equals]. ++ When comparing ranges, the lower bounds are compared. If the lower bounds are + equal, the upper bounds are compared, instead. ++ When comparing ranges, `NULL` values are handled as follows: + + `NULL` lower bounds are sorted before non-`NULL` lower bounds. + + `NULL` upper bounds are sorted after non-`NULL` upper bounds. + + If two bounds that are being compared are `NULL`, the comparison is `TRUE`. + + An `UNBOUNDED` bound is treated as a `NULL` bound. + All types that support comparisons can be used in a `JOIN` condition. See [JOIN Types][join-types] for an explanation of join conditions. @@ -546,7 +567,8 @@ has: + An integer value. Integers are used for comparison and ordering enum values. There is no requirement that these integers start at zero or that they be contiguous. -+ A string value. Strings are case sensitive. ++ A string value for its name. Strings are case sensitive. In the case of +protocol buffer open enums, this name is optional. + Optional alias values. One or more additional string values that act as aliases. @@ -1000,22 +1022,27 @@ INTERVAL '8 -20 17' MONTH TO HOUR For additional examples, see [Interval literals][interval-literal-range]. -#### Interval-supported datetime parts +#### Interval-supported date and time parts -You can use the following datetime parts to construct -an interval: +You can use the following date parts to construct an interval: + `YEAR`: Number of years, `Y`. + `QUARTER`: Number of quarters; each quarter is converted to `3` months, `M`. + `MONTH`: Number of months, `M`. Each `12` months is converted to `1` year. + `WEEK`: Number of weeks; Each week is converted to `7` days, `D`. + `DAY`: Number of days, `D`. + +You can use the following time parts to construct an interval: + + `HOUR`: Number of hours, `H`. + `MINUTE`: Number of minutes, `M`. Each `60` minutes is converted to `1` hour. + `SECOND`: Number of seconds, `S`. Each `60` seconds is converted to `1` minute. Can include up to nine fractional digits (nanosecond precision). ++ `MILLISECOND`: Number of milliseconds. ++ `MICROSECOND`: Number of microseconds. ++ `NANOSECOND`: Number of nanoseconds. ## JSON type @@ -1717,6 +1744,95 @@ possible workarounds: + To get a simple approximation comparison, cast protocol buffer to string. This applies lexicographical ordering for numeric fields. +## Range type + + + + + + + + + + + + + + +
NameRange
RANGE + Contiguous range between two dates, datetimes, or timestamps. + The lower and upper bound for the range are optional. The lower bound + is inclusive and the upper bound is exclusive. +
+ +### Declare a range type + +A range type can be declared as follows: + + + + + + + + + + + + + + + + + + + + + + +
Type DeclarationMeaning
RANGE<DATE>Contiguous range between two dates.
RANGE<DATETIME>Contiguous range between two datetimes.
RANGE<TIMESTAMP>Contiguous range between two timestamps.
+ +### Construct a range + + +You can construct a range with the [`RANGE` constructor][range-with-constructor] +or a [range literal][range-with-literal]. + +#### Construct a range with a constructor + + +You can construct a range with the `RANGE` constructor. To learn more, +see [`RANGE`][range-constructor]. + +#### Construct a range with a literal + + +You can construct a range with a range literal. The canonical format for a +range literal has the following parts: + +```sql +RANGE '[lower_bound, upper_bound)' +``` + ++ `T`: The type of range. This can be `DATE`, `DATETIME`, or `TIMESTAMP`. ++ `lower_bound`: The range starts from this value. This can be a + [date][date-literals], [datetime][datetime-literals], or + [timestamp][timestamp-literals] literal. If this value is `UNBOUNDED` or + `NULL`, the range does not include a lower bound. ++ `upper_bound`: The range ends before this value. This can be a + [date][date-literals], [datetime][datetime-literals], or + [timestamp][timestamp-literals] literal. If this value is `UNBOUNDED` or + `NULL`, the range does not include an upper bound. + +`T`, `lower_bound`, and `upper_bound` must be of the same data type. + +To learn more about the literal representation of a range type, +see [Range literals][range-literals]. + +### Additional details + +The range type does not support arithmetic operators. + ## String type @@ -1864,7 +1980,7 @@ types and the output type of the addition operator.
This syntax can also be used with struct comparison for comparison expressions -using multi-part keys, e.g. in a `WHERE` clause: +using multi-part keys, e.g., in a `WHERE` clause: ``` WHERE (Key1,Key2) IN ( (12,34), (56,78) ) @@ -2013,7 +2129,7 @@ an absolute point in time, use a [timestamp][timestamp-type]. ``` -[H]H:[M]M:[S]S[.DDDDDD|.F] +[H]H:[M]M:[S]S[.F] ``` + [H]H: One or two digit hour (valid values from 00 to 23). @@ -2336,5 +2452,13 @@ when there is a leap second. [json-literals]: https://github.com/google/zetasql/blob/master/docs/lexical.md#json_literals +[range-literals]: https://github.com/google/zetasql/blob/master/docs/lexical.md#range_literals + +[range-with-constructor]: #range_with_constructor + +[range-constructor]: https://github.com/google/zetasql/blob/master/docs/range-functions.md#range + +[range-with-literal]: #range_with_literal + diff --git a/docs/date_functions.md b/docs/date_functions.md index 889970e07..35992a43e 100644 --- a/docs/date_functions.md +++ b/docs/date_functions.md @@ -351,7 +351,7 @@ the output is negative. `FRIDAY`, and `SATURDAY`. + `ISOWEEK`: Uses [ISO 8601 week][ISO-8601-week] boundaries. ISO weeks begin on Monday. -+ `MONTH`, except when the first two arguments are `TIMESTAMP` objects. ++ `MONTH` + `QUARTER` + `YEAR` + `ISOYEAR`: Uses the [ISO 8601][ISO-8601] diff --git a/docs/datetime_functions.md b/docs/datetime_functions.md index 2e01e45a3..33c2e0cf1 100644 --- a/docs/datetime_functions.md +++ b/docs/datetime_functions.md @@ -294,7 +294,7 @@ between the two `DATETIME` objects would overflow an `FRIDAY`, and `SATURDAY`. + `ISOWEEK`: Uses [ISO 8601 week][ISO-8601-week] boundaries. ISO weeks begin on Monday. -+ `MONTH`, except when the first two arguments are `TIMESTAMP` objects. ++ `MONTH` + `QUARTER` + `YEAR` + `ISOYEAR`: Uses the [ISO 8601][ISO-8601] @@ -823,6 +823,7 @@ SELECT LAST_DAY(DATETIME '2008-11-10 15:30:00', WEEK(MONDAY)) AS last_day ```sql PARSE_DATETIME(format_string, datetime_string) ``` + **Description** Converts a [string representation of a datetime][datetime-format] to a diff --git a/docs/debugging_functions.md b/docs/debugging_functions.md index b22f0c30d..251170e13 100644 --- a/docs/debugging_functions.md +++ b/docs/debugging_functions.md @@ -67,9 +67,16 @@ ERROR(error_message) **Description** -Returns an error. The `error_message` argument is a `STRING`. +Returns an error. -ZetaSQL treats `ERROR` in the same way as any expression that may +**Definitions** + ++ `error_message`: A `STRING` value that represents the error message to + produce. + +**Details** + +`ERROR` is treated like any other expression that may result in an error: there is no special guarantee of evaluation order. **Return Data Type** @@ -367,6 +374,7 @@ SELECT ISERROR((SELECT e FROM UNNEST([1, 2]) AS e)) AS is_error ```sql NULLIFERROR(try_expression) ``` + **Description** Evaluates `try_expression`. diff --git a/docs/differential-privacy.md b/docs/differential-privacy.md index c7a3a875e..530a66554 100644 --- a/docs/differential-privacy.md +++ b/docs/differential-privacy.md @@ -155,17 +155,17 @@ table called `students`. ```sql SELECT WITH DIFFERENTIAL_PRIVACY - OPTIONS (epsilon=1.09, delta=1e-5, privacy_unit_column=id) + OPTIONS (epsilon=10, delta=.01, privacy_unit_column=id) item, - COUNT(*) + COUNT(*, contribution_bounds_per_group=>(0, 100)) FROM students; ``` ```sql SELECT WITH DIFFERENTIAL_PRIVACY - OPTIONS (epsilon=1.09, delta=1e-5, privacy_unit_column=members.id) + OPTIONS (epsilon=10, delta=.01, privacy_unit_column=members.id) item, - COUNT(*) + COUNT(*, contribution_bounds_per_group=>(0, 100)) FROM (SELECT * FROM students) AS members; ``` @@ -178,9 +178,10 @@ table expressions][dp-from-rules]. ```sql {.bad} -- This produces an error -SELECT - WITH ANONYMIZATION OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2) - item, ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2) + item, + AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity FROM {{USERNAME}}.view_on_professors, {{USERNAME}}.view_on_students GROUP BY item; ``` @@ -362,7 +363,7 @@ individual's ID, you could expose sensitive data. [qs-remove-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -[qs-limit-groups]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#limit_groups +[qs-limit-groups]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#limit_groups_for_privacy_unit [data-types-groupable]: https://github.com/google/zetasql/blob/master/docs/data-types.md#groupable_data_types diff --git a/docs/functions-and-operators.md b/docs/functions-and-operators.md index 8a46418ad..3541c7c8d 100644 --- a/docs/functions-and-operators.md +++ b/docs/functions-and-operators.md @@ -33,7 +33,7 @@ Common conventions: ### Operator precedence The following table lists all ZetaSQL operators from highest to -lowest precedence, i.e. the order in which they will be evaluated within a +lowest precedence, i.e., the order in which they will be evaluated within a statement. @@ -1477,7 +1477,7 @@ This operator throws an error if Y is negative.X: Integer or BYTES
Y: INT64 - - +### `APPROX_COUNT_DISTINCT` - - - - +**Description** - - - - +This function is less accurate than `COUNT(DISTINCT expression)`, but performs +better on huge input. - - - - +Any data type **except**: - - - - +**Returned Data Types** - -
Shifts the first operand X to the right. This operator does not -do sign bit extension with a signed type (i.e. it fills vacant bits on the left +do sign bit extension with a signed type (i.e., it fills vacant bits on the left with 0). This operator returns 0 or a byte sequence of b'\x00' @@ -1576,187 +1576,190 @@ SELECT entry FROM entry_table WHERE entry IS NULL ### Comparison operators -Comparisons always return `BOOL`. Comparisons generally -require both operands to be of the same type. If operands are of different -types, and if ZetaSQL can convert the values of those types to a -common type without loss of precision, ZetaSQL will generally coerce -them to that common type for the comparison; ZetaSQL will generally -coerce literals to the type of non-literals, where -present. Comparable data types are defined in -[Data Types][operators-link-to-data-types]. -NOTE: ZetaSQL allows comparisons -between signed and unsigned integers. - -Structs support only these comparison operators: equal -(`=`), not equal (`!=` and `<>`), and `IN`. - -The comparison operators in this section cannot be used to compare -`JSON` ZetaSQL literals with other `JSON` ZetaSQL literals. -If you need to compare values inside of `JSON`, convert the values to -SQL values first. For more information, see [`JSON` functions][json-functions]. - -The following rules apply when comparing these data types: - -+ Floating point: - All comparisons with `NaN` return `FALSE`, - except for `!=` and `<>`, which return `TRUE`. -+ `BOOL`: `FALSE` is less than `TRUE`. -+ `STRING`: Strings are - compared codepoint-by-codepoint, which means that canonically equivalent - strings are only guaranteed to compare as equal if - they have been normalized first. -+ `NULL`: The convention holds here: any operation with a `NULL` input returns - `NULL`. +Compares operands and produces the results of the comparison as a `BOOL` +value. These comparison operators are available: - - - - - - - - - - - - + + + + + + + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - + for details. + + +
NameSyntaxDescription
Less ThanX < Y - Returns TRUE if X is less than Y. - +
NameSyntaxDescription
Less ThanX < Y + Returns TRUE if X is less than Y. + This operator supports specifying collation. -
Less Than or Equal ToX <= Y - Returns TRUE if X is less than or equal to - Y. - +
Less Than or Equal ToX <= Y + Returns TRUE if X is less than or equal to + Y. + This operator supports specifying collation. -
Greater ThanX > Y - Returns TRUE if X is greater than Y. - +
Greater ThanX > Y + Returns TRUE if X is greater than + Y. + This operator supports specifying collation. -
Greater Than or Equal ToX >= Y - Returns TRUE if X is greater than or equal to - Y. - +
Greater Than or Equal ToX >= Y + Returns TRUE if X is greater than or equal to + Y. + This operator supports specifying collation. -
EqualX = Y - Returns TRUE if X is equal to Y. - +
EqualX = Y + Returns TRUE if X is equal to Y. + This operator supports specifying collation. -
Not EqualX != Y
X <> Y
- Returns TRUE if X is not equal to Y. - +
Not EqualX != Y
X <> Y
+ Returns TRUE if X is not equal to + Y. + This operator supports specifying collation. -
BETWEENX [NOT] BETWEEN Y AND Z -

- Returns TRUE if X is [not] within the range - specified. The result of X BETWEEN Y AND Z is equivalent to - Y <= X AND X <= Z but X is evaluated only - once in the former. - +

BETWEENX [NOT] BETWEEN Y AND Z +

+ Returns TRUE if X is [not] within the range + specified. The result of X BETWEEN Y AND Z is equivalent + to Y <= X AND X <= Z but X is + evaluated only once in the former. + This operator supports specifying collation. -

-
LIKEX [NOT] LIKE Y - See the `LIKE` operator +

+
LIKEX [NOT] LIKE Y + See the `LIKE` operator - for details. -
INMultiple - See the `IN` operator + for details. +
INMultiple + See the `IN` operator - for details. -
-When testing values that have a struct data type for -equality, it's possible that one or more fields are `NULL`. In such cases: +The following rules apply to operands in a comparison operator: + ++ The operands must be [comparable][data-type-comparable]. ++ A comparison operator generally requires both operands to be of the + same type. ++ If the operands are of different types, and the values of those types can be + converted to a common type without loss of precision, + they are generally coerced to that common type for the comparison. ++ A literal operand is generally coerced to the same data type of a + non-literal operand that is part of the comparison. ++ Comparisons between operands that are signed and unsigned integers is + allowed. ++ Struct operands support only these comparison operators: equal + (`=`), not equal (`!=` and `<>`), and `IN`. -+ If all non-`NULL` field values are equal, the comparison returns `NULL`. -+ If any non-`NULL` field values are not equal, the comparison returns `FALSE`. - -The following table demonstrates how struct data -types are compared when they have fields that are `NULL` valued. +The following rules apply when comparing these data types: - - - - - - - - - - - - - - - - - - - - - - - - - -
Struct1Struct2Struct1 = Struct2
STRUCT(1, NULL)STRUCT(1, NULL)NULL
STRUCT(1, NULL)STRUCT(2, NULL)FALSE
STRUCT(1,2)STRUCT(1, NULL)NULL
++ Floating point: + All comparisons with `NaN` return `FALSE`, + except for `!=` and `<>`, which return `TRUE`. ++ `BOOL`: `FALSE` is less than `TRUE`. ++ `STRING`: Strings are compared codepoint-by-codepoint, which means that + canonically equivalent strings are only guaranteed to compare as equal if + they have been normalized first. ++ `JSON`: You can't compare JSON, but you can compare + the values inside of JSON if you convert the values to + SQL values first. For more information, see + [`JSON` functions][json-functions]. ++ `NULL`: Any operation with a `NULL` input returns `NULL`. ++ `STRUCT`: When testing a struct for equality, it's possible that one or more + fields are `NULL`. In such cases: + + + If all non-`NULL` field values are equal, the comparison returns `NULL`. + + If any non-`NULL` field values are not equal, the comparison returns + `FALSE`. + + The following table demonstrates how `STRUCT` data types are compared when + they have fields that are `NULL` valued. + + + + + + + + + + + + + + + + + + + + + + + + + + +
Struct1Struct2Struct1 = Struct2
STRUCT(1, NULL)STRUCT(1, NULL)NULL
STRUCT(1, NULL)STRUCT(2, NULL)FALSE
STRUCT(1,2)STRUCT(1, NULL)NULL
### `EXISTS` operator @@ -3015,6 +3018,8 @@ FROM UNNEST([ [operators-link-to-data-types]: https://github.com/google/zetasql/blob/master/docs/data-types.md +[data-type-comparable]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparable_data_types + [operators-link-to-struct-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#struct_type [operators-link-to-from-clause]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#from_clause @@ -3389,7 +3394,7 @@ SELECT NULLIFZERO(0) AS result ### `ZEROIFNULL` ```sql -NULLIFZERO(expr) +ZEROIFNULL(expr) ``` **Description** @@ -4482,10 +4487,10 @@ window_specification: Returns the count of `TRUE` values for `expression`. Returns `0` if there are zero input rows, or if `expression` evaluates to `FALSE` or `NULL` for all rows. -Since `expression` must be a `BOOL`, the form -`COUNTIF(DISTINCT ...)` is generally not useful: there is only one distinct -value of `TRUE`. So `COUNTIF(DISTINCT ...)` will return 1 if `expression` -evaluates to `TRUE` for one or more input rows, or 0 otherwise. +Since `expression` must be a `BOOL`, the form `COUNTIF(DISTINCT ...)` is +generally not useful: there is only one distinct value of `TRUE`. So +`COUNTIF(DISTINCT ...)` will return 1 if `expression` evaluates to `TRUE` for +one or more input rows, or 0 otherwise. Usually when someone wants to combine `COUNTIF` and `DISTINCT`, they want to count the number of distinct values of an expression for which a certain condition is satisfied. One recipe to achieve this is the following: @@ -5305,12 +5310,28 @@ FROM UNNEST([]) AS x; [agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md -## Statistical aggregate functions +## Approximate aggregate functions -ZetaSQL supports statistical aggregate functions. +ZetaSQL supports approximate aggregate functions. To learn about the syntax for aggregate function calls, see [Aggregate function calls][agg-function-calls]. +Approximate aggregate functions are scalable in terms of memory usage and time, +but produce approximate results instead of exact results. These functions +typically require less memory than [exact aggregation functions][aggregate-functions-reference] +like `COUNT(DISTINCT ...)`, but also introduce statistical uncertainty. +This makes approximate aggregation appropriate for large data streams for +which linear memory usage is impractical, as well as for data that is +already approximate. + +The approximate aggregate functions in this section work directly on the +input data, rather than an intermediate estimation of the data. These functions +_do not allow_ users to specify the precision for the estimation with +sketches. If you would like to specify precision with sketches, see: + ++ [HyperLogLog++ functions][hll-functions] to estimate cardinality. ++ [KLL functions][kll-functions] to estimate quantile values. + ### Function list @@ -5323,133 +5344,107 @@ To learn about the syntax for aggregate function calls, see - - - - - - +
CORR + APPROX_COUNT_DISTINCT - Computes the Pearson coefficient of correlation of a set of number pairs. + Gets the approximate result for COUNT(DISTINCT expression).
COVAR_POP + APPROX_QUANTILES - Computes the population covariance of a set of number pairs. + Gets the approximate quantile boundaries.
COVAR_SAMP + APPROX_TOP_COUNT - Computes the sample covariance of a set of number pairs. + Gets the approximate top elements and their approximate count.
STDDEV + APPROX_TOP_SUM - An alias of the STDDEV_SAMP function. + Gets the approximate top elements and sum, based on the approximate sum + of an assigned weight.
STDDEV_POP +
-
- Computes the population (biased) standard deviation of the values. -
STDDEV_SAMP +```sql +APPROX_COUNT_DISTINCT( + expression +) +``` - - Computes the sample (unbiased) standard deviation of the values. -
VAR_POP +Returns the approximate result for `COUNT(DISTINCT expression)`. The value +returned is a statistical estimate, not necessarily the actual value. - - Computes the population (biased) variance of the values. -
VAR_SAMP +**Supported Argument Types** - - Computes the sample (unbiased) variance of the values. -
VARIANCE ++ `ARRAY` ++ `STRUCT` ++ `PROTO` - - An alias of VAR_SAMP. -
+`INT64` -### `CORR` +**Examples** ```sql -CORR( - X1, X2 - [ HAVING { MAX | MIN } expression2 ] -) -[ OVER over_clause ] +SELECT APPROX_COUNT_DISTINCT(x) as approx_distinct +FROM UNNEST([0, 1, 1, 2, 3, 5]) as x; -over_clause: - { named_window | ( [ window_specification ] ) } +/*-----------------* + | approx_distinct | + +-----------------+ + | 5 | + *-----------------*/ +``` -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] +### `APPROX_QUANTILES` +```sql +APPROX_QUANTILES( + [ DISTINCT ] + expression, number + [ { IGNORE | RESPECT } NULLS ] + [ HAVING { MAX | MIN } expression2 ] +) ``` **Description** -Returns the [Pearson coefficient][stat-agg-link-to-pearson-coefficient] -of correlation of a set of number pairs. For each number pair, the first number -is the dependent variable and the second number is the independent variable. -The return result is between `-1` and `1`. A result of `0` indicates no -correlation. - -All numeric types are supported. If the -input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is -stable with the final output converted to a `DOUBLE`. -Otherwise the input is converted to a `DOUBLE` -before aggregation, resulting in a potentially unstable result. - -This function ignores any input pairs that contain one or more `NULL` values. If -there are fewer than two input pairs without `NULL` values, this function -returns `NULL`. - -`NaN` is produced if: +Returns the approximate boundaries for a group of `expression` values, where +`number` represents the number of quantiles to create. This function returns an +array of `number` + 1 elements, sorted in ascending order, where the +first element is the approximate minimum and the last element is the approximate +maximum. -+ Any input value is `NaN` -+ Any input value is positive infinity or negative infinity. -+ The variance of `X1` or `X2` is `0`. -+ The covariance of `X1` and `X2` is `0`. +Returns `NULL` if there are zero input rows or `expression` evaluates to +`NULL` for all rows. To learn more about the optional aggregate clauses that you can pass into this function, see @@ -5461,153 +5456,95 @@ into this function, see -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. - - +**Supported Argument Types** -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md ++ `expression`: Any supported data type **except**: - + + `ARRAY` + + `STRUCT` + + `PROTO` ++ `number`: `INT64` literal or query parameter. -**Return Data Type** +**Returned Data Types** -`DOUBLE` +`ARRAY` where `T` is the type specified by `expression`. **Examples** ```sql -SELECT CORR(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 5.0 AS x), - (3.0, 9.0), - (4.0, 7.0)]); - -/*--------------------* - | results | - +--------------------+ - | 0.6546536707079772 | - *--------------------*/ -``` - -```sql -SELECT CORR(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 5.0 AS x), - (3.0, 9.0), - (4.0, NULL)]); +SELECT APPROX_QUANTILES(x, 2) AS approx_quantiles +FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; -/*---------* - | results | - +---------+ - | 1 | - *---------*/ +/*------------------* + | approx_quantiles | + +------------------+ + | [1, 5, 10] | + *------------------*/ ``` ```sql -SELECT CORR(y, x) AS results -FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)]) +SELECT APPROX_QUANTILES(x, 100)[OFFSET(90)] AS percentile_90 +FROM UNNEST([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AS x; -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*---------------* + | percentile_90 | + +---------------+ + | 9 | + *---------------*/ ``` ```sql -SELECT CORR(y, x) AS results -FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)]) +SELECT APPROX_QUANTILES(DISTINCT x, 2) AS approx_quantiles +FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*------------------* + | approx_quantiles | + +------------------+ + | [1, 6, 10] | + *------------------*/ ``` ```sql -SELECT CORR(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 5.0 AS x), - (3.0, 9.0), - (4.0, 7.0), - (5.0, 1.0), - (7.0, CAST('Infinity' as DOUBLE))]) +SELECT APPROX_QUANTILES(x, 2 RESPECT NULLS) AS approx_quantiles +FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; -/*---------* - | results | - +---------+ - | NaN | - *---------*/ +/*------------------* + | approx_quantiles | + +------------------+ + | [NULL, 4, 10] | + *------------------*/ ``` ```sql -SELECT CORR(x, y) AS results -FROM - ( - SELECT 0 AS x, 0 AS y - UNION ALL - SELECT 0 AS x, 0 AS y - ) +SELECT APPROX_QUANTILES(DISTINCT x, 2 RESPECT NULLS) AS approx_quantiles +FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; -/*---------* - | results | - +---------+ - | NaN | - *---------*/ +/*------------------* + | approx_quantiles | + +------------------+ + | [NULL, 6, 10] | + *------------------*/ ``` -[stat-agg-link-to-pearson-coefficient]: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient - -### `COVAR_POP` +### `APPROX_TOP_COUNT` ```sql -COVAR_POP( - X1, X2 +APPROX_TOP_COUNT( + expression, number [ HAVING { MAX | MIN } expression2 ] ) -[ OVER over_clause ] - -over_clause: - { named_window | ( [ window_specification ] ) } - -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] - ``` **Description** -Returns the population [covariance][stat-agg-link-to-covariance] of -a set of number pairs. The first number is the dependent variable; the second -number is the independent variable. The return result is between `-Inf` and -`+Inf`. - -All numeric types are supported. If the -input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is -stable with the final output converted to a `DOUBLE`. -Otherwise the input is converted to a `DOUBLE` -before aggregation, resulting in a potentially unstable result. - -This function ignores any input pairs that contain one or more `NULL` values. If -there is no input pair without `NULL` values, this function returns `NULL`. -If there is exactly one input pair without `NULL` values, this function returns -`0`. +Returns the approximate top elements of `expression` as an array of `STRUCT`s. +The `number` parameter specifies the number of elements returned. -`NaN` is produced if: +Each `STRUCT` contains two fields. The first field (named `value`) contains an +input value. The second field (named `count`) contains an `INT64` specifying the +number of times the value was returned. -+ Any input value is `NaN` -+ Any input value is positive infinity or negative infinity. +Returns `NULL` if there are zero input rows. To learn more about the optional aggregate clauses that you can pass into this function, see @@ -5619,140 +5556,67 @@ into this function, see -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. - - - -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +**Supported Argument Types** - ++ `expression`: Any data type that the `GROUP BY` clause supports. ++ `number`: `INT64` literal or query parameter. -**Return Data Type** +**Returned Data Types** -`DOUBLE` +`ARRAY` **Examples** ```sql -SELECT COVAR_POP(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 1.0 AS x), - (2.0, 6.0), - (9.0, 3.0), - (2.0, 6.0), - (9.0, 3.0)]) +SELECT APPROX_TOP_COUNT(x, 2) as approx_top_count +FROM UNNEST(["apple", "apple", "pear", "pear", "pear", "banana"]) as x; -/*---------------------* - | results | - +---------------------+ - | -1.6800000000000002 | - *---------------------*/ +/*-------------------------* + | approx_top_count | + +-------------------------+ + | [{pear, 3}, {apple, 2}] | + *-------------------------*/ ``` -```sql -SELECT COVAR_POP(y, x) AS results -FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)]) +**NULL handling** -/*---------* - | results | - +---------+ - | 0 | - *---------*/ -``` +`APPROX_TOP_COUNT` does not ignore `NULL`s in the input. For example: ```sql -SELECT COVAR_POP(y, x) AS results -FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)]) +SELECT APPROX_TOP_COUNT(x, 2) as approx_top_count +FROM UNNEST([NULL, "pear", "pear", "pear", "apple", NULL]) as x; -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*------------------------* + | approx_top_count | + +------------------------+ + | [{pear, 3}, {NULL, 2}] | + *------------------------*/ ``` -```sql -SELECT COVAR_POP(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 1.0 AS x), - (2.0, 6.0), - (9.0, 3.0), - (2.0, 6.0), - (NULL, 3.0)]) - -/*---------* - | results | - +---------+ - | -1 | - *---------*/ -``` +### `APPROX_TOP_SUM` ```sql -SELECT COVAR_POP(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 1.0 AS x), - (2.0, 6.0), - (9.0, 3.0), - (2.0, 6.0), - (CAST('Infinity' as DOUBLE), 3.0)]) - -/*---------* - | results | - +---------+ - | NaN | - *---------*/ -``` - -[stat-agg-link-to-covariance]: https://en.wikipedia.org/wiki/Covariance - -### `COVAR_SAMP` - -```sql -COVAR_SAMP( - X1, X2 +APPROX_TOP_SUM( + expression, weight, number [ HAVING { MAX | MIN } expression2 ] ) -[ OVER over_clause ] - -over_clause: - { named_window | ( [ window_specification ] ) } - -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] - ``` **Description** -Returns the sample [covariance][stat-agg-link-to-covariance] of a -set of number pairs. The first number is the dependent variable; the second -number is the independent variable. The return result is between `-Inf` and -`+Inf`. - -All numeric types are supported. If the -input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is -stable with the final output converted to a `DOUBLE`. -Otherwise the input is converted to a `DOUBLE` -before aggregation, resulting in a potentially unstable result. +Returns the approximate top elements of `expression`, based on the sum of an +assigned `weight`. The `number` parameter specifies the number of elements +returned. -This function ignores any input pairs that contain one or more `NULL` values. If -there are fewer than two input pairs without `NULL` values, this function -returns `NULL`. +If the `weight` input is negative or `NaN`, this function returns an error. -`NaN` is produced if: +The elements are returned as an array of `STRUCT`s. +Each `STRUCT` contains two fields: `value` and `sum`. +The `value` field contains the value of the input expression. The `sum` field is +the same type as `weight`, and is the approximate sum of the input weight +associated with the `value` field. -+ Any input value is `NaN` -+ Any input value is positive infinity or negative infinity. +Returns `NULL` if there are zero input rows. To learn more about the optional aggregate clauses that you can pass into this function, see @@ -5764,963 +5628,851 @@ into this function, see -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. - - +**Supported Argument Types** -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md ++ `expression`: Any data type that the `GROUP BY` clause supports. ++ `weight`: One of the following: - + + `INT64` + + `UINT64` + + `NUMERIC` + + `BIGNUMERIC` + + `DOUBLE` ++ `number`: `INT64` literal or query parameter. -**Return Data Type** +**Returned Data Types** -`DOUBLE` +`ARRAY` **Examples** ```sql -SELECT COVAR_SAMP(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 1.0 AS x), - (2.0, 6.0), - (9.0, 3.0), - (2.0, 6.0), - (9.0, 3.0)]) +SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM +UNNEST([ + STRUCT("apple" AS x, 3 AS weight), + ("pear", 2), + ("apple", 0), + ("banana", 5), + ("pear", 4) +]); -/*---------* - | results | - +---------+ - | -2.1 | - *---------*/ +/*--------------------------* + | approx_top_sum | + +--------------------------+ + | [{pear, 6}, {banana, 5}] | + *--------------------------*/ ``` -```sql -SELECT COVAR_SAMP(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 1.0 AS x), - (2.0, 6.0), - (9.0, 3.0), - (2.0, 6.0), - (NULL, 3.0)]) +**NULL handling** -/*----------------------* - | results | - +----------------------+ - | --1.3333333333333333 | - *----------------------*/ -``` +`APPROX_TOP_SUM` does not ignore `NULL` values for the `expression` and `weight` +parameters. ```sql -SELECT COVAR_SAMP(y, x) AS results -FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)]) +SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM +UNNEST([STRUCT("apple" AS x, NULL AS weight), ("pear", 0), ("pear", NULL)]); -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*----------------------------* + | approx_top_sum | + +----------------------------+ + | [{pear, 0}, {apple, NULL}] | + *----------------------------*/ ``` ```sql -SELECT COVAR_SAMP(y, x) AS results -FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)]) +SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM +UNNEST([STRUCT("apple" AS x, 0 AS weight), (NULL, 2)]); -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*-------------------------* + | approx_top_sum | + +-------------------------+ + | [{NULL, 2}, {apple, 0}] | + *-------------------------*/ ``` ```sql -SELECT COVAR_SAMP(y, x) AS results -FROM - UNNEST( - [ - STRUCT(1.0 AS y, 1.0 AS x), - (2.0, 6.0), - (9.0, 3.0), - (2.0, 6.0), - (CAST('Infinity' as DOUBLE), 3.0)]) +SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM +UNNEST([STRUCT("apple" AS x, 0 AS weight), (NULL, NULL)]); -/*---------* - | results | - +---------+ - | NaN | - *---------*/ +/*----------------------------* + | approx_top_sum | + +----------------------------+ + | [{apple, 0}, {NULL, NULL}] | + *----------------------------*/ ``` -[stat-agg-link-to-covariance]: https://en.wikipedia.org/wiki/Covariance - -### `STDDEV` - -```sql -STDDEV( - [ DISTINCT ] - expression - [ HAVING { MAX | MIN } expression2 ] -) -[ OVER over_clause ] +[hll-functions]: #hyperloglog_functions -over_clause: - { named_window | ( [ window_specification ] ) } +[kll-functions]: #kll_quantile_functions -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] +[aggregate-functions-reference]: #aggregate_functions -``` +[agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md -**Description** +## Array functions -An alias of [STDDEV_SAMP][stat-agg-link-to-stddev-samp]. +ZetaSQL supports the following array functions. -[stat-agg-link-to-stddev-samp]: #stddev_samp +### Function list -### `STDDEV_POP` + + + + + + + + -```sql -STDDEV_POP( - [ DISTINCT ] - expression - [ HAVING { MAX | MIN } expression2 ] -) -[ OVER over_clause ] + + + + -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] + + + + -**Description** + + + + -All numeric types are supported. If the -input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is -stable with the final output converted to a `DOUBLE`. -Otherwise the input is converted to a `DOUBLE` -before aggregation, resulting in a potentially unstable result. + + + + -`NaN` is produced if: + + + + -To learn more about the optional aggregate clauses that you can pass -into this function, see -[Aggregate function calls][aggregate-function-calls]. + + + + -[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + + + + -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. + + + + -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + + -`STDDEV_POP` can be used with differential privacy. To learn more, see -[Differentially private aggregate functions][dp-functions]. + + + + -`DOUBLE` + + + + -```sql -SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, 18]) AS x + + + + -```sql -SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, NULL]) AS x + + + + -```sql -SELECT STDDEV_POP(x) AS results FROM UNNEST([10, NULL]) AS x + + + + -```sql -SELECT STDDEV_POP(x) AS results FROM UNNEST([NULL]) AS x + + + + -```sql -SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x + + + + -[dp-functions]: #aggregate-dp-functions + + + + -```sql -STDDEV_SAMP( - [ DISTINCT ] - expression - [ HAVING { MAX | MIN } expression2 ] -) -[ OVER over_clause ] + + + + -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] + + + + -**Description** + + + + -All numeric types are supported. If the -input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is -stable with the final output converted to a `DOUBLE`. -Otherwise the input is converted to a `DOUBLE` -before aggregation, resulting in a potentially unstable result. + + + + -`NaN` is produced if: + + + + -To learn more about the optional aggregate clauses that you can pass -into this function, see -[Aggregate function calls][aggregate-function-calls]. + + + + -[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + +
NameSummary
ARRAY -over_clause: - { named_window | ( [ window_specification ] ) } + + Produces an array with one element for each row in a subquery. +
ARRAY_AVG -``` + + Gets the average of non-NULL values in an array. +
ARRAY_CONCAT -Returns the population (biased) standard deviation of the values. The return -result is between `0` and `+Inf`. + + Concatenates one or more arrays with the same element type into a + single array. +
ARRAY_FILTER -This function ignores any `NULL` inputs. If all inputs are ignored, this -function returns `NULL`. If this function receives a single non-`NULL` input, -it returns `0`. + + Takes an array, filters out unwanted elements, and returns the results + in a new array. +
ARRAY_FIRST -+ Any input value is `NaN` -+ Any input value is positive infinity or negative infinity. + + Gets the first element in an array. +
ARRAY_INCLUDES - + + Checks if there is an element in the array that is + equal to a search value. +
ARRAY_INCLUDES_ALL - + + Checks if all search values are in an array. +
ARRAY_INCLUDES_ANY - + + Checks if any search values are in an array. +
ARRAY_IS_DISTINCT - + + Checks if an array contains no repeated elements. +
ARRAY_LAST -**Return Data Type** + + Gets the last element in an array. +
ARRAY_LENGTH -**Examples** + + Gets the number of elements in an array. +
ARRAY_MAX -/*-------------------* - | results | - +-------------------+ - | 3.265986323710904 | - *-------------------*/ -``` + + Gets the maximum non-NULL value in an array. +
ARRAY_MIN -/*---------* - | results | - +---------+ - | 2 | - *---------*/ -``` + + Gets the minimum non-NULL value in an array. +
ARRAY_REVERSE -/*---------* - | results | - +---------+ - | 0 | - *---------*/ -``` + + Reverses the order of elements in an array. +
ARRAY_SLICE -/*---------* - | results | - +---------+ - | NULL | - *---------*/ -``` + + Produces an array containing zero or more consecutive elements from an + input array. +
ARRAY_SUM -/*---------* - | results | - +---------+ - | NaN | - *---------*/ -``` + + Gets the sum of non-NULL values in an array. +
ARRAY_TO_STRING -### `STDDEV_SAMP` + + Produces a concatenation of the elements in an array as a + STRING value. +
ARRAY_TRANSFORM -over_clause: - { named_window | ( [ window_specification ] ) } + + Transforms the elements of an array, and returns the results in a new + array. +
ARRAY_ZIP -``` + + Combines elements from two to four arrays into one array. +
FLATTEN -Returns the sample (unbiased) standard deviation of the values. The return -result is between `0` and `+Inf`. + + Flattens arrays of nested data to create a single flat array. +
GENERATE_ARRAY -This function ignores any `NULL` inputs. If there are fewer than two non-`NULL` -inputs, this function returns `NULL`. + + Generates an array of values in a range. +
GENERATE_DATE_ARRAY -+ Any input value is `NaN` -+ Any input value is positive infinity or negative infinity. + + Generates an array of dates in a range. +
GENERATE_TIMESTAMP_ARRAY - + + Generates an array of timestamps in a range. +
- +### `ARRAY` -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +```sql +ARRAY(subquery) +``` - +**Description** -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +The `ARRAY` function returns an `ARRAY` with one element for each row in a +[subquery][subqueries]. - - -**Return Data Type** +If `subquery` produces a +[SQL table][datamodel-sql-tables], +the table must have exactly one column. Each element in the output `ARRAY` is +the value of the single column of a row in the table. -`DOUBLE` +If `subquery` produces a +[value table][datamodel-value-tables], +then each element in the output `ARRAY` is the entire corresponding row of the +value table. -**Examples** +**Constraints** -```sql -SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, 18]) AS x ++ Subqueries are unordered, so the elements of the output `ARRAY` are not +guaranteed to preserve any order in the source table for the subquery. However, +if the subquery includes an `ORDER BY` clause, the `ARRAY` function will return +an `ARRAY` that honors that clause. ++ If the subquery returns more than one column, the `ARRAY` function returns an +error. ++ If the subquery returns an `ARRAY` typed column or `ARRAY` typed rows, the + `ARRAY` function returns an error that ZetaSQL does not support + `ARRAY`s with elements of type + [`ARRAY`][array-data-type]. ++ If the subquery returns zero rows, the `ARRAY` function returns an empty +`ARRAY`. It never returns a `NULL` `ARRAY`. -/*---------* - | results | - +---------+ - | 4 | - *---------*/ -``` +**Return type** -```sql -SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, NULL]) AS x +`ARRAY` -/*--------------------* - | results | - +--------------------+ - | 2.8284271247461903 | - *--------------------*/ -``` +**Examples** ```sql -SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, NULL]) AS x +SELECT ARRAY + (SELECT 1 UNION ALL + SELECT 2 UNION ALL + SELECT 3) AS new_array; -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*-----------* + | new_array | + +-----------+ + | [1, 2, 3] | + *-----------*/ ``` +To construct an `ARRAY` from a subquery that contains multiple +columns, change the subquery to use `SELECT AS STRUCT`. Now +the `ARRAY` function will return an `ARRAY` of `STRUCT`s. The `ARRAY` will +contain one `STRUCT` for each row in the subquery, and each of these `STRUCT`s +will contain a field for each column in that row. + ```sql -SELECT STDDEV_SAMP(x) AS results FROM UNNEST([NULL]) AS x +SELECT + ARRAY + (SELECT AS STRUCT 1, 2, 3 + UNION ALL SELECT AS STRUCT 4, 5, 6) AS new_array; -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*------------------------* + | new_array | + +------------------------+ + | [{1, 2, 3}, {4, 5, 6}] | + *------------------------*/ ``` +Similarly, to construct an `ARRAY` from a subquery that contains +one or more `ARRAY`s, change the subquery to use `SELECT AS STRUCT`. + ```sql -SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x +SELECT ARRAY + (SELECT AS STRUCT [1, 2, 3] UNION ALL + SELECT AS STRUCT [4, 5, 6]) AS new_array; -/*---------* - | results | - +---------+ - | NaN | - *---------*/ +/*----------------------------* + | new_array | + +----------------------------+ + | [{[1, 2, 3]}, {[4, 5, 6]}] | + *----------------------------*/ ``` -### `VAR_POP` +[subqueries]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#subqueries -```sql -VAR_POP( - [ DISTINCT ] - expression - [ HAVING { MAX | MIN } expression2 ] -) -[ OVER over_clause ] +[datamodel-sql-tables]: https://github.com/google/zetasql/blob/master/docs/data-model.md#standard_sql_tables -over_clause: - { named_window | ( [ window_specification ] ) } +[datamodel-value-tables]: https://github.com/google/zetasql/blob/master/docs/data-model.md#value_tables -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] +[array-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#array_type + +### `ARRAY_AVG` +```sql +ARRAY_AVG(input_array) ``` **Description** -Returns the population (biased) variance of the values. The return result is -between `0` and `+Inf`. +Returns the average of non-`NULL` values in an array. -All numeric types are supported. If the -input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is -stable with the final output converted to a `DOUBLE`. -Otherwise the input is converted to a `DOUBLE` -before aggregation, resulting in a potentially unstable result. +Caveats: -This function ignores any `NULL` inputs. If all inputs are ignored, this -function returns `NULL`. If this function receives a single non-`NULL` input, -it returns `0`. ++ If the array is `NULL`, empty, or contains only `NULL`s, returns + `NULL`. ++ If the array contains `NaN`, returns `NaN`. ++ If the array contains `[+|-]Infinity`, returns either `[+|-]Infinity` + or `NaN`. ++ If there is numeric overflow, produces an error. ++ If a [floating-point type][floating-point-types] is returned, the result is + [non-deterministic][non-deterministic], which means you might receive a + different result each time you use this function. -`NaN` is produced if: +[floating-point-types]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating_point_types -+ Any input value is `NaN` -+ Any input value is positive infinity or negative infinity. +[non-deterministic]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating-point-semantics -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +**Supported Argument Types** - +In the input array, `ARRAY`, `T` can represent one of the following +data types: -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md ++ Any numeric input type ++ `INTERVAL` - +**Return type** -`VAR_POP` can be used with differential privacy. To learn more, see -[Differentially private aggregate functions][dp-functions]. +The return type depends upon `T` in the input array: -**Return Data Type** + -`DOUBLE` + + + + + + + + + +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEINTERVAL
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLEINTERVAL
**Examples** ```sql -SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, 18]) AS x +SELECT ARRAY_AVG([0, 2, NULL, 4, 4, 5]) as avg -/*--------------------* - | results | - +--------------------+ - | 10.666666666666666 | - *--------------------*/ +/*-----* + | avg | + +-----+ + | 3 | + *-----*/ ``` -```sql -SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, NULL]) AS x +### `ARRAY_CONCAT` -/*----------* - | results | - +---------+ - | 4 | - *---------*/ +```sql +ARRAY_CONCAT(array_expression[, ...]) ``` -```sql -SELECT VAR_POP(x) AS results FROM UNNEST([10, NULL]) AS x +**Description** -/*----------* - | results | - +---------+ - | 0 | - *---------*/ -``` +Concatenates one or more arrays with the same element type into a single array. + +The function returns `NULL` if any input argument is `NULL`. + +Note: You can also use the [|| concatenation operator][array-link-to-operators] +to concatenate arrays. + +**Return type** + +`ARRAY` + +**Examples** ```sql -SELECT VAR_POP(x) AS results FROM UNNEST([NULL]) AS x +SELECT ARRAY_CONCAT([1, 2], [3, 4], [5, 6]) as count_to_six; -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*--------------------------------------------------* + | count_to_six | + +--------------------------------------------------+ + | [1, 2, 3, 4, 5, 6] | + *--------------------------------------------------*/ ``` +[array-link-to-operators]: #operators + +### `ARRAY_FILTER` + ```sql -SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x +ARRAY_FILTER(array_expression, lambda_expression) -/*---------* - | results | - +---------+ - | NaN | - *---------*/ +lambda_expression: + { + element_alias -> boolean_expression + | (element_alias, index_alias) -> boolean_expression + } ``` -[dp-functions]: #aggregate-dp-functions +**Description** -### `VAR_SAMP` +Takes an array, filters out unwanted elements, and returns the results in a new +array. + ++ `array_expression`: The array to filter. ++ `lambda_expression`: Each element in `array_expression` is evaluated against + the [lambda expression][lambda-definition]. If the expression evaluates to + `FALSE` or `NULL`, the element is removed from the resulting array. ++ `element_alias`: An alias that represents an array element. ++ `index_alias`: An alias that represents the zero-based offset of the array + element. ++ `boolean_expression`: The predicate used to filter the array elements. + +Returns `NULL` if the `array_expression` is `NULL`. + +**Return type** + +ARRAY + +**Example** ```sql -VAR_SAMP( - [ DISTINCT ] - expression - [ HAVING { MAX | MIN } expression2 ] -) -[ OVER over_clause ] +SELECT + ARRAY_FILTER([1 ,2, 3], e -> e > 1) AS a1, + ARRAY_FILTER([0, 2, 3], (e, i) -> e > i) AS a2; -over_clause: - { named_window | ( [ window_specification ] ) } +/*-------+-------* + | a1 | a2 | + +-------+-------+ + | [2,3] | [2,3] | + *-------+-------*/ +``` -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] +[lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas + +### `ARRAY_FIRST` +```sql +ARRAY_FIRST(array_expression) ``` **Description** -Returns the sample (unbiased) variance of the values. The return result is -between `0` and `+Inf`. +Takes an array and returns the first element in the array. -All numeric types are supported. If the -input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is -stable with the final output converted to a `DOUBLE`. -Otherwise the input is converted to a `DOUBLE` -before aggregation, resulting in a potentially unstable result. +Produces an error if the array is empty. -This function ignores any `NULL` inputs. If there are fewer than two non-`NULL` -inputs, this function returns `NULL`. +Returns `NULL` if `array_expression` is `NULL`. -`NaN` is produced if: +Note: To get the last element in an array, see [`ARRAY_LAST`][array-last]. -+ Any input value is `NaN` -+ Any input value is positive infinity or negative infinity. +**Return type** -To learn more about the optional aggregate clauses that you can pass -into this function, see -[Aggregate function calls][aggregate-function-calls]. +Matches the data type of elements in `array_expression`. - +**Example** -[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md +```sql +SELECT ARRAY_FIRST(['a','b','c','d']) as first_element - +/*---------------* + | first_element | + +---------------+ + | a | + *---------------*/ +``` -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +[array-last]: #array_last - +### `ARRAY_INCLUDES` -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md ++ [Signature 1](#array_includes_signature1): + `ARRAY_INCLUDES(array_to_search, search_value)` ++ [Signature 2](#array_includes_signature2): + `ARRAY_INCLUDES(array_to_search, lambda_expression)` - +#### Signature 1 + -**Return Data Type** +```sql +ARRAY_INCLUDES(array_to_search, search_value) +``` -`DOUBLE` +**Description** -**Examples** +Takes an array and returns `TRUE` if there is an element in the array that is +equal to the search_value. -```sql -SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, 18]) AS x ++ `array_to_search`: The array to search. ++ `search_value`: The element to search for in the array. -/*---------* - | results | - +---------+ - | 16 | - *---------*/ -``` +Returns `NULL` if `array_to_search` or `search_value` is `NULL`. -```sql -SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, NULL]) AS x +**Return type** -/*---------* - | results | - +---------+ - | 8 | - *---------*/ -``` +`BOOL` -```sql -SELECT VAR_SAMP(x) AS results FROM UNNEST([10, NULL]) AS x +**Example** -/*---------* - | results | - +---------+ - | NULL | - *---------*/ -``` +In the following example, the query first checks to see if `0` exists in an +array. Then the query checks to see if `1` exists in an array. ```sql -SELECT VAR_SAMP(x) AS results FROM UNNEST([NULL]) AS x +SELECT + ARRAY_INCLUDES([1, 2, 3], 0) AS a1, + ARRAY_INCLUDES([1, 2, 3], 1) AS a2; -/*---------* - | results | - +---------+ - | NULL | - *---------*/ +/*-------+------* + | a1 | a2 | + +-------+------+ + | false | true | + *-------+------*/ ``` +#### Signature 2 + + ```sql -SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x +ARRAY_INCLUDES(array_to_search, lambda_expression) -/*---------* - | results | - +---------+ - | NaN | - *---------*/ +lambda_expression: element_alias -> boolean_expression ``` -### `VARIANCE` +**Description** -```sql -VARIANCE( - [ DISTINCT ] - expression - [ HAVING { MAX | MIN } expression2 ] -) -[ OVER over_clause ] +Takes an array and returns `TRUE` if the lambda expression evaluates to `TRUE` +for any element in the array. -over_clause: - { named_window | ( [ window_specification ] ) } ++ `array_to_search`: The array to search. ++ `lambda_expression`: Each element in `array_to_search` is evaluated against + the [lambda expression][lambda-definition]. ++ `element_alias`: An alias that represents an array element. ++ `boolean_expression`: The predicate used to evaluate the array elements. -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - [ window_frame_clause ] +Returns `NULL` if `array_to_search` is `NULL`. -``` +**Return type** -**Description** +`BOOL` -An alias of [VAR_SAMP][stat-agg-link-to-var-samp]. +**Example** -[stat-agg-link-to-var-samp]: #var_samp +In the following example, the query first checks to see if any elements that are +greater than 3 exist in an array (`e > 3`). Then the query checks to see if any +any elements that are greater than 0 exist in an array (`e > 0`). -[agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md +```sql +SELECT + ARRAY_INCLUDES([1, 2, 3], e -> e > 3) AS a1, + ARRAY_INCLUDES([1, 2, 3], e -> e > 0) AS a2; -## Differentially private aggregate functions - +/*-------+------* + | a1 | a2 | + +-------+------+ + | false | true | + *-------+------*/ +``` -ZetaSQL supports differentially private aggregate functions. -For an explanation of how aggregate functions work, see -[Aggregate function calls][agg-function-calls]. +[lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas -You can only use differentially private aggregate functions with -[differentially private queries][dp-guide] in a -[differential privacy clause][dp-syntax]. +### `ARRAY_INCLUDES_ALL` -Note: In this topic, the privacy parameters in the examples are not -recommendations. You should work with your privacy or security officer to -determine the optimal privacy parameters for your dataset and organization. +```sql +ARRAY_INCLUDES_ALL(array_to_search, search_values) +``` -### Function list +**Description** - - - - - - - - +Takes an array to search and an array of search values. Returns `TRUE` if all +search values are in the array to search, otherwise returns `FALSE`. - - - - +Returns `NULL` if `array_to_search` or `search_values` is +`NULL`. - - - - +`BOOL` - - - - - - - - - - - - - - - - - - - +```sql +SELECT + ARRAY_INCLUDES_ALL([1,2,3,4,5], [3,4,5]) AS a1, + ARRAY_INCLUDES_ALL([1,2,3,4,5], [4,5,6]) AS a2; - - - - +### `ARRAY_INCLUDES_ANY` - - - - +**Description** - - - - ++ `array_to_search`: The array to search. ++ `search_values`: The array that contains the elements to search for. - - - - +**Return type** - - - - +**Example** - - - - +```sql +SELECT + ARRAY_INCLUDES_ANY([1,2,3], [3,4,5]) AS a1, + ARRAY_INCLUDES_ANY([1,2,3], [4,5,6]) AS a2; - -
NameSummary
ANON_AVG ++ `array_to_search`: The array to search. ++ `search_values`: The array that contains the elements to search for. - - Gets the differentially-private average of non-NULL, - non-NaN values in a query with an - ANONYMIZATION clause. -
ANON_COUNT +**Return type** - - Signature 1: Gets the differentially-private count of rows in a query - with an ANONYMIZATION clause. -
-
- Signature 2: Gets the differentially-private count of rows with a - non-NULL expression in a query with an - ANONYMIZATION clause. -
ANON_PERCENTILE_CONT +**Example** - - Computes a differentially-private percentile across privacy unit columns - in a query with an ANONYMIZATION clause. -
ANON_QUANTILES - - - Produces an array of differentially-private quantile boundaries - in a query with an ANONYMIZATION clause. -
ANON_STDDEV_POP - - - Computes a differentially-private population (biased) standard deviation of - values in a query with an ANONYMIZATION clause. -
ANON_SUM +In the following example, the query first checks to see if `3`, `4`, and `5` +exists in an array. Then the query checks to see if `4`, `5`, and `6` exists in +an array. - - Gets the differentially-private sum of non-NULL, - non-NaN values in a query with an - ANONYMIZATION clause. -
ANON_VAR_POP +/*------+-------* + | a1 | a2 | + +------+-------+ + | true | false | + *------+-------*/ +``` - - Computes a differentially-private population (biased) variance of values - in a query with an ANONYMIZATION clause. -
AVG (differential privacy) +```sql +ARRAY_INCLUDES_ANY(array_to_search, search_values) +``` - - Gets the differentially-private average of non-NULL, - non-NaN values in a query with a - DIFFERENTIAL_PRIVACY clause. -
COUNT (differential privacy) +Takes an array to search and an array of search values. Returns `TRUE` if any +search values are in the array to search, otherwise returns `FALSE`. - - Signature 1: Gets the differentially-private count of rows in a query with a - DIFFERENTIAL_PRIVACY clause. -
-
- Signature 2: Gets the differentially-private count of rows with a - non-NULL expression in a query with a - DIFFERENTIAL_PRIVACY clause. -
PERCENTILE_CONT (differential privacy) +Returns `NULL` if `array_to_search` or `search_values` is +`NULL`. - - Computes a differentially-private percentile across privacy unit columns - in a query with a DIFFERENTIAL_PRIVACY clause. -
SUM (differential privacy) +`BOOL` - - Gets the differentially-private sum of non-NULL, - non-NaN values in a query with a - DIFFERENTIAL_PRIVACY clause. -
VAR_POP (differential privacy) +In the following example, the query first checks to see if `3`, `4`, or `5` +exists in an array. Then the query checks to see if `4`, `5`, or `6` exists in +an array. - - Computes the differentially-private population (biased) variance of values - in a query with a DIFFERENTIAL_PRIVACY clause. -
+/*------+-------* + | a1 | a2 | + +------+-------+ + | true | false | + *------+-------*/ +``` -### `ANON_AVG` - +### `ARRAY_IS_DISTINCT` ```sql -WITH ANONYMIZATION ... - ANON_AVG(expression [clamped_between_clause]) - -clamped_between_clause: - CLAMPED BETWEEN lower_bound AND upper_bound +ARRAY_IS_DISTINCT(value) ``` **Description** -Returns the average of non-`NULL`, non-`NaN` values in the expression. -This function first computes the average per privacy unit column, and then -computes the final result by averaging these averages. - -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: - -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. -+ `clamped_between_clause`: Perform [clamping][dp-clamp-between] per - privacy unit column averages. +Returns `TRUE` if the array contains no repeated elements, using the same +equality comparison logic as `SELECT DISTINCT`. **Return type** -`DOUBLE` +`BOOL` **Examples** -The following differentially private query gets the average number of each item -requested per professor. Smaller aggregations might not be included. This query -references a view called [`view_on_professors`][dp-example-views]. - -```sql --- With noise, using the epsilon parameter. -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity -FROM {{USERNAME}}.view_on_professors -GROUP BY item; - --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | pencil | 38.5038356810269 | - | pen | 13.4725028762032 | - *----------+------------------*/ -``` - ```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) +WITH example AS ( + SELECT [1, 2, 3] AS arr UNION ALL + SELECT [1, 1, 1] AS arr UNION ALL + SELECT [1, 2, NULL] AS arr UNION ALL + SELECT [1, 1, NULL] AS arr UNION ALL + SELECT [1, NULL, NULL] AS arr UNION ALL + SELECT [] AS arr UNION ALL + SELECT CAST(NULL AS ARRAY) AS arr +) SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) - item, - ANON_AVG(quantity) average_quantity -FROM {{USERNAME}}.view_on_professors -GROUP BY item; + arr, + ARRAY_IS_DISTINCT(arr) as is_distinct +FROM example; --- These results will not change when you run the query. -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 8 | - | pencil | 40 | - | pen | 18.5 | - *----------+------------------*/ +/*-----------------+-------------* + | arr | is_distinct | + +-----------------+-------------+ + | [1, 2, 3] | TRUE | + | [1, 1, 1] | FALSE | + | [1, 2, NULL] | TRUE | + | [1, 1, NULL] | FALSE | + | [1, NULL, NULL] | FALSE | + | [] | TRUE | + | NULL | NULL | + *-----------------+-------------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. - -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise - -[dp-clamp-between]: #dp_clamp_between - -### `ANON_COUNT` - - -+ [Signature 1](#anon_count_signature1) -+ [Signature 2](#anon_count_signature2) - -#### Signature 1 - +### `ARRAY_LAST` ```sql -WITH ANONYMIZATION ... - ANON_COUNT(*) +ARRAY_LAST(array_expression) ``` **Description** -Returns the number of rows in the -[differentially private][dp-from-clause] `FROM` clause. The final result -is an aggregation across privacy unit columns. -[Input values are clamped implicitly][dp-clamp-implicit]. Clamping is -performed per privacy unit column. - -This function must be used with the `ANONYMIZATION` clause. +Takes an array and returns the last element in the array. -**Return type** +Produces an error if the array is empty. -`INT64` +Returns `NULL` if `array_expression` is `NULL`. -**Examples** +Note: To get the first element in an array, see [`ARRAY_FIRST`][array-first]. -The following differentially private query counts the number of requests for -each item. This query references a view called -[`view_on_professors`][dp-example-views]. +**Return type** -```sql --- With noise, using the epsilon parameter. -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_COUNT(*) times_requested -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +Matches the data type of elements in `array_expression`. --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | pencil | 5 | - | pen | 2 | - *----------+-----------------*/ -``` +**Example** ```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) - item, - ANON_COUNT(*) times_requested -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +SELECT ARRAY_LAST(['a','b','c','d']) as last_element --- These results will not change when you run the query. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | scissors | 1 | - | pencil | 4 | - | pen | 3 | - *----------+-----------------*/ +/*---------------* + | last_element | + +---------------+ + | d | + *---------------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. +[array-first]: #array_first -#### Signature 2 - +### `ARRAY_LENGTH` ```sql -WITH ANONYMIZATION ... - ANON_COUNT(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) +ARRAY_LENGTH(array_expression) ``` **Description** -Returns the number of non-`NULL` expression values. The final result is an -aggregation across privacy unit columns. - -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: - -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. +Returns the size of the array. Returns 0 for an empty array. Returns `NULL` if +the `array_expression` is `NULL`. **Return type** @@ -6728,1250 +6480,1235 @@ can support these arguments: **Examples** -The following differentially private query counts the number of requests made -for each type of item. This query references a view called -[`view_on_professors`][dp-example-views]. - -```sql --- With noise -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested -FROM {{USERNAME}}.view_on_professors -GROUP BY item; - --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | pencil | 5 | - | pen | 2 | - *----------+-----------------*/ -``` - ```sql ---Without noise (this un-noised version is for demonstration only) -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) - item, - ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +WITH items AS + (SELECT ["coffee", NULL, "milk" ] as list + UNION ALL + SELECT ["cake", "pie"] as list) +SELECT ARRAY_TO_STRING(list, ', ', 'NULL'), ARRAY_LENGTH(list) AS size +FROM items +ORDER BY size DESC; --- These results will not change when you run the query. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | scissors | 1 | - | pencil | 4 | - | pen | 3 | - *----------+-----------------*/ +/*--------------------+------* + | list | size | + +--------------------+------+ + | coffee, NULL, milk | 3 | + | cake, pie | 2 | + *--------------------+------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. - -[dp-clamp-implicit]: #dp_implicit_clamping - -[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from_rules - -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise - -[dp-clamp-between]: #dp_clamp_between - -### `ANON_PERCENTILE_CONT` - +### `ARRAY_MAX` ```sql -WITH ANONYMIZATION ... - ANON_PERCENTILE_CONT(expression, percentile [CLAMPED BETWEEN lower_bound AND upper_bound]) +ARRAY_MAX(input_array) ``` **Description** -Takes an expression and computes a percentile for it. The final result is an -aggregation across privacy unit columns. +Returns the maximum non-`NULL` value in an array. -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: +Caveats: -+ `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL`s are always ignored. -+ `percentile`: The percentile to compute. The percentile must be a literal in - the range [0, 1] -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. ++ If the array is `NULL`, empty, or contains only `NULL`s, returns + `NULL`. ++ If the array contains `NaN`, returns `NaN`. -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. +**Supported Argument Types** + +In the input array, `ARRAY`, `T` can be an +[orderable data type][data-type-properties]. **Return type** -`DOUBLE` +The same data type as `T` in the input array. **Examples** -The following differentially private query gets the percentile of items -requested. Smaller aggregations might not be included. This query references a -view called [`view_on_professors`][dp-example-views]. - ```sql --- With noise, using the epsilon parameter. -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_PERCENTILE_CONT(quantity, 0.5 CLAMPED BETWEEN 0 AND 100) percentile_requested -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +SELECT ARRAY_MAX([8, 37, NULL, 55, 4]) as max --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+----------------------* - | item | percentile_requested | - +----------+----------------------+ - | pencil | 72.00011444091797 | - | scissors | 8.000175476074219 | - | pen | 23.001075744628906 | - *----------+----------------------*/ +/*-----* + | max | + +-----+ + | 55 | + *-----*/ ``` -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - -[dp-clamp-between]: #dp_clamp_between +[data-type-properties]: https://github.com/google/zetasql/blob/master/docs/data-types.md#data_type_properties -### `ANON_QUANTILES` - +### `ARRAY_MIN` ```sql -WITH ANONYMIZATION ... - ANON_QUANTILES(expression, number CLAMPED BETWEEN lower_bound AND upper_bound) +ARRAY_MIN(input_array) ``` **Description** -Returns an array of differentially private quantile boundaries for values in -`expression`. The first element in the return value is the -minimum quantile boundary and the last element is the maximum quantile boundary. -The returned results are aggregations across privacy unit columns. +Returns the minimum non-`NULL` value in an array. -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: +Caveats: -+ `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL`s are always ignored. -+ `number`: The number of quantiles to create. This must be an `INT64`. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. ++ If the array is `NULL`, empty, or contains only `NULL`s, returns + `NULL`. ++ If the array contains `NaN`, returns `NaN`. -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. +**Supported Argument Types** + +In the input array, `ARRAY`, `T` can be an +[orderable data type][data-type-properties]. **Return type** -`ARRAY`<`DOUBLE`> +The same data type as `T` in the input array. **Examples** -The following differentially private query gets the five quantile boundaries of -the four quartiles of the number of items requested. Smaller aggregations -might not be included. This query references a view called -[`view_on_professors`][dp-example-views]. - ```sql --- With noise, using the epsilon parameter. -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_QUANTILES(quantity, 4 CLAMPED BETWEEN 0 AND 100) quantiles_requested -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +SELECT ARRAY_MIN([8, 37, NULL, 4, 55]) as min --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+----------------------------------------------------------------------* - | item | quantiles_requested | - +----------+----------------------------------------------------------------------+ - | pen | [6.409375,20.647684733072918,41.40625,67.30848524305556,99.80078125] | - | pencil | [6.849259,44.010416666666664,62.64204,65.83806818181819,98.59375] | - *----------+----------------------------------------------------------------------*/ +/*-----* + | min | + +-----+ + | 4 | + *-----*/ ``` -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - -[dp-clamp-between]: #dp_clamp_between +[data-type-properties]: https://github.com/google/zetasql/blob/master/docs/data-types.md#data_type_properties -### `ANON_STDDEV_POP` - +### `ARRAY_REVERSE` ```sql -WITH ANONYMIZATION ... - ANON_STDDEV_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) +ARRAY_REVERSE(value) ``` **Description** -Takes an expression and computes the population (biased) standard deviation of -the values in the expression. The final result is an aggregation across -privacy unit columns between `0` and `+Inf`. - -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: - -+ `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL`s are always ignored. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per individual entity values. - -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. +Returns the input `ARRAY` with elements in reverse order. **Return type** -`DOUBLE` +`ARRAY` **Examples** -The following differentially private query gets the -population (biased) standard deviation of items requested. Smaller aggregations -might not be included. This query references a view called -[`view_on_professors`][dp-example-views]. - ```sql --- With noise, using the epsilon parameter. +WITH example AS ( + SELECT [1, 2, 3] AS arr UNION ALL + SELECT [4, 5] AS arr UNION ALL + SELECT [] AS arr +) SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_STDDEV_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_standard_deviation -FROM {{USERNAME}}.view_on_professors -GROUP BY item; + arr, + ARRAY_REVERSE(arr) AS reverse_arr +FROM example; --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+------------------------* - | item | pop_standard_deviation | - +----------+------------------------+ - | pencil | 25.350871122442054 | - | scissors | 50 | - | pen | 2 | - *----------+------------------------*/ +/*-----------+-------------* + | arr | reverse_arr | + +-----------+-------------+ + | [1, 2, 3] | [3, 2, 1] | + | [4, 5] | [5, 4] | + | [] | [] | + *-----------+-------------*/ ``` -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - -[dp-clamp-between]: #dp_clamp_between - -### `ANON_SUM` - +### `ARRAY_SLICE` ```sql -WITH ANONYMIZATION ... - ANON_SUM(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) +ARRAY_SLICE(array_to_slice, start_offset, end_offset) ``` **Description** -Returns the sum of non-`NULL`, non-`NaN` values in the expression. The final -result is an aggregation across privacy unit columns. +Returns an array containing zero or more consecutive elements from the +input array. -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: ++ `array_to_slice`: The array that contains the elements you want to slice. ++ `start_offset`: The inclusive starting offset. ++ `end_offset`: The inclusive ending offset. -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per privacy unit column. +An offset can be positive or negative. A positive offset starts from the +beginning of the input array and is 0-based. A negative offset starts from +the end of the input array. Out-of-bounds offsets are supported. Here are some +examples: -**Return type** + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Input offsetFinal offset in arrayNotes
0['a', 'b', 'c', 'd']The final offset is 0.
3['a', 'b', 'c', 'd']The final offset is 3.
5['a', 'b', 'c', 'd'] + Because the input offset is out of bounds, + the final offset is 3 (array length - 1). +
-1['a', 'b', 'c', 'd'] + Because a negative offset is used, the offset starts at the end of the + array. The final offset is 3 + (array length - 1). +
-2['a', 'b', 'c', 'd'] + Because a negative offset is used, the offset starts at the end of the + array. The final offset is 2 + (array length - 2). +
-4['a', 'b', 'c', 'd'] + Because a negative offset is used, the offset starts at the end of the + array. The final offset is 0 + (array length - 4). +
-5['a', 'b', 'c', 'd'] + Because the offset is negative and out of bounds, the final offset is + 0 (array length - array length). +
-One of the following [supertypes][dp-supertype]: +Additional details: -+ `INT64` -+ `UINT64` -+ `DOUBLE` ++ The input array can contain `NULL` elements. `NULL` elements are included + in the resulting array. ++ Returns `NULL` if `array_to_slice`, `start_offset`, or `end_offset` is + `NULL`. ++ Returns an empty array if `array_to_slice` is empty. ++ Returns an empty array if the position of the `start_offset` in the array is + after the position of the `end_offset`. -**Examples** +**Return type** -The following differentially private query gets the sum of items requested. -Smaller aggregations might not be included. This query references a view called -[`view_on_professors`][dp-example-views]. +`ARRAY` + +**Examples** ```sql --- With noise, using the epsilon parameter. -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_SUM(quantity CLAMPED BETWEEN 0 AND 100) quantity -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, 3) AS result --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+-----------* - | item | quantity | - +----------+-----------+ - | pencil | 143 | - | pen | 59 | - *----------+-----------*/ +/*-----------* + | result | + +-----------+ + | [b, c, d] | + *-----------*/ ``` ```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) - item, - ANON_SUM(quantity) quantity -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -1, 3) AS result --- These results will not change when you run the query. -/*----------+----------* - | item | quantity | - +----------+----------+ - | scissors | 8 | - | pencil | 144 | - | pen | 58 | - *----------+----------*/ +/*-----------* + | result | + +-----------+ + | [] | + *-----------*/ ``` -Note: You can learn more about when and when not to use -noise [here][dp-noise]. +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, -3) AS result -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views +/*--------* + | result | + +--------+ + | [b, c] | + *--------*/ +``` -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -1, -3) AS result -[dp-supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes +/*-----------* + | result | + +-----------+ + | [] | + *-----------*/ +``` -[dp-clamp-between]: #dp_clamp_between +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -3, -1) AS result -### `ANON_VAR_POP` - +/*-----------* + | result | + +-----------+ + | [c, d, e] | + *-----------*/ +``` ```sql -WITH ANONYMIZATION ... - ANON_VAR_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) -``` +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 3, 3) AS result -**Description** +/*--------* + | result | + +--------+ + | [d] | + *--------*/ +``` -Takes an expression and computes the population (biased) variance of the values -in the expression. The final result is an aggregation across -privacy unit columns between `0` and `+Inf`. You can -[clamp the input values][dp-clamp-explicit] explicitly, otherwise input values -are clamped implicitly. Clamping is performed per individual entity values. +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -3, -3) AS result -This function must be used with the `ANONYMIZATION` clause and -can support these arguments: +/*--------* + | result | + +--------+ + | [c] | + *--------*/ +``` -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. `NULL`s are always ignored. -+ `CLAMPED BETWEEN` clause: - Perform [clamping][dp-clamp-between] per individual entity values. +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, 30) AS result -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. +/*--------------* + | result | + +--------------+ + | [b, c, d, e] | + *--------------*/ +``` -**Return type** +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, -30) AS result -`DOUBLE` +/*-----------* + | result | + +-----------+ + | [] | + *-----------*/ +``` -**Examples** +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -30, 30) AS result -The following differentially private query gets the -population (biased) variance of items requested. Smaller aggregations might not -be included. This query references a view called -[`view_on_professors`][dp-example-views]. +/*-----------------* + | result | + +-----------------+ + | [a, b, c, d, e] | + *-----------------*/ +``` ```sql --- With noise, using the epsilon parameter. -SELECT - WITH ANONYMIZATION - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) - item, - ANON_VAR_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_variance -FROM {{USERNAME}}.view_on_professors -GROUP BY item; +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -30, -5) AS result --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+-----------------* - | item | pop_variance | - +----------+-----------------+ - | pencil | 642 | - | pen | 2.6666666666665 | - | scissors | 2500 | - *----------+-----------------*/ +/*--------* + | result | + +--------+ + | [a] | + *--------*/ ``` -[dp-clamp-explicit]: #dp_explicit_clamping +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 5, 30) AS result -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views +/*--------* + | result | + +--------+ + | [] | + *--------*/ +``` -[dp-clamp-between]: #dp_clamp_between +```sql +SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, NULL) AS result -### `AVG` (differential privacy) - +/*-----------* + | result | + +-----------+ + | NULL | + *-----------*/ +``` ```sql -WITH DIFFERENTIAL_PRIVACY ... - AVG(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)]) +SELECT ARRAY_SLICE(['a', 'b', NULL, 'd', 'e'], 1, 3) AS result + +/*--------------* + | result | + +--------------+ + | [b, NULL, d] | + *--------------*/ ``` -**Description** +### `ARRAY_SUM` -Returns the average of non-`NULL`, non-`NaN` values in the expression. -This function first computes the average per privacy unit column, and then -computes the final result by averaging these averages. +```sql +ARRAY_SUM(input_array) +``` -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support the following arguments: +**Description** -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. +Returns the sum of non-`NULL` values in an array. -**Return type** +Caveats: -`DOUBLE` ++ If the array is `NULL`, empty, or contains only `NULL`s, returns + `NULL`. ++ If the array contains `NaN`, returns `NaN`. ++ If the array contains `[+|-]Infinity`, returns either `[+|-]Infinity` + or `NaN`. ++ If there is numeric overflow, produces an error. ++ If a [floating-point type][floating-point-types] is returned, the result is + [non-deterministic][non-deterministic], which means you might receive a + different result each time you use this function. -**Examples** +[floating-point-types]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating_point_types -The following differentially private query gets the average number of each item -requested per professor. Smaller aggregations might not be included. This query -references a table called [`professors`][dp-example-tables]. +[non-deterministic]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating-point-semantics -```sql --- With noise, using the epsilon parameter. -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity -FROM professors -GROUP BY item; +**Supported Argument Types** --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | pencil | 38.5038356810269 | - | pen | 13.4725028762032 | - *----------+------------------*/ -``` +In the input array, `ARRAY`, `T` can represent: -```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - AVG(quantity) average_quantity -FROM professors -GROUP BY item; ++ Any supported numeric data type ++ `INTERVAL` --- These results will not change when you run the query. -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 8 | - | pencil | 40 | - | pen | 18.5 | - *----------+------------------*/ -``` +**Return type** -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. +The return type depends upon `T` in the input array: -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables + -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise + + + + + + + + -[dp-clamped-named]: #dp_clamped_named +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEINTERVAL
OUTPUTINT64INT64UINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLEINTERVAL
-[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +**Examples** -### `COUNT` (differential privacy) - +```sql +SELECT ARRAY_SUM([1, 2, 3, 4, 5, NULL, 4, 3, 2, 1]) as sum -+ [Signature 1](#dp_count_signature1): Returns the number of rows in a - differentially private `FROM` clause. -+ [Signature 2](#dp_count_signature2): Returns the number of non-`NULL` - values in an expression. +/*-----* + | sum | + +-----+ + | 25 | + *-----*/ +``` -#### Signature 1 - +### `ARRAY_TO_STRING` ```sql -WITH DIFFERENTIAL_PRIVACY ... - COUNT(* [, contribution_bounds_per_group => (lower_bound, upper_bound)])) +ARRAY_TO_STRING(array_expression, delimiter[, null_text]) ``` **Description** -Returns the number of rows in the -[differentially private][dp-from-clause] `FROM` clause. The final result -is an aggregation across a privacy unit column. +Returns a concatenation of the elements in `array_expression` +as a `STRING`. The value for `array_expression` +can either be an array of `STRING` or +`BYTES` data types. -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support the following argument: +If the `null_text` parameter is used, the function replaces any `NULL` values in +the array with the value of `null_text`. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. +If the `null_text` parameter is not used, the function omits the `NULL` value +and its preceding delimiter. **Return type** -`INT64` +`STRING` **Examples** -The following differentially private query counts the number of requests for -each item. This query references a table called -[`professors`][dp-example-tables]. - ```sql --- With noise, using the epsilon parameter. -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - COUNT(*) times_requested -FROM professors -GROUP BY item; +WITH items AS + (SELECT ['coffee', 'tea', 'milk' ] as list + UNION ALL + SELECT ['cake', 'pie', NULL] as list) --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | pencil | 5 | - | pen | 2 | - *----------+-----------------*/ -``` +SELECT ARRAY_TO_STRING(list, '--') AS text +FROM items; -```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - COUNT(*) times_requested -FROM professors -GROUP BY item; - --- These results will not change when you run the query. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | scissors | 1 | - | pencil | 4 | - | pen | 3 | - *----------+-----------------*/ +/*--------------------------------* + | text | + +--------------------------------+ + | coffee--tea--milk | + | cake--pie | + *--------------------------------*/ ``` -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. +```sql +WITH items AS + (SELECT ['coffee', 'tea', 'milk' ] as list + UNION ALL + SELECT ['cake', 'pie', NULL] as list) -#### Signature 2 - +SELECT ARRAY_TO_STRING(list, '--', 'MISSING') AS text +FROM items; + +/*--------------------------------* + | text | + +--------------------------------+ + | coffee--tea--milk | + | cake--pie--MISSING | + *--------------------------------*/ +``` + +### `ARRAY_TRANSFORM` ```sql -WITH DIFFERENTIAL_PRIVACY ... - COUNT(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)]) +ARRAY_TRANSFORM(array_expression, lambda_expression) + +lambda_expression: + { + element_alias -> transform_expression + | (element_alias, index_alias) -> transform_expression + } ``` **Description** -Returns the number of non-`NULL` expression values. The final result is an -aggregation across a privacy unit column. +Takes an array, transforms the elements, and returns the results in a new array. +The output array always has the same length as the input array. -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support these arguments: ++ `array_expression`: The array to transform. ++ `lambda_expression`: Each element in `array_expression` is evaluated against + the [lambda expression][lambda-definition]. The evaluation results are + returned in a new array. ++ `element_alias`: An alias that represents an array element. ++ `index_alias`: An alias that represents the zero-based offset of the array + element. ++ `transform_expression`: The expression used to transform the array elements. -+ `expression`: The input expression. This expression can be any - numeric input type, such as `INT64`. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. +Returns `NULL` if the `array_expression` is `NULL`. **Return type** -`INT64` - -**Examples** +`ARRAY` -The following differentially private query counts the number of requests made -for each type of item. This query references a table called -[`professors`][dp-example-tables]. +**Example** ```sql --- With noise, using the epsilon parameter. SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - COUNT(item, contribution_bounds_per_group => (0,100)) times_requested -FROM professors -GROUP BY item; + ARRAY_TRANSFORM([1, 2, 3], e -> e + 1) AS a1, + ARRAY_TRANSFORM([1, 2, 3], (e, i) -> e + i) AS a2; --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | pencil | 5 | - | pen | 2 | - *----------+-----------------*/ +/*---------+---------* + | a1 | a2 | + +---------+---------+ + | [2,3,4] | [1,3,5] | + *---------+---------*/ ``` +[lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas + +### `ARRAY_ZIP` + ```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - COUNT(item, contribution_bounds_per_group => (0,100)) times_requested -FROM professors -GROUP BY item; +ARRAY_ZIP( + array_input [ AS alias ], + array_input [ AS alias ][, ... ] + [, transformation => lambda_expression ] + [, mode => { 'STRICT' | 'TRUNCATE' | 'PAD' } ] +) +``` --- These results will not change when you run the query. -/*----------+-----------------* - | item | times_requested | - +----------+-----------------+ - | scissors | 1 | - | pencil | 4 | - | pen | 3 | - *----------+-----------------*/ +**Description** + +Combines the elements from two to four arrays into one array. + +**Definitions** + ++ `array_input`: An input `ARRAY` value to be zipped with the other array + inputs. `ARRAY_ZIP` supports two to four input arrays. ++ `alias`: An alias optionally supplied for an `array_input`. In the results, + the alias is the name of the associated `STRUCT` field. ++ `transformation`: A optionally-named lambda argument. `lambda_expression` + specifies how elements are combined as they are zipped. This overrides + the default `STRUCT` creation behavior. ++ `mode`: A mandatory-named argument that determines how arrays of differing + lengths are zipped. If this optional argument is not supplied, the function + uses `STRICT` mode by default. This argument can be one of the following + values: + + + `STRICT` (default): If the length of any array is different from the + others, produce an error. + + + `TRUNCATE`: Truncate longer arrays to match the length of the shortest + array. + + + `PAD`: Pad shorter arrays with `NULL` values to match the length of the + longest array. + +**Details** + ++ If an `array_input` or `mode` is `NULL`, this function returns `NULL`, even when + `mode` is `STRICT`. ++ Argument aliases can't be used with the `transformation` argument. + +**Return type** + ++ If `transformation` is used and `lambda_expression` returns type `T`, the + return type is `ARRAY`. ++ Otherwise, the return type is `ARRAY`, with the `STRUCT` having a + number of fields equal to the number of input arrays. Each field's name is + either the user-provided `alias` for the corresponding `array_input`, or a + default alias assigned by the compiler, following the same logic used for + [naming columns in a SELECT list][implicit-aliases]. + +**Examples** + +The following query zips two arrays into one: + +```sql +SELECT ARRAY_ZIP([1, 2], ['a', 'b']) AS results + +/*----------------------* + | results | + +----------------------+ + | [(1, 'a'), (2, 'b')] | + *----------------------*/ ``` -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. +You can give an array an alias. For example, in the following +query, the returned array is of type `ARRAY>`, +where: -[dp-clamp-implicit]: #dp_implicit_clamping ++ `A1` is the alias provided for array `[1, 2]`. ++ `alias_inferred` is the inferred alias provided for array `['a', 'b']`. -[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from +```sql +WITH T AS ( + SELECT ['a', 'b'] AS alias_inferred +) +SELECT ARRAY_ZIP([1, 2] AS A1, alias_inferred) AS results +FROM T -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables +/*----------------------------------------------------------+ + | results | + +----------------------------------------------------------+ + | [{1 A1, 'a' alias_inferred}, {2 A1, 'b' alias_inferred}] | + +----------------------------------------------------------*/ +``` -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise +To provide a custom transformation of the input arrays, use the `transformation` +argument: -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +```sql +SELECT ARRAY_ZIP([1, 2], [3, 4], transformation => (e1, e2) -> (e1 + e2)) -[dp-clamped-named]: #dp_clamped_named +/*---------+ + | results | + +---------+ + | [4, 6] | + +---------*/ +``` -### `PERCENTILE_CONT` (differential privacy) - +The argument name `transformation` is not required. For example: ```sql -WITH DIFFERENTIAL_PRIVACY ... - PERCENTILE_CONT(expression, percentile, contribution_bounds_per_row => (lower_bound, upper_bound)) +SELECT ARRAY_ZIP([1, 2], [3, 4], (e1, e2) -> (e1 + e2)) + +/*---------+ + | results | + +---------+ + | [4, 6] | + +---------*/ ``` -**Description** +When `transformation` is provided, the input arrays are not allowed to have +aliases. For example, the following query is invalid: -Takes an expression and computes a percentile for it. The final result is an -aggregation across privacy unit columns. +```sql {.bad} +-- Error: ARRAY_ZIP function with lambda argument does not allow providing +-- argument aliases +SELECT ARRAY_ZIP([1, 2], [3, 4] AS alias_not_allowed, (e1, e2) -> (e1 + e2)) +``` -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support these arguments: +To produce an error when arrays with different lengths are zipped, don't +add `mode`, or if you do, set it as `STRICT`. For example: -+ `expression`: The input expression. This can be most numeric input types, - such as `INT64`. `NULL` values are always ignored. -+ `percentile`: The percentile to compute. The percentile must be a literal in - the range `[0, 1]`. -+ `contribution_bounds_per_row`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each row separately before performing intermediate - grouping on the privacy unit column. +```sql {.bad} +-- Error: Unequal array length +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd']) AS results +``` -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. +```sql {.bad} +-- Error: Unequal array length +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'STRICT') AS results +``` -**Return type** +Use the `PAD` mode to pad missing values with `NULL` when input arrays have +different lengths. For example: -`DOUBLE` +```sql +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], [], mode => 'PAD') AS results -**Examples** +/*------------------------------------------------------------------------+ + | results | + +------------------------------------------------------------------------+ + | [{1, 'a', NULL}, {2, 'b', NULL}, {NULL, 'c', NULL}, {NULL, 'd', NULL}] | + +------------------------------------------------------------------------*/ +``` -The following differentially private query gets the percentile of items -requested. Smaller aggregations might not be included. This query references a -view called [`professors`][dp-example-tables]. +Use the `TRUNCATE` mode to truncate all arrays that are longer than the shortest +array. For example: ```sql --- With noise, using the epsilon parameter. -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row => (0,100)) percentile_requested -FROM professors -GROUP BY item; +SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'TRUNCATE') AS results --- These results will change each time you run the query. --- Smaller aggregations might be removed. - /*----------+----------------------* - | item | percentile_requested | - +----------+----------------------+ - | pencil | 72.00011444091797 | - | scissors | 8.000175476074219 | - | pen | 23.001075744628906 | - *----------+----------------------*/ +/*----------------------* + | results | + +----------------------+ + | [(1, 'a'), (2, 'b')] | + *----------------------*/ ``` -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables + -[dp-clamped-named]: #dp_clamped_named +[implicit-aliases]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#implicit_aliases -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause + -### `SUM` (differential privacy) - +### `FLATTEN` ```sql -WITH DIFFERENTIAL_PRIVACY ... - SUM(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)]) +FLATTEN(array_elements_field_access_expression) ``` **Description** -Returns the sum of non-`NULL`, non-`NaN` values in the expression. The final -result is an aggregation across privacy unit columns. - -This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] -and can support these arguments: +Takes a nested array and flattens a specific part of it into a single, flat +array with the +[array elements field access operator][array-el-field-operator]. +Returns `NULL` if the input value is `NULL`. +If `NULL` array elements are +encountered, they are added to the resulting array. -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. `NULL` values are always ignored. -+ `contribution_bounds_per_group`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each group separately before performing intermediate - grouping on the privacy unit column. +There are several ways to flatten nested data into arrays. To learn more, see +[Flattening nested data into an array][flatten-tree-to-array]. **Return type** -One of the following [supertypes][dp-supertype]: - -+ `INT64` -+ `UINT64` -+ `DOUBLE` +`ARRAY` **Examples** -The following differentially private query gets the sum of items requested. -Smaller aggregations might not be included. This query references a view called -[`professors`][dp-example-tables]. +In the following example, all of the arrays for `v.sales.quantity` are +concatenated in a flattened array. ```sql --- With noise, using the epsilon parameter. -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - SUM(quantity, contribution_bounds_per_group => (0,100)) quantity -FROM professors -GROUP BY item; +WITH t AS ( + SELECT + [ + STRUCT([STRUCT([1,2,3] AS quantity), STRUCT([4,5,6] AS quantity)] AS sales), + STRUCT([STRUCT([7,8] AS quantity), STRUCT([] AS quantity)] AS sales) + ] AS v +) +SELECT FLATTEN(v.sales.quantity) AS all_values +FROM t; --- These results will change each time you run the query. --- Smaller aggregations might be removed. -/*----------+-----------* - | item | quantity | - +----------+-----------+ - | pencil | 143 | - | pen | 59 | - *----------+-----------*/ +/*--------------------------* + | all_values | + +--------------------------+ + | [1, 2, 3, 4, 5, 6, 7, 8] | + *--------------------------*/ ``` +In the following example, `OFFSET` gets the second value in each array and +concatenates them. + ```sql --- Without noise, using the epsilon parameter. --- (this un-noised version is for demonstration only) -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - SUM(quantity) quantity -FROM professors -GROUP BY item; +WITH t AS ( + SELECT + [ + STRUCT([STRUCT([1,2,3] AS quantity), STRUCT([4,5,6] AS quantity)] AS sales), + STRUCT([STRUCT([7,8,9] AS quantity), STRUCT([10,11,12] AS quantity)] AS sales) + ] AS v +) +SELECT FLATTEN(v.sales.quantity[OFFSET(1)]) AS second_values +FROM t; --- These results will not change when you run the query. -/*----------+----------* - | item | quantity | - +----------+----------+ - | scissors | 8 | - | pencil | 144 | - | pen | 58 | - *----------+----------*/ +/*---------------* + | second_values | + +---------------+ + | [2, 5, 8, 11] | + *---------------*/ ``` -Note: For more information about when and when not to use -noise, see [Use differential privacy][dp-noise]. +In the following example, all values for `v.price` are returned in a +flattened array. -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables +```sql +WITH t AS ( + SELECT + [ + STRUCT(1 AS price, 2 AS quantity), + STRUCT(10 AS price, 20 AS quantity) + ] AS v +) +SELECT FLATTEN(v.price) AS all_prices +FROM t; -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise +/*------------* + | all_prices | + +------------+ + | [1, 10] | + *------------*/ +``` -[dp-supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes +For more examples, including how to use protocol buffers with `FLATTEN`, see the +[array elements field access operator][array-el-field-operator]. -[dp-clamped-named]: #dp_clamped_named +[flatten-tree-to-array]: https://github.com/google/zetasql/blob/master/docs/arrays.md#flattening_nested_data_into_arrays -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +[array-el-field-operator]: #array_el_field_operator -### `VAR_POP` (differential privacy) - +### `GENERATE_ARRAY` ```sql -WITH DIFFERENTIAL_PRIVACY ... - VAR_POP(expression[, contribution_bounds_per_row => (lower_bound, upper_bound)]) +GENERATE_ARRAY(start_expression, end_expression[, step_expression]) ``` **Description** -Takes an expression and computes the population (biased) variance of the values -in the expression. The final result is an aggregation across -privacy unit columns between `0` and `+Inf`. You can -[clamp the input values][dp-clamp-explicit] explicitly, otherwise input values -are clamped implicitly. Clamping is performed per individual user values. +Returns an array of values. The `start_expression` and `end_expression` +parameters determine the inclusive start and end of the array. -This function must be used with the `DIFFERENTIAL_PRIVACY` clause and -can support these arguments: +The `GENERATE_ARRAY` function accepts the following data types as inputs: -+ `expression`: The input expression. This can be any numeric input type, - such as `INT64`. `NULL`s are always ignored. -+ `contribution_bounds_per_row`: The - [contribution bounds named argument][dp-clamped-named]. - Perform clamping per each row separately before performing intermediate - grouping on individual user values. ++ `INT64` ++ `UINT64` ++ `NUMERIC` ++ `BIGNUMERIC` ++ `DOUBLE` -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. - If you need them, cast them as the -`DOUBLE` data type first. +The `step_expression` parameter determines the increment used to +generate array values. The default value for this parameter is `1`. -**Return type** +This function returns an error if `step_expression` is set to 0, or if any +input is `NaN`. -`DOUBLE` +If any argument is `NULL`, the function will return a `NULL` array. + +**Return Data Type** + +`ARRAY` **Examples** -The following differentially private query gets the -population (biased) variance of items requested. Smaller aggregations may not -be included. This query references a view called -[`professors`][dp-example-tables]. +The following returns an array of integers, with a default step of 1. ```sql --- With noise -SELECT - WITH DIFFERENTIAL_PRIVACY - OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) - item, - VAR_POP(quantity, contribution_bounds_per_row => (0,100)) pop_variance -FROM professors -GROUP BY item; +SELECT GENERATE_ARRAY(1, 5) AS example_array; --- These results will change each time you run the query. --- Smaller aggregations may be removed. -/*----------+-----------------* - | item | pop_variance | - +----------+-----------------+ - | pencil | 642 | - | pen | 2.6666666666665 | - | scissors | 2500 | - *----------+-----------------*/ +/*-----------------* + | example_array | + +-----------------+ + | [1, 2, 3, 4, 5] | + *-----------------*/ ``` -[dp-clamp-explicit]: #dp_explicit_clamping - -[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables - -[dp-clamped-named]: #dp_clamped_named - -### Clamp values in a differentially private aggregate function - +The following returns an array using a user-specified step size. -In [differentially private queries][dp-syntax], -aggregation clamping is used to limit the contribution of outliers. You can -clamp explicitly or implicitly as follows: +```sql +SELECT GENERATE_ARRAY(0, 10, 3) AS example_array; -+ [Clamp explicitly in the `DIFFERENTIAL_PRIVACY` clause][dp-clamped-named]. -+ [Clamp implicitly in the `DIFFERENTIAL_PRIVACY` clause][dp-clamped-named-imp]. -+ [Clamp explicitly in the `ANONYMIZATION` clause][dp-clamp-between]. -+ [Clamp implicitly in the `ANONYMIZATION` clause][dp-clamp-between-imp]. +/*---------------* + | example_array | + +---------------+ + | [0, 3, 6, 9] | + *---------------*/ +``` -To learn more about explicit and implicit clamping, see the following: +The following returns an array using a negative value, `-3` for its step size. -+ [About implicit clamping][dp-imp-clamp]. -+ [About explicit clamping][dp-exp-clamp]. +```sql +SELECT GENERATE_ARRAY(10, 0, -3) AS example_array; -#### Implicitly clamp values in the `DIFFERENTIAL_PRIVACY` clause - +/*---------------* + | example_array | + +---------------+ + | [10, 7, 4, 1] | + *---------------*/ +``` -If you don't include the contribution bounds named argument with the -`DIFFERENTIAL_PRIVACY` clause, clamping is [implicit][dp-imp-clamp], which -means bounds are derived from the data itself in a differentially private way. +The following returns an array using the same value for the `start_expression` +and `end_expression`. -Implicit bounding works best when computed using large datasets. For more -information, see [Implicit bounding limitations for small datasets][implicit-limits]. +```sql +SELECT GENERATE_ARRAY(4, 4, 10) AS example_array; -**Example** +/*---------------* + | example_array | + +---------------+ + | [4] | + *---------------*/ +``` -The following anonymized query clamps each aggregate contribution for each -differential privacy ID and within a derived range from the data itself. -As long as all or most values fall within this range, your results -will be accurate. This query references a view called -[`view_on_professors`][dp-example-views]. +The following returns an empty array, because the `start_expression` is greater +than the `end_expression`, and the `step_expression` value is positive. ```sql ---Without noise (this un-noised version is for demonstration only) -SELECT WITH DIFFERENTIAL_PRIVACY - OPTIONS ( - epsilon = 1e20, - delta = .01, - privacy_unit_column=id - ) - item, - AVG(quantity) average_quantity -FROM view_on_professors -GROUP BY item; +SELECT GENERATE_ARRAY(10, 0, 3) AS example_array; -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 8 | - | pencil | 72 | - | pen | 18.5 | - *----------+------------------*/ +/*---------------* + | example_array | + +---------------+ + | [] | + *---------------*/ ``` -#### Explicitly clamp values in the `DIFFERENTIAL_PRIVACY` clause - +The following returns a `NULL` array because `end_expression` is `NULL`. ```sql -contribution_bounds_per_group => (lower_bound,upper_bound) -``` +SELECT GENERATE_ARRAY(5, NULL, 1) AS example_array; -```sql -contribution_bounds_per_row => (lower_bound,upper_bound) +/*---------------* + | example_array | + +---------------+ + | NULL | + *---------------*/ ``` -Use the contribution bounds named argument to [explicitly clamp][dp-exp-clamp] -values per group or per row between a lower and upper bound in a -`DIFFERENTIAL_PRIVACY` clause. - -Input values: +The following returns multiple arrays. -+ `contribution_bounds_per_row`: Contributions per privacy unit are clamped - on a per-row (per-record) basis. This means the following: - + Upper and lower bounds are applied to column values in individual - rows produced by the input subquery independently. - + The maximum possible contribution per privacy unit (and per grouping set) - is the product of the per-row contribution limit and `max_groups_contributed` - differential privacy parameter. -+ `contribution_bounds_per_group`: Contributions per privacy unit are clamped - on a unique set of entity-specified `GROUP BY` keys. The upper and lower - bounds are applied to values per group after the values are aggregated per - privacy unit. -+ `lower_bound`: Numeric literal that represents the smallest value to - include in an aggregation. -+ `upper_bound`: Numeric literal that represents the largest value to - include in an aggregation. +```sql +SELECT GENERATE_ARRAY(start, 5) AS example_array +FROM UNNEST([3, 4, 5]) AS start; -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. +/*---------------* + | example_array | + +---------------+ + | [3, 4, 5] | + | [4, 5] | + | [5] | + +---------------*/ +``` + +### `GENERATE_DATE_ARRAY` + +```sql +GENERATE_DATE_ARRAY(start_date, end_date[, INTERVAL INT64_expr date_part]) +``` + +**Description** + +Returns an array of dates. The `start_date` and `end_date` +parameters determine the inclusive start and end of the array. + +The `GENERATE_DATE_ARRAY` function accepts the following data types as inputs: + ++ `start_date` must be a `DATE`. ++ `end_date` must be a `DATE`. ++ `INT64_expr` must be an `INT64`. ++ `date_part` must be either DAY, WEEK, MONTH, QUARTER, or YEAR. + +The `INT64_expr` parameter determines the increment used to generate dates. The +default value for this parameter is 1 day. + +This function returns an error if `INT64_expr` is set to 0. + +**Return Data Type** + +`ARRAY` containing 0 or more `DATE` values. **Examples** -The following anonymized query clamps each aggregate contribution for each -differential privacy ID and within a specified range (`0` and `100`). -As long as all or most values fall within this range, your results -will be accurate. This query references a view called -[`view_on_professors`][dp-example-views]. +The following returns an array of dates, with a default step of 1. ```sql ---Without noise (this un-noised version is for demonstration only) -SELECT WITH DIFFERENTIAL_PRIVACY - OPTIONS ( - epsilon = 1e20, - delta = .01, - privacy_unit_column=id - ) - item, - AVG(quantity, contribution_bounds_per_group=>(0,100)) AS average_quantity -FROM view_on_professors -GROUP BY item; +SELECT GENERATE_DATE_ARRAY('2016-10-05', '2016-10-08') AS example; -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 8 | - | pencil | 40 | - | pen | 18.5 | - *----------+------------------*/ +/*--------------------------------------------------* + | example | + +--------------------------------------------------+ + | [2016-10-05, 2016-10-06, 2016-10-07, 2016-10-08] | + *--------------------------------------------------*/ ``` -Notice what happens when most or all values fall outside of the clamped range. -To get accurate results, ensure that the difference between the upper and lower -bound is as small as possible, and that most inputs are between the upper and -lower bound. +The following returns an array using a user-specified step size. -```sql {.bad} ---Without noise (this un-noised version is for demonstration only) -SELECT WITH DIFFERENTIAL_PRIVACY - OPTIONS ( - epsilon = 1e20, - delta = .01, - privacy_unit_column=id - ) - item, - AVG(quantity, contribution_bounds_per_group=>(50,100)) AS average_quantity -FROM view_on_professors -GROUP BY item; +```sql +SELECT GENERATE_DATE_ARRAY( + '2016-10-05', '2016-10-09', INTERVAL 2 DAY) AS example; -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 54 | - | pencil | 58 | - | pen | 51 | - *----------+------------------*/ +/*--------------------------------------* + | example | + +--------------------------------------+ + | [2016-10-05, 2016-10-07, 2016-10-09] | + *--------------------------------------*/ ``` -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. +The following returns an array using a negative value, `-3` for its step size. -#### Implicitly clamp values in the `ANONYMIZATION` clause - +```sql +SELECT GENERATE_DATE_ARRAY('2016-10-05', + '2016-10-01', INTERVAL -3 DAY) AS example; -If you don't include the `CLAMPED BETWEEN` clause with the -`ANONYMIZATION` clause, clamping is [implicit][dp-imp-clamp], which means bounds -are derived from the data itself in a differentially private way. +/*--------------------------* + | example | + +--------------------------+ + | [2016-10-05, 2016-10-02] | + *--------------------------*/ +``` -Implicit bounding works best when computed using large datasets. For more -information, see [Implicit bounding limitations for small datasets][implicit-limits]. +The following returns an array using the same value for the `start_date`and +`end_date`. -**Example** +```sql +SELECT GENERATE_DATE_ARRAY('2016-10-05', + '2016-10-05', INTERVAL 8 DAY) AS example; -The following anonymized query clamps each aggregate contribution for each -differential privacy ID and within a derived range from the data itself. -As long as all or most values fall within this range, your results -will be accurate. This query references a view called -[`view_on_professors`][dp-example-views]. +/*--------------* + | example | + +--------------+ + | [2016-10-05] | + *--------------*/ +``` + +The following returns an empty array, because the `start_date` is greater +than the `end_date`, and the `step` value is positive. ```sql ---Without noise (this un-noised version is for demonstration only) -SELECT WITH ANONYMIZATION - OPTIONS ( - epsilon = 1e20, - delta = .01, - max_groups_contributed = 1 - ) - item, - AVG(quantity) AS average_quantity -FROM view_on_professors -GROUP BY item; +SELECT GENERATE_DATE_ARRAY('2016-10-05', + '2016-10-01', INTERVAL 1 DAY) AS example; -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 8 | - | pencil | 72 | - | pen | 18.5 | - *----------+------------------*/ +/*---------* + | example | + +---------+ + | [] | + *---------*/ ``` -#### Explicitly clamp values in the `ANONYMIZATION` clause - +The following returns a `NULL` array, because one of its inputs is +`NULL`. ```sql -CLAMPED BETWEEN lower_bound AND upper_bound +SELECT GENERATE_DATE_ARRAY('2016-10-05', NULL) AS example; + +/*---------* + | example | + +---------+ + | NULL | + *---------*/ ``` -Use the `CLAMPED BETWEEN` clause to [explicitly clamp][dp-exp-clamp] values -between a lower and an upper bound in an `ANONYMIZATION` clause. +The following returns an array of dates, using MONTH as the `date_part` +interval: -Input values: +```sql +SELECT GENERATE_DATE_ARRAY('2016-01-01', + '2016-12-31', INTERVAL 2 MONTH) AS example; -+ `lower_bound`: Numeric literal that represents the smallest value to - include in an aggregation. -+ `upper_bound`: Numeric literal that represents the largest value to - include in an aggregation. +/*--------------------------------------------------------------------------* + | example | + +--------------------------------------------------------------------------+ + | [2016-01-01, 2016-03-01, 2016-05-01, 2016-07-01, 2016-09-01, 2016-11-01] | + *--------------------------------------------------------------------------*/ +``` -`NUMERIC` and `BIGNUMERIC` arguments are not allowed. +The following uses non-constant dates to generate an array. -Note: This is a legacy feature. If possible, use the `contribution_bounds` -named argument instead. +```sql +SELECT GENERATE_DATE_ARRAY(date_start, date_end, INTERVAL 1 WEEK) AS date_range +FROM ( + SELECT DATE '2016-01-01' AS date_start, DATE '2016-01-31' AS date_end + UNION ALL SELECT DATE "2016-04-01", DATE "2016-04-30" + UNION ALL SELECT DATE "2016-07-01", DATE "2016-07-31" + UNION ALL SELECT DATE "2016-10-01", DATE "2016-10-31" +) AS items; -**Examples** +/*--------------------------------------------------------------* + | date_range | + +--------------------------------------------------------------+ + | [2016-01-01, 2016-01-08, 2016-01-15, 2016-01-22, 2016-01-29] | + | [2016-04-01, 2016-04-08, 2016-04-15, 2016-04-22, 2016-04-29] | + | [2016-07-01, 2016-07-08, 2016-07-15, 2016-07-22, 2016-07-29] | + | [2016-10-01, 2016-10-08, 2016-10-15, 2016-10-22, 2016-10-29] | + *--------------------------------------------------------------*/ +``` -The following differentially private query clamps each aggregate contribution -for each privacy unit column and within a specified range (`0` and `100`). -As long as all or most values fall within this range, your results will be -accurate. This query references a view called -[`view_on_professors`][dp-example-views]. +### `GENERATE_TIMESTAMP_ARRAY` ```sql ---Without noise (this un-noised version is for demonstration only) -SELECT WITH ANONYMIZATION - OPTIONS ( - epsilon = 1e20, - delta = .01, - max_groups_contributed = 1 - ) - item, - ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) AS average_quantity -FROM view_on_professors -GROUP BY item; - -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 8 | - | pencil | 40 | - | pen | 18.5 | - *----------+------------------*/ +GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, + INTERVAL step_expression date_part) ``` -Notice what happens when most or all values fall outside of the clamped range. -To get accurate results, ensure that the difference between the upper and lower -bound is as small as possible, and that most inputs are between the upper and -lower bound. +**Description** -```sql {.bad} ---Without noise (this un-noised version is for demonstration only) -SELECT WITH ANONYMIZATION - OPTIONS ( - epsilon = 1e20, - delta = .01, - max_groups_contributed = 1 - ) - item, - ANON_AVG(quantity CLAMPED BETWEEN 50 AND 100) AS average_quantity -FROM view_on_professors -GROUP BY item; +Returns an `ARRAY` of `TIMESTAMPS` separated by a given interval. The +`start_timestamp` and `end_timestamp` parameters determine the inclusive +lower and upper bounds of the `ARRAY`. -/*----------+------------------* - | item | average_quantity | - +----------+------------------+ - | scissors | 54 | - | pencil | 58 | - | pen | 51 | - *----------+------------------*/ -``` +The `GENERATE_TIMESTAMP_ARRAY` function accepts the following data types as +inputs: -Note: For more information about when and when not to use -noise, see [Remove noise][dp-noise]. ++ `start_timestamp`: `TIMESTAMP` ++ `end_timestamp`: `TIMESTAMP` ++ `step_expression`: `INT64` ++ Allowed `date_part` values are: + `NANOSECOND` + (if the SQL engine supports it), + `MICROSECOND`, `MILLISECOND`, `SECOND`, `MINUTE`, `HOUR`, or `DAY`. -#### About explicit clamping - +The `step_expression` parameter determines the increment used to generate +timestamps. -In differentially private aggregate functions, clamping explicitly clamps the -total contribution from each privacy unit column to within a specified -range. +**Return Data Type** -Explicit bounds are uniformly applied to all aggregations. So even if some -aggregations have a wide range of values, and others have a narrow range of -values, the same bounds are applied to all of them. On the other hand, when -[implicit bounds][dp-imp-clamp] are inferred from the data, the bounds applied -to each aggregation can be different. +An `ARRAY` containing 0 or more `TIMESTAMP` values. -Explicit bounds should be chosen to reflect public information. -For example, bounding ages between 0 and 100 reflects public information -because the age of most people generally falls within this range. +**Examples** -Important: The results of the query reveal the explicit bounds. Do not use -explicit bounds based on the entity data; explicit bounds should be based on -public information. +The following example returns an `ARRAY` of `TIMESTAMP`s at intervals of 1 day. -#### About implicit clamping - +```sql +SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-07 00:00:00', + INTERVAL 1 DAY) AS timestamp_array; -In differentially private aggregate functions, explicit clamping is optional. -If you don't include this clause, clamping is implicit, -which means bounds are derived from the data itself in a differentially -private way. The process is somewhat random, so aggregations with identical -ranges can have different bounds. +/*--------------------------------------------------------------------------* + | timestamp_array | + +--------------------------------------------------------------------------+ + | [2016-10-05 00:00:00+00, 2016-10-06 00:00:00+00, 2016-10-07 00:00:00+00] | + *--------------------------------------------------------------------------*/ +``` -Implicit bounds are determined for each aggregation. So if some -aggregations have a wide range of values, and others have a narrow range of -values, implicit bounding can identify different bounds for different -aggregations as appropriate. Implicit bounds might be an advantage or a -disadvantage depending on your use case. Different bounds for different -aggregations can result in lower error. Different bounds also means that -different aggregations have different levels of uncertainty, which might not be -directly comparable. [Explicit bounds][dp-exp-clamp], on the other hand, -apply uniformly to all aggregations and should be derived from public -information. +The following example returns an `ARRAY` of `TIMESTAMP`s at intervals of 1 +second. -When clamping is implicit, part of the total epsilon is spent picking bounds. -This leaves less epsilon for aggregations, so these aggregations are noisier. +```sql +SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-05 00:00:02', + INTERVAL 1 SECOND) AS timestamp_array; -[dp-guide]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md +/*--------------------------------------------------------------------------* + | timestamp_array | + +--------------------------------------------------------------------------+ + | [2016-10-05 00:00:00+00, 2016-10-05 00:00:01+00, 2016-10-05 00:00:02+00] | + *--------------------------------------------------------------------------*/ +``` -[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause +The following example returns an `ARRAY` of `TIMESTAMPS` with a negative +interval. -[agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md +```sql +SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-06 00:00:00', '2016-10-01 00:00:00', + INTERVAL -2 DAY) AS timestamp_array; -[dp-exp-clamp]: #dp_explicit_clamping +/*--------------------------------------------------------------------------* + | timestamp_array | + +--------------------------------------------------------------------------+ + | [2016-10-06 00:00:00+00, 2016-10-04 00:00:00+00, 2016-10-02 00:00:00+00] | + *--------------------------------------------------------------------------*/ +``` -[dp-imp-clamp]: #dp_implicit_clamping +The following example returns an `ARRAY` with a single element, because +`start_timestamp` and `end_timestamp` have the same value. -[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views +```sql +SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-05 00:00:00', + INTERVAL 1 HOUR) AS timestamp_array; -[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise +/*--------------------------* + | timestamp_array | + +--------------------------+ + | [2016-10-05 00:00:00+00] | + *--------------------------*/ +``` -[implicit-limits]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#implicit_limits +The following example returns an empty `ARRAY`, because `start_timestamp` is +later than `end_timestamp`. -[dp-clamp-between]: #dp_clamp_between +```sql +SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-06 00:00:00', '2016-10-05 00:00:00', + INTERVAL 1 HOUR) AS timestamp_array; -[dp-clamp-between-imp]: #dp_clamp_between_implicit +/*-----------------* + | timestamp_array | + +-----------------+ + | [] | + *-----------------*/ +``` -[dp-clamped-named]: #dp_clamped_named +The following example returns a null `ARRAY`, because one of the inputs is +`NULL`. -[dp-clamped-named-imp]: #dp_clamped_named_implicit +```sql +SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', NULL, INTERVAL 1 HOUR) + AS timestamp_array; -## Approximate aggregate functions +/*-----------------* + | timestamp_array | + +-----------------+ + | NULL | + *-----------------*/ +``` -ZetaSQL supports approximate aggregate functions. -To learn about the syntax for aggregate function calls, see -[Aggregate function calls][agg-function-calls]. +The following example generates `ARRAY`s of `TIMESTAMP`s from columns containing +values for `start_timestamp` and `end_timestamp`. -Approximate aggregate functions are scalable in terms of memory usage and time, -but produce approximate results instead of exact results. These functions -typically require less memory than [exact aggregation functions][aggregate-functions-reference] -like `COUNT(DISTINCT ...)`, but also introduce statistical uncertainty. -This makes approximate aggregation appropriate for large data streams for -which linear memory usage is impractical, as well as for data that is -already approximate. +```sql +SELECT GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, INTERVAL 1 HOUR) + AS timestamp_array +FROM + (SELECT + TIMESTAMP '2016-10-05 00:00:00' AS start_timestamp, + TIMESTAMP '2016-10-05 02:00:00' AS end_timestamp + UNION ALL + SELECT + TIMESTAMP '2016-10-05 12:00:00' AS start_timestamp, + TIMESTAMP '2016-10-05 14:00:00' AS end_timestamp + UNION ALL + SELECT + TIMESTAMP '2016-10-05 23:59:00' AS start_timestamp, + TIMESTAMP '2016-10-06 01:59:00' AS end_timestamp); -The approximate aggregate functions in this section work directly on the -input data, rather than an intermediate estimation of the data. These functions -_do not allow_ users to specify the precision for the estimation with -sketches. If you would like to specify precision with sketches, see: +/*--------------------------------------------------------------------------* + | timestamp_array | + +--------------------------------------------------------------------------+ + | [2016-10-05 00:00:00+00, 2016-10-05 01:00:00+00, 2016-10-05 02:00:00+00] | + | [2016-10-05 12:00:00+00, 2016-10-05 13:00:00+00, 2016-10-05 14:00:00+00] | + | [2016-10-05 23:59:00+00, 2016-10-06 00:59:00+00, 2016-10-06 01:59:00+00] | + *--------------------------------------------------------------------------*/ +``` -+ [HyperLogLog++ functions][hll-functions] to estimate cardinality. -+ [KLL functions][kll-functions] to estimate quantile values. +### OFFSET and ORDINAL + +For information about using `OFFSET` and `ORDINAL` with arrays, see +[Array subscript operator][array-subscript-operator] and [Accessing array +elements][accessing-array-elements]. + + + +[array-subscript-operator]: #array_subscript_operator + +[accessing-array-elements]: https://github.com/google/zetasql/blob/master/docs/arrays.md#accessing_array_elements + + + +## Bit functions + +ZetaSQL supports the following bit functions. ### Function list @@ -7985,392 +7722,233 @@ sketches. If you would like to specify precision with sketches, see: - APPROX_COUNT_DISTINCT + BIT_CAST_TO_INT32 - Gets the approximate result for COUNT(DISTINCT expression). + Cast bits to an INT32 value. - APPROX_QUANTILES + BIT_CAST_TO_INT64 - Gets the approximate quantile boundaries. + Cast bits to an INT64 value. - APPROX_TOP_COUNT + BIT_CAST_TO_UINT32 - Gets the approximate top elements and their approximate count. + Cast bits to an UINT32 value. - APPROX_TOP_SUM + BIT_CAST_TO_UINT64 - Gets the approximate top elements and sum, based on the approximate sum - of an assigned weight. + Cast bits to an UINT64 value. + + + + + BIT_COUNT + + + + Gets the number of bits that are set in an input expression. -### `APPROX_COUNT_DISTINCT` +### `BIT_CAST_TO_INT32` ```sql -APPROX_COUNT_DISTINCT( - expression -) +BIT_CAST_TO_INT32(value) ``` **Description** -Returns the approximate result for `COUNT(DISTINCT expression)`. The value -returned is a statistical estimate, not necessarily the actual value. - -This function is less accurate than `COUNT(DISTINCT expression)`, but performs -better on huge input. - -**Supported Argument Types** +ZetaSQL supports bit casting to `INT32`. A bit +cast is a cast in which the order of bits is preserved instead of the value +those bytes represent. -Any data type **except**: +The `value` parameter can represent: -+ `ARRAY` -+ `STRUCT` -+ `PROTO` ++ `INT32` ++ `UINT32` -**Returned Data Types** +**Return Data Type** -`INT64` +`INT32` **Examples** ```sql -SELECT APPROX_COUNT_DISTINCT(x) as approx_distinct -FROM UNNEST([0, 1, 1, 2, 3, 5]) as x; +SELECT BIT_CAST_TO_UINT32(-1) as UINT32_value, BIT_CAST_TO_INT32(BIT_CAST_TO_UINT32(-1)) as bit_cast_value; -/*-----------------* - | approx_distinct | - +-----------------+ - | 5 | - *-----------------*/ +/*---------------+----------------------* + | UINT32_value | bit_cast_value | + +---------------+----------------------+ + | 4294967295 | -1 | + *---------------+----------------------*/ ``` -### `APPROX_QUANTILES` +### `BIT_CAST_TO_INT64` ```sql -APPROX_QUANTILES( - [ DISTINCT ] - expression, number - [ { IGNORE | RESPECT } NULLS ] - [ HAVING { MAX | MIN } expression2 ] -) +BIT_CAST_TO_INT64(value) ``` **Description** -Returns the approximate boundaries for a group of `expression` values, where -`number` represents the number of quantiles to create. This function returns an -array of `number` + 1 elements, sorted in ascending order, where the -first element is the approximate minimum and the last element is the approximate -maximum. - -Returns `NULL` if there are zero input rows or `expression` evaluates to -`NULL` for all rows. - -To learn more about the optional aggregate clauses that you can pass -into this function, see -[Aggregate function calls][aggregate-function-calls]. - - - -[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md - - - -**Supported Argument Types** - -+ `expression`: Any supported data type **except**: - - + `ARRAY` - + `STRUCT` - + `PROTO` -+ `number`: `INT64` literal or query parameter. - -**Returned Data Types** - -`ARRAY` where `T` is the type specified by `expression`. - -**Examples** - -```sql -SELECT APPROX_QUANTILES(x, 2) AS approx_quantiles -FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; - -/*------------------* - | approx_quantiles | - +------------------+ - | [1, 5, 10] | - *------------------*/ -``` - -```sql -SELECT APPROX_QUANTILES(x, 100)[OFFSET(90)] AS percentile_90 -FROM UNNEST([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AS x; +ZetaSQL supports bit casting to `INT64`. A bit +cast is a cast in which the order of bits is preserved instead of the value +those bytes represent. -/*---------------* - | percentile_90 | - +---------------+ - | 9 | - *---------------*/ -``` +The `value` parameter can represent: -```sql -SELECT APPROX_QUANTILES(DISTINCT x, 2) AS approx_quantiles -FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; ++ `INT64` ++ `UINT64` -/*------------------* - | approx_quantiles | - +------------------+ - | [1, 6, 10] | - *------------------*/ -``` +**Return Data Type** -```sql -SELECT APPROX_QUANTILES(x, 2 RESPECT NULLS) AS approx_quantiles -FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; +`INT64` -/*------------------* - | approx_quantiles | - +------------------+ - | [NULL, 4, 10] | - *------------------*/ -``` +**Example** ```sql -SELECT APPROX_QUANTILES(DISTINCT x, 2 RESPECT NULLS) AS approx_quantiles -FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x; +SELECT BIT_CAST_TO_UINT64(-1) as UINT64_value, BIT_CAST_TO_INT64(BIT_CAST_TO_UINT64(-1)) as bit_cast_value; -/*------------------* - | approx_quantiles | - +------------------+ - | [NULL, 6, 10] | - *------------------*/ +/*-----------------------+----------------------* + | UINT64_value | bit_cast_value | + +-----------------------+----------------------+ + | 18446744073709551615 | -1 | + *-----------------------+----------------------*/ ``` -### `APPROX_TOP_COUNT` +### `BIT_CAST_TO_UINT32` ```sql -APPROX_TOP_COUNT( - expression, number - [ HAVING { MAX | MIN } expression2 ] -) +BIT_CAST_TO_UINT32(value) ``` **Description** -Returns the approximate top elements of `expression` as an array of `STRUCT`s. -The `number` parameter specifies the number of elements returned. - -Each `STRUCT` contains two fields. The first field (named `value`) contains an -input value. The second field (named `count`) contains an `INT64` specifying the -number of times the value was returned. - -Returns `NULL` if there are zero input rows. - -To learn more about the optional aggregate clauses that you can pass -into this function, see -[Aggregate function calls][aggregate-function-calls]. - - - -[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md - - +ZetaSQL supports bit casting to `UINT32`. A bit +cast is a cast in which the order of bits is preserved instead of the value +those bytes represent. -**Supported Argument Types** +The `value` parameter can represent: -+ `expression`: Any data type that the `GROUP BY` clause supports. -+ `number`: `INT64` literal or query parameter. ++ `INT32` ++ `UINT32` -**Returned Data Types** +**Return Data Type** -`ARRAY` +`UINT32` **Examples** ```sql -SELECT APPROX_TOP_COUNT(x, 2) as approx_top_count -FROM UNNEST(["apple", "apple", "pear", "pear", "pear", "banana"]) as x; - -/*-------------------------* - | approx_top_count | - +-------------------------+ - | [{pear, 3}, {apple, 2}] | - *-------------------------*/ -``` - -**NULL handling** - -`APPROX_TOP_COUNT` does not ignore `NULL`s in the input. For example: - -```sql -SELECT APPROX_TOP_COUNT(x, 2) as approx_top_count -FROM UNNEST([NULL, "pear", "pear", "pear", "apple", NULL]) as x; +SELECT -1 as UINT32_value, BIT_CAST_TO_UINT32(-1) as bit_cast_value; -/*------------------------* - | approx_top_count | - +------------------------+ - | [{pear, 3}, {NULL, 2}] | - *------------------------*/ +/*--------------+----------------------* + | UINT32_value | bit_cast_value | + +--------------+----------------------+ + | -1 | 4294967295 | + *--------------+----------------------*/ ``` -### `APPROX_TOP_SUM` +### `BIT_CAST_TO_UINT64` ```sql -APPROX_TOP_SUM( - expression, weight, number - [ HAVING { MAX | MIN } expression2 ] -) +BIT_CAST_TO_UINT64(value) ``` **Description** -Returns the approximate top elements of `expression`, based on the sum of an -assigned `weight`. The `number` parameter specifies the number of elements -returned. - -If the `weight` input is negative or `NaN`, this function returns an error. - -The elements are returned as an array of `STRUCT`s. -Each `STRUCT` contains two fields: `value` and `sum`. -The `value` field contains the value of the input expression. The `sum` field is -the same type as `weight`, and is the approximate sum of the input weight -associated with the `value` field. - -Returns `NULL` if there are zero input rows. - -To learn more about the optional aggregate clauses that you can pass -into this function, see -[Aggregate function calls][aggregate-function-calls]. - - - -[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md - - - -**Supported Argument Types** +ZetaSQL supports bit casting to `UINT64`. A bit +cast is a cast in which the order of bits is preserved instead of the value +those bytes represent. -+ `expression`: Any data type that the `GROUP BY` clause supports. -+ `weight`: One of the following: +The `value` parameter can represent: - + `INT64` - + `UINT64` - + `NUMERIC` - + `BIGNUMERIC` - + `DOUBLE` -+ `number`: `INT64` literal or query parameter. ++ `INT64` ++ `UINT64` -**Returned Data Types** +**Return Data Type** -`ARRAY` +`UINT64` -**Examples** +**Example** ```sql -SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM -UNNEST([ - STRUCT("apple" AS x, 3 AS weight), - ("pear", 2), - ("apple", 0), - ("banana", 5), - ("pear", 4) -]); +SELECT -1 as INT64_value, BIT_CAST_TO_UINT64(-1) as bit_cast_value; -/*--------------------------* - | approx_top_sum | - +--------------------------+ - | [{pear, 6}, {banana, 5}] | - *--------------------------*/ +/*--------------+----------------------* + | INT64_value | bit_cast_value | + +--------------+----------------------+ + | -1 | 18446744073709551615 | + *--------------+----------------------*/ ``` -**NULL handling** - -`APPROX_TOP_SUM` does not ignore `NULL` values for the `expression` and `weight` -parameters. - -```sql -SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM -UNNEST([STRUCT("apple" AS x, NULL AS weight), ("pear", 0), ("pear", NULL)]); - -/*----------------------------* - | approx_top_sum | - +----------------------------+ - | [{pear, 0}, {apple, NULL}] | - *----------------------------*/ -``` +### `BIT_COUNT` ```sql -SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM -UNNEST([STRUCT("apple" AS x, 0 AS weight), (NULL, 2)]); - -/*-------------------------* - | approx_top_sum | - +-------------------------+ - | [{NULL, 2}, {apple, 0}] | - *-------------------------*/ +BIT_COUNT(expression) ``` -```sql -SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM -UNNEST([STRUCT("apple" AS x, 0 AS weight), (NULL, NULL)]); - -/*----------------------------* - | approx_top_sum | - +----------------------------+ - | [{apple, 0}, {NULL, NULL}] | - *----------------------------*/ -``` +**Description** -[hll-functions]: #hyperloglog_functions +The input, `expression`, must be an +integer or `BYTES`. -[kll-functions]: #kll_quantile_functions +Returns the number of bits that are set in the input `expression`. +For signed integers, this is the number of bits in two's complement form. -[aggregate-functions-reference]: #aggregate_functions +**Return Data Type** -[agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md +`INT64` -## HyperLogLog++ functions - +**Example** -The [HyperLogLog++ algorithm (HLL++)][hll-sketches] estimates -[cardinality][cardinality] from [sketches][hll-sketches]. +```sql +SELECT a, BIT_COUNT(a) AS a_bits, FORMAT("%T", b) as b, BIT_COUNT(b) AS b_bits +FROM UNNEST([ + STRUCT(0 AS a, b'' AS b), (0, b'\x00'), (5, b'\x05'), (8, b'\x00\x08'), + (0xFFFF, b'\xFF\xFF'), (-2, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFE'), + (-1, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF'), + (NULL, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF') +]) AS x; -HLL++ functions are approximate aggregate functions. -Approximate aggregation typically requires less -memory than exact aggregation functions, -like [`COUNT(DISTINCT)`][count-distinct], but also introduces statistical error. -This makes HLL++ functions appropriate for large data streams for -which linear memory usage is impractical, as well as for data that is -already approximate. +/*-------+--------+---------------------------------------------+--------* + | a | a_bits | b | b_bits | + +-------+--------+---------------------------------------------+--------+ + | 0 | 0 | b"" | 0 | + | 0 | 0 | b"\x00" | 0 | + | 5 | 2 | b"\x05" | 2 | + | 8 | 1 | b"\x00\x08" | 1 | + | 65535 | 16 | b"\xff\xff" | 16 | + | -2 | 63 | b"\xff\xff\xff\xff\xff\xff\xff\xfe" | 63 | + | -1 | 64 | b"\xff\xff\xff\xff\xff\xff\xff\xff" | 64 | + | NULL | NULL | b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff" | 80 | + *-------+--------+---------------------------------------------+--------*/ +``` -If you do not need materialized sketches, you can alternatively use an -[approximate aggregate function with system-defined precision][approx-functions-reference], -such as [`APPROX_COUNT_DISTINCT`][approx-count-distinct]. However, -`APPROX_COUNT_DISTINCT` does not allow partial aggregations, re-aggregations, -and custom precision. +## Conversion functions -ZetaSQL supports the following HLL++ functions: +ZetaSQL supports conversion functions. These data type +conversions are explicit, but some conversions can happen implicitly. You can +learn more about implicit and explicit conversion [here][conversion-rules]. ### Function list @@ -8384,1145 +7962,1539 @@ ZetaSQL supports the following HLL++ functions: - HLL_COUNT.EXTRACT - - - - Extracts a cardinality estimate of an HLL++ sketch. - - - - - HLL_COUNT.INIT - - - - Aggregates values of the same underlying type into a new HLL++ sketch. - - - - - HLL_COUNT.MERGE + CAST - Merges HLL++ sketches of the same underlying type into a new sketch, and - then gets the cardinality of the new sketch. + Convert the results of an expression to the given type. - HLL_COUNT.MERGE_PARTIAL + SAFE_CAST - Merges HLL++ sketches of the same underlying type into a new sketch. + Similar to the CAST function, but returns NULL + when a runtime error is produced. -### `HLL_COUNT.EXTRACT` +### `CAST` + -``` -HLL_COUNT.EXTRACT(sketch) +```sql +CAST(expression AS typename [format_clause]) ``` **Description** -A scalar function that extracts a cardinality estimate of a single -[HLL++][hll-link-to-research-whitepaper] sketch. +Cast syntax is used in a query to indicate that the result type of an +expression should be converted to some other type. -If `sketch` is `NULL`, this function returns a cardinality estimate of `0`. +When using `CAST`, a query can fail if ZetaSQL is unable to perform +the cast. If you want to protect your queries from these types of errors, you +can use [SAFE_CAST][con-func-safecast]. -**Supported input types** +Casts between supported types that do not successfully map from the original +value to the target domain produce runtime errors. For example, casting +`BYTES` to `STRING` where the byte sequence is not valid UTF-8 results in a +runtime error. -`BYTES` +Other examples include: -**Return type** ++ Casting `INT64` to `INT32` where the value overflows `INT32`. ++ Casting `STRING` to `INT32` where the `STRING` contains non-digit characters. -`INT64` +Some casts can include a [format clause][formatting-syntax], which provides +instructions for how to conduct the +cast. For example, you could +instruct a cast to convert a sequence of bytes to a BASE64-encoded string +instead of a UTF-8-encoded string. -**Example** +The structure of the format clause is unique to each type of cast and more +information is available in the section for that cast. -The following query returns the number of distinct users for each country who -have at least one invoice. +**Examples** -```sql -SELECT - country, - HLL_COUNT.EXTRACT(HLL_sketch) AS distinct_customers_with_open_invoice -FROM - ( - SELECT - country, - HLL_COUNT.INIT(customer_id) AS hll_sketch - FROM - UNNEST( - ARRAY>[ - ('UA', 'customer_id_1', 'invoice_id_11'), - ('BR', 'customer_id_3', 'invoice_id_31'), - ('CZ', 'customer_id_2', 'invoice_id_22'), - ('CZ', 'customer_id_2', 'invoice_id_23'), - ('BR', 'customer_id_3', 'invoice_id_31'), - ('UA', 'customer_id_2', 'invoice_id_24')]) - GROUP BY country - ); +The following query results in `"true"` if `x` is `1`, `"false"` for any other +non-`NULL` value, and `NULL` if `x` is `NULL`. -/*---------+--------------------------------------* - | country | distinct_customers_with_open_invoice | - +---------+--------------------------------------+ - | UA | 2 | - | BR | 1 | - | CZ | 1 | - *---------+--------------------------------------*/ +```sql +CAST(x=1 AS STRING) ``` -[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html - -### `HLL_COUNT.INIT` +### CAST AS ARRAY -``` -HLL_COUNT.INIT(input [, precision]) +```sql +CAST(expression AS ARRAY) ``` **Description** -An aggregate function that takes one or more `input` values and aggregates them -into a [HLL++][hll-link-to-research-whitepaper] sketch. Each sketch -is represented using the `BYTES` data type. You can then merge sketches using -`HLL_COUNT.MERGE` or `HLL_COUNT.MERGE_PARTIAL`. If no merging is needed, -you can extract the final count of distinct values from the sketch using -`HLL_COUNT.EXTRACT`. +ZetaSQL supports [casting][con-func-cast] to `ARRAY`. The +`expression` parameter can represent an expression for these data types: -This function supports an optional parameter, `precision`. This parameter -defines the accuracy of the estimate at the cost of additional memory required -to process the sketches or store them on disk. The range for this value is -`10` to `24`. The default value is `15`. For more information about precision, -see [Precision for sketches][precision_hll]. ++ `ARRAY` -If the input is `NULL`, this function returns `NULL`. +**Conversion rules** -For more information, see [HyperLogLog in Practice: Algorithmic Engineering of -a State of The Art Cardinality Estimation Algorithm][hll-link-to-research-whitepaper]. + + + + + + + + + + + +
FromToRule(s) when casting x
ARRAYARRAY + + The element types of the input + array must be castable to the + element types of the target array. + For example, casting from type + ARRAY<INT64> to + ARRAY<DOUBLE> or + ARRAY<STRING> is valid; + casting from type ARRAY<INT64> + to ARRAY<BYTES> is not valid. + +
-**Supported input types** +### CAST AS BIGNUMERIC + + +```sql +CAST(expression AS BIGNUMERIC) +``` + +**Description** + +ZetaSQL supports [casting][con-func-cast] to `BIGNUMERIC`. The +`expression` parameter can represent an expression for these data types: ++ `INT32` ++ `UINT32` + `INT64` + `UINT64` ++ `FLOAT` ++ `DOUBLE` + `NUMERIC` + `BIGNUMERIC` + `STRING` -+ `BYTES` - -**Return type** - -`BYTES` -**Example** +**Conversion rules** -The following query creates HLL++ sketches that count the number of distinct -users with at least one invoice per country. + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
Floating PointBIGNUMERIC + The floating point number will round + half away from zero. -```sql -SELECT - country, - HLL_COUNT.INIT(customer_id, 10) - AS hll_sketch -FROM - UNNEST( - ARRAY>[ - ('UA', 'customer_id_1', 'invoice_id_11'), - ('CZ', 'customer_id_2', 'invoice_id_22'), - ('CZ', 'customer_id_2', 'invoice_id_23'), - ('BR', 'customer_id_3', 'invoice_id_31'), - ('UA', 'customer_id_2', 'invoice_id_24')]) -GROUP BY country; - -/*---------+------------------------------------------------------------------------------------* - | country | hll_sketch | - +---------+------------------------------------------------------------------------------------+ - | UA | "\010p\020\002\030\002 \013\202\007\r\020\002\030\n \0172\005\371\344\001\315\010" | - | CZ | "\010p\020\002\030\002 \013\202\007\013\020\001\030\n \0172\003\371\344\001" | - | BR | "\010p\020\001\030\002 \013\202\007\013\020\001\030\n \0172\003\202\341\001" | - *---------+------------------------------------------------------------------------------------*/ -``` - -[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html + Casting a NaN, +inf or + -inf will return an error. Casting a value outside the range + of BIGNUMERIC returns an overflow error. +
STRINGBIGNUMERIC + The numeric literal contained in the string must not exceed + the maximum precision or range of the + BIGNUMERIC type, or an error will occur. If the number of + digits after the decimal point exceeds 38, then the resulting + BIGNUMERIC value will round + half away from zero -[precision_hll]: https://github.com/google/zetasql/blob/master/docs/sketches.md#precision_hll + to have 38 digits after the decimal point. +
-### `HLL_COUNT.MERGE` +### CAST AS BOOL -``` -HLL_COUNT.MERGE(sketch) +```sql +CAST(expression AS BOOL) ``` **Description** -An aggregate function that returns the cardinality of several -[HLL++][hll-link-to-research-whitepaper] sketches by computing their union. - -Each `sketch` must be initialized on the same type. Attempts to merge sketches -for different types results in an error. For example, you cannot merge a sketch -initialized from `INT64` data with one initialized from `STRING` data. - -If the merged sketches were initialized with different precisions, the precision -will be downgraded to the lowest precision involved in the merge. - -This function ignores `NULL` values when merging sketches. If the merge happens -over zero rows or only over `NULL` values, the function returns `0`. - -**Supported input types** - -`BYTES` +ZetaSQL supports [casting][con-func-cast] to `BOOL`. The +`expression` parameter can represent an expression for these data types: -**Return type** ++ `INT32` ++ `UINT32` ++ `INT64` ++ `UINT64` ++ `BOOL` ++ `STRING` -`INT64` +**Conversion rules** -**Example** + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
IntegerBOOL + Returns FALSE if x is 0, + TRUE otherwise. +
STRINGBOOL + Returns TRUE if x is "true" and + FALSE if x is "false"
+ All other values of x are invalid and throw an error instead + of casting to a boolean.
+ A string is case-insensitive when converting + to a boolean. +
- The following query counts the number of distinct users across all countries - who have at least one invoice. +### CAST AS BYTES ```sql -SELECT HLL_COUNT.MERGE(hll_sketch) AS distinct_customers_with_open_invoice -FROM - ( - SELECT - country, - HLL_COUNT.INIT(customer_id) AS hll_sketch - FROM - UNNEST( - ARRAY>[ - ('UA', 'customer_id_1', 'invoice_id_11'), - ('BR', 'customer_id_3', 'invoice_id_31'), - ('CZ', 'customer_id_2', 'invoice_id_22'), - ('CZ', 'customer_id_2', 'invoice_id_23'), - ('BR', 'customer_id_3', 'invoice_id_31'), - ('UA', 'customer_id_2', 'invoice_id_24')]) - GROUP BY country - ); - -/*--------------------------------------* - | distinct_customers_with_open_invoice | - +--------------------------------------+ - | 3 | - *--------------------------------------*/ -``` - -[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html - -### `HLL_COUNT.MERGE_PARTIAL` - -``` -HLL_COUNT.MERGE_PARTIAL(sketch) +CAST(expression AS BYTES [format_clause]) ``` **Description** -An aggregate function that takes one or more -[HLL++][hll-link-to-research-whitepaper] `sketch` -inputs and merges them into a new sketch. - -Each `sketch` must be initialized on the same type. Attempts to merge sketches -for different types results in an error. For example, you cannot merge a sketch -initialized from `INT64` data with one initialized from `STRING` data. - -If the merged sketches were initialized with different precisions, the precision -will be downgraded to the lowest precision involved in the merge. For example, -if `MERGE_PARTIAL` encounters sketches of precision 14 and 15, the returned new -sketch will have precision 14. +ZetaSQL supports [casting][con-func-cast] to `BYTES`. The +`expression` parameter can represent an expression for these data types: -This function returns `NULL` if there is no input or all inputs are `NULL`. ++ `BYTES` ++ `STRING` ++ `PROTO` -**Supported input types** +**Format clause** -`BYTES` +When an expression of one type is cast to another type, you can use the +[format clause][formatting-syntax] to provide instructions for how to conduct +the cast. You can use the format clause in this section if `expression` is a +`STRING`. -**Return type** ++ [Format string as bytes][format-string-as-bytes] -`BYTES` +**Conversion rules** -**Example** + + + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
STRINGBYTES + Strings are cast to bytes using UTF-8 encoding. For example, + the string "©", when cast to + bytes, would become a 2-byte sequence with the + hex values C2 and A9. +
PROTOBYTES + Returns the proto2 wire format bytes + of x. +
-The following query returns an HLL++ sketch that counts the number of distinct -users who have at least one invoice across all countries. +### CAST AS DATE ```sql -SELECT HLL_COUNT.MERGE_PARTIAL(HLL_sketch) AS distinct_customers_with_open_invoice -FROM - ( - SELECT - country, - HLL_COUNT.INIT(customer_id) AS hll_sketch - FROM - UNNEST( - ARRAY>[ - ('UA', 'customer_id_1', 'invoice_id_11'), - ('BR', 'customer_id_3', 'invoice_id_31'), - ('CZ', 'customer_id_2', 'invoice_id_22'), - ('CZ', 'customer_id_2', 'invoice_id_23'), - ('BR', 'customer_id_3', 'invoice_id_31'), - ('UA', 'customer_id_2', 'invoice_id_24')]) - GROUP BY country - ); - -/*----------------------------------------------------------------------------------------------* - | distinct_customers_with_open_invoice | - +----------------------------------------------------------------------------------------------+ - | "\010p\020\006\030\002 \013\202\007\020\020\003\030\017 \0242\010\320\2408\352}\244\223\002" | - *----------------------------------------------------------------------------------------------*/ +CAST(expression AS DATE [format_clause]) ``` -[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html - -[hll-sketches]: https://github.com/google/zetasql/blob/master/docs/sketches.md#sketches_hll - -[cardinality]: https://en.wikipedia.org/wiki/Cardinality - -[count-distinct]: #count - -[approx-count-distinct]: #approx-count-distinct - -[approx-functions-reference]: #approximate_aggregate_functions +**Description** -## KLL quantile functions +ZetaSQL supports [casting][con-func-cast] to `DATE`. The `expression` +parameter can represent an expression for these data types: -ZetaSQL supports KLL functions. ++ `STRING` ++ `DATETIME` ++ `TIMESTAMP` -The [KLL16 algorithm][kll-sketches] estimates -quantiles from [sketches][kll-sketches]. If you do not want -to work with sketches and do not need customized precision, consider -using [approximate aggregate functions][approx-functions-reference] -with system-defined precision. +**Format clause** -KLL functions are approximate aggregate functions. -Approximate aggregation requires significantly less memory than an exact -quantiles computation, but also introduces statistical error. -This makes approximate aggregation appropriate for large data streams for -which linear memory usage is impractical, as well as for data that is -already approximate. +When an expression of one type is cast to another type, you can use the +[format clause][formatting-syntax] to provide instructions for how to conduct +the cast. You can use the format clause in this section if `expression` is a +`STRING`. -Note: While `APPROX_QUANTILES` is also returning approximate quantile results, -the functions from this section allow for partial aggregations and -re-aggregations. ++ [Format string as date and time][format-string-as-date-time] -### Function list +**Conversion rules** - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + +
NameSummary
KLL_QUANTILES.EXTRACT_INT64 - - - Gets a selected number of quantiles from an - INT64-initialized KLL sketch. -
KLL_QUANTILES.EXTRACT_UINT64 - - - Gets a selected number of quantiles from an - UINT64-initialized KLL sketch. -
KLL_QUANTILES.EXTRACT_DOUBLE - - - Gets a selected number of quantiles from a - DOUBLE-initialized KLL sketch. -
KLL_QUANTILES.EXTRACT_POINT_INT64 - - - Gets a specific quantile from an - INT64-initialized KLL sketch. -
KLL_QUANTILES.EXTRACT_POINT_UINT64 +
FromToRule(s) when casting x
STRINGDATE + When casting from string to date, the string must conform to + the supported date literal format, and is independent of time zone. If the + string expression is invalid or represents a date that is outside of the + supported min/max range, then an error is produced. +
TIMESTAMPDATE + Casting from a timestamp to date effectively truncates the timestamp as + of the default time zone. +
- - - Gets a specific quantile from an - UINT64-initialized KLL sketch. - - - - KLL_QUANTILES.EXTRACT_POINT_DOUBLE +### CAST AS DATETIME - - - Gets a specific quantile from a - DOUBLE-initialized KLL sketch. - - +```sql +CAST(expression AS DATETIME [format_clause]) +``` - - KLL_QUANTILES.INIT_INT64 +**Description** - - - Aggregates values into an - INT64-initialized KLL sketch. - - - - KLL_QUANTILES.INIT_UINT64 +ZetaSQL supports [casting][con-func-cast] to `DATETIME`. The +`expression` parameter can represent an expression for these data types: - - - Aggregates values into an - UINT64-initialized KLL sketch. - - - - KLL_QUANTILES.INIT_DOUBLE ++ `STRING` ++ `DATETIME` ++ `TIMESTAMP` - - - Aggregates values into a - DOUBLE-initialized KLL sketch. - - +**Format clause** - - KLL_QUANTILES.MERGE_INT64 +When an expression of one type is cast to another type, you can use the +[format clause][formatting-syntax] to provide instructions for how to conduct +the cast. You can use the format clause in this section if `expression` is a +`STRING`. - - - Merges INT64-initialized KLL sketches into a new sketch, and - then gets the quantiles from the new sketch. - - - - KLL_QUANTILES.MERGE_UINT64 ++ [Format string as date and time][format-string-as-date-time] - - - Merges UINT64-initialized KLL sketches into a new sketch, and - then gets the quantiles from the new sketch. - - - - KLL_QUANTILES.MERGE_DOUBLE +**Conversion rules** - - - Merges DOUBLE-initialized KLL sketches - into a new sketch, and then gets the quantiles from the new sketch. - - + + + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
STRINGDATETIME + When casting from string to datetime, the string must conform to the + supported datetime literal format, and is independent of time zone. If + the string expression is invalid or represents a datetime that is outside + of the supported min/max range, then an error is produced. +
TIMESTAMPDATETIME + Casting from a timestamp to datetime effectively truncates the timestamp + as of the default time zone. +
- - KLL_QUANTILES.MERGE_PARTIAL +### CAST AS ENUM - - - Merges KLL sketches of the same underlying type into a new sketch. - - +```sql +CAST(expression AS ENUM) +``` - - KLL_QUANTILES.MERGE_POINT_INT64 +**Description** - - - Merges INT64-initialized KLL sketches into a new sketch, and - then gets a specific quantile from the new sketch. - - - - KLL_QUANTILES.MERGE_POINT_UINT64 +ZetaSQL supports [casting][con-func-cast] to `ENUM`. The `expression` +parameter can represent an expression for these data types: - - - Merges UINT64-initialized KLL sketches into a new sketch, and - then gets a specific quantile from the new sketch. - - - - KLL_QUANTILES.MERGE_POINT_DOUBLE ++ `INT32` ++ `UINT32` ++ `INT64` ++ `UINT64` ++ `STRING` ++ `ENUM` - - - Merges DOUBLE-initialized KLL sketches - into a new sketch, and then gets a specific quantile from the new sketch. - - +**Conversion rules** - + + + + + + + + + + +
FromToRule(s) when casting x
ENUMENUMMust have the same enum name.
-### `KLL_QUANTILES.EXTRACT_INT64` +### CAST AS Floating Point + ```sql -KLL_QUANTILES.EXTRACT_INT64(sketch, number) +CAST(expression AS DOUBLE) +``` + +```sql +CAST(expression AS FLOAT) ``` **Description** -Takes a single KLL sketch as `BYTES` and returns a selected `number` of -quantiles. The output is an `ARRAY` containing the exact minimum value from the -input data that you used to initialize the sketch, each approximate quantile, -and the exact maximum value from the initial input data. The values are sorted -in ascending order. This is a scalar function, similar to -`KLL_QUANTILES.MERGE_INT64`, but a scalar function rather than an -aggregate function. +ZetaSQL supports [casting][con-func-cast] to floating point types. +The `expression` parameter can represent an expression for these data types: -Returns an error if the underlying type of the input sketch is not compatible -with type `INT64`. - -Returns an error if the input is not a valid KLL quantiles sketch. - -**Supported Argument Types** - -+ `sketch`: `BYTES` KLL sketch initialized on `INT64` data type -+ `number`: `INT64` - -**Return Type** ++ `INT32` ++ `UINT32` ++ `INT64` ++ `UINT64` ++ `FLOAT` ++ `DOUBLE` ++ `NUMERIC` ++ `BIGNUMERIC` ++ `STRING` -`ARRAY` +**Conversion rules** -**Example** + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
IntegerFloating Point + Returns a close but potentially not exact floating point value. +
NUMERICFloating Point + NUMERIC will convert to the closest floating point number + with a possible loss of precision. +
BIGNUMERICFloating Point + BIGNUMERIC will convert to the closest floating point number + with a possible loss of precision. +
STRINGFloating Point + Returns x as a floating point value, interpreting it as + having the same form as a valid floating point literal. + Also supports casts from "[+,-]inf" to + [,-]Infinity, + "[+,-]infinity" to [,-]Infinity, and + "[+,-]nan" to NaN. + Conversions are case-insensitive. +
-The following query initializes a KLL sketch from five rows of data. Then -it returns an `ARRAY` containing the minimum, median, and maximum values in the -input sketch. +### CAST AS Integer + ```sql -SELECT KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS median -FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 1 AS x UNION ALL - SELECT 2 AS x UNION ALL - SELECT 3 AS x UNION ALL - SELECT 4 AS x UNION ALL - SELECT 5 AS x)); +CAST(expression AS INT32) +``` -/*---------* - | median | - +---------+ - | [1,3,5] | - *---------*/ +```sql +CAST(expression AS UINT32) ``` -### `KLL_QUANTILES.EXTRACT_UINT64` +```sql +CAST(expression AS INT64) +``` ```sql -KLL_QUANTILES.EXTRACT_UINT64(sketch, number) +CAST(expression AS UINT64) ``` **Description** -Like [`KLL_QUANTILES.EXTRACT_INT64`](#kll-quantilesextract-int64), -but accepts KLL sketches initialized on data of type `UINT64`. +ZetaSQL supports [casting][con-func-cast] to integer types. +The `expression` parameter can represent an expression for these data types: -**Supported Argument Types** ++ `INT32` ++ `UINT32` ++ `INT64` ++ `UINT64` ++ `FLOAT` ++ `DOUBLE` ++ `NUMERIC` ++ `BIGNUMERIC` ++ `ENUM` ++ `BOOL` ++ `STRING` -+ `sketch`: `BYTES` KLL sketch initialized on `UINT64` data type -+ `number`: `INT64` +**Conversion rules** -**Return Type** + + + + + + + + + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
+ Floating Point + + Integer + + Returns the closest integer value.
+ Halfway cases such as 1.5 or -0.5 round away from zero. +
BOOLInteger + Returns 1 if x is TRUE, + 0 otherwise. +
STRINGInteger + A hex string can be cast to an integer. For example, + 0x123 to 291 or -0x123 to + -291. +
-`ARRAY` +**Examples** -### `KLL_QUANTILES.EXTRACT_DOUBLE` +If you are working with hex strings (`0x123`), you can cast those strings as +integers: ```sql -KLL_QUANTILES.EXTRACT_DOUBLE(sketch, number) -``` - -**Description** - -Like [`KLL_QUANTILES.EXTRACT_INT64`](#kll-quantilesextract-int64), -but accepts KLL sketches initialized on data of type -`DOUBLE`. - -**Supported Argument Types** +SELECT '0x123' as hex_value, CAST('0x123' as INT64) as hex_to_int; -+ `sketch`: `BYTES` KLL sketch initialized on - `DOUBLE` data type -+ `number`: `INT64` +/*-----------+------------* + | hex_value | hex_to_int | + +-----------+------------+ + | 0x123 | 291 | + *-----------+------------*/ +``` -**Return Type** +```sql +SELECT '-0x123' as hex_value, CAST('-0x123' as INT64) as hex_to_int; -`ARRAY` +/*-----------+------------* + | hex_value | hex_to_int | + +-----------+------------+ + | -0x123 | -291 | + *-----------+------------*/ +``` -### `KLL_QUANTILES.EXTRACT_POINT_INT64` +### CAST AS INTERVAL ```sql -KLL_QUANTILES.EXTRACT_POINT_INT64(sketch, phi) +CAST(expression AS INTERVAL) ``` **Description** -Takes a single KLL sketch as `BYTES` and returns a single quantile. -The `phi` argument specifies the quantile to return as a fraction of the total -number of rows in the input, normalized between 0 and 1. This means that the -function will return a value *v* such that approximately Φ * *n* inputs are less -than or equal to *v*, and a (1-Φ) * *n* inputs are greater than or equal to *v*. -This is a scalar function. - -Returns an error if the underlying type of the input sketch is not compatible -with type `INT64`. - -Returns an error if the input is not a valid KLL quantiles sketch. - -**Supported Argument Types** +ZetaSQL supports [casting][con-func-cast] to `INTERVAL`. The +`expression` parameter can represent an expression for these data types: -+ `sketch`: `BYTES` KLL sketch initialized on `INT64` data type -+ `phi`: `DOUBLE` between 0 and 1 ++ `STRING` -**Return Type** +**Conversion rules** -`INT64` + + + + + + + + + + + +
FromToRule(s) when casting x
STRINGINTERVAL + When casting from string to interval, the string must conform to either + ISO 8601 Duration -**Example** + standard or to interval literal + format 'Y-M D H:M:S.F'. Partial interval literal formats are also accepted + when they are not ambiguous, for example 'H:M:S'. + If the string expression is invalid or represents an interval that is + outside of the supported min/max range, then an error is produced. +
-The following query initializes a KLL sketch from five rows of data. Then -it returns the value of the eighth decile or 80th percentile of the sketch. +**Examples** ```sql -SELECT KLL_QUANTILES.EXTRACT_POINT_INT64(kll_sketch, .8) AS quintile -FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 1 AS x UNION ALL - SELECT 2 AS x UNION ALL - SELECT 3 AS x UNION ALL - SELECT 4 AS x UNION ALL - SELECT 5 AS x)); +SELECT input, CAST(input AS INTERVAL) AS output +FROM UNNEST([ + '1-2 3 10:20:30.456', + '1-2', + '10:20:30', + 'P1Y2M3D', + 'PT10H20M30,456S' +]) input -/*----------* - | quintile | - +----------+ - | 4 | - *----------*/ +/*--------------------+--------------------* + | input | output | + +--------------------+--------------------+ + | 1-2 3 10:20:30.456 | 1-2 3 10:20:30.456 | + | 1-2 | 1-2 0 0:0:0 | + | 10:20:30 | 0-0 0 10:20:30 | + | P1Y2M3D | 1-2 3 0:0:0 | + | PT10H20M30,456S | 0-0 0 10:20:30.456 | + *--------------------+--------------------*/ ``` -### `KLL_QUANTILES.EXTRACT_POINT_UINT64` +### CAST AS NUMERIC + ```sql -KLL_QUANTILES.EXTRACT_POINT_UINT64(sketch, phi) +CAST(expression AS NUMERIC) ``` **Description** -Like [`KLL_QUANTILES.EXTRACT_POINT_INT64`](#kll-quantilesextract-point-int64), -but accepts KLL sketches initialized on data of type `UINT64`. +ZetaSQL supports [casting][con-func-cast] to `NUMERIC`. The +`expression` parameter can represent an expression for these data types: -**Supported Argument Types** ++ `INT32` ++ `UINT32` ++ `INT64` ++ `UINT64` ++ `FLOAT` ++ `DOUBLE` ++ `NUMERIC` ++ `BIGNUMERIC` ++ `STRING` -+ `sketch`: `BYTES` KLL sketch initialized on `UINT64` data type -+ `phi`: `DOUBLE` between 0 and 1 +**Conversion rules** -**Return Type** + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
Floating PointNUMERIC + The floating point number will round + half away from zero. -`UINT64` + Casting a NaN, +inf or + -inf will return an error. Casting a value outside the range + of NUMERIC returns an overflow error. +
STRINGNUMERIC + The numeric literal contained in the string must not exceed + the maximum precision or range of the NUMERIC + type, or an error will occur. If the number of digits + after the decimal point exceeds nine, then the resulting + NUMERIC value will round + half away from zero. -### `KLL_QUANTILES.EXTRACT_POINT_DOUBLE` + to have nine digits after the decimal point. +
+ +### CAST AS PROTO ```sql -KLL_QUANTILES.EXTRACT_POINT_DOUBLE(sketch, phi) +CAST(expression AS PROTO) ``` **Description** -Like [`KLL_QUANTILES.EXTRACT_POINT_INT64`](#kll-quantilesextract-point-int64), -but accepts KLL sketches initialized on data of type -`DOUBLE`. +ZetaSQL supports [casting][con-func-cast] to `PROTO`. The +`expression` parameter can represent an expression for these data types: -**Supported Argument Types** ++ `STRING` ++ `BYTES` ++ `PROTO` -+ `sketch`: `BYTES` KLL sketch initialized on - `DOUBLE` data type -+ `phi`: `DOUBLE` between 0 and 1 +**Conversion rules** -**Return Type** + + + + + + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
STRINGPROTO + Returns the protocol buffer that results from parsing + from proto2 text format.
+ Throws an error if parsing fails, e.g., if not all required fields are + set. +
BYTESPROTO + Returns the protocol buffer that results from parsing + x from the proto2 wire format.
+ Throws an error if parsing fails, e.g., if not all required fields are + set. +
PROTOPROTOMust have the same protocol buffer name.
-`DOUBLE` +**Example** -### `KLL_QUANTILES.INIT_INT64` +The example in this section references a protocol buffer called `Award`. -```sql -KLL_QUANTILES.INIT_INT64(input[, precision[, weight => input_weight]]) +```proto +message Award { + required int32 year = 1; + optional int32 month = 2; + repeated Type type = 3; + + message Type { + optional string award_name = 1; + optional string category = 2; + } +} ``` -**Description** +```sql +SELECT + CAST( + ''' + year: 2001 + month: 9 + type { award_name: 'Best Artist' category: 'Artist' } + type { award_name: 'Best Album' category: 'Album' } + ''' + AS zetasql.examples.music.Award) + AS award_col -Takes one or more `input` values and aggregates them into a -[KLL][kll-sketches] sketch. This function represents the output sketch -using the `BYTES` data type. This is an -aggregate function. +/*---------------------------------------------------------* + | award_col | + +---------------------------------------------------------+ + | { | + | year: 2001 | + | month: 9 | + | type { award_name: "Best Artist" category: "Artist" } | + | type { award_name: "Best Album" category: "Album" } | + | } | + *---------------------------------------------------------*/ +``` -The `precision` argument defines the exactness of the returned approximate -quantile *q*. By default, the rank of the approximate quantile in the input can -be at most ±1/1000 * *n* off from ⌈Φ * *n*⌉, where *n* is the number of rows in -the input and ⌈Φ * *n*⌉ is the rank of the exact quantile. If you provide a -value for `precision`, the rank of the approximate quantile in the input can be -at most ±1/`precision` * *n* off from the rank of the exact quantile. The error -is within this error bound in 99.999% of cases. This error guarantee only -applies to the difference between exact and approximate ranks: the numerical -difference between the exact and approximated value for a quantile can be -arbitrarily large. +### CAST AS RANGE -By default, values in an initialized KLL sketch are weighted equally as `1`. -If you would you like to weight values differently, use the -mandatory-named argument, `weight`, which assigns weight to each input in the -resulting KLL sketch. `weight` is a multiplier. For example, if you assign a -weight of `3` to an input value, it's as if three instances of the input value -are included in the generation of the KLL sketch. +```sql +CAST(expression AS RANGE) +``` -**Supported Argument Types** +**Description** -+ `input`: `INT64` -+ `precision`: `INT64` -+ `input_weight`: `INT64` +ZetaSQL supports [casting][con-func-cast] to `RANGE`. The +`expression` parameter can represent an expression for these data types: -**Return Type** ++ `STRING` -KLL sketch as `BYTES` +**Conversion rules** -**Examples** + + + + + + + + + + + +
FromToRule(s) when casting x
STRINGRANGE + When casting from string to range, the string must conform to the + supported range literal format. If the string expression is invalid or + represents a range that is outside of the supported subtype min/max range, + then an error is produced. +
-The following query takes a column of type `INT64` and outputs a sketch as -`BYTES` that allows you to retrieve values whose ranks are within -±1/1000 * 5 = ±1/200 ≈ 0 ranks of their exact quantile. +**Examples** ```sql -SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch -FROM (SELECT 1 AS x UNION ALL - SELECT 2 AS x UNION ALL - SELECT 3 AS x UNION ALL - SELECT 4 AS x UNION ALL - SELECT 5 AS x); +SELECT CAST( + '[2020-01-01, 2020-01-02)' + AS RANGE) AS string_to_range -/*----------------------------------------------------------------------* - | kll_sketch | - +----------------------------------------------------------------------+ - | "\010q\020\005 \004\212\007\025\010\200 | - | \020\350\007\032\001\001\"\001\005*\007\n\005\001\002\003\004\005" | - *----------------------------------------------------------------------*/ +/*----------------------------------------* + | string_to_range | + +----------------------------------------+ + | [DATE '2020-01-01', DATE '2020-01-02') | + *----------------------------------------*/ ``` -The following examples illustrate how weight works when you initialize a -KLL sketch. The results are converted to quantiles. - ```sql -WITH points AS ( - SELECT 1 AS x, 1 AS y UNION ALL - SELECT 2 AS x, 1 AS y UNION ALL - SELECT 3 AS x, 1 AS y UNION ALL - SELECT 4 AS x, 1 AS y UNION ALL - SELECT 5 AS x, 1 AS y) -SELECT KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS median -FROM - ( - SELECT KLL_QUANTILES.INIT_INT64(x, 1000, weight=>y) AS kll_sketch - FROM points - ); +SELECT CAST( + '[2014-09-27 12:30:00.45, 2016-10-17 11:15:00.33)' + AS RANGE) AS string_to_range -/*---------* - | median | - +---------+ - | [1,3,5] | - *---------*/ +/*------------------------------------------------------------------------* + | string_to_range | + +------------------------------------------------------------------------+ + | [DATETIME '2014-09-27 12:30:00.45', DATETIME '2016-10-17 11:15:00.33') | + *------------------------------------------------------------------------*/ ``` ```sql -WITH points AS ( - SELECT 1 AS x, 1 AS y UNION ALL - SELECT 2 AS x, 3 AS y UNION ALL - SELECT 3 AS x, 1 AS y UNION ALL - SELECT 4 AS x, 1 AS y UNION ALL - SELECT 5 AS x, 1 AS y) -SELECT KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS median -FROM - ( - SELECT KLL_QUANTILES.INIT_INT64(x, 1000, weight=>y) AS kll_sketch - FROM points - ); +SELECT CAST( + '[2014-09-27 12:30:00+08, 2016-10-17 11:15:00+08)' + AS RANGE) AS string_to_range -/*---------* - | median | - +---------+ - | [1,2,5] | - *---------*/ +-- Results depend upon where this query was executed. +/*--------------------------------------------------------------------------* + | string_to_range | + +--------------------------------------------------------------------------+ + | [TIMESTAMP '2014-09-27 12:30:00+08', TIMESTAMP '2016-10-17 11:15:00+08') | + *--------------------------------------------------------------------------*/ ``` -### `KLL_QUANTILES.INIT_UINT64` - ```sql -KLL_QUANTILES.INIT_UINT64(input[, precision[, weight => input_weight]]) -``` - -**Description** - -Like [`KLL_QUANTILES.INIT_INT64`](#kll-quantilesinit-int64), -but accepts `input` of type `UINT64`. - -**Supported Argument Types** +SELECT CAST( + '[UNBOUNDED, 2020-01-02)' + AS RANGE) AS string_to_range -+ `input`: `UINT64` -+ `precision`: `INT64` -+ `input_weight`: `INT64` +/*--------------------------------* + | string_to_range | + +--------------------------------+ + | [UNBOUNDED, DATE '2020-01-02') | + *--------------------------------*/ +``` -**Return Type** +```sql +SELECT CAST( + '[2020-01-01, NULL)' + AS RANGE) AS string_to_range -KLL sketch as `BYTES` +/*--------------------------------* + | string_to_range | + +--------------------------------+ + | [DATE '2020-01-01', UNBOUNDED) | + *--------------------------------*/ +``` -### `KLL_QUANTILES.INIT_DOUBLE` +### CAST AS STRING + ```sql -KLL_QUANTILES.INIT_DOUBLE(input[, precision[, weight => input_weight]]) +CAST(expression AS STRING [format_clause [AT TIME ZONE timezone_expr]]) ``` **Description** -Like [`KLL_QUANTILES.INIT_INT64`](#kll-quantilesinit-int64), -but accepts `input` of type `DOUBLE`. +ZetaSQL supports [casting][con-func-cast] to `STRING`. The +`expression` parameter can represent an expression for these data types: -`KLL_QUANTILES.INIT_DOUBLE` orders values according to the ZetaSQL -[floating point sort order][sort-order]. For example, `NaN` orders before -‑inf. ++ `INT32` ++ `UINT32` ++ `INT64` ++ `UINT64` ++ `FLOAT` ++ `DOUBLE` ++ `NUMERIC` ++ `BIGNUMERIC` ++ `ENUM` ++ `BOOL` ++ `BYTES` ++ `PROTO` ++ `TIME` ++ `DATE` ++ `DATETIME` ++ `TIMESTAMP` ++ `RANGE` ++ `INTERVAL` ++ `STRING` -**Supported Argument Types** +**Format clause** -+ `input`: `DOUBLE` -+ `precision`: `INT64` -+ `input_weight`: `INT64` +When an expression of one type is cast to another type, you can use the +[format clause][formatting-syntax] to provide instructions for how to conduct +the cast. You can use the format clause in this section if `expression` is one +of these data types: -**Return Type** ++ `INT32` ++ `UINT32` ++ `INT64` ++ `UINT64` ++ `FLOAT` ++ `DOUBLE` ++ `NUMERIC` ++ `BIGNUMERIC` ++ `BYTES` ++ `TIME` ++ `DATE` ++ `DATETIME` ++ `TIMESTAMP` -KLL sketch as `BYTES` +The format clause for `STRING` has an additional optional clause called +`AT TIME ZONE timezone_expr`, which you can use to specify a specific time zone +to use during formatting of a `TIMESTAMP`. If this optional clause is not +included when formatting a `TIMESTAMP`, your current time zone is used. -[kll-sketches]: https://github.com/google/zetasql/blob/master/docs/sketches.md#sketches_kll +For more information, see the following topics: -[sort-order]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparison_operator_examples ++ [Format bytes as string][format-bytes-as-string] ++ [Format date and time as string][format-date-time-as-string] ++ [Format numeric type as string][format-numeric-type-as-string] -### `KLL_QUANTILES.MERGE_INT64` +**Conversion rules** -```sql -KLL_QUANTILES.MERGE_INT64(sketch, number) -``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
Floating PointSTRINGReturns an approximate string representation. A returned + NaN or 0 will not be signed.
+
BOOLSTRING + Returns "true" if x is TRUE, + "false" otherwise.
BYTESSTRING + Returns x interpreted as a UTF-8 string.
+ For example, the bytes literal + b'\xc2\xa9', when cast to a string, + is interpreted as UTF-8 and becomes the unicode character "©".
+ An error occurs if x is not valid UTF-8.
ENUMSTRING + Returns the canonical enum value name of + x.
+ If an enum value has multiple names (aliases), + the canonical name/alias for that value is used.
PROTOSTRINGReturns the proto2 text format representation of x.
TIMESTRING + Casting from a time type to a string is independent of time zone and + is of the form HH:MM:SS. +
DATESTRING + Casting from a date type to a string is independent of time zone and is + of the form YYYY-MM-DD. +
DATETIMESTRING + Casting from a datetime type to a string is independent of time zone and + is of the form YYYY-MM-DD HH:MM:SS. +
TIMESTAMPSTRING + When casting from timestamp types to string, the timestamp is interpreted + using the default time zone, which is implementation defined. The number of + subsecond digits produced depends on the number of trailing zeroes in the + subsecond part: the CAST function will truncate zero, three, or six + digits. +
INTERVALSTRING + Casting from an interval to a string is of the form + Y-M D H:M:S. +
-**Description** +**Examples** -Takes KLL sketches as `BYTES` and merges them into -a new sketch, then returns the quantiles that divide the input into -`number` equal-sized groups, along with the minimum and maximum values of the -input. The output is an `ARRAY` containing the exact minimum value from -the input data that you used to initialize the sketches, each -approximate quantile, and the exact maximum value from the initial input data. -This is an aggregate function. +```sql +SELECT CAST(CURRENT_DATE() AS STRING) AS current_date -If the merged sketches were initialized with different precisions, the precision -is downgraded to the lowest precision involved in the merge — except if the -aggregations are small enough to still capture the input exactly — then the -mergee's precision is maintained. +/*---------------* + | current_date | + +---------------+ + | 2021-03-09 | + *---------------*/ +``` -Returns an error if the underlying type of one or more input sketches is not -compatible with type `INT64`. +```sql +SELECT CAST(CURRENT_DATE() AS STRING FORMAT 'DAY') AS current_day -Returns an error if the input is not a valid KLL quantiles sketch. +/*-------------* + | current_day | + +-------------+ + | MONDAY | + *-------------*/ +``` -**Supported Argument Types** +```sql +SELECT CAST( + TIMESTAMP '2008-12-25 00:00:00+00:00' + AS STRING FORMAT 'YYYY-MM-DD HH24:MI:SS TZH:TZM') AS date_time_to_string -+ `sketch`: `BYTES` KLL sketch initialized on `INT64` data type -+ `number`: `INT64` +-- Results depend upon where this query was executed. +/*------------------------------* + | date_time_to_string | + +------------------------------+ + | 2008-12-24 16:00:00 -08:00 | + *------------------------------*/ +``` -**Return Type** +```sql +SELECT CAST( + TIMESTAMP '2008-12-25 00:00:00+00:00' + AS STRING FORMAT 'YYYY-MM-DD HH24:MI:SS TZH:TZM' + AT TIME ZONE 'Asia/Kolkata') AS date_time_to_string -`ARRAY` +-- Because the time zone is specified, the result is always the same. +/*------------------------------* + | date_time_to_string | + +------------------------------+ + | 2008-12-25 05:30:00 +05:30 | + *------------------------------*/ +``` -**Example** +```sql +SELECT CAST(INTERVAL 3 DAY AS STRING) AS interval_to_string -The following query initializes two KLL sketches from five rows of data each. -Then it merges these two sketches and returns an `ARRAY` containing the minimum, -median, and maximum values in the input sketches. +/*--------------------* + | interval_to_string | + +--------------------+ + | 0-0 3 0:0:0 | + *--------------------*/ +``` ```sql -SELECT KLL_QUANTILES.MERGE_INT64(kll_sketch, 2) AS merged_sketch -FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 1 AS x UNION ALL - SELECT 2 AS x UNION ALL - SELECT 3 AS x UNION ALL - SELECT 4 AS x UNION ALL - SELECT 5) - UNION ALL - SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 6 AS x UNION ALL - SELECT 7 AS x UNION ALL - SELECT 8 AS x UNION ALL - SELECT 9 AS x UNION ALL - SELECT 10 AS x)); +SELECT CAST( + INTERVAL "1-2 3 4:5:6.789" YEAR TO SECOND + AS STRING) AS interval_to_string -/*---------------* - | merged_sketch | - +---------------+ - | [1,5,10] | - *---------------*/ +/*--------------------* + | interval_to_string | + +--------------------+ + | 1-2 3 4:5:6.789 | + *--------------------*/ ``` -### `KLL_QUANTILES.MERGE_UINT64` +### CAST AS STRUCT ```sql -KLL_QUANTILES.MERGE_UINT64(sketch, number) +CAST(expression AS STRUCT) ``` **Description** -Like [`KLL_QUANTILES.MERGE_INT64`](#kll-quantilesmerge-int64), -but accepts KLL sketches initialized on data of type `UINT64`. - -**Supported Argument Types** +ZetaSQL supports [casting][con-func-cast] to `STRUCT`. The +`expression` parameter can represent an expression for these data types: -+ `sketch`: `BYTES` KLL sketch initialized on `UINT64` data type -+ `number`: `INT64` ++ `STRUCT` -**Return Type** +**Conversion rules** -`ARRAY` + + + + + + + + + + + +
FromToRule(s) when casting x
STRUCTSTRUCT + Allowed if the following conditions are met:
+
    +
  1. + The two structs have the same number of + fields. +
  2. +
  3. + The original struct field types can be + explicitly cast to the corresponding target + struct field types (as defined by field + order, not field name). +
  4. +
+
-### `KLL_QUANTILES.MERGE_DOUBLE` +### CAST AS TIME ```sql -KLL_QUANTILES.MERGE_DOUBLE(sketch, number) +CAST(expression AS TIME [format_clause]) ``` **Description** -Like [`KLL_QUANTILES.MERGE_INT64`](#kll-quantilesmerge-int64), -but accepts KLL sketches initialized on data of type -`DOUBLE`. +ZetaSQL supports [casting][con-func-cast] to TIME. The `expression` +parameter can represent an expression for these data types: -`KLL_QUANTILES.MERGE_DOUBLE` orders values according to the ZetaSQL -[floating point sort order][sort-order]. For example, `NaN` orders before -‑inf. ++ `STRING` ++ `TIME` ++ `DATETIME` ++ `TIMESTAMP` -**Supported Argument Types** +**Format clause** -+ `sketch`: `BYTES` KLL sketch initialized on - `DOUBLE` data type -+ `number`: `INT64` +When an expression of one type is cast to another type, you can use the +[format clause][formatting-syntax] to provide instructions for how to conduct +the cast. You can use the format clause in this section if `expression` is a +`STRING`. -**Return Type** ++ [Format string as date and time][format-string-as-date-time] -`ARRAY` +**Conversion rules** -[sort-order]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparison_operator_examples + + + + + + + + + + + +
FromToRule(s) when casting x
STRINGTIME + When casting from string to time, the string must conform to + the supported time literal format, and is independent of time zone. If the + string expression is invalid or represents a time that is outside of the + supported min/max range, then an error is produced. +
-### `KLL_QUANTILES.MERGE_PARTIAL` +### CAST AS TIMESTAMP ```sql -KLL_QUANTILES.MERGE_PARTIAL(sketch) +CAST(expression AS TIMESTAMP [format_clause [AT TIME ZONE timezone_expr]]) ``` **Description** -Takes KLL sketches of the same underlying type and merges them to return a new -sketch of the same underlying type. This is an aggregate function. +ZetaSQL supports [casting][con-func-cast] to `TIMESTAMP`. The +`expression` parameter can represent an expression for these data types: -If the merged sketches were initialized with different precisions, the precision -is downgraded to the lowest precision involved in the merge — except if the -aggregations are small enough to still capture the input exactly — then the -mergee's precision is maintained. ++ `STRING` ++ `DATETIME` ++ `TIMESTAMP` -Returns an error if two or more sketches don't have compatible underlying types, -such as one sketch of `INT64` values and another of -`DOUBLE` values. +**Format clause** -Returns an error if one or more inputs are not a valid KLL quantiles sketch. +When an expression of one type is cast to another type, you can use the +[format clause][formatting-syntax] to provide instructions for how to conduct +the cast. You can use the format clause in this section if `expression` is a +`STRING`. -Ignores `NULL` sketches. If the input contains zero rows or only `NULL` -sketches, the function returns `NULL`. ++ [Format string as date and time][format-string-as-date-time] -You can initialize sketches with different optional clauses and merge them. For -example, you can initialize a sketch with the `DISTINCT` clause and another -sketch without any optional clauses, and then merge these two sketches. -However, if you initialize sketches with the `DISTINCT` clause and merge them, -the resulting sketch may still contain duplicates. +The format clause for `TIMESTAMP` has an additional optional clause called +`AT TIME ZONE timezone_expr`, which you can use to specify a specific time zone +to use during formatting. If this optional clause is not included, your +current time zone is used. -**Supported Argument Types** +**Conversion rules** -+ `sketch`: `BYTES` KLL sketch + + + + + + + + + + + + + + + + + + + + + + + + + +
FromToRule(s) when casting x
STRINGTIMESTAMP + When casting from string to a timestamp, string_expression + must conform to the supported timestamp literal formats, or else a runtime + error occurs. The string_expression may itself contain a + time zone. +

+ If there is a time zone in the string_expression, that + time zone is used for conversion, otherwise the default time zone, + which is implementation defined, is used. If the string has fewer than six digits, + then it is implicitly widened. +

+ An error is produced if the string_expression is invalid, + has more than six subsecond digits (i.e., precision greater than + microseconds), or represents a time outside of the supported timestamp + range. +
DATETIMESTAMP + Casting from a date to a timestamp interprets date_expression + as of midnight (start of the day) in the default time zone, + which is implementation defined. +
DATETIMETIMESTAMP + Casting from a datetime to a timestamp interprets + datetime_expression in the default time zone, + which is implementation defined. +

+ Most valid datetime values have exactly one corresponding timestamp + in each time zone. However, there are certain combinations of valid + datetime values and time zones that have zero or two corresponding + timestamp values. This happens in a time zone when clocks are set forward + or set back, such as for Daylight Savings Time. + When there are two valid timestamps, the earlier one is used. + When there is no valid timestamp, the length of the gap in time + (typically one hour) is added to the datetime. +
-**Return Type** +**Examples** -KLL sketch as `BYTES` +The following example casts a string-formatted timestamp as a timestamp: -**Example** +```sql +SELECT CAST("2020-06-02 17:00:53.110+00:00" AS TIMESTAMP) AS as_timestamp -The following query initializes two KLL sketches from five rows of data each. -Then it merges these two sketches into a new sketch, also as `BYTES`. Both -input sketches have the same underlying data type and precision. +-- Results depend upon where this query was executed. +/*----------------------------* + | as_timestamp | + +----------------------------+ + | 2020-06-03 00:00:53.110+00 | + *----------------------------*/ +``` + +The following examples cast a string-formatted date and time as a timestamp. +These examples return the same output as the previous example. ```sql -SELECT KLL_QUANTILES.MERGE_PARTIAL(kll_sketch) AS merged_sketch -FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 1 AS x UNION ALL - SELECT 2 AS x UNION ALL - SELECT 3 AS x UNION ALL - SELECT 4 AS x UNION ALL - SELECT 5) - UNION ALL - SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 6 AS x UNION ALL - SELECT 7 AS x UNION ALL - SELECT 8 AS x UNION ALL - SELECT 9 AS x UNION ALL - SELECT 10 AS x)); - -/*-----------------------------------------------------------------------------* - | merged_sketch | - +-----------------------------------------------------------------------------+ - | "\010q\020\n \004\212\007\032\010\200 \020\350\007\032\001\001\"\001\n* | - | \014\n\n\001\002\003\004\005\006\007\010\t\n" | - *-----------------------------------------------------------------------------*/ +SELECT CAST('06/02/2020 17:00:53.110' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3' AT TIME ZONE 'UTC') AS as_timestamp ``` -### `KLL_QUANTILES.MERGE_POINT_INT64` +```sql +SELECT CAST('06/02/2020 17:00:53.110' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3' AT TIME ZONE '+00') AS as_timestamp +``` ```sql -KLL_QUANTILES.MERGE_POINT_INT64(sketch, phi) +SELECT CAST('06/02/2020 17:00:53.110 +00' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3 TZH') AS as_timestamp ``` -**Description** +[formatting-syntax]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#formatting_syntax -Takes KLL sketches as `BYTES` and merges them, then extracts a single -quantile from the merged sketch. The `phi` argument specifies the quantile -to return as a fraction of the total number of rows in the input, normalized -between 0 and 1. This means that the function will return a value *v* such that -approximately Φ * *n* inputs are less than or equal to *v*, and a (1-Φ) / *n* -inputs are greater than or equal to *v*. This is an aggregate function. +[format-string-as-bytes]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_string_as_bytes -If the merged sketches were initialized with different precisions, the precision -is downgraded to the lowest precision involved in the merge — except if the -aggregations are small enough to still capture the input exactly — then the -mergee's precision is maintained. +[format-bytes-as-string]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_bytes_as_string -Returns an error if the underlying type of one or more input sketches is not -compatible with type `INT64`. +[format-date-time-as-string]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_date_time_as_string -Returns an error if the input is not a valid KLL quantiles sketch. +[format-string-as-date-time]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_string_as_datetime -**Supported Argument Types** +[format-numeric-type-as-string]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_numeric_type_as_string -+ `sketch`: `BYTES` KLL sketch initialized on `INT64` data type -+ `phi`: `DOUBLE` between 0 and 1 +[con-func-cast]: #cast -**Return Type** +[con-func-safecast]: #safe_casting -`INT64` +### `SAFE_CAST` + -**Example** +
+SAFE_CAST(expression AS typename [format_clause])
+
-The following query initializes two KLL sketches from five rows of data each. -Then it merges these two sketches and returns the value of the ninth decile or -90th percentile of the merged sketch. +**Description** -```sql -SELECT KLL_QUANTILES.MERGE_POINT_INT64(kll_sketch, .9) AS merged_sketch -FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 1 AS x UNION ALL - SELECT 2 AS x UNION ALL - SELECT 3 AS x UNION ALL - SELECT 4 AS x UNION ALL - SELECT 5) - UNION ALL - SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch - FROM (SELECT 6 AS x UNION ALL - SELECT 7 AS x UNION ALL - SELECT 8 AS x UNION ALL - SELECT 9 AS x UNION ALL - SELECT 10 AS x)); +When using `CAST`, a query can fail if ZetaSQL is unable to perform +the cast. For example, the following query generates an error: -/*---------------* - | merged_sketch | - +---------------+ - | 9 | - *---------------*/ +```sql +SELECT CAST("apple" AS INT64) AS not_a_number; ``` -### `KLL_QUANTILES.MERGE_POINT_UINT64` +If you want to protect your queries from these types of errors, you can use +`SAFE_CAST`. `SAFE_CAST` replaces runtime errors with `NULL`s. However, during +static analysis, impossible casts between two non-castable types still produce +an error because the query is invalid. ```sql -KLL_QUANTILES.MERGE_POINT_UINT64(sketch, phi) +SELECT SAFE_CAST("apple" AS INT64) AS not_a_number; + +/*--------------* + | not_a_number | + +--------------+ + | NULL | + *--------------*/ ``` -**Description** +Some casts can include a [format clause][formatting-syntax], which provides +instructions for how to conduct the +cast. For example, you could +instruct a cast to convert a sequence of bytes to a BASE64-encoded string +instead of a UTF-8-encoded string. -Like [`KLL_QUANTILES.MERGE_POINT_INT64`](#kll-quantilesmerge-point-int64), -but accepts KLL sketches initialized on data of type `UINT64`. +The structure of the format clause is unique to each type of cast and more +information is available in the section for that cast. -**Supported Argument Types** +If you are casting from bytes to strings, you can also use the +function, [`SAFE_CONVERT_BYTES_TO_STRING`][SC_BTS]. Any invalid UTF-8 characters +are replaced with the unicode replacement character, `U+FFFD`. -+ `sketch`: `BYTES` KLL sketch initialized on `UINT64` data type -+ `phi`: `DOUBLE` between 0 and 1 +[SC_BTS]: #safe_convert_bytes_to_string -**Return Type** +[formatting-syntax]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#formatting_syntax -`UINT64` +### Other conversion functions + -### `KLL_QUANTILES.MERGE_POINT_DOUBLE` +You can learn more about these conversion functions elsewhere in the +documentation: -```sql -KLL_QUANTILES.MERGE_POINT_DOUBLE(sketch, phi) -``` + -**Description** +Conversion function | From | To +------- | -------- | ------- +[ARRAY_TO_STRING][ARRAY_STRING] | ARRAY | STRING +[BIT_CAST_TO_INT32][BIT_I32] | UINT32 | INT32 +[BIT_CAST_TO_INT64][BIT_I64] | UINT64 | INT64 +[BIT_CAST_TO_UINT32][BIT_U32] | INT32 | UINT32 +[BIT_CAST_TO_UINT64][BIT_U64] | INT64 | UINT64 +[BOOL][JSON_TO_BOOL] | JSON | BOOL +[DATE][T_DATE] | Various data types | DATE +[DATE_FROM_UNIX_DATE][T_DATE_FROM_UNIX_DATE] | INT64 | DATE +[DATETIME][T_DATETIME] | Various data types | DATETIME +[FLOAT64][JSON_TO_DOUBLE] | JSON | DOUBLE +[FROM_BASE32][F_B32] | STRING | BYTEs +[FROM_BASE64][F_B64] | STRING | BYTES +[FROM_HEX][F_HEX] | STRING | BYTES +[FROM_PROTO][F_PROTO] | PROTO value | Most data types +[INT64][JSON_TO_INT64] | JSON | INT64 +[PARSE_DATE][P_DATE] | STRING | DATE +[PARSE_DATETIME][P_DATETIME] | STRING | DATETIME +[PARSE_JSON][P_JSON] | STRING | JSON +[PARSE_TIME][P_TIME] | STRING | TIME +[PARSE_TIMESTAMP][P_TIMESTAMP] | STRING | TIMESTAMP +[SAFE_CONVERT_BYTES_TO_STRING][SC_BTS] | BYTES | STRING +[STRING][STRING_TIMESTAMP] | TIMESTAMP | STRING +[STRING][JSON_TO_STRING] | JSON | STRING +[TIME][T_TIME] | Various data types | TIME +[TIMESTAMP][T_TIMESTAMP] | Various data types | TIMESTAMP +[TIMESTAMP_FROM_UNIX_MICROS][T_TIMESTAMP_FROM_UNIX_MICROS] | INT64 | TIMESTAMP +[TIMESTAMP_FROM_UNIX_MILLIS][T_TIMESTAMP_FROM_UNIX_MILLIS] | INT64 | TIMESTAMP +[TIMESTAMP_FROM_UNIX_SECONDS][T_TIMESTAMP_FROM_UNIX_SECONDS] | INT64 | TIMESTAMP +[TIMESTAMP_MICROS][T_TIMESTAMP_MICROS] | INT64 | TIMESTAMP +[TIMESTAMP_MILLIS][T_TIMESTAMP_MILLIS] | INT64 | TIMESTAMP +[TIMESTAMP_SECONDS][T_TIMESTAMP_SECONDS] | INT64 | TIMESTAMP +[TO_BASE32][T_B32] | BYTES | STRING +[TO_BASE64][T_B64] | BYTES | STRING +[TO_HEX][T_HEX] | BYTES | STRING +[TO_JSON][T_JSON] | All data types | JSON +[TO_JSON_STRING][T_JSON_STRING] | All data types | STRING +[TO_PROTO][T_PROTO] | Most data types | PROTO value -Like [`KLL_QUANTILES.MERGE_POINT_INT64`](#kll-quantilesmerge-point-int64), -but accepts KLL sketches initialized on data of type -`DOUBLE`. + -`KLL_QUANTILES.MERGE_POINT_DOUBLE` orders values according to the -ZetaSQL [floating point sort order][sort-order]. For example, `NaN` -orders before ‑inf. + -**Supported Argument Types** +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md -+ `sketch`: `BYTES` KLL sketch initialized on - `DOUBLE` data type -+ `phi`: `DOUBLE` between 0 and 1 +[ARRAY_STRING]: #array_to_string -**Return Type** +[BIT_I32]: #bit_cast_to_int32 -`DOUBLE` +[BIT_U32]: #bit_cast_to_uint32 -[sort-order]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparison_operator_examples +[BIT_I64]: #bit_cast_to_int64 -[kll-sketches]: https://github.com/google/zetasql/blob/master/docs/sketches.md#sketches_kll +[BIT_U64]: #bit_cast_to_uint64 -[approx-functions-reference]: #approximate_aggregate_functions +[F_B32]: #from_base32 -## Numbering functions +[F_B64]: #from_base64 -ZetaSQL supports numbering functions. -Numbering functions are a subset of window functions. To create a -window function call and learn about the syntax for window functions, -see [Window function calls][window-function-calls]. +[F_HEX]: #from_hex -Numbering functions assign integer values to each row based on their position -within the specified window. The `OVER` clause syntax varies across -numbering functions. +[F_PROTO]: #from_proto + +[P_DATE]: #parse_date + +[P_DATETIME]: #parse_datetime + +[P_JSON]: #parse_json + +[P_TIME]: #parse_time + +[P_TIMESTAMP]: #parse_timestamp + +[SC_BTS]: #safe_convert_bytes_to_string + +[STRING_TIMESTAMP]: #string + +[T_B32]: #to_base32 + +[T_B64]: #to_base64 + +[T_HEX]: #to_hex + +[T_JSON]: #to_json + +[T_JSON_STRING]: #to_json_string + +[T_PROTO]: #to_proto + +[T_DATE]: #date + +[T_DATETIME]: #datetime + +[T_TIMESTAMP]: #timestamp + +[T_TIME]: #time + +[JSON_TO_BOOL]: #bool_for_json + +[JSON_TO_STRING]: #string_for_json + +[JSON_TO_INT64]: #int64_for_json + +[JSON_TO_DOUBLE]: #double_for_json + +[T_DATE_FROM_UNIX_DATE]: #date_from_unix_date + +[T_TIMESTAMP_FROM_UNIX_MICROS]: #timestamp_from_unix_micros + +[T_TIMESTAMP_FROM_UNIX_MILLIS]: #timestamp_from_unix_millis + +[T_TIMESTAMP_FROM_UNIX_SECONDS]: #timestamp_from_unix_seconds + +[T_TIMESTAMP_MICROS]: #timestamp_micros + +[T_TIMESTAMP_MILLIS]: #timestamp_millis + +[T_TIMESTAMP_SECONDS]: #timestamp_seconds + + + +## Date functions + +ZetaSQL supports the following date functions. ### Function list @@ -9536,2264 +9508,1866 @@ numbering functions. - CUME_DIST + CURRENT_DATE - Gets the cumulative distribution (relative position (0,1]) of each row - within a window. + Returns the current date as a DATE value. - DENSE_RANK + DATE - Gets the dense rank (1-based, no gaps) of each row within a window. + Constructs a DATE value. - NTILE + DATE_ADD - Gets the quantile bucket number (1-based) of each row within a window. + Adds a specified time interval to a DATE value. - PERCENT_RANK + DATE_DIFF - Gets the percentile rank (from 0 to 1) of each row within a window. + Gets the number of intervals between two DATE values. - RANK + DATE_FROM_UNIX_DATE - Gets the rank (1-based) of each row within a window. + Interprets an INT64 expression as the number of days + since 1970-01-01. - ROW_NUMBER + DATE_SUB - Gets the sequential row number (1-based) of each row within a window. + Subtracts a specified time interval from a DATE value. - - - -### `CUME_DIST` + + DATE_TRUNC -```sql -CUME_DIST() -OVER over_clause + + + Truncates a DATE value. + + -over_clause: - { named_window | ( [ window_specification ] ) } + + EXTRACT -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] + + + Extracts part of a date from a DATE value. + + -``` + + FORMAT_DATE -**Description** + + + Formats a DATE value according to a specified format string. + + -Return the relative rank of a row defined as NP/NR. NP is defined to be the -number of rows that either precede or are peers with the current row. NR is the -number of rows in the partition. + + LAST_DAY -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. + + + Gets the last day in a specified time period that contains a + DATE value. + + - + + PARSE_DATE -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + Converts a STRING value to a DATE value. + + - + + UNIX_DATE -**Return Type** + + + Converts a DATE value to the number of days since 1970-01-01. + + -`DOUBLE` + + -**Example** +### `CURRENT_DATE` ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') -SELECT name, - finish_time, - division, - CUME_DIST() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank -FROM finishers; - -/*-----------------+------------------------+----------+-------------* - | name | finish_time | division | finish_rank | - +-----------------+------------------------+----------+-------------+ - | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 0.25 | - | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 0.75 | - | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 0.75 | - | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 1 | - | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 0.25 | - | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 0.5 | - | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 0.75 | - | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 1 | - *-----------------+------------------------+----------+-------------*/ +CURRENT_DATE() ``` -### `DENSE_RANK` - ```sql -DENSE_RANK() -OVER over_clause - -over_clause: - { named_window | ( [ window_specification ] ) } - -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] +CURRENT_DATE(time_zone_expression) +``` +```sql +CURRENT_DATE ``` **Description** -Returns the ordinal (1-based) rank of each row within the window partition. -All peer rows receive the same rank value, and the subsequent rank value is -incremented by one. - -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +Returns the current date as a `DATE` object. Parentheses are optional when +called with no arguments. - +This function supports the following arguments: -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md ++ `time_zone_expression`: A `STRING` expression that represents a + [time zone][date-timezone-definitions]. If no time zone is specified, the + default time zone, which is implementation defined, is used. If this expression is + used and it evaluates to `NULL`, this function returns `NULL`. - +The current date is recorded at the start of the query +statement which contains this function, not when this specific function is +evaluated. -**Return Type** +**Return Data Type** -`INT64` +`DATE` **Examples** -```sql -WITH Numbers AS - (SELECT 1 as x - UNION ALL SELECT 2 - UNION ALL SELECT 2 - UNION ALL SELECT 5 - UNION ALL SELECT 8 - UNION ALL SELECT 10 - UNION ALL SELECT 10 -) -SELECT x, - DENSE_RANK() OVER (ORDER BY x ASC) AS dense_rank -FROM Numbers - -/*-------------------------* - | x | dense_rank | - +-------------------------+ - | 1 | 1 | - | 2 | 2 | - | 2 | 2 | - | 5 | 3 | - | 8 | 4 | - | 10 | 5 | - | 10 | 5 | - *-------------------------*/ -``` +The following query produces the current date in the default time zone: ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') -SELECT name, - finish_time, - division, - DENSE_RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank -FROM finishers; +SELECT CURRENT_DATE() AS the_date; -/*-----------------+------------------------+----------+-------------* - | name | finish_time | division | finish_rank | - +-----------------+------------------------+----------+-------------+ - | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | - | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 | - | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 | - | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 3 | - | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | - | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 | - | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 | - | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 | - *-----------------+------------------------+----------+-------------*/ +/*--------------* + | the_date | + +--------------+ + | 2016-12-25 | + *--------------*/ ``` -### `NTILE` +The following queries produce the current date in a specified time zone: ```sql -NTILE(constant_integer_expression) -OVER over_clause +SELECT CURRENT_DATE('America/Los_Angeles') AS the_date; -over_clause: - { named_window | ( [ window_specification ] ) } +/*--------------* + | the_date | + +--------------+ + | 2016-12-25 | + *--------------*/ +``` -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] +```sql +SELECT CURRENT_DATE('-08') AS the_date; +/*--------------* + | the_date | + +--------------+ + | 2016-12-25 | + *--------------*/ ``` -**Description** +The following query produces the current date in the default time zone. +Parentheses are not needed if the function has no arguments. -This function divides the rows into `constant_integer_expression` -buckets based on row ordering and returns the 1-based bucket number that is -assigned to each row. The number of rows in the buckets can differ by at most 1. -The remainder values (the remainder of number of rows divided by buckets) are -distributed one for each bucket, starting with bucket 1. If -`constant_integer_expression` evaluates to NULL, 0 or negative, an -error is provided. +```sql +SELECT CURRENT_DATE AS the_date; -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +/*--------------* + | the_date | + +--------------+ + | 2016-12-25 | + *--------------*/ +``` - +When a column named `current_date` is present, the column name and the function +call without parentheses are ambiguous. To ensure the function call, add +parentheses; to ensure the column name, qualify it with its +[range variable][date-range-variables]. For example, the +following query will select the function in the `the_date` column and the table +column in the `current_date` column. -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +```sql +WITH t AS (SELECT 'column value' AS `current_date`) +SELECT current_date() AS the_date, t.current_date FROM t; - +/*------------+--------------* + | the_date | current_date | + +------------+--------------+ + | 2016-12-25 | column value | + *------------+--------------*/ +``` -**Return Type** +[date-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables -`INT64` +[date-timezone-definitions]: https://github.com/google/zetasql/blob/master/docs/data-types.md#time_zones -**Example** +### `DATE` ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') -SELECT name, - finish_time, - division, - NTILE(3) OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank -FROM finishers; - -/*-----------------+------------------------+----------+-------------* - | name | finish_time | division | finish_rank | - +-----------------+------------------------+----------+-------------+ - | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | - | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 1 | - | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 | - | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 3 | - | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | - | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 1 | - | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 2 | - | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 3 | - *-----------------+------------------------+----------+-------------*/ +DATE(year, month, day) ``` -### `PERCENT_RANK` - ```sql -PERCENT_RANK() -OVER over_clause - -over_clause: - { named_window | ( [ window_specification ] ) } +DATE(timestamp_expression) +``` -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] +```sql +DATE(timestamp_expression, time_zone_expression) +``` +``` +DATE(datetime_expression) ``` **Description** -Return the percentile rank of a row defined as (RK-1)/(NR-1), where RK is -the `RANK` of the row and NR is the number of rows in the partition. -Returns 0 if NR=1. - -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. - - +Constructs or extracts a date. -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +This function supports the following arguments: - ++ `year`: The `INT64` value for year. ++ `month`: The `INT64` value for month. ++ `day`: The `INT64` value for day. ++ `timestamp_expression`: A `TIMESTAMP` expression that contains the date. ++ `time_zone_expression`: A `STRING` expression that represents a + [time zone][date-timezone-definitions]. If no time zone is specified with + `timestamp_expression`, the default time zone, which is implementation defined, is + used. ++ `datetime_expression`: A `DATETIME` expression that contains the date. -**Return Type** +**Return Data Type** -`DOUBLE` +`DATE` **Example** ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') -SELECT name, - finish_time, - division, - PERCENT_RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank -FROM finishers; +SELECT + DATE(2016, 12, 25) AS date_ymd, + DATE(DATETIME '2016-12-25 23:59:59') AS date_dt, + DATE(TIMESTAMP '2016-12-25 05:30:00+07', 'America/Los_Angeles') AS date_tstz; -/*-----------------+------------------------+----------+---------------------* - | name | finish_time | division | finish_rank | - +-----------------+------------------------+----------+---------------------+ - | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 0 | - | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 0.33333333333333331 | - | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 0.33333333333333331 | - | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 1 | - | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 0 | - | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 0.33333333333333331 | - | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 0.66666666666666663 | - | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 1 | - *-----------------+------------------------+----------+---------------------*/ +/*------------+------------+------------* + | date_ymd | date_dt | date_tstz | + +------------+------------+------------+ + | 2016-12-25 | 2016-12-25 | 2016-12-24 | + *------------+------------+------------*/ ``` -### `RANK` - -```sql -RANK() -OVER over_clause - -over_clause: - { named_window | ( [ window_specification ] ) } +[date-timezone-definitions]: #timezone_definitions -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] +### `DATE_ADD` +```sql +DATE_ADD(date_expression, INTERVAL int64_expression date_part) ``` **Description** -Returns the ordinal (1-based) rank of each row within the ordered partition. -All peer rows receive the same rank value. The next row or set of peer rows -receives a rank value which increments by the number of peers with the previous -rank value, instead of `DENSE_RANK`, which always increments by 1. - -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. - - - -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +Adds a specified time interval to a DATE. - +`DATE_ADD` supports the following `date_part` values: -**Return Type** ++ `DAY` ++ `WEEK`. Equivalent to 7 `DAY`s. ++ `MONTH` ++ `QUARTER` ++ `YEAR` -`INT64` +Special handling is required for MONTH, QUARTER, and YEAR parts when +the date is at (or near) the last day of the month. If the resulting +month has fewer days than the original date's day, then the resulting +date is the last date of that month. -**Examples** +**Return Data Type** -```sql -WITH Numbers AS - (SELECT 1 as x - UNION ALL SELECT 2 - UNION ALL SELECT 2 - UNION ALL SELECT 5 - UNION ALL SELECT 8 - UNION ALL SELECT 10 - UNION ALL SELECT 10 -) -SELECT x, - RANK() OVER (ORDER BY x ASC) AS rank -FROM Numbers +DATE -/*-------------------------* - | x | rank | - +-------------------------+ - | 1 | 1 | - | 2 | 2 | - | 2 | 2 | - | 5 | 4 | - | 8 | 5 | - | 10 | 6 | - | 10 | 6 | - *-------------------------*/ -``` +**Example** ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') -SELECT name, - finish_time, - division, - RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank -FROM finishers; +SELECT DATE_ADD(DATE '2008-12-25', INTERVAL 5 DAY) AS five_days_later; -/*-----------------+------------------------+----------+-------------* - | name | finish_time | division | finish_rank | - +-----------------+------------------------+----------+-------------+ - | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | - | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 | - | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 | - | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 4 | - | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | - | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 | - | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 | - | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 | - *-----------------+------------------------+----------+-------------*/ +/*--------------------* + | five_days_later | + +--------------------+ + | 2008-12-30 | + *--------------------*/ ``` -### `ROW_NUMBER` +### `DATE_DIFF` ```sql -ROW_NUMBER() -OVER over_clause - -over_clause: - { named_window | ( [ window_specification ] ) } - -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - +DATE_DIFF(date_expression_a, date_expression_b, date_part) ``` **Description** -Does not require the `ORDER BY` clause. Returns the sequential -row ordinal (1-based) of each row for each ordered partition. If the -`ORDER BY` clause is unspecified then the result is -non-deterministic. - -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. - - +Returns the whole number of specified `date_part` intervals between two +`DATE` objects (`date_expression_a` - `date_expression_b`). +If the first `DATE` is earlier than the second one, +the output is negative. -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +`DATE_DIFF` supports the following `date_part` values: - ++ `DAY` ++ `WEEK` This date part begins on Sunday. ++ `WEEK()`: This date part begins on `WEEKDAY`. Valid values for + `WEEKDAY` are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, + `FRIDAY`, and `SATURDAY`. ++ `ISOWEEK`: Uses [ISO 8601 week][ISO-8601-week] + boundaries. ISO weeks begin on Monday. ++ `MONTH` ++ `QUARTER` ++ `YEAR` ++ `ISOYEAR`: Uses the [ISO 8601][ISO-8601] + week-numbering year boundary. The ISO year boundary is the Monday of the + first week whose Thursday belongs to the corresponding Gregorian calendar + year. -**Return Type** +**Return Data Type** -`INT64` +INT64 -**Examples** +**Example** ```sql -WITH Numbers AS - (SELECT 1 as x - UNION ALL SELECT 2 - UNION ALL SELECT 2 - UNION ALL SELECT 5 - UNION ALL SELECT 8 - UNION ALL SELECT 10 - UNION ALL SELECT 10 -) -SELECT x, - ROW_NUMBER() OVER (ORDER BY x) AS row_num -FROM Numbers +SELECT DATE_DIFF(DATE '2010-07-07', DATE '2008-12-25', DAY) AS days_diff; -/*-------------------------* - | x | row_num | - +-------------------------+ - | 1 | 1 | - | 2 | 2 | - | 2 | 3 | - | 5 | 4 | - | 8 | 5 | - | 10 | 6 | - | 10 | 7 | - *-------------------------*/ +/*-----------* + | days_diff | + +-----------+ + | 559 | + *-----------*/ ``` ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') -SELECT name, - finish_time, - division, - ROW_NUMBER() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank -FROM finishers; +SELECT + DATE_DIFF(DATE '2017-10-15', DATE '2017-10-14', DAY) AS days_diff, + DATE_DIFF(DATE '2017-10-15', DATE '2017-10-14', WEEK) AS weeks_diff; -/*-----------------+------------------------+----------+-------------* - | name | finish_time | division | finish_rank | - +-----------------+------------------------+----------+-------------+ - | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | - | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 | - | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 3 | - | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 4 | - | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | - | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 | - | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 | - | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 | - *-----------------+------------------------+----------+-------------*/ +/*-----------+------------* + | days_diff | weeks_diff | + +-----------+------------+ + | 1 | 1 | + *-----------+------------*/ ``` - +The example above shows the result of `DATE_DIFF` for two days in succession. +`DATE_DIFF` with the date part `WEEK` returns 1 because `DATE_DIFF` counts the +number of date part boundaries in this range of dates. Each `WEEK` begins on +Sunday, so there is one date part boundary between Saturday, 2017-10-14 +and Sunday, 2017-10-15. -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +The following example shows the result of `DATE_DIFF` for two dates in different +years. `DATE_DIFF` with the date part `YEAR` returns 3 because it counts the +number of Gregorian calendar year boundaries between the two dates. `DATE_DIFF` +with the date part `ISOYEAR` returns 2 because the second date belongs to the +ISO year 2015. The first Thursday of the 2015 calendar year was 2015-01-01, so +the ISO year 2015 begins on the preceding Monday, 2014-12-29. - +```sql +SELECT + DATE_DIFF('2017-12-30', '2014-12-30', YEAR) AS year_diff, + DATE_DIFF('2017-12-30', '2014-12-30', ISOYEAR) AS isoyear_diff; -## Bit functions +/*-----------+--------------* + | year_diff | isoyear_diff | + +-----------+--------------+ + | 3 | 2 | + *-----------+--------------*/ +``` -ZetaSQL supports the following bit functions. +The following example shows the result of `DATE_DIFF` for two days in +succession. The first date falls on a Monday and the second date falls on a +Sunday. `DATE_DIFF` with the date part `WEEK` returns 0 because this date part +uses weeks that begin on Sunday. `DATE_DIFF` with the date part `WEEK(MONDAY)` +returns 1. `DATE_DIFF` with the date part `ISOWEEK` also returns 1 because +ISO weeks begin on Monday. -### Function list +```sql +SELECT + DATE_DIFF('2017-12-18', '2017-12-17', WEEK) AS week_diff, + DATE_DIFF('2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, + DATE_DIFF('2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff; - - - - - - - - +/*-----------+-------------------+--------------* + | week_diff | week_weekday_diff | isoweek_diff | + +-----------+-------------------+--------------+ + | 0 | 1 | 1 | + *-----------+-------------------+--------------*/ +``` - - - - +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date - - - - +```sql +DATE_FROM_UNIX_DATE(int64_expression) +``` - - - - +Interprets `int64_expression` as the number of days since 1970-01-01. - - - - +DATE - - - - +```sql +SELECT DATE_FROM_UNIX_DATE(14238) AS date_from_epoch; - -
NameSummary
BIT_CAST_TO_INT32 +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 - - Cast bits to an INT32 value. -
BIT_CAST_TO_INT64 +### `DATE_FROM_UNIX_DATE` - - Cast bits to an INT64 value. -
BIT_CAST_TO_UINT32 +**Description** - - Cast bits to an UINT32 value. -
BIT_CAST_TO_UINT64 +**Return Data Type** - - Cast bits to an UINT64 value. -
BIT_COUNT +**Example** - - Gets the number of bits that are set in an input expression. -
+/*-----------------* + | date_from_epoch | + +-----------------+ + | 2008-12-25 | + *-----------------+*/ +``` -### `BIT_CAST_TO_INT32` +### `DATE_SUB` ```sql -BIT_CAST_TO_INT32(value) +DATE_SUB(date_expression, INTERVAL int64_expression date_part) ``` **Description** -ZetaSQL supports bit casting to `INT32`. A bit -cast is a cast in which the order of bits is preserved instead of the value -those bytes represent. +Subtracts a specified time interval from a DATE. -The `value` parameter can represent: +`DATE_SUB` supports the following `date_part` values: -+ `INT32` -+ `UINT32` ++ `DAY` ++ `WEEK`. Equivalent to 7 `DAY`s. ++ `MONTH` ++ `QUARTER` ++ `YEAR` + +Special handling is required for MONTH, QUARTER, and YEAR parts when +the date is at (or near) the last day of the month. If the resulting +month has fewer days than the original date's day, then the resulting +date is the last date of that month. **Return Data Type** -`INT32` +DATE -**Examples** +**Example** ```sql -SELECT BIT_CAST_TO_UINT32(-1) as UINT32_value, BIT_CAST_TO_INT32(BIT_CAST_TO_UINT32(-1)) as bit_cast_value; - -/*---------------+----------------------* - | UINT32_value | bit_cast_value | - +---------------+----------------------+ - | 4294967295 | -1 | - *---------------+----------------------*/ -``` +SELECT DATE_SUB(DATE '2008-12-25', INTERVAL 5 DAY) AS five_days_ago; -### `BIT_CAST_TO_INT64` +/*---------------* + | five_days_ago | + +---------------+ + | 2008-12-20 | + *---------------*/ +``` + +### `DATE_TRUNC` ```sql -BIT_CAST_TO_INT64(value) +DATE_TRUNC(date_expression, date_part) ``` **Description** -ZetaSQL supports bit casting to `INT64`. A bit -cast is a cast in which the order of bits is preserved instead of the value -those bytes represent. +Truncates a `DATE` value to the granularity of `date_part`. The `DATE` value +is always rounded to the beginning of `date_part`, which can be one of the +following: -The `value` parameter can represent: ++ `DAY`: The day in the Gregorian calendar year that contains the + `DATE` value. ++ `WEEK`: The first day of the week in the week that contains the + `DATE` value. Weeks begin on Sundays. `WEEK` is equivalent to + `WEEK(SUNDAY)`. ++ `WEEK(WEEKDAY)`: The first day of the week in the week that contains the + `DATE` value. Weeks begin on `WEEKDAY`. `WEEKDAY` must be one of the + following: `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, + or `SATURDAY`. ++ `ISOWEEK`: The first day of the [ISO 8601 week][ISO-8601-week] in the + ISO week that contains the `DATE` value. The ISO week begins on + Monday. The first ISO week of each ISO year contains the first Thursday of the + corresponding Gregorian calendar year. ++ `MONTH`: The first day of the month in the month that contains the + `DATE` value. ++ `QUARTER`: The first day of the quarter in the quarter that contains the + `DATE` value. ++ `YEAR`: The first day of the year in the year that contains the + `DATE` value. ++ `ISOYEAR`: The first day of the [ISO 8601][ISO-8601] week-numbering year + in the ISO year that contains the `DATE` value. The ISO year is the + Monday of the first week whose Thursday belongs to the corresponding + Gregorian calendar year. -+ `INT64` -+ `UINT64` + + +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 + +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date + + **Return Data Type** -`INT64` +DATE -**Example** +**Examples** ```sql -SELECT BIT_CAST_TO_UINT64(-1) as UINT64_value, BIT_CAST_TO_INT64(BIT_CAST_TO_UINT64(-1)) as bit_cast_value; +SELECT DATE_TRUNC(DATE '2008-12-25', MONTH) AS month; -/*-----------------------+----------------------* - | UINT64_value | bit_cast_value | - +-----------------------+----------------------+ - | 18446744073709551615 | -1 | - *-----------------------+----------------------*/ +/*------------* + | month | + +------------+ + | 2008-12-01 | + *------------*/ ``` -### `BIT_CAST_TO_UINT32` +In the following example, the original date falls on a Sunday. Because +the `date_part` is `WEEK(MONDAY)`, `DATE_TRUNC` returns the `DATE` for the +preceding Monday. ```sql -BIT_CAST_TO_UINT32(value) +SELECT date AS original, DATE_TRUNC(date, WEEK(MONDAY)) AS truncated +FROM (SELECT DATE('2017-11-05') AS date); + +/*------------+------------* + | original | truncated | + +------------+------------+ + | 2017-11-05 | 2017-10-30 | + *------------+------------*/ ``` -**Description** +In the following example, the original `date_expression` is in the Gregorian +calendar year 2015. However, `DATE_TRUNC` with the `ISOYEAR` date part +truncates the `date_expression` to the beginning of the ISO year, not the +Gregorian calendar year. The first Thursday of the 2015 calendar year was +2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29. +Therefore the ISO year boundary preceding the `date_expression` 2015-06-15 is +2014-12-29. -ZetaSQL supports bit casting to `UINT32`. A bit -cast is a cast in which the order of bits is preserved instead of the value -those bytes represent. +```sql +SELECT + DATE_TRUNC('2015-06-15', ISOYEAR) AS isoyear_boundary, + EXTRACT(ISOYEAR FROM DATE '2015-06-15') AS isoyear_number; -The `value` parameter can represent: +/*------------------+----------------* + | isoyear_boundary | isoyear_number | + +------------------+----------------+ + | 2014-12-29 | 2015 | + *------------------+----------------*/ +``` -+ `INT32` -+ `UINT32` +### `EXTRACT` + +```sql +EXTRACT(part FROM date_expression) +``` + +**Description** + +Returns the value corresponding to the specified date part. The `part` must +be one of: + ++ `DAYOFWEEK`: Returns values in the range [1,7] with Sunday as the first day + of the week. ++ `DAY` ++ `DAYOFYEAR` ++ `WEEK`: Returns the week number of the date in the range [0, 53]. Weeks begin + with Sunday, and dates prior to the first Sunday of the year are in week 0. ++ `WEEK()`: Returns the week number of the date in the range [0, 53]. + Weeks begin on `WEEKDAY`. Dates prior to + the first `WEEKDAY` of the year are in week 0. Valid values for `WEEKDAY` are + `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, and + `SATURDAY`. ++ `ISOWEEK`: Returns the [ISO 8601 week][ISO-8601-week] + number of the `date_expression`. `ISOWEEK`s begin on Monday. Return values + are in the range [1, 53]. The first `ISOWEEK` of each ISO year begins on the + Monday before the first Thursday of the Gregorian calendar year. ++ `MONTH` ++ `QUARTER`: Returns values in the range [1,4]. ++ `YEAR` ++ `ISOYEAR`: Returns the [ISO 8601][ISO-8601] + week-numbering year, which is the Gregorian calendar year containing the + Thursday of the week to which `date_expression` belongs. **Return Data Type** -`UINT32` +INT64 **Examples** +In the following example, `EXTRACT` returns a value corresponding to the `DAY` +date part. + ```sql -SELECT -1 as UINT32_value, BIT_CAST_TO_UINT32(-1) as bit_cast_value; +SELECT EXTRACT(DAY FROM DATE '2013-12-25') AS the_day; -/*--------------+----------------------* - | UINT32_value | bit_cast_value | - +--------------+----------------------+ - | -1 | 4294967295 | - *--------------+----------------------*/ +/*---------* + | the_day | + +---------+ + | 25 | + *---------*/ ``` -### `BIT_CAST_TO_UINT64` +In the following example, `EXTRACT` returns values corresponding to different +date parts from a column of dates near the end of the year. ```sql -BIT_CAST_TO_UINT64(value) +SELECT + date, + EXTRACT(ISOYEAR FROM date) AS isoyear, + EXTRACT(ISOWEEK FROM date) AS isoweek, + EXTRACT(YEAR FROM date) AS year, + EXTRACT(WEEK FROM date) AS week +FROM UNNEST(GENERATE_DATE_ARRAY('2015-12-23', '2016-01-09')) AS date +ORDER BY date; + +/*------------+---------+---------+------+------* + | date | isoyear | isoweek | year | week | + +------------+---------+---------+------+------+ + | 2015-12-23 | 2015 | 52 | 2015 | 51 | + | 2015-12-24 | 2015 | 52 | 2015 | 51 | + | 2015-12-25 | 2015 | 52 | 2015 | 51 | + | 2015-12-26 | 2015 | 52 | 2015 | 51 | + | 2015-12-27 | 2015 | 52 | 2015 | 52 | + | 2015-12-28 | 2015 | 53 | 2015 | 52 | + | 2015-12-29 | 2015 | 53 | 2015 | 52 | + | 2015-12-30 | 2015 | 53 | 2015 | 52 | + | 2015-12-31 | 2015 | 53 | 2015 | 52 | + | 2016-01-01 | 2015 | 53 | 2016 | 0 | + | 2016-01-02 | 2015 | 53 | 2016 | 0 | + | 2016-01-03 | 2015 | 53 | 2016 | 1 | + | 2016-01-04 | 2016 | 1 | 2016 | 1 | + | 2016-01-05 | 2016 | 1 | 2016 | 1 | + | 2016-01-06 | 2016 | 1 | 2016 | 1 | + | 2016-01-07 | 2016 | 1 | 2016 | 1 | + | 2016-01-08 | 2016 | 1 | 2016 | 1 | + | 2016-01-09 | 2016 | 1 | 2016 | 1 | + *------------+---------+---------+------+------*/ ``` -**Description** +In the following example, `date_expression` falls on a Sunday. `EXTRACT` +calculates the first column using weeks that begin on Sunday, and it calculates +the second column using weeks that begin on Monday. -ZetaSQL supports bit casting to `UINT64`. A bit -cast is a cast in which the order of bits is preserved instead of the value -those bytes represent. +```sql +WITH table AS (SELECT DATE('2017-11-05') AS date) +SELECT + date, + EXTRACT(WEEK(SUNDAY) FROM date) AS week_sunday, + EXTRACT(WEEK(MONDAY) FROM date) AS week_monday FROM table; -The `value` parameter can represent: +/*------------+-------------+-------------* + | date | week_sunday | week_monday | + +------------+-------------+-------------+ + | 2017-11-05 | 45 | 44 | + *------------+-------------+-------------*/ +``` -+ `INT64` -+ `UINT64` +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 + +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date + +### `FORMAT_DATE` + +```sql +FORMAT_DATE(format_string, date_expr) +``` + +**Description** + +Formats the `date_expr` according to the specified `format_string`. + +See [Supported Format Elements For DATE][date-format-elements] +for a list of format elements that this function supports. **Return Data Type** -`UINT64` +STRING -**Example** +**Examples** ```sql -SELECT -1 as INT64_value, BIT_CAST_TO_UINT64(-1) as bit_cast_value; +SELECT FORMAT_DATE('%x', DATE '2008-12-25') AS US_format; -/*--------------+----------------------* - | INT64_value | bit_cast_value | - +--------------+----------------------+ - | -1 | 18446744073709551615 | - *--------------+----------------------*/ +/*------------* + | US_format | + +------------+ + | 12/25/08 | + *------------*/ ``` -### `BIT_COUNT` +```sql +SELECT FORMAT_DATE('%b-%d-%Y', DATE '2008-12-25') AS formatted; + +/*-------------* + | formatted | + +-------------+ + | Dec-25-2008 | + *-------------*/ +``` ```sql -BIT_COUNT(expression) +SELECT FORMAT_DATE('%b %Y', DATE '2008-12-25') AS formatted; + +/*-------------* + | formatted | + +-------------+ + | Dec 2008 | + *-------------*/ +``` + +[date-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time + +### `LAST_DAY` + +```sql +LAST_DAY(date_expression[, date_part]) ``` **Description** -The input, `expression`, must be an -integer or `BYTES`. +Returns the last day from a date expression. This is commonly used to return +the last day of the month. -Returns the number of bits that are set in the input `expression`. -For signed integers, this is the number of bits in two's complement form. +You can optionally specify the date part for which the last day is returned. +If this parameter is not used, the default value is `MONTH`. +`LAST_DAY` supports the following values for `date_part`: + ++ `YEAR` ++ `QUARTER` ++ `MONTH` ++ `WEEK`. Equivalent to 7 `DAY`s. ++ `WEEK()`. `` represents the starting day of the week. + Valid values are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, + `FRIDAY`, and `SATURDAY`. ++ `ISOWEEK`. Uses [ISO 8601][ISO-8601-week] week boundaries. ISO weeks begin + on Monday. ++ `ISOYEAR`. Uses the [ISO 8601][ISO-8601] week-numbering year boundary. + The ISO year boundary is the Monday of the first week whose Thursday belongs + to the corresponding Gregorian calendar year. **Return Data Type** -`INT64` +`DATE` **Example** +These both return the last day of the month: + ```sql -SELECT a, BIT_COUNT(a) AS a_bits, FORMAT("%T", b) as b, BIT_COUNT(b) AS b_bits -FROM UNNEST([ - STRUCT(0 AS a, b'' AS b), (0, b'\x00'), (5, b'\x05'), (8, b'\x00\x08'), - (0xFFFF, b'\xFF\xFF'), (-2, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFE'), - (-1, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF'), - (NULL, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF') -]) AS x; +SELECT LAST_DAY(DATE '2008-11-25', MONTH) AS last_day -/*-------+--------+---------------------------------------------+--------* - | a | a_bits | b | b_bits | - +-------+--------+---------------------------------------------+--------+ - | 0 | 0 | b"" | 0 | - | 0 | 0 | b"\x00" | 0 | - | 5 | 2 | b"\x05" | 2 | - | 8 | 1 | b"\x00\x08" | 1 | - | 65535 | 16 | b"\xff\xff" | 16 | - | -2 | 63 | b"\xff\xff\xff\xff\xff\xff\xff\xfe" | 63 | - | -1 | 64 | b"\xff\xff\xff\xff\xff\xff\xff\xff" | 64 | - | NULL | NULL | b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff" | 80 | - *-------+--------+---------------------------------------------+--------*/ +/*------------* + | last_day | + +------------+ + | 2008-11-30 | + *------------*/ ``` -## Conversion functions +```sql +SELECT LAST_DAY(DATE '2008-11-25') AS last_day -ZetaSQL supports conversion functions. These data type -conversions are explicit, but some conversions can happen implicitly. You can -learn more about implicit and explicit conversion [here][conversion-rules]. +/*------------* + | last_day | + +------------+ + | 2008-11-30 | + *------------*/ +``` -### Function list +This returns the last day of the year: - - - - - - - - +```sql +SELECT LAST_DAY(DATE '2008-11-25', YEAR) AS last_day - - - - +This returns the last day of the week for a week that starts on a Sunday: - - - - +/*------------* + | last_day | + +------------+ + | 2008-11-15 | + *------------*/ +``` - -
NameSummary
CAST +/*------------* + | last_day | + +------------+ + | 2008-12-31 | + *------------*/ +``` - - Convert the results of an expression to the given type. -
SAFE_CAST +```sql +SELECT LAST_DAY(DATE '2008-11-10', WEEK(SUNDAY)) AS last_day - - Similar to the CAST function, but returns NULL - when a runtime error is produced. -
+This returns the last day of the week for a week that starts on a Monday: -### `CAST` - +```sql +SELECT LAST_DAY(DATE '2008-11-10', WEEK(MONDAY)) AS last_day + +/*------------* + | last_day | + +------------+ + | 2008-11-16 | + *------------*/ +``` + +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 + +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date + +### `PARSE_DATE` ```sql -CAST(expression AS typename [format_clause]) +PARSE_DATE(format_string, date_string) ``` **Description** -Cast syntax is used in a query to indicate that the result type of an -expression should be converted to some other type. +Converts a [string representation of date][date-format] to a +`DATE` object. -When using `CAST`, a query can fail if ZetaSQL is unable to perform -the cast. If you want to protect your queries from these types of errors, you -can use [SAFE_CAST][con-func-safecast]. +`format_string` contains the [format elements][date-format-elements] +that define how `date_string` is formatted. Each element in +`date_string` must have a corresponding element in `format_string`. The +location of each element in `format_string` must match the location of +each element in `date_string`. -Casts between supported types that do not successfully map from the original -value to the target domain produce runtime errors. For example, casting -`BYTES` to `STRING` where the byte sequence is not valid UTF-8 results in a -runtime error. +```sql +-- This works because elements on both sides match. +SELECT PARSE_DATE('%A %b %e %Y', 'Thursday Dec 25 2008') -Other examples include: +-- This produces an error because the year element is in different locations. +SELECT PARSE_DATE('%Y %A %b %e', 'Thursday Dec 25 2008') -+ Casting `INT64` to `INT32` where the value overflows `INT32`. -+ Casting `STRING` to `INT32` where the `STRING` contains non-digit characters. +-- This produces an error because one of the year elements is missing. +SELECT PARSE_DATE('%A %b %e', 'Thursday Dec 25 2008') -Some casts can include a [format clause][formatting-syntax], which provides -instructions for how to conduct the -cast. For example, you could -instruct a cast to convert a sequence of bytes to a BASE64-encoded string -instead of a UTF-8-encoded string. +-- This works because %F can find all matching elements in date_string. +SELECT PARSE_DATE('%F', '2000-12-30') +``` -The structure of the format clause is unique to each type of cast and more -information is available in the section for that cast. +When using `PARSE_DATE`, keep the following in mind: -**Examples** ++ **Unspecified fields.** Any unspecified field is initialized from `1970-01-01`. ++ **Case insensitivity.** Names, such as `Monday`, `February`, and so on, are + case insensitive. ++ **Whitespace.** One or more consecutive white spaces in the format string + matches zero or more consecutive white spaces in the date string. In + addition, leading and trailing white spaces in the date string are always + allowed -- even if they are not in the format string. ++ **Format precedence.** When two (or more) format elements have overlapping + information (for example both `%F` and `%Y` affect the year), the last one + generally overrides any earlier ones. -The following query results in `"true"` if `x` is `1`, `"false"` for any other -non-`NULL` value, and `NULL` if `x` is `NULL`. +**Return Data Type** -```sql -CAST(x=1 AS STRING) -``` +DATE -### CAST AS ARRAY +**Examples** + +This example converts a `MM/DD/YY` formatted string to a `DATE` object: ```sql -CAST(expression AS ARRAY) +SELECT PARSE_DATE('%x', '12/25/08') AS parsed; + +/*------------* + | parsed | + +------------+ + | 2008-12-25 | + *------------*/ ``` -**Description** +This example converts a `YYYYMMDD` formatted string to a `DATE` object: -ZetaSQL supports [casting][con-func-cast] to `ARRAY`. The -`expression` parameter can represent an expression for these data types: +```sql +SELECT PARSE_DATE('%Y%m%d', '20081225') AS parsed; -+ `ARRAY` +/*------------* + | parsed | + +------------+ + | 2008-12-25 | + *------------*/ +``` -**Conversion rules** +[date-format]: #format_date - - - - - - - - - - - -
FromToRule(s) when casting x
ARRAYARRAY - - The element types of the input - array must be castable to the - element types of the target array. - For example, casting from type - ARRAY<INT64> to - ARRAY<DOUBLE> or - ARRAY<STRING> is valid; - casting from type ARRAY<INT64> - to ARRAY<BYTES> is not valid. - -
+[date-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time -### CAST AS BIGNUMERIC - +### `UNIX_DATE` ```sql -CAST(expression AS BIGNUMERIC) +UNIX_DATE(date_expression) ``` **Description** -ZetaSQL supports [casting][con-func-cast] to `BIGNUMERIC`. The -`expression` parameter can represent an expression for these data types: +Returns the number of days since `1970-01-01`. -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `FLOAT` -+ `DOUBLE` -+ `NUMERIC` -+ `BIGNUMERIC` -+ `STRING` - -**Conversion rules** - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
Floating PointBIGNUMERIC - The floating point number will round - half away from zero. - - Casting a NaN, +inf or - -inf will return an error. Casting a value outside the range - of BIGNUMERIC returns an overflow error. -
STRINGBIGNUMERIC - The numeric literal contained in the string must not exceed - the maximum precision or range of the - BIGNUMERIC type, or an error will occur. If the number of - digits after the decimal point exceeds 38, then the resulting - BIGNUMERIC value will round - half away from zero +**Return Data Type** - to have 38 digits after the decimal point. -
+INT64 -### CAST AS BOOL +**Example** ```sql -CAST(expression AS BOOL) -``` +SELECT UNIX_DATE(DATE '2008-12-25') AS days_from_epoch; -**Description** +/*-----------------* + | days_from_epoch | + +-----------------+ + | 14238 | + *-----------------*/ +``` -ZetaSQL supports [casting][con-func-cast] to `BOOL`. The -`expression` parameter can represent an expression for these data types: +## Datetime functions -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `BOOL` -+ `STRING` +ZetaSQL supports the following datetime functions. -**Conversion rules** +### Function list - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
IntegerBOOL - Returns FALSE if x is 0, - TRUE otherwise. -
STRINGBOOL - Returns TRUE if x is "true" and - FALSE if x is "false"
- All other values of x are invalid and throw an error instead - of casting to a boolean.
- A string is case-insensitive when converting - to a boolean. -
+ + + Name + Summary + + + -### CAST AS BYTES + + CURRENT_DATETIME -```sql -CAST(expression AS BYTES [format_clause]) -``` + + + Returns the current date and time as a DATETIME value. + + -**Description** + + DATETIME -ZetaSQL supports [casting][con-func-cast] to `BYTES`. The -`expression` parameter can represent an expression for these data types: + + + Constructs a DATETIME value. + + -+ `BYTES` -+ `STRING` -+ `PROTO` + + DATETIME_ADD -**Format clause** + + + Adds a specified time interval to a DATETIME value. + + -When an expression of one type is cast to another type, you can use the -[format clause][formatting-syntax] to provide instructions for how to conduct -the cast. You can use the format clause in this section if `expression` is a -`STRING`. + + DATETIME_DIFF -+ [Format string as bytes][format-string-as-bytes] + + + Gets the number of intervals between two DATETIME values. + + -**Conversion rules** + + DATETIME_SUB - - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
STRINGBYTES - Strings are cast to bytes using UTF-8 encoding. For example, - the string "©", when cast to - bytes, would become a 2-byte sequence with the - hex values C2 and A9. -
PROTOBYTES - Returns the proto2 wire format bytes - of x. -
+ + + Subtracts a specified time interval from a DATETIME value. + + -### CAST AS DATE + + DATETIME_TRUNC -```sql -CAST(expression AS DATE [format_clause]) -``` + + + Truncates a DATETIME value. + + -**Description** + + EXTRACT -ZetaSQL supports [casting][con-func-cast] to `DATE`. The `expression` -parameter can represent an expression for these data types: + + + Extracts part of a date and time from a DATETIME value. + + -+ `STRING` -+ `DATETIME` -+ `TIMESTAMP` + + FORMAT_DATETIME -**Format clause** + + + Formats a DATETIME value according to a specified + format string. + + -When an expression of one type is cast to another type, you can use the -[format clause][formatting-syntax] to provide instructions for how to conduct -the cast. You can use the format clause in this section if `expression` is a -`STRING`. + + LAST_DAY -+ [Format string as date and time][format-string-as-date-time] + + + Gets the last day in a specified time period that contains a + DATETIME value. + + -**Conversion rules** + + PARSE_DATETIME - - - - - - - - - - - - - - - - - - + + + + +
FromToRule(s) when casting x
STRINGDATE - When casting from string to date, the string must conform to - the supported date literal format, and is independent of time zone. If the - string expression is invalid or represents a date that is outside of the - supported min/max range, then an error is produced. -
TIMESTAMPDATE - Casting from a timestamp to date effectively truncates the timestamp as - of the default time zone. -
+ Converts a STRING value to a DATETIME value. +
-### CAST AS DATETIME +### `CURRENT_DATETIME` ```sql -CAST(expression AS DATETIME [format_clause]) +CURRENT_DATETIME([time_zone]) +``` + +```sql +CURRENT_DATETIME ``` **Description** -ZetaSQL supports [casting][con-func-cast] to `DATETIME`. The -`expression` parameter can represent an expression for these data types: +Returns the current time as a `DATETIME` object. Parentheses are optional when +called with no arguments. -+ `STRING` -+ `DATETIME` -+ `TIMESTAMP` +This function supports an optional `time_zone` parameter. +See [Time zone definitions][datetime-timezone-definitions] for +information on how to specify a time zone. -**Format clause** +The current date and time is recorded at the start of the query +statement which contains this function, not when this specific function is +evaluated. -When an expression of one type is cast to another type, you can use the -[format clause][formatting-syntax] to provide instructions for how to conduct -the cast. You can use the format clause in this section if `expression` is a -`STRING`. +**Return Data Type** -+ [Format string as date and time][format-string-as-date-time] +`DATETIME` -**Conversion rules** +**Example** - - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
STRINGDATETIME - When casting from string to datetime, the string must conform to the - supported datetime literal format, and is independent of time zone. If - the string expression is invalid or represents a datetime that is outside - of the supported min/max range, then an error is produced. -
TIMESTAMPDATETIME - Casting from a timestamp to datetime effectively truncates the timestamp - as of the default time zone. -
+```sql +SELECT CURRENT_DATETIME() as now; -### CAST AS ENUM +/*----------------------------* + | now | + +----------------------------+ + | 2016-05-19 10:38:47.046465 | + *----------------------------*/ +``` + +When a column named `current_datetime` is present, the column name and the +function call without parentheses are ambiguous. To ensure the function call, +add parentheses; to ensure the column name, qualify it with its +[range variable][datetime-range-variables]. For example, the +following query will select the function in the `now` column and the table +column in the `current_datetime` column. ```sql -CAST(expression AS ENUM) +WITH t AS (SELECT 'column value' AS `current_datetime`) +SELECT current_datetime() as now, t.current_datetime FROM t; + +/*----------------------------+------------------* + | now | current_datetime | + +----------------------------+------------------+ + | 2016-05-19 10:38:47.046465 | column value | + *----------------------------+------------------*/ ``` -**Description** +[datetime-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables -ZetaSQL supports [casting][con-func-cast] to `ENUM`. The `expression` -parameter can represent an expression for these data types: +[datetime-timezone-definitions]: #timezone_definitions -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `STRING` -+ `ENUM` +### `DATETIME` -**Conversion rules** +```sql +1. DATETIME(year, month, day, hour, minute, second) +2. DATETIME(date_expression[, time_expression]) +3. DATETIME(timestamp_expression [, time_zone]) +``` - - - - - - - - - - - -
FromToRule(s) when casting x
ENUMENUMMust have the same enum name.
+**Description** -### CAST AS Floating Point - +1. Constructs a `DATETIME` object using `INT64` values + representing the year, month, day, hour, minute, and second. +2. Constructs a `DATETIME` object using a DATE object and an optional `TIME` + object. +3. Constructs a `DATETIME` object using a `TIMESTAMP` object. It supports an + optional parameter to + [specify a time zone][datetime-timezone-definitions]. + If no time zone is specified, the default time zone, which is implementation defined, + is used. + +**Return Data Type** + +`DATETIME` + +**Example** ```sql -CAST(expression AS DOUBLE) +SELECT + DATETIME(2008, 12, 25, 05, 30, 00) as datetime_ymdhms, + DATETIME(TIMESTAMP "2008-12-25 05:30:00+00", "America/Los_Angeles") as datetime_tstz; + +/*---------------------+---------------------* + | datetime_ymdhms | datetime_tstz | + +---------------------+---------------------+ + | 2008-12-25 05:30:00 | 2008-12-24 21:30:00 | + *---------------------+---------------------*/ ``` +[datetime-timezone-definitions]: #timezone_definitions + +### `DATETIME_ADD` + ```sql -CAST(expression AS FLOAT) +DATETIME_ADD(datetime_expression, INTERVAL int64_expression part) ``` **Description** -ZetaSQL supports [casting][con-func-cast] to floating point types. -The `expression` parameter can represent an expression for these data types: +Adds `int64_expression` units of `part` to the `DATETIME` object. -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `FLOAT` -+ `DOUBLE` -+ `NUMERIC` -+ `BIGNUMERIC` -+ `STRING` +`DATETIME_ADD` supports the following values for `part`: -**Conversion rules** ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` ++ `DAY` ++ `WEEK`. Equivalent to 7 `DAY`s. ++ `MONTH` ++ `QUARTER` ++ `YEAR` - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
IntegerFloating Point - Returns a close but potentially not exact floating point value. -
NUMERICFloating Point - NUMERIC will convert to the closest floating point number - with a possible loss of precision. -
BIGNUMERICFloating Point - BIGNUMERIC will convert to the closest floating point number - with a possible loss of precision. -
STRINGFloating Point - Returns x as a floating point value, interpreting it as - having the same form as a valid floating point literal. - Also supports casts from "[+,-]inf" to - [,-]Infinity, - "[+,-]infinity" to [,-]Infinity, and - "[+,-]nan" to NaN. - Conversions are case-insensitive. -
+Special handling is required for MONTH, QUARTER, and YEAR parts when the +date is at (or near) the last day of the month. If the resulting month has fewer +days than the original DATETIME's day, then the result day is the last day of +the new month. -### CAST AS Integer - +**Return Data Type** -```sql -CAST(expression AS INT32) -``` +`DATETIME` -```sql -CAST(expression AS UINT32) -``` +**Example** ```sql -CAST(expression AS INT64) +SELECT + DATETIME "2008-12-25 15:30:00" as original_date, + DATETIME_ADD(DATETIME "2008-12-25 15:30:00", INTERVAL 10 MINUTE) as later; + +/*-----------------------------+------------------------* + | original_date | later | + +-----------------------------+------------------------+ + | 2008-12-25 15:30:00 | 2008-12-25 15:40:00 | + *-----------------------------+------------------------*/ ``` +### `DATETIME_DIFF` + ```sql -CAST(expression AS UINT64) +DATETIME_DIFF(datetime_expression_a, datetime_expression_b, part) ``` **Description** -ZetaSQL supports [casting][con-func-cast] to integer types. -The `expression` parameter can represent an expression for these data types: - -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `FLOAT` -+ `DOUBLE` -+ `NUMERIC` -+ `BIGNUMERIC` -+ `ENUM` -+ `BOOL` -+ `STRING` +Returns the whole number of specified `part` intervals between two +`DATETIME` objects (`datetime_expression_a` - `datetime_expression_b`). +If the first `DATETIME` is earlier than the second one, +the output is negative. Throws an error if the computation overflows the +result type, such as if the difference in +nanoseconds +between the two `DATETIME` objects would overflow an +`INT64` value. -**Conversion rules** +`DATETIME_DIFF` supports the following values for `part`: - - - - - - - - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
- Floating Point - - Integer - - Returns the closest integer value.
- Halfway cases such as 1.5 or -0.5 round away from zero. -
BOOLInteger - Returns 1 if x is TRUE, - 0 otherwise. -
STRINGInteger - A hex string can be cast to an integer. For example, - 0x123 to 291 or -0x123 to - -291. -
++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` ++ `DAY` ++ `WEEK`: This date part begins on Sunday. ++ `WEEK()`: This date part begins on `WEEKDAY`. Valid values for + `WEEKDAY` are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, + `FRIDAY`, and `SATURDAY`. ++ `ISOWEEK`: Uses [ISO 8601 week][ISO-8601-week] + boundaries. ISO weeks begin on Monday. ++ `MONTH` ++ `QUARTER` ++ `YEAR` ++ `ISOYEAR`: Uses the [ISO 8601][ISO-8601] + week-numbering year boundary. The ISO year boundary is the Monday of the + first week whose Thursday belongs to the corresponding Gregorian calendar + year. -**Examples** +**Return Data Type** -If you are working with hex strings (`0x123`), you can cast those strings as -integers: +`INT64` + +**Example** ```sql -SELECT '0x123' as hex_value, CAST('0x123' as INT64) as hex_to_int; +SELECT + DATETIME "2010-07-07 10:20:00" as first_datetime, + DATETIME "2008-12-25 15:30:00" as second_datetime, + DATETIME_DIFF(DATETIME "2010-07-07 10:20:00", + DATETIME "2008-12-25 15:30:00", DAY) as difference; -/*-----------+------------* - | hex_value | hex_to_int | - +-----------+------------+ - | 0x123 | 291 | - *-----------+------------*/ +/*----------------------------+------------------------+------------------------* + | first_datetime | second_datetime | difference | + +----------------------------+------------------------+------------------------+ + | 2010-07-07 10:20:00 | 2008-12-25 15:30:00 | 559 | + *----------------------------+------------------------+------------------------*/ ``` ```sql -SELECT '-0x123' as hex_value, CAST('-0x123' as INT64) as hex_to_int; +SELECT + DATETIME_DIFF(DATETIME '2017-10-15 00:00:00', + DATETIME '2017-10-14 00:00:00', DAY) as days_diff, + DATETIME_DIFF(DATETIME '2017-10-15 00:00:00', + DATETIME '2017-10-14 00:00:00', WEEK) as weeks_diff; /*-----------+------------* - | hex_value | hex_to_int | + | days_diff | weeks_diff | +-----------+------------+ - | -0x123 | -291 | + | 1 | 1 | *-----------+------------*/ ``` -### CAST AS INTERVAL +The example above shows the result of `DATETIME_DIFF` for two `DATETIME`s that +are 24 hours apart. `DATETIME_DIFF` with the part `WEEK` returns 1 because +`DATETIME_DIFF` counts the number of part boundaries in this range of +`DATETIME`s. Each `WEEK` begins on Sunday, so there is one part boundary between +Saturday, `2017-10-14 00:00:00` and Sunday, `2017-10-15 00:00:00`. + +The following example shows the result of `DATETIME_DIFF` for two dates in +different years. `DATETIME_DIFF` with the date part `YEAR` returns 3 because it +counts the number of Gregorian calendar year boundaries between the two +`DATETIME`s. `DATETIME_DIFF` with the date part `ISOYEAR` returns 2 because the +second `DATETIME` belongs to the ISO year 2015. The first Thursday of the 2015 +calendar year was 2015-01-01, so the ISO year 2015 begins on the preceding +Monday, 2014-12-29. ```sql -CAST(expression AS INTERVAL) -``` +SELECT + DATETIME_DIFF('2017-12-30 00:00:00', + '2014-12-30 00:00:00', YEAR) AS year_diff, + DATETIME_DIFF('2017-12-30 00:00:00', + '2014-12-30 00:00:00', ISOYEAR) AS isoyear_diff; -**Description** +/*-----------+--------------* + | year_diff | isoyear_diff | + +-----------+--------------+ + | 3 | 2 | + *-----------+--------------*/ +``` -ZetaSQL supports [casting][con-func-cast] to `INTERVAL`. The -`expression` parameter can represent an expression for these data types: +The following example shows the result of `DATETIME_DIFF` for two days in +succession. The first date falls on a Monday and the second date falls on a +Sunday. `DATETIME_DIFF` with the date part `WEEK` returns 0 because this time +part uses weeks that begin on Sunday. `DATETIME_DIFF` with the date part +`WEEK(MONDAY)` returns 1. `DATETIME_DIFF` with the date part +`ISOWEEK` also returns 1 because ISO weeks begin on Monday. -+ `STRING` +```sql +SELECT + DATETIME_DIFF('2017-12-18', '2017-12-17', WEEK) AS week_diff, + DATETIME_DIFF('2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, + DATETIME_DIFF('2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff; -**Conversion rules** +/*-----------+-------------------+--------------* + | week_diff | week_weekday_diff | isoweek_diff | + +-----------+-------------------+--------------+ + | 0 | 1 | 1 | + *-----------+-------------------+--------------*/ +``` - - - - - - - - - - - -
FromToRule(s) when casting x
STRINGINTERVAL - When casting from string to interval, the string must conform to either - ISO 8601 Duration +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 - standard or to interval literal - format 'Y-M D H:M:S.F'. Partial interval literal formats are also accepted - when they are not ambiguous, for example 'H:M:S'. - If the string expression is invalid or represents an interval that is - outside of the supported min/max range, then an error is produced. -
+[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date -**Examples** +### `DATETIME_SUB` ```sql -SELECT input, CAST(input AS INTERVAL) AS output -FROM UNNEST([ - '1-2 3 10:20:30.456', - '1-2', - '10:20:30', - 'P1Y2M3D', - 'PT10H20M30,456S' -]) input - -/*--------------------+--------------------* - | input | output | - +--------------------+--------------------+ - | 1-2 3 10:20:30.456 | 1-2 3 10:20:30.456 | - | 1-2 | 1-2 0 0:0:0 | - | 10:20:30 | 0-0 0 10:20:30 | - | P1Y2M3D | 1-2 3 0:0:0 | - | PT10H20M30,456S | 0-0 0 10:20:30.456 | - *--------------------+--------------------*/ +DATETIME_SUB(datetime_expression, INTERVAL int64_expression part) ``` -### CAST AS NUMERIC - +**Description** -```sql -CAST(expression AS NUMERIC) -``` +Subtracts `int64_expression` units of `part` from the `DATETIME`. -**Description** +`DATETIME_SUB` supports the following values for `part`: -ZetaSQL supports [casting][con-func-cast] to `NUMERIC`. The -`expression` parameter can represent an expression for these data types: ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` ++ `DAY` ++ `WEEK`. Equivalent to 7 `DAY`s. ++ `MONTH` ++ `QUARTER` ++ `YEAR` -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `FLOAT` -+ `DOUBLE` -+ `NUMERIC` -+ `BIGNUMERIC` -+ `STRING` +Special handling is required for `MONTH`, `QUARTER`, and `YEAR` parts when the +date is at (or near) the last day of the month. If the resulting month has fewer +days than the original `DATETIME`'s day, then the result day is the last day of +the new month. -**Conversion rules** +**Return Data Type** - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
Floating PointNUMERIC - The floating point number will round - half away from zero. +`DATETIME` - Casting a NaN, +inf or - -inf will return an error. Casting a value outside the range - of NUMERIC returns an overflow error. -
STRINGNUMERIC - The numeric literal contained in the string must not exceed - the maximum precision or range of the NUMERIC - type, or an error will occur. If the number of digits - after the decimal point exceeds nine, then the resulting - NUMERIC value will round - half away from zero. +**Example** - to have nine digits after the decimal point. -
+```sql +SELECT + DATETIME "2008-12-25 15:30:00" as original_date, + DATETIME_SUB(DATETIME "2008-12-25 15:30:00", INTERVAL 10 MINUTE) as earlier; -### CAST AS PROTO +/*-----------------------------+------------------------* + | original_date | earlier | + +-----------------------------+------------------------+ + | 2008-12-25 15:30:00 | 2008-12-25 15:20:00 | + *-----------------------------+------------------------*/ +``` + +### `DATETIME_TRUNC` ```sql -CAST(expression AS PROTO) +DATETIME_TRUNC(datetime_expression, date_time_part) ``` **Description** -ZetaSQL supports [casting][con-func-cast] to `PROTO`. The -`expression` parameter can represent an expression for these data types: +Truncates a `DATETIME` value to the granularity of `date_time_part`. +The `DATETIME` value is always rounded to the beginning of `date_time_part`, +which can be one of the following: -+ `STRING` -+ `BYTES` -+ `PROTO` ++ `NANOSECOND`: If used, nothing is truncated from the value. ++ `MICROSECOND`: The nearest lessor or equal microsecond. ++ `MILLISECOND`: The nearest lessor or equal millisecond. ++ `SECOND`: The nearest lessor or equal second. ++ `MINUTE`: The nearest lessor or equal minute. ++ `HOUR`: The nearest lessor or equal hour. ++ `DAY`: The day in the Gregorian calendar year that contains the + `DATETIME` value. ++ `WEEK`: The first day of the week in the week that contains the + `DATETIME` value. Weeks begin on Sundays. `WEEK` is equivalent to + `WEEK(SUNDAY)`. ++ `WEEK(WEEKDAY)`: The first day of the week in the week that contains the + `DATETIME` value. Weeks begin on `WEEKDAY`. `WEEKDAY` must be one of the + following: `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, + or `SATURDAY`. ++ `ISOWEEK`: The first day of the [ISO 8601 week][ISO-8601-week] in the + ISO week that contains the `DATETIME` value. The ISO week begins on + Monday. The first ISO week of each ISO year contains the first Thursday of the + corresponding Gregorian calendar year. ++ `MONTH`: The first day of the month in the month that contains the + `DATETIME` value. ++ `QUARTER`: The first day of the quarter in the quarter that contains the + `DATETIME` value. ++ `YEAR`: The first day of the year in the year that contains the + `DATETIME` value. ++ `ISOYEAR`: The first day of the [ISO 8601][ISO-8601] week-numbering year + in the ISO year that contains the `DATETIME` value. The ISO year is the + Monday of the first week whose Thursday belongs to the corresponding + Gregorian calendar year. -**Conversion rules** + - - - - - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
STRINGPROTO - Returns the protocol buffer that results from parsing - from proto2 text format.
- Throws an error if parsing fails, e.g. if not all required fields are set. -
BYTESPROTO - Returns the protocol buffer that results from parsing - x from the proto2 wire format.
- Throws an error if parsing fails, e.g. if not all required fields are set. -
PROTOPROTOMust have the same protocol buffer name.
+[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 -**Example** +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date -The example in this section references a protocol buffer called `Award`. + -```proto -message Award { - required int32 year = 1; - optional int32 month = 2; - repeated Type type = 3; +**Return Data Type** - message Type { - optional string award_name = 1; - optional string category = 2; - } -} -``` +`DATETIME` + +**Examples** ```sql SELECT - CAST( - ''' - year: 2001 - month: 9 - type { award_name: 'Best Artist' category: 'Artist' } - type { award_name: 'Best Album' category: 'Album' } - ''' - AS zetasql.examples.music.Award) - AS award_col + DATETIME "2008-12-25 15:30:00" as original, + DATETIME_TRUNC(DATETIME "2008-12-25 15:30:00", DAY) as truncated; -/*---------------------------------------------------------* - | award_col | - +---------------------------------------------------------+ - | { | - | year: 2001 | - | month: 9 | - | type { award_name: "Best Artist" category: "Artist" } | - | type { award_name: "Best Album" category: "Album" } | - | } | - *---------------------------------------------------------*/ +/*----------------------------+------------------------* + | original | truncated | + +----------------------------+------------------------+ + | 2008-12-25 15:30:00 | 2008-12-25 00:00:00 | + *----------------------------+------------------------*/ ``` -### CAST AS STRING - +In the following example, the original `DATETIME` falls on a Sunday. Because the +`part` is `WEEK(MONDAY)`, `DATE_TRUNC` returns the `DATETIME` for the +preceding Monday. ```sql -CAST(expression AS STRING [format_clause [AT TIME ZONE timezone_expr]]) -``` - -**Description** +SELECT + datetime AS original, + DATETIME_TRUNC(datetime, WEEK(MONDAY)) AS truncated +FROM (SELECT DATETIME(TIMESTAMP "2017-11-05 00:00:00+00", "UTC") AS datetime); -ZetaSQL supports [casting][con-func-cast] to `STRING`. The -`expression` parameter can represent an expression for these data types: +/*---------------------+---------------------* + | original | truncated | + +---------------------+---------------------+ + | 2017-11-05 00:00:00 | 2017-10-30 00:00:00 | + *---------------------+---------------------*/ +``` -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `FLOAT` -+ `DOUBLE` -+ `NUMERIC` -+ `BIGNUMERIC` -+ `ENUM` -+ `BOOL` -+ `BYTES` -+ `PROTO` -+ `TIME` -+ `DATE` -+ `DATETIME` -+ `TIMESTAMP` -+ `INTERVAL` -+ `STRING` +In the following example, the original `datetime_expression` is in the Gregorian +calendar year 2015. However, `DATETIME_TRUNC` with the `ISOYEAR` date part +truncates the `datetime_expression` to the beginning of the ISO year, not the +Gregorian calendar year. The first Thursday of the 2015 calendar year was +2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29. +Therefore the ISO year boundary preceding the `datetime_expression` +2015-06-15 00:00:00 is 2014-12-29. -**Format clause** +```sql +SELECT + DATETIME_TRUNC('2015-06-15 00:00:00', ISOYEAR) AS isoyear_boundary, + EXTRACT(ISOYEAR FROM DATETIME '2015-06-15 00:00:00') AS isoyear_number; -When an expression of one type is cast to another type, you can use the -[format clause][formatting-syntax] to provide instructions for how to conduct -the cast. You can use the format clause in this section if `expression` is one -of these data types: +/*---------------------+----------------* + | isoyear_boundary | isoyear_number | + +---------------------+----------------+ + | 2014-12-29 00:00:00 | 2015 | + *---------------------+----------------*/ +``` -+ `INT32` -+ `UINT32` -+ `INT64` -+ `UINT64` -+ `FLOAT` -+ `DOUBLE` -+ `NUMERIC` -+ `BIGNUMERIC` -+ `BYTES` -+ `TIME` -+ `DATE` -+ `DATETIME` -+ `TIMESTAMP` +### `EXTRACT` -The format clause for `STRING` has an additional optional clause called -`AT TIME ZONE timezone_expr`, which you can use to specify a specific time zone -to use during formatting of a `TIMESTAMP`. If this optional clause is not -included when formatting a `TIMESTAMP`, your current time zone is used. +```sql +EXTRACT(part FROM datetime_expression) +``` -For more information, see the following topics: +**Description** -+ [Format bytes as string][format-bytes-as-string] -+ [Format date and time as string][format-date-time-as-string] -+ [Format numeric type as string][format-numeric-type-as-string] +Returns a value that corresponds to the +specified `part` from a supplied `datetime_expression`. -**Conversion rules** +Allowed `part` values are: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
Floating PointSTRINGReturns an approximate string representation. A returned - NaN or 0 will not be signed.
-
BOOLSTRING - Returns "true" if x is TRUE, - "false" otherwise.
BYTESSTRING - Returns x interpreted as a UTF-8 string.
- For example, the bytes literal - b'\xc2\xa9', when cast to a string, - is interpreted as UTF-8 and becomes the unicode character "©".
- An error occurs if x is not valid UTF-8.
ENUMSTRING - Returns the canonical enum value name of - x.
- If an enum value has multiple names (aliases), - the canonical name/alias for that value is used.
PROTOSTRINGReturns the proto2 text format representation of x.
TIMESTRING - Casting from a time type to a string is independent of time zone and - is of the form HH:MM:SS. -
DATESTRING - Casting from a date type to a string is independent of time zone and is - of the form YYYY-MM-DD. -
DATETIMESTRING - Casting from a datetime type to a string is independent of time zone and - is of the form YYYY-MM-DD HH:MM:SS. -
TIMESTAMPSTRING - When casting from timestamp types to string, the timestamp is interpreted - using the default time zone, which is implementation defined. The number of - subsecond digits produced depends on the number of trailing zeroes in the - subsecond part: the CAST function will truncate zero, three, or six - digits. -
INTERVALSTRING - Casting from an interval to a string is of the form - Y-M D H:M:S. -
++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` ++ `DAYOFWEEK`: Returns values in the range [1,7] with Sunday as the first day of + of the week. ++ `DAY` ++ `DAYOFYEAR` ++ `WEEK`: Returns the week number of the date in the range [0, 53]. Weeks begin + with Sunday, and dates prior to the first Sunday of the year are in week 0. ++ `WEEK()`: Returns the week number of `datetime_expression` in the + range [0, 53]. Weeks begin on `WEEKDAY`. + `datetime`s prior to the first `WEEKDAY` of the year are in week 0. Valid + values for `WEEKDAY` are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, + `THURSDAY`, `FRIDAY`, and `SATURDAY`. ++ `ISOWEEK`: Returns the [ISO 8601 week][ISO-8601-week] + number of the `datetime_expression`. `ISOWEEK`s begin on Monday. Return values + are in the range [1, 53]. The first `ISOWEEK` of each ISO year begins on the + Monday before the first Thursday of the Gregorian calendar year. ++ `MONTH` ++ `QUARTER` ++ `YEAR` ++ `ISOYEAR`: Returns the [ISO 8601][ISO-8601] + week-numbering year, which is the Gregorian calendar year containing the + Thursday of the week to which `date_expression` belongs. ++ `DATE` ++ `TIME` -**Examples** +Returned values truncate lower order time periods. For example, when extracting +seconds, `EXTRACT` truncates the millisecond and microsecond values. -```sql -SELECT CAST(CURRENT_DATE() AS STRING) AS current_date +**Return Data Type** -/*---------------* - | current_date | - +---------------+ - | 2021-03-09 | - *---------------*/ -``` +`INT64`, except in the following cases: -```sql -SELECT CAST(CURRENT_DATE() AS STRING FORMAT 'DAY') AS current_day ++ If `part` is `DATE`, returns a `DATE` object. ++ If `part` is `TIME`, returns a `TIME` object. -/*-------------* - | current_day | - +-------------+ - | MONDAY | - *-------------*/ -``` +**Examples** + +In the following example, `EXTRACT` returns a value corresponding to the `HOUR` +time part. ```sql -SELECT CAST( - TIMESTAMP '2008-12-25 00:00:00+00:00' - AS STRING FORMAT 'YYYY-MM-DD HH24:MI:SS TZH:TZM') AS date_time_to_string +SELECT EXTRACT(HOUR FROM DATETIME(2008, 12, 25, 15, 30, 00)) as hour; --- Results depend upon where this query was executed. -/*------------------------------* - | date_time_to_string | - +------------------------------+ - | 2008-12-24 16:00:00 -08:00 | - *------------------------------*/ +/*------------------* + | hour | + +------------------+ + | 15 | + *------------------*/ ``` +In the following example, `EXTRACT` returns values corresponding to different +time parts from a column of datetimes. + ```sql -SELECT CAST( - TIMESTAMP '2008-12-25 00:00:00+00:00' - AS STRING FORMAT 'YYYY-MM-DD HH24:MI:SS TZH:TZM' - AT TIME ZONE 'Asia/Kolkata') AS date_time_to_string +WITH Datetimes AS ( + SELECT DATETIME '2005-01-03 12:34:56' AS datetime UNION ALL + SELECT DATETIME '2007-12-31' UNION ALL + SELECT DATETIME '2009-01-01' UNION ALL + SELECT DATETIME '2009-12-31' UNION ALL + SELECT DATETIME '2017-01-02' UNION ALL + SELECT DATETIME '2017-05-26' +) +SELECT + datetime, + EXTRACT(ISOYEAR FROM datetime) AS isoyear, + EXTRACT(ISOWEEK FROM datetime) AS isoweek, + EXTRACT(YEAR FROM datetime) AS year, + EXTRACT(WEEK FROM datetime) AS week +FROM Datetimes +ORDER BY datetime; --- Because the time zone is specified, the result is always the same. -/*------------------------------* - | date_time_to_string | - +------------------------------+ - | 2008-12-25 05:30:00 +05:30 | - *------------------------------*/ +/*---------------------+---------+---------+------+------* + | datetime | isoyear | isoweek | year | week | + +---------------------+---------+---------+------+------+ + | 2005-01-03 12:34:56 | 2005 | 1 | 2005 | 1 | + | 2007-12-31 00:00:00 | 2008 | 1 | 2007 | 52 | + | 2009-01-01 00:00:00 | 2009 | 1 | 2009 | 0 | + | 2009-12-31 00:00:00 | 2009 | 53 | 2009 | 52 | + | 2017-01-02 00:00:00 | 2017 | 1 | 2017 | 1 | + | 2017-05-26 00:00:00 | 2017 | 21 | 2017 | 21 | + *---------------------+---------+---------+------+------*/ ``` +In the following example, `datetime_expression` falls on a Sunday. `EXTRACT` +calculates the first column using weeks that begin on Sunday, and it calculates +the second column using weeks that begin on Monday. + ```sql -SELECT CAST(INTERVAL 3 DAY AS STRING) AS interval_to_string +WITH table AS (SELECT DATETIME(TIMESTAMP "2017-11-05 00:00:00+00", "UTC") AS datetime) +SELECT + datetime, + EXTRACT(WEEK(SUNDAY) FROM datetime) AS week_sunday, + EXTRACT(WEEK(MONDAY) FROM datetime) AS week_monday +FROM table; -/*--------------------* - | interval_to_string | - +--------------------+ - | 0-0 3 0:0:0 | - *--------------------*/ +/*---------------------+-------------+---------------* + | datetime | week_sunday | week_monday | + +---------------------+-------------+---------------+ + | 2017-11-05 00:00:00 | 45 | 44 | + *---------------------+-------------+---------------*/ ``` -```sql -SELECT CAST( - INTERVAL "1-2 3 4:5:6.789" YEAR TO SECOND - AS STRING) AS interval_to_string +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 -/*--------------------* - | interval_to_string | - +--------------------+ - | 1-2 3 4:5:6.789 | - *--------------------*/ -``` +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date -### CAST AS STRUCT +### `FORMAT_DATETIME` ```sql -CAST(expression AS STRUCT) +FORMAT_DATETIME(format_string, datetime_expression) ``` **Description** -ZetaSQL supports [casting][con-func-cast] to `STRUCT`. The -`expression` parameter can represent an expression for these data types: - -+ `STRUCT` +Formats a `DATETIME` object according to the specified `format_string`. See +[Supported Format Elements For DATETIME][datetime-format-elements] +for a list of format elements that this function supports. -**Conversion rules** +**Return Data Type** - - - - - - - - - - - -
FromToRule(s) when casting x
STRUCTSTRUCT - Allowed if the following conditions are met:
-
    -
  1. - The two structs have the same number of - fields. -
  2. -
  3. - The original struct field types can be - explicitly cast to the corresponding target - struct field types (as defined by field - order, not field name). -
  4. -
-
+`STRING` -### CAST AS TIME +**Examples** ```sql -CAST(expression AS TIME [format_clause]) -``` - -**Description** - -ZetaSQL supports [casting][con-func-cast] to TIME. The `expression` -parameter can represent an expression for these data types: +SELECT + FORMAT_DATETIME("%c", DATETIME "2008-12-25 15:30:00") + AS formatted; -+ `STRING` -+ `TIME` -+ `DATETIME` -+ `TIMESTAMP` +/*--------------------------* + | formatted | + +--------------------------+ + | Thu Dec 25 15:30:00 2008 | + *--------------------------*/ +``` -**Format clause** +```sql +SELECT + FORMAT_DATETIME("%b-%d-%Y", DATETIME "2008-12-25 15:30:00") + AS formatted; -When an expression of one type is cast to another type, you can use the -[format clause][formatting-syntax] to provide instructions for how to conduct -the cast. You can use the format clause in this section if `expression` is a -`STRING`. +/*-------------* + | formatted | + +-------------+ + | Dec-25-2008 | + *-------------*/ +``` -+ [Format string as date and time][format-string-as-date-time] +```sql +SELECT + FORMAT_DATETIME("%b %Y", DATETIME "2008-12-25 15:30:00") + AS formatted; -**Conversion rules** +/*-------------* + | formatted | + +-------------+ + | Dec 2008 | + *-------------*/ +``` - - - - - - - - - - - -
FromToRule(s) when casting x
STRINGTIME - When casting from string to time, the string must conform to - the supported time literal format, and is independent of time zone. If the - string expression is invalid or represents a time that is outside of the - supported min/max range, then an error is produced. -
+[datetime-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time -### CAST AS TIMESTAMP +### `LAST_DAY` ```sql -CAST(expression AS TIMESTAMP [format_clause [AT TIME ZONE timezone_expr]]) +LAST_DAY(datetime_expression[, date_part]) ``` **Description** -ZetaSQL supports [casting][con-func-cast] to `TIMESTAMP`. The -`expression` parameter can represent an expression for these data types: - -+ `STRING` -+ `DATETIME` -+ `TIMESTAMP` - -**Format clause** - -When an expression of one type is cast to another type, you can use the -[format clause][formatting-syntax] to provide instructions for how to conduct -the cast. You can use the format clause in this section if `expression` is a -`STRING`. +Returns the last day from a datetime expression that contains the date. +This is commonly used to return the last day of the month. -+ [Format string as date and time][format-string-as-date-time] +You can optionally specify the date part for which the last day is returned. +If this parameter is not used, the default value is `MONTH`. +`LAST_DAY` supports the following values for `date_part`: -The format clause for `TIMESTAMP` has an additional optional clause called -`AT TIME ZONE timezone_expr`, which you can use to specify a specific time zone -to use during formatting. If this optional clause is not included, your -current time zone is used. ++ `YEAR` ++ `QUARTER` ++ `MONTH` ++ `WEEK`. Equivalent to 7 `DAY`s. ++ `WEEK()`. `` represents the starting day of the week. + Valid values are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, + `FRIDAY`, and `SATURDAY`. ++ `ISOWEEK`. Uses [ISO 8601][ISO-8601-week] week boundaries. ISO weeks begin + on Monday. ++ `ISOYEAR`. Uses the [ISO 8601][ISO-8601] week-numbering year boundary. + The ISO year boundary is the Monday of the first week whose Thursday belongs + to the corresponding Gregorian calendar year. -**Conversion rules** +**Return Data Type** - - - - - - - - - - - - - - - - - - - - - - - - - -
FromToRule(s) when casting x
STRINGTIMESTAMP - When casting from string to a timestamp, string_expression - must conform to the supported timestamp literal formats, or else a runtime - error occurs. The string_expression may itself contain a - time zone. -

- If there is a time zone in the string_expression, that - time zone is used for conversion, otherwise the default time zone, - which is implementation defined, is used. If the string has fewer than six digits, - then it is implicitly widened. -

- An error is produced if the string_expression is invalid, - has more than six subsecond digits (i.e. precision greater than - microseconds), or represents a time outside of the supported timestamp - range. -
DATETIMESTAMP - Casting from a date to a timestamp interprets date_expression - as of midnight (start of the day) in the default time zone, - which is implementation defined. -
DATETIMETIMESTAMP - Casting from a datetime to a timestamp interprets - datetime_expression in the default time zone, - which is implementation defined. -

- Most valid datetime values have exactly one corresponding timestamp - in each time zone. However, there are certain combinations of valid - datetime values and time zones that have zero or two corresponding - timestamp values. This happens in a time zone when clocks are set forward - or set back, such as for Daylight Savings Time. - When there are two valid timestamps, the earlier one is used. - When there is no valid timestamp, the length of the gap in time - (typically one hour) is added to the datetime. -
+`DATE` -**Examples** +**Example** -The following example casts a string-formatted timestamp as a timestamp: +These both return the last day of the month: ```sql -SELECT CAST("2020-06-02 17:00:53.110+00:00" AS TIMESTAMP) AS as_timestamp +SELECT LAST_DAY(DATETIME '2008-11-25', MONTH) AS last_day --- Results depend upon where this query was executed. -/*----------------------------* - | as_timestamp | - +----------------------------+ - | 2020-06-03 00:00:53.110+00 | - *----------------------------*/ +/*------------* + | last_day | + +------------+ + | 2008-11-30 | + *------------*/ ``` -The following examples cast a string-formatted date and time as a timestamp. -These examples return the same output as the previous example. - ```sql -SELECT CAST('06/02/2020 17:00:53.110' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3' AT TIME ZONE 'UTC') AS as_timestamp -``` +SELECT LAST_DAY(DATETIME '2008-11-25') AS last_day -```sql -SELECT CAST('06/02/2020 17:00:53.110' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3' AT TIME ZONE '+00') AS as_timestamp +/*------------* + | last_day | + +------------+ + | 2008-11-30 | + *------------*/ ``` +This returns the last day of the year: + ```sql -SELECT CAST('06/02/2020 17:00:53.110 +00' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3 TZH') AS as_timestamp +SELECT LAST_DAY(DATETIME '2008-11-25 15:30:00', YEAR) AS last_day + +/*------------* + | last_day | + +------------+ + | 2008-12-31 | + *------------*/ ``` -[formatting-syntax]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#formatting_syntax +This returns the last day of the week for a week that starts on a Sunday: -[format-string-as-bytes]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_string_as_bytes +```sql +SELECT LAST_DAY(DATETIME '2008-11-10 15:30:00', WEEK(SUNDAY)) AS last_day -[format-bytes-as-string]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_bytes_as_string +/*------------* + | last_day | + +------------+ + | 2008-11-15 | + *------------*/ +``` -[format-date-time-as-string]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_date_time_as_string +This returns the last day of the week for a week that starts on a Monday: -[format-string-as-date-time]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_string_as_datetime +```sql +SELECT LAST_DAY(DATETIME '2008-11-10 15:30:00', WEEK(MONDAY)) AS last_day -[format-numeric-type-as-string]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_numeric_type_as_string +/*------------* + | last_day | + +------------+ + | 2008-11-16 | + *------------*/ +``` -[con-func-cast]: #cast +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 -[con-func-safecast]: #safe_casting +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date -### `SAFE_CAST` - +### `PARSE_DATETIME` -
-SAFE_CAST(expression AS typename [format_clause])
-
+```sql +PARSE_DATETIME(format_string, datetime_string) +``` **Description** -When using `CAST`, a query can fail if ZetaSQL is unable to perform -the cast. For example, the following query generates an error: - -```sql -SELECT CAST("apple" AS INT64) AS not_a_number; -``` +Converts a [string representation of a datetime][datetime-format] to a +`DATETIME` object. -If you want to protect your queries from these types of errors, you can use -`SAFE_CAST`. `SAFE_CAST` replaces runtime errors with `NULL`s. However, during -static analysis, impossible casts between two non-castable types still produce -an error because the query is invalid. +`format_string` contains the [format elements][datetime-format-elements] +that define how `datetime_string` is formatted. Each element in +`datetime_string` must have a corresponding element in `format_string`. The +location of each element in `format_string` must match the location of +each element in `datetime_string`. ```sql -SELECT SAFE_CAST("apple" AS INT64) AS not_a_number; - -/*--------------* - | not_a_number | - +--------------+ - | NULL | - *--------------*/ -``` - -Some casts can include a [format clause][formatting-syntax], which provides -instructions for how to conduct the -cast. For example, you could -instruct a cast to convert a sequence of bytes to a BASE64-encoded string -instead of a UTF-8-encoded string. - -The structure of the format clause is unique to each type of cast and more -information is available in the section for that cast. - -If you are casting from bytes to strings, you can also use the -function, [`SAFE_CONVERT_BYTES_TO_STRING`][SC_BTS]. Any invalid UTF-8 characters -are replaced with the unicode replacement character, `U+FFFD`. - -[SC_BTS]: #safe_convert_bytes_to_string - -[formatting-syntax]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#formatting_syntax - -### Other conversion functions - - -You can learn more about these conversion functions elsewhere in the -documentation: - - - -Conversion function | From | To -------- | -------- | ------- -[ARRAY_TO_STRING][ARRAY_STRING] | ARRAY | STRING -[BIT_CAST_TO_INT32][BIT_I32] | UINT32 | INT32 -[BIT_CAST_TO_INT64][BIT_I64] | UINT64 | INT64 -[BIT_CAST_TO_UINT32][BIT_U32] | INT32 | UINT32 -[BIT_CAST_TO_UINT64][BIT_U64] | INT64 | UINT64 -[BOOL][JSON_TO_BOOL] | JSON | BOOL -[DATE][T_DATE] | Various data types | DATE -[DATE_FROM_UNIX_DATE][T_DATE_FROM_UNIX_DATE] | INT64 | DATE -[DATETIME][T_DATETIME] | Various data types | DATETIME -[FLOAT64][JSON_TO_DOUBLE] | JSON | DOUBLE -[FROM_BASE32][F_B32] | STRING | BYTEs -[FROM_BASE64][F_B64] | STRING | BYTES -[FROM_HEX][F_HEX] | STRING | BYTES -[FROM_PROTO][F_PROTO] | PROTO value | Most data types -[INT64][JSON_TO_INT64] | JSON | INT64 -[PARSE_DATE][P_DATE] | STRING | DATE -[PARSE_DATETIME][P_DATETIME] | STRING | DATETIME -[PARSE_JSON][P_JSON] | STRING | JSON -[PARSE_TIME][P_TIME] | STRING | TIME -[PARSE_TIMESTAMP][P_TIMESTAMP] | STRING | TIMESTAMP -[SAFE_CONVERT_BYTES_TO_STRING][SC_BTS] | BYTES | STRING -[STRING][STRING_TIMESTAMP] | TIMESTAMP | STRING -[STRING][JSON_TO_STRING] | JSON | STRING -[TIME][T_TIME] | Various data types | TIME -[TIMESTAMP][T_TIMESTAMP] | Various data types | TIMESTAMP -[TIMESTAMP_FROM_UNIX_MICROS][T_TIMESTAMP_FROM_UNIX_MICROS] | INT64 | TIMESTAMP -[TIMESTAMP_FROM_UNIX_MILLIS][T_TIMESTAMP_FROM_UNIX_MILLIS] | INT64 | TIMESTAMP -[TIMESTAMP_FROM_UNIX_SECONDS][T_TIMESTAMP_FROM_UNIX_SECONDS] | INT64 | TIMESTAMP -[TIMESTAMP_MICROS][T_TIMESTAMP_MICROS] | INT64 | TIMESTAMP -[TIMESTAMP_MILLIS][T_TIMESTAMP_MILLIS] | INT64 | TIMESTAMP -[TIMESTAMP_SECONDS][T_TIMESTAMP_SECONDS] | INT64 | TIMESTAMP -[TO_BASE32][T_B32] | BYTES | STRING -[TO_BASE64][T_B64] | BYTES | STRING -[TO_HEX][T_HEX] | BYTES | STRING -[TO_JSON][T_JSON] | All data types | JSON -[TO_JSON_STRING][T_JSON_STRING] | All data types | STRING -[TO_PROTO][T_PROTO] | Most data types | PROTO value - - - - - -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md - -[ARRAY_STRING]: #array_to_string - -[BIT_I32]: #bit_cast_to_int32 - -[BIT_U32]: #bit_cast_to_uint32 - -[BIT_I64]: #bit_cast_to_int64 - -[BIT_U64]: #bit_cast_to_uint64 - -[F_B32]: #from_base32 - -[F_B64]: #from_base64 - -[F_HEX]: #from_hex - -[F_PROTO]: #from_proto - -[P_DATE]: #parse_date - -[P_DATETIME]: #parse_datetime - -[P_JSON]: #parse_json - -[P_TIME]: #parse_time - -[P_TIMESTAMP]: #parse_timestamp - -[SC_BTS]: #safe_convert_bytes_to_string - -[STRING_TIMESTAMP]: #string - -[T_B32]: #to_base32 - -[T_B64]: #to_base64 - -[T_HEX]: #to_hex - -[T_JSON]: #to_json +-- This works because elements on both sides match. +SELECT PARSE_DATETIME("%a %b %e %I:%M:%S %Y", "Thu Dec 25 07:30:00 2008") -[T_JSON_STRING]: #to_json_string +-- This produces an error because the year element is in different locations. +SELECT PARSE_DATETIME("%a %b %e %Y %I:%M:%S", "Thu Dec 25 07:30:00 2008") -[T_PROTO]: #to_proto +-- This produces an error because one of the year elements is missing. +SELECT PARSE_DATETIME("%a %b %e %I:%M:%S", "Thu Dec 25 07:30:00 2008") -[T_DATE]: #date +-- This works because %c can find all matching elements in datetime_string. +SELECT PARSE_DATETIME("%c", "Thu Dec 25 07:30:00 2008") +``` -[T_DATETIME]: #datetime +`PARSE_DATETIME` parses `string` according to the following rules: -[T_TIMESTAMP]: #timestamp ++ **Unspecified fields.** Any unspecified field is initialized from + `1970-01-01 00:00:00.0`. For example, if the year is unspecified then it + defaults to `1970`. ++ **Case insensitivity.** Names, such as `Monday` and `February`, + are case insensitive. ++ **Whitespace.** One or more consecutive white spaces in the format string + matches zero or more consecutive white spaces in the + `DATETIME` string. Leading and trailing + white spaces in the `DATETIME` string are always + allowed, even if they are not in the format string. ++ **Format precedence.** When two or more format elements have overlapping + information, the last one generally overrides any earlier ones, with some + exceptions. For example, both `%F` and `%Y` affect the year, so the earlier + element overrides the later. See the descriptions + of `%s`, `%C`, and `%y` in + [Supported Format Elements For DATETIME][datetime-format-elements]. ++ **Format divergence.** `%p` can be used with `am`, `AM`, `pm`, and `PM`. -[T_TIME]: #time +**Return Data Type** -[JSON_TO_BOOL]: #bool_for_json +`DATETIME` -[JSON_TO_STRING]: #string_for_json +**Examples** -[JSON_TO_INT64]: #int64_for_json +The following examples parse a `STRING` literal as a +`DATETIME`. -[JSON_TO_DOUBLE]: #double_for_json +```sql +SELECT PARSE_DATETIME('%Y-%m-%d %H:%M:%S', '1998-10-18 13:45:55') AS datetime; -[T_DATE_FROM_UNIX_DATE]: #date_from_unix_date +/*---------------------* + | datetime | + +---------------------+ + | 1998-10-18 13:45:55 | + *---------------------*/ +``` -[T_TIMESTAMP_FROM_UNIX_MICROS]: #timestamp_from_unix_micros +```sql +SELECT PARSE_DATETIME('%m/%d/%Y %I:%M:%S %p', '8/30/2018 2:23:38 pm') AS datetime -[T_TIMESTAMP_FROM_UNIX_MILLIS]: #timestamp_from_unix_millis +/*---------------------* + | datetime | + +---------------------+ + | 2018-08-30 14:23:38 | + *---------------------*/ +``` -[T_TIMESTAMP_FROM_UNIX_SECONDS]: #timestamp_from_unix_seconds +The following example parses a `STRING` literal +containing a date in a natural language format as a +`DATETIME`. -[T_TIMESTAMP_MICROS]: #timestamp_micros +```sql +SELECT PARSE_DATETIME('%A, %B %e, %Y','Wednesday, December 19, 2018') + AS datetime; -[T_TIMESTAMP_MILLIS]: #timestamp_millis +/*---------------------* + | datetime | + +---------------------+ + | 2018-12-19 00:00:00 | + *---------------------*/ +``` -[T_TIMESTAMP_SECONDS]: #timestamp_seconds +[datetime-format]: #format_datetime - +[datetime-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time -## Mathematical functions +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 -ZetaSQL supports mathematical functions. -All mathematical functions have the following behaviors: +## Debugging functions -+ They return `NULL` if any of the input parameters is `NULL`. -+ They return `NaN` if any of the arguments is `NaN`. +ZetaSQL supports the following debugging functions. ### Function list @@ -11807,1715 +11381,1646 @@ All mathematical functions have the following behaviors: - ABS + ERROR - Computes the absolute value of X. + Produces an error with a custom error message. - ACOS + IFERROR - Computes the inverse cosine of X. + Evaluates a try expression, and if an evaluation error is produced, returns + the result of a catch expression. - ACOSH + ISERROR - Computes the inverse hyperbolic cosine of X. + Evaluates a try expression, and if an evaluation error is produced, returns + TRUE. - ASIN + NULLIFERROR - Computes the inverse sine of X. + Evaluates a try expression, and if an evaluation error is produced, returns + NULL. - - ASINH + + - - - Computes the inverse hyperbolic sine of X. - - +### `ERROR` - - ATAN +```sql +ERROR(error_message) +``` - - - Computes the inverse tangent of X. - - +**Description** - - ATAN2 +Returns an error. - - - Computes the inverse tangent of X/Y, using the signs of - X and Y to determine the quadrant. - - +**Definitions** - - ATANH ++ `error_message`: A `STRING` value that represents the error message to + produce. - - - Computes the inverse hyperbolic tangent of X. - - +**Details** - - CBRT +`ERROR` is treated like any other expression that may +result in an error: there is no special guarantee of evaluation order. - - - Computes the cube root of X. - - +**Return Data Type** - - CEIL +ZetaSQL infers the return type in context. - - - Gets the smallest integral value that is not less than X. - - +**Examples** - - CEILING +In the following example, the query returns an error message if the value of the +row does not match one of two defined values. - - - Synonym of CEIL. - - +```sql +SELECT + CASE + WHEN value = 'foo' THEN 'Value is foo.' + WHEN value = 'bar' THEN 'Value is bar.' + ELSE ERROR(CONCAT('Found unexpected value: ', value)) + END AS new_value +FROM ( + SELECT 'foo' AS value UNION ALL + SELECT 'bar' AS value UNION ALL + SELECT 'baz' AS value); - - COS +-- Found unexpected value: baz +``` - - - Computes the cosine of X. - - +In the following example, ZetaSQL may evaluate the `ERROR` function +before or after the `x > 0` condition, because ZetaSQL +generally provides no ordering guarantees between `WHERE` clause conditions and +there are no special guarantees for the `ERROR` function. - - COSH +```sql +SELECT * +FROM (SELECT -1 AS x) +WHERE x > 0 AND ERROR('Example error'); +``` - - - Computes the hyperbolic cosine of X. - - +In the next example, the `WHERE` clause evaluates an `IF` condition, which +ensures that ZetaSQL only evaluates the `ERROR` function if the +condition fails. - - COSINE_DISTANCE +```sql +SELECT * +FROM (SELECT -1 AS x) +WHERE IF(x > 0, true, ERROR(FORMAT('Error: x must be positive but is %t', x))); - - Computes the cosine distance between two vectors. - +-- Error: x must be positive but is -1 +``` - - COT +### `IFERROR` - - - Computes the cotangent of X. - - +```sql +IFERROR(try_expression, catch_expression) +``` - - COTH +**Description** - - - Computes the hyperbolic cotangent of X. - - +Evaluates `try_expression`. - - CSC +When `try_expression` is evaluated: - - - Computes the cosecant of X. - - ++ If the evaluation of `try_expression` does not produce an error, then + `IFERROR` returns the result of `try_expression` without evaluating + `catch_expression`. ++ If the evaluation of `try_expression` produces a system error, then `IFERROR` + produces that system error. ++ If the evaluation of `try_expression` produces an evaluation error, then + `IFERROR` suppresses that evaluation error and evaluates `catch_expression`. - - CSCH +If `catch_expression` is evaluated: - - - Computes the hyperbolic cosecant of X. - - ++ If the evaluation of `catch_expression` does not produce an error, then + `IFERROR` returns the result of `catch_expression`. ++ If the evaluation of `catch_expression` produces any error, then `IFERROR` + produces that error. - - DIV +**Arguments** - - - Divides integer X by integer Y. - - ++ `try_expression`: An expression that returns a scalar value. ++ `catch_expression`: An expression that returns a scalar value. - - EXP +The results of `try_expression` and `catch_expression` must share a +[supertype][supertype]. - - - Computes e to the power of X. - - +**Return Data Type** - - EUCLIDEAN_DISTANCE +The [supertype][supertype] for `try_expression` and +`catch_expression`. - - Computes the Euclidean distance between two vectors. - +**Example** - - FLOOR +In the following examples, the query successfully evaluates `try_expression`. - - - Gets the largest integral value that is not greater than X. - - +```sql +SELECT IFERROR('a', 'b') AS result - - GREATEST +/*--------* + | result | + +--------+ + | a | + *--------*/ +``` - - - Gets the greatest value among X1,...,XN. - - +```sql +SELECT IFERROR((SELECT [1,2,3][OFFSET(0)]), -1) AS result - - IEEE_DIVIDE +/*--------* + | result | + +--------+ + | 1 | + *--------*/ +``` - - - Divides X by Y, but does not generate errors for - division by zero or overflow. - - +In the following examples, `IFERROR` catches an evaluation error in the +`try_expression` and successfully evaluates `catch_expression`. - - IS_INF +```sql +SELECT IFERROR(ERROR('a'), 'b') AS result - - - Checks if X is positive or negative infinity. - - +/*--------* + | result | + +--------+ + | b | + *--------*/ +``` - - IS_NAN +```sql +SELECT IFERROR((SELECT [1,2,3][OFFSET(9)]), -1) AS result - - - Checks if X is a NaN value. - - +/*--------* + | result | + +--------+ + | -1 | + *--------*/ +``` - - LEAST +In the following query, the error is handled by the innermost `IFERROR` +operation, `IFERROR(ERROR('a'), 'b')`. - - - Gets the least value among X1,...,XN. - - +```sql +SELECT IFERROR(IFERROR(ERROR('a'), 'b'), 'c') AS result - - LN +/*--------* + | result | + +--------+ + | b | + *--------*/ +``` - - - Computes the natural logarithm of X. - - +In the following query, the error is handled by the outermost `IFERROR` +operation, `IFERROR(..., 'c')`. - - LOG +```sql +SELECT IFERROR(IFERROR(ERROR('a'), ERROR('b')), 'c') AS result - - - Computes the natural logarithm of X or the logarithm of - X to base Y. - - +/*--------* + | result | + +--------+ + | c | + *--------*/ +``` - - LOG10 +In the following example, an evaluation error is produced because the subquery +passed in as the `try_expression` evaluates to a table, not a scalar value. - - - Computes the natural logarithm of X to base 10. - - +```sql +SELECT IFERROR((SELECT e FROM UNNEST([1, 2]) AS e), 3) AS result - - MOD +/*--------* + | result | + +--------+ + | 3 | + *--------*/ +``` - - - Gets the remainder of the division of X by Y. - - +In the following example, `IFERROR` catches an evaluation error in `ERROR('a')` +and then evaluates `ERROR('b')`. Because there is also an evaluation error in +`ERROR('b')`, `IFERROR` produces an evaluation error for `ERROR('b')`. - - PI +```sql +SELECT IFERROR(ERROR('a'), ERROR('b')) AS result - - - Produces the mathematical constant π as a - DOUBLE value. - - +--ERROR: OUT_OF_RANGE 'b' +``` - - PI_BIGNUMERIC +[supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes - - - Produces the mathematical constant π as a BIGNUMERIC value. - - +### `ISERROR` - - PI_NUMERIC +```sql +ISERROR(try_expression) +``` - - - Produces the mathematical constant π as a NUMERIC value. - - +**Description** - - POW +Evaluates `try_expression`. - - - Produces the value of X raised to the power of Y. - - ++ If the evaluation of `try_expression` does not produce an error, then + `ISERROR` returns `FALSE`. ++ If the evaluation of `try_expression` produces a system error, then `ISERROR` + produces that system error. ++ If the evaluation of `try_expression` produces an evaluation error, then + `ISERROR` returns `TRUE`. - - POWER +**Arguments** - - - Synonym of POW. - - ++ `try_expression`: An expression that returns a scalar value. - - RAND +**Return Data Type** - - - Generates a pseudo-random value of type - DOUBLE in the range of - [0, 1). - - +`BOOL` - - ROUND +**Example** - - - Rounds X to the nearest integer or rounds X - to N decimal places after the decimal point. - - +In the following examples, `ISERROR` successfully evaluates `try_expression`. - - SAFE_ADD +```sql +SELECT ISERROR('a') AS is_error - - - Equivalent to the addition operator (X + Y), but returns - NULL if overflow occurs. - - +/*----------* + | is_error | + +----------+ + | false | + *----------*/ +``` - - SAFE_DIVIDE +```sql +SELECT ISERROR(2/1) AS is_error - - - Equivalent to the division operator (X / Y), but returns - NULL if an error occurs. - - +/*----------* + | is_error | + +----------+ + | false | + *----------*/ +``` + +```sql +SELECT ISERROR((SELECT [1,2,3][OFFSET(0)])) AS is_error + +/*----------* + | is_error | + +----------+ + | false | + *----------*/ +``` + +In the following examples, `ISERROR` catches an evaluation error in +`try_expression`. + +```sql +SELECT ISERROR(ERROR('a')) AS is_error + +/*----------* + | is_error | + +----------+ + | true | + *----------*/ +``` + +```sql +SELECT ISERROR(2/0) AS is_error + +/*----------* + | is_error | + +----------+ + | true | + *----------*/ +``` + +```sql +SELECT ISERROR((SELECT [1,2,3][OFFSET(9)])) AS is_error + +/*----------* + | is_error | + +----------+ + | true | + *----------*/ +``` + +In the following example, an evaluation error is produced because the subquery +passed in as `try_expression` evaluates to a table, not a scalar value. + +```sql +SELECT ISERROR((SELECT e FROM UNNEST([1, 2]) AS e)) AS is_error + +/*----------* + | is_error | + +----------+ + | true | + *----------*/ +``` + +### `NULLIFERROR` + +```sql +NULLIFERROR(try_expression) +``` + +**Description** + +Evaluates `try_expression`. + ++ If the evaluation of `try_expression` does not produce an error, then + `NULLIFERROR` returns the result of `try_expression`. ++ If the evaluation of `try_expression` produces a system error, then + `NULLIFERROR` produces that system error. + ++ If the evaluation of `try_expression` produces an evaluation error, then + `NULLIFERROR` returns `NULL`. + +**Arguments** + ++ `try_expression`: An expression that returns a scalar value. + +**Return Data Type** + +The data type for `try_expression` or `NULL` + +**Example** + +In the following examples, `NULLIFERROR` successfully evaluates +`try_expression`. + +```sql +SELECT NULLIFERROR('a') AS result + +/*--------* + | result | + +--------+ + | a | + *--------*/ +``` + +```sql +SELECT NULLIFERROR((SELECT [1,2,3][OFFSET(0)])) AS result + +/*--------* + | result | + +--------+ + | 1 | + *--------*/ +``` + +In the following examples, `NULLIFERROR` catches an evaluation error in +`try_expression`. + +```sql +SELECT NULLIFERROR(ERROR('a')) AS result + +/*--------* + | result | + +--------+ + | NULL | + *--------*/ +``` + +```sql +SELECT NULLIFERROR((SELECT [1,2,3][OFFSET(9)])) AS result + +/*--------* + | result | + +--------+ + | NULL | + *--------*/ +``` + +In the following example, an evaluation error is produced because the subquery +passed in as `try_expression` evaluates to a table, not a scalar value. + +```sql +SELECT NULLIFERROR((SELECT e FROM UNNEST([1, 2]) AS e)) AS result + +/*--------* + | result | + +--------+ + | NULL | + *--------*/ +``` + +## Differentially private aggregate functions + + +ZetaSQL supports differentially private aggregate functions. +For an explanation of how aggregate functions work, see +[Aggregate function calls][agg-function-calls]. + +You can only use differentially private aggregate functions with +[differentially private queries][dp-guide] in a +[differential privacy clause][dp-syntax]. + +Note: In this topic, the privacy parameters in the examples are not +recommendations. You should work with your privacy or security officer to +determine the optimal privacy parameters for your dataset and organization. + +### Function list + + + + + + + + + - - - - - - - - - - - -
NameSummary
SAFE_MULTIPLY + AVG - Equivalent to the multiplication operator (X * Y), - but returns NULL if overflow occurs. + DIFFERENTIAL_PRIVACY-supported AVG.

+ Gets the differentially-private average of non-NULL, + non-NaN values in a query with a + DIFFERENTIAL_PRIVACY clause.
SAFE_NEGATE + COUNT - Equivalent to the unary minus operator (-X), but returns - NULL if overflow occurs. + DIFFERENTIAL_PRIVACY-supported COUNT.

+ Signature 1: Gets the differentially-private count of rows in a query with a + DIFFERENTIAL_PRIVACY clause. +
+
+ Signature 2: Gets the differentially-private count of rows with a + non-NULL expression in a query with a + DIFFERENTIAL_PRIVACY clause.
SAFE_SUBTRACT + PERCENTILE_CONT - Equivalent to the subtraction operator (X - Y), but - returns NULL if overflow occurs. + DIFFERENTIAL_PRIVACY-supported PERCENTILE_CONT.

+ Computes a differentially-private percentile across privacy unit columns + in a query with a DIFFERENTIAL_PRIVACY clause.
SEC + SUM - Computes the secant of X. + DIFFERENTIAL_PRIVACY-supported SUM.

+ Gets the differentially-private sum of non-NULL, + non-NaN values in a query with a + DIFFERENTIAL_PRIVACY clause.
SECH + VAR_POP - Computes the hyperbolic secant of X. + DIFFERENTIAL_PRIVACY-supported VAR_POP.

+ Computes the differentially-private population (biased) variance of values + in a query with a DIFFERENTIAL_PRIVACY clause.
SIGN + ANON_AVG - Produces -1 , 0, or +1 for negative, zero, and positive arguments - respectively. + Deprecated. + Gets the differentially-private average of non-NULL, + non-NaN values in a query with an + ANONYMIZATION clause.
SIN + ANON_COUNT - Computes the sine of X. + Deprecated. +
+
+ Signature 1: Gets the differentially-private count of rows in a query + with an ANONYMIZATION clause. +
+
+ Signature 2: Gets the differentially-private count of rows with a + non-NULL expression in a query with an + ANONYMIZATION clause.
SINH + ANON_PERCENTILE_CONT - Computes the hyperbolic sine of X. + Deprecated. + Computes a differentially-private percentile across privacy unit columns + in a query with an ANONYMIZATION clause.
SQRT + ANON_QUANTILES - Computes the square root of X. + Deprecated. + Produces an array of differentially-private quantile boundaries + in a query with an ANONYMIZATION clause.
TAN + ANON_STDDEV_POP - Computes the tangent of X. + Deprecated. + Computes a differentially-private population (biased) standard deviation of + values in a query with an ANONYMIZATION clause.
TANH + ANON_SUM - Computes the hyperbolic tangent of X. + Deprecated. + Gets the differentially-private sum of non-NULL, + non-NaN values in a query with an + ANONYMIZATION clause.
TRUNC + ANON_VAR_POP - Rounds a number like ROUND(X) or ROUND(X, N), - but always rounds towards zero and never overflows. + Deprecated. + Computes a differentially-private population (biased) variance of values + in a query with an ANONYMIZATION clause.
-### `ABS` +### `AVG` (`DIFFERENTIAL_PRIVACY`) + -``` -ABS(X) +```sql +WITH DIFFERENTIAL_PRIVACY ... + AVG( + expression, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) ``` **Description** -Computes absolute value. Returns an error if the argument is an integer and the -output value cannot be represented as the same type; this happens only for the -largest negative input value, which has no positive representation. +Returns the average of non-`NULL`, non-`NaN` values in the expression. +This function first computes the average per privacy unit column, and then +computes the final result by averaging these averages. - - - - - - - - - - - - - - - - - - - - - - - - - -
XABS(X)
2525
-2525
+inf+inf
-inf+inf
+This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support the following arguments: -**Return Data Type** ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. - +**Return type** - - - - - - - - +`DOUBLE` -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
+**Examples** -### `ACOS` +The following differentially private query gets the average number of each item +requested per professor. Smaller aggregations might not be included. This query +references a table called [`professors`][dp-example-tables]. -``` -ACOS(X) +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity +FROM professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | pencil | 38.5038356810269 | + | pen | 13.4725028762032 | + *----------+------------------*/ ``` -**Description** +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + AVG(quantity) average_quantity +FROM professors +GROUP BY item; -Computes the principal value of the inverse cosine of X. The return value is in -the range [0,π]. Generates an error if X is a value outside of the -range [-1, 1]. +-- These results will not change when you run the query. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 40 | + | pen | 18.5 | + *----------+------------------*/ +``` - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XACOS(X)
+infNaN
-infNaN
NaNNaN
X < -1Error
X > 1Error
+The following differentially private query gets the average number of each item +requested per professor. Smaller aggregations might not be included. This query +references a view called [`view_on_professors`][dp-example-views]. -### `ACOSH` +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + AVG(quantity, contribution_bounds_per_group=>(0, 100)) average_quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | pencil | 38.5038356810269 | + | pen | 13.4725028762032 | + *----------+------------------*/ ``` -ACOSH(X) + +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + AVG(quantity) average_quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; + +-- These results will not change when you run the query. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 40 | + | pen | 18.5 | + *----------+------------------*/ ``` -**Description** +Note: For more information about when and when not to use +noise, see [Remove noise][dp-noise]. -Computes the inverse hyperbolic cosine of X. Generates an error if X is a value -less than 1. +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables - - - - - - - - - - - - - - - - - - - - - - - - - -
XACOSH(X)
+inf+inf
-infNaN
NaNNaN
X < 1Error
+[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -### `ASIN` +[dp-clamped-named]: #dp_clamped_named -``` -ASIN(X) -``` +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause -**Description** +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -Computes the principal value of the inverse sine of X. The return value is in -the range [-π/2,π/2]. Generates an error if X is outside of -the range [-1, 1]. +### `COUNT` (`DIFFERENTIAL_PRIVACY`) + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XASIN(X)
+infNaN
-infNaN
NaNNaN
X < -1Error
X > 1Error
++ [Signature 1](#dp_count_signature1): Returns the number of rows in a + differentially private `FROM` clause. ++ [Signature 2](#dp_count_signature2): Returns the number of non-`NULL` + values in an expression. -### `ASINH` +#### Signature 1 + -``` -ASINH(X) +```sql +WITH DIFFERENTIAL_PRIVACY ... + COUNT( + *, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) ``` **Description** -Computes the inverse hyperbolic sine of X. Does not fail. +Returns the number of rows in the +[differentially private][dp-from-clause] `FROM` clause. The final result +is an aggregation across a privacy unit column. - - - - - - - - - - - - - - - - - - - - - -
XASINH(X)
+inf+inf
-inf-inf
NaNNaN
+This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support the following argument: -### `ATAN` ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. -``` -ATAN(X) -``` +**Return type** -**Description** +`INT64` -Computes the principal value of the inverse tangent of X. The return value is -in the range [-π/2,π/2]. Does not fail. +**Examples** - - - - - - - - - - - - - - - - - - - - - -
XATAN(X)
+infπ/2
-inf-π/2
NaNNaN
+The following differentially private query counts the number of requests for +each item. This query references a table called +[`professors`][dp-example-tables]. -### `ATAN2` +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested +FROM professors +GROUP BY item; -``` -ATAN2(X, Y) +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ ``` -**Description** +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested +FROM professors +GROUP BY item; -Calculates the principal value of the inverse tangent of X/Y using the signs of -the two arguments to determine the quadrant. The return value is in the range -[-π,π]. +-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ +``` - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XYATAN2(X, Y)
NaNAny valueNaN
Any valueNaNNaN
0.00.00.0
Positive Finite value-infπ
Negative Finite value-inf
Finite value+inf0.0
+infFinite valueπ/2
-infFinite value-π/2
+inf-inf¾π
-inf-inf-¾π
+inf+infπ/4
-inf+inf-π/4
+The following differentially private query counts the number of requests for +each item. This query references a view called +[`view_on_professors`][dp-example-views]. -### `ATANH` +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -``` -ATANH(X) +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ ``` -**Description** +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -Computes the inverse hyperbolic tangent of X. Generates an error if X is outside -of the range (-1, 1). +-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ +``` - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XATANH(X)
+infNaN
-infNaN
NaNNaN
X < -1Error
X > 1Error
+Note: For more information about when and when not to use +noise, see [Remove noise][dp-noise]. -### `CBRT` +#### Signature 2 + -``` -CBRT(X) +```sql +WITH DIFFERENTIAL_PRIVACY ... + COUNT( + expression, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) ``` **Description** -Computes the cube root of `X`. `X` can be any data type -that [coerces to `DOUBLE`][conversion-rules]. -Supports the `SAFE.` prefix. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XCBRT(X)
+infinf
-inf-inf
NaNNaN
00
NULLNULL
+Returns the number of non-`NULL` expression values. The final result is an +aggregation across a privacy unit column. -**Return Data Type** +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support these arguments: -`DOUBLE` ++ `expression`: The input expression. This expression can be any + numeric input type, such as `INT64`. ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. -**Example** +**Return type** -```sql -SELECT CBRT(27) AS cube_root; +`INT64` -/*--------------------* - | cube_root | - +--------------------+ - | 3.0000000000000004 | - *--------------------*/ -``` +**Examples** -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules +The following differentially private query counts the number of requests made +for each type of item. This query references a table called +[`professors`][dp-example-tables]. -### `CEIL` +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + COUNT(item, contribution_bounds_per_group => (0,100)) times_requested +FROM professors +GROUP BY item; -``` -CEIL(X) +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ ``` -**Description** - -Returns the smallest integral value that is not less than X. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XCEIL(X)
2.02.0
2.33.0
2.83.0
2.53.0
-2.3-2.0
-2.8-2.0
-2.5-2.0
00
+inf+inf
-inf-inf
NaNNaN
- -**Return Data Type** - - +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + COUNT(item, contribution_bounds_per_group => (0,100)) times_requested +FROM professors +GROUP BY item; - - - - - - - - +-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ +``` -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
+The following differentially private query counts the number of requests made +for each type of item. This query references a view called +[`view_on_professors`][dp-example-views]. -### `CEILING` +```sql +-- With noise +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + COUNT(item, contribution_bounds_per_group=>(0, 100)) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -``` -CEILING(X) +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ ``` -**Description** - -Synonym of CEIL(X) - -### `COS` +```sql +--Without noise (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + COUNT(item, contribution_bounds_per_group=>(0, 100)) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; +-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ ``` -COS(X) -``` - -**Description** -Computes the cosine of X where X is specified in radians. Never fails. +Note: For more information about when and when not to use +noise, see [Remove noise][dp-noise]. - - - - - - - - - - - - - - - - - - - - - -
XCOS(X)
+infNaN
-infNaN
NaNNaN
+[dp-clamp-implicit]: #dp_implicit_clamping -### `COSH` +[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from -``` -COSH(X) -``` +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables -**Description** +[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -Computes the hyperbolic cosine of X where X is specified in radians. -Generates an error if overflow occurs. +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause - - - - - - - - - - - - - - - - - - - - - -
XCOSH(X)
+inf+inf
-inf+inf
NaNNaN
+[dp-clamped-named]: #dp_clamped_named -### `COSINE_DISTANCE` +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - +### `PERCENTILE_CONT` (`DIFFERENTIAL_PRIVACY`) + ```sql -COSINE_DISTANCE(vector1, vector2) +WITH DIFFERENTIAL_PRIVACY ... + PERCENTILE_CONT( + expression, + percentile, + contribution_bounds_per_row => (lower_bound, upper_bound) + ) ``` **Description** -Computes the [cosine distance][cosine-distance] between two vectors. - -**Definitions** +Takes an expression and computes a percentile for it. The final result is an +aggregation across privacy unit columns. -+ `vector1`: The first vector. -+ `vector2`: The second vector. +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support these arguments: -**Details** ++ `expression`: The input expression. This can be most numeric input types, + such as `INT64`. `NULL` values are always ignored. ++ `percentile`: The percentile to compute. The percentile must be a literal in + the range `[0, 1]`. ++ `contribution_bounds_per_row`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each row separately before performing intermediate + grouping on the privacy unit column. -Each vector represents a quantity that includes magnitude and direction. -The following vector types are supported: +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. -+ Dense vector: `ARRAY` that represents - the vector and its numerical values. `value` is of type - `DOUBLE`. +**Return type** - This is an example of a dense vector: +`DOUBLE` - ``` - [1.0, 0.0, 3.0] - ``` -+ Sparse vector: `ARRAY>`, where - `STRUCT` contains a dimension-value pair for each numerical value in the - vector. This information is used to generate a dense vector. +**Examples** - + `dimension`: A `STRING` or `INT64` value that represents the - specific dimension for `value` in a vector. +The following differentially private query gets the percentile of items +requested. Smaller aggregations might not be included. This query references a +view called [`professors`][dp-example-tables]. - + `value`: A `DOUBLE` value that represents the - numerical value for `dimension`. +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row => (0,100)) percentile_requested +FROM professors +GROUP BY item; - A sparse vector contains mostly zeros, with only a few non-zero elements. - It's a useful data structure for representing data that is mostly empty or - has a lot of zeros. For example, if you have a vector of length 10,000 and - only 10 elements are non-zero, then it is a sparse vector. As a result, - it's more efficient to describe a sparse vector by only mentioning its - non-zero elements. If an element isn't present in the - sparse representation, its value can be implicitly understood to be zero. +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. + /*----------+----------------------* + | item | percentile_requested | + +----------+----------------------+ + | pencil | 72.00011444091797 | + | scissors | 8.000175476074219 | + | pen | 23.001075744628906 | + *----------+----------------------*/ +``` - The following `INT64` sparse vector +The following differentially private query gets the percentile of items +requested. Smaller aggregations might not be included. This query references a +view called [`view_on_professors`][dp-example-views]. - ``` - [(0, 1.0), (2, 3.0)] - ``` +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row=>(0, 100)) percentile_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; - is converted to this dense vector: +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+----------------------* + | item | percentile_requested | + +----------+----------------------+ + | pencil | 72.00011444091797 | + | scissors | 8.000175476074219 | + | pen | 23.001075744628906 | + *----------+----------------------*/ +``` - ``` - [1.0, 0.0, 3.0] - ``` +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables - The following `STRING` sparse vector +[dp-clamped-named]: #dp_clamped_named - ``` - [('d': 4.0), ('a', 1.0), ('b': 3.0)] - ``` +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause - is converted to this dense vector: +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - ``` - [1.0, 3.0, 0.0, 4.0] - ``` +### `SUM` (`DIFFERENTIAL_PRIVACY`) + -The ordering of numeric values in a vector doesn't impact the results -produced by this function if the dimensions of the vectors are aligned. +```sql +WITH DIFFERENTIAL_PRIVACY ... + SUM( + expression, + [contribution_bounds_per_group => (lower_bound, upper_bound)] + ) +``` -A vector can have one or more dimensions. Both vectors in this function must -share these same dimensions, and if they don't, an error is produced. +**Description** -A vector can't be a zero vector. A vector is a zero vector if all elements in -the vector are `0`. For example, `[0.0, 0.0]`. If a zero vector is encountered, -an error is produced. +Returns the sum of non-`NULL`, non-`NaN` values in the expression. The final +result is an aggregation across privacy unit columns. -An error is produced if an element or field in a vector is `NULL`. +This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax] +and can support these arguments: -If `vector1` or `vector2` is `NULL`, `NULL` is returned. ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. `NULL` values are always ignored. ++ `contribution_bounds_per_group`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each group separately before performing intermediate + grouping on the privacy unit column. **Return type** -`DOUBLE` +One of the following [supertypes][dp-supertype]: + ++ `INT64` ++ `UINT64` ++ `DOUBLE` **Examples** -In the following example, dense vectors are used to compute the -cosine distance: +The following differentially private query gets the sum of items requested. +Smaller aggregations might not be included. This query references a view called +[`professors`][dp-example-tables]. ```sql -SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + SUM(quantity, contribution_bounds_per_group => (0,100)) quantity +FROM professors +GROUP BY item; -/*----------* - | results | - +----------+ - | 0.016130 | - *----------*/ +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------* + | item | quantity | + +----------+-----------+ + | pencil | 143 | + | pen | 59 | + *----------+-----------*/ ``` -In the following example, sparse vectors are used to compute the -cosine distance: - ```sql -SELECT COSINE_DISTANCE( - [(1, 1.0), (2, 2.0)], - [(2, 4.0), (1, 3.0)]) AS results; +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + SUM(quantity) quantity +FROM professors +GROUP BY item; - /*----------* - | results | - +----------+ - | 0.016130 | - *----------*/ +-- These results will not change when you run the query. +/*----------+----------* + | item | quantity | + +----------+----------+ + | scissors | 8 | + | pencil | 144 | + | pen | 58 | + *----------+----------*/ ``` -The ordering of numeric values in a vector doesn't impact the results -produced by this function. For example these queries produce the same results -even though the numeric values in each vector is in a different order: +The following differentially private query gets the sum of items requested. +Smaller aggregations might not be included. This query references a view called +[`view_on_professors`][dp-example-views]. ```sql -SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; +-- With noise, using the epsilon parameter. +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + SUM(quantity, contribution_bounds_per_group=>(0, 100)) quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -SELECT COSINE_DISTANCE([2.0, 1.0], [4.0, 3.0]) AS results; +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------* + | item | quantity | + +----------+-----------+ + | pencil | 143 | + | pen | 59 | + *----------+-----------*/ +``` -SELECT COSINE_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results; +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + SUM(quantity) quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; - /*----------* - | results | - +----------+ - | 0.016130 | - *----------*/ +-- These results will not change when you run the query. +/*----------+----------* + | item | quantity | + +----------+----------+ + | scissors | 8 | + | pencil | 144 | + | pen | 58 | + *----------+----------*/ ``` -In the following example, the function can't compute cosine distance against -the first vector, which is a zero vector: +Note: For more information about when and when not to use +noise, see [Use differential privacy][dp-noise]. -```sql --- ERROR -SELECT COSINE_DISTANCE([0.0, 0.0], [3.0, 4.0]) AS results; -``` +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables -Both dense vectors must have the same dimensions. If not, an error is produced. -In the following examples, the first vector has two dimensions and the second -vector has three: +[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -```sql --- ERROR -SELECT COSINE_DISTANCE([9.0, 7.0], [8.0, 4.0, 5.0]) AS results; -``` +[dp-supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes -If you use sparse vectors and you repeat a dimension, an error is -produced: +[dp-clamped-named]: #dp_clamped_named -```sql --- ERROR -SELECT COSINE_DISTANCE( - [(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results; -``` +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause -[cosine-distance]: https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_distance +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -### `COT` +### `VAR_POP` (`DIFFERENTIAL_PRIVACY`) + -``` -COT(X) +```sql +WITH DIFFERENTIAL_PRIVACY ... + VAR_POP( + expression, + [contribution_bounds_per_row => (lower_bound, upper_bound)] + ) ``` **Description** -Computes the cotangent for the angle of `X`, where `X` is specified in radians. -`X` can be any data type -that [coerces to `DOUBLE`][conversion-rules]. -Supports the `SAFE.` prefix. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XCOT(X)
+infNaN
-infNaN
NaNNaN
0Error
NULLNULL
+Takes an expression and computes the population (biased) variance of the values +in the expression. The final result is an aggregation across +privacy unit columns between `0` and `+Inf`. You can +[clamp the input values][dp-clamp-explicit] explicitly, otherwise input values +are clamped implicitly. Clamping is performed per individual user values. -**Return Data Type** +This function must be used with the `DIFFERENTIAL_PRIVACY` clause and +can support these arguments: -`DOUBLE` ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. `NULL`s are always ignored. ++ `contribution_bounds_per_row`: The + [contribution bounds named argument][dp-clamped-named]. + Perform clamping per each row separately before performing intermediate + grouping on individual user values. -**Example** +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. -```sql -SELECT COT(1) AS a, SAFE.COT(0) AS b; +**Return type** -/*---------------------+------* - | a | b | - +---------------------+------+ - | 0.64209261593433065 | NULL | - *---------------------+------*/ -``` +`DOUBLE` -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules +**Examples** -### `COTH` +The following differentially private query gets the +population (biased) variance of items requested. Smaller aggregations may not +be included. This query references a view called +[`professors`][dp-example-tables]. -``` -COTH(X) +```sql +-- With noise +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id) + item, + VAR_POP(quantity, contribution_bounds_per_row => (0,100)) pop_variance +FROM professors +GROUP BY item; + +-- These results will change each time you run the query. +-- Smaller aggregations may be removed. +/*----------+-----------------* + | item | pop_variance | + +----------+-----------------+ + | pencil | 642 | + | pen | 2.6666666666665 | + | scissors | 2500 | + *----------+-----------------*/ ``` -**Description** +The following differentially private query gets the +population (biased) variance of items requested. Smaller aggregations might not +be included. This query references a view called +[`view_on_professors`][dp-example-views]. -Computes the hyperbolic cotangent for the angle of `X`, where `X` is specified -in radians. `X` can be any data type -that [coerces to `DOUBLE`][conversion-rules]. -Supports the `SAFE.` prefix. +```sql +-- With noise +SELECT + WITH DIFFERENTIAL_PRIVACY + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + VAR_POP(quantity, contribution_bounds_per_row=>(0, 100)) pop_variance +FROM {{USERNAME}}.view_on_professors +GROUP BY item; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XCOTH(X)
+inf1
-inf-1
NaNNaN
0Error
NULLNULL
+-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | pop_variance | + +----------+-----------------+ + | pencil | 642 | + | pen | 2.6666666666665 | + | scissors | 2500 | + *----------+-----------------*/ +``` -**Return Data Type** +[dp-clamp-explicit]: #dp_explicit_clamping -`DOUBLE` +[dp-example-tables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_tables -**Example** +[dp-clamped-named]: #dp_clamped_named -```sql -SELECT COTH(1) AS a, SAFE.COTH(0) AS b; +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -/*----------------+------* - | a | b | - +----------------+------+ - | 1.313035285499 | NULL | - *----------------+------*/ -``` +### `ANON_AVG` (DEPRECATED) + -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules +Warning: This function has been deprecated. Use +`AVG` (differential privacy) instead. -### `CSC` +```sql +WITH ANONYMIZATION ... + ANON_AVG(expression [clamped_between_clause]) -``` -CSC(X) +clamped_between_clause: + CLAMPED BETWEEN lower_bound AND upper_bound ``` **Description** -Computes the cosecant of the input angle, which is in radians. -`X` can be any data type -that [coerces to `DOUBLE`][conversion-rules]. -Supports the `SAFE.` prefix. +Returns the average of non-`NULL`, non-`NaN` values in the expression. +This function first computes the average per privacy unit column, and then +computes the final result by averaging these averages. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XCSC(X)
+infNaN
-infNaN
NaNNaN
0Error
NULLNULL
+This function must be used with the `ANONYMIZATION` clause and +can support these arguments: -**Return Data Type** ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. ++ `clamped_between_clause`: Perform [clamping][dp-clamp-between] per + privacy unit column averages. + +**Return type** `DOUBLE` -**Example** +**Examples** + +The following differentially private query gets the average number of each item +requested per professor. Smaller aggregations might not be included. This query +references a view called [`view_on_professors`][dp-example-views]. ```sql -SELECT CSC(100) AS a, CSC(-1) AS b, SAFE.CSC(0) AS c; +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -/*----------------+-----------------+------* - | a | b | c | - +----------------+-----------------+------+ - | -1.97485753142 | -1.188395105778 | NULL | - *----------------+-----------------+------*/ +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | pencil | 38.5038356810269 | + | pen | 13.4725028762032 | + *----------+------------------*/ ``` -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules - -### `CSCH` +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + ANON_AVG(quantity) average_quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; +-- These results will not change when you run the query. +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 40 | + | pen | 18.5 | + *----------+------------------*/ ``` -CSCH(X) -``` - -**Description** -Computes the hyperbolic cosecant of the input angle, which is in radians. -`X` can be any data type -that [coerces to `DOUBLE`][conversion-rules]. -Supports the `SAFE.` prefix. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XCSCH(X)
+inf0
-inf0
NaNNaN
0Error
NULLNULL
+Note: You can learn more about when and when not to use +noise [here][dp-noise]. -**Return Data Type** +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -`DOUBLE` +[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise -**Example** +[dp-clamp-between]: #dp_clamp_between -```sql -SELECT CSCH(0.5) AS a, CSCH(-2) AS b, SAFE.CSCH(0) AS c; +### `ANON_COUNT` (DEPRECATED) + -/*----------------+----------------+------* - | a | b | c | - +----------------+----------------+------+ - | 1.919034751334 | -0.27572056477 | NULL | - *----------------+----------------+------*/ -``` +Warning: This function has been deprecated. Use +`COUNT` (differential privacy) instead. -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules ++ [Signature 1](#anon_count_signature1) ++ [Signature 2](#anon_count_signature2) -### `DIV` +#### Signature 1 + -``` -DIV(X, Y) +```sql +WITH ANONYMIZATION ... + ANON_COUNT(*) ``` **Description** -Returns the result of integer division of X by Y. Division by zero returns -an error. Division by -1 may overflow. +Returns the number of rows in the +[differentially private][dp-from-clause] `FROM` clause. The final result +is an aggregation across privacy unit columns. +[Input values are clamped implicitly][dp-clamp-implicit]. Clamping is +performed per privacy unit column. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XYDIV(X, Y)
2045
12-7-1
2036
0200
200Error
+This function must be used with the `ANONYMIZATION` clause. -**Return Data Type** +**Return type** -The return data type is determined by the argument types with the following -table. - +`INT64` - - - - - - - - - - - - - +**Examples** -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERIC
INT32INT64INT64INT64ERRORNUMERICBIGNUMERIC
INT64INT64INT64INT64ERRORNUMERICBIGNUMERIC
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERIC
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERIC
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERIC
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERIC
+The following differentially private query counts the number of requests for +each item. This query references a view called +[`view_on_professors`][dp-example-views]. -### `EXP` +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_COUNT(*) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ ``` -EXP(X) -``` - -**Description** - -Computes *e* to the power of X, also called the natural exponential function. If -the result underflows, this function returns a zero. Generates an error if the -result overflows. - - - - - - - - - - - - - - - - - - - - - - -
XEXP(X)
0.01.0
+inf+inf
-inf0.0
- -**Return Data Type** - - - - - - - - - - +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + ANON_COUNT(*) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
+-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ +``` -### `EUCLIDEAN_DISTANCE` +Note: You can learn more about when and when not to use +noise [here][dp-noise]. - +#### Signature 2 + ```sql -EUCLIDEAN_DISTANCE(vector1, vector2) +WITH ANONYMIZATION ... + ANON_COUNT(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** -Computes the [Euclidean distance][euclidean-distance] between two vectors. +Returns the number of non-`NULL` expression values. The final result is an +aggregation across privacy unit columns. -**Definitions** +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: -+ `vector1`: The first vector. -+ `vector2`: The second vector. ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. -**Details** +**Return type** -Each vector represents a quantity that includes magnitude and direction. -The following vector types are supported: +`INT64` -+ Dense vector: `ARRAY` that represents - the vector and its numerical values. `value` is of type - `DOUBLE`. +**Examples** - This is an example of a dense vector: +The following differentially private query counts the number of requests made +for each type of item. This query references a view called +[`view_on_professors`][dp-example-views]. - ``` - [1.0, 0.0, 3.0] - ``` -+ Sparse vector: `ARRAY>`, where - `STRUCT` contains a dimension-value pair for each numerical value in the - vector. This information is used to generate a dense vector. +```sql +-- With noise +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; - + `dimension`: A `STRING` or `INT64` value that represents the - specific dimension for `value` in a vector. +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | pencil | 5 | + | pen | 2 | + *----------+-----------------*/ +``` - + `value`: A `DOUBLE` value that represents a - numerical value for `dimension`. +```sql +--Without noise (this un-noised version is for demonstration only) +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; - A sparse vector contains mostly zeros, with only a few non-zero elements. - It's a useful data structure for representing data that is mostly empty or - has a lot of zeros. For example, if you have a vector of length 10,000 and - only 10 elements are non-zero, then it is a sparse vector. As a result, - it's more efficient to describe a sparse vector by only mentioning its - non-zero elements. If an element isn't present in the - sparse representation, its value can be implicitly understood to be zero. +-- These results will not change when you run the query. +/*----------+-----------------* + | item | times_requested | + +----------+-----------------+ + | scissors | 1 | + | pencil | 4 | + | pen | 3 | + *----------+-----------------*/ +``` - The following `INT64` sparse vector +Note: You can learn more about when and when not to use +noise [here][dp-noise]. - ``` - [(0, 1.0), (2, 3.0)] - ``` +[dp-clamp-implicit]: #dp_implicit_clamping - is converted to this dense vector: +[dp-from-clause]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_from_rules - ``` - [1.0, 0.0, 3.0] - ``` +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - The following `STRING` sparse vector +[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise - ``` - [('d': 4.0), ('a', 1.0), ('b': 3.0)] - ``` +[dp-clamp-between]: #dp_clamp_between - is converted to this dense vector: +### `ANON_PERCENTILE_CONT` (DEPRECATED) + - ``` - [1.0, 3.0, 0.0, 4.0] - ``` +Warning: This function has been deprecated. Use +`PERCENTILE_CONT` (differential privacy) instead. -The ordering of numeric values in a vector doesn't impact the results -produced by this function if the dimensions of the vectors are aligned. +```sql +WITH ANONYMIZATION ... + ANON_PERCENTILE_CONT(expression, percentile [CLAMPED BETWEEN lower_bound AND upper_bound]) +``` -A vector can have one or more dimensions. Both vectors in this function must -share these same dimensions, and if they don't, an error is produced. +**Description** + +Takes an expression and computes a percentile for it. The final result is an +aggregation across privacy unit columns. -A vector can be a zero vector. A vector is a zero vector if all elements in -the vector are `0`. For example, `[0.0, 0.0]`. +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: -An error is produced if an element or field in a vector is `NULL`. ++ `expression`: The input expression. This can be most numeric input types, + such as `INT64`. `NULL`s are always ignored. ++ `percentile`: The percentile to compute. The percentile must be a literal in + the range [0, 1] ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. -If `vector1` or `vector2` is `NULL`, `NULL` is returned. +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. **Return type** @@ -13523,2429 +13028,2035 @@ If `vector1` or `vector2` is `NULL`, `NULL` is returned. **Examples** -In the following example, dense vectors are used to compute the -Euclidean distance: +The following differentially private query gets the percentile of items +requested. Smaller aggregations might not be included. This query references a +view called [`view_on_professors`][dp-example-views]. ```sql -SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_PERCENTILE_CONT(quantity, 0.5 CLAMPED BETWEEN 0 AND 100) percentile_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -/*----------* - | results | - +----------+ - | 2.828 | - *----------*/ +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+----------------------* + | item | percentile_requested | + +----------+----------------------+ + | pencil | 72.00011444091797 | + | scissors | 8.000175476074219 | + | pen | 23.001075744628906 | + *----------+----------------------*/ ``` -In the following example, sparse vectors are used to compute the -Euclidean distance: +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -```sql -SELECT EUCLIDEAN_DISTANCE( - [(1, 1.0), (2, 2.0)], - [(2, 4.0), (1, 3.0)]) AS results; +[dp-clamp-between]: #dp_clamp_between - /*----------* - | results | - +----------+ - | 2.828 | - *----------*/ -``` +### `ANON_QUANTILES` (DEPRECATED) + -The ordering of numeric values in a vector doesn't impact the results -produced by this function. For example these queries produce the same results -even though the numeric values in each vector is in a different order: +Warning: This function has been deprecated. Use +`QUANTILES` (differential privacy) instead. ```sql -SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]); - -SELECT EUCLIDEAN_DISTANCE([2.0, 1.0], [4.0, 3.0]); - -SELECT EUCLIDEAN_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results; - - /*----------* - | results | - +----------+ - | 2.828 | - *----------*/ +WITH ANONYMIZATION ... + ANON_QUANTILES(expression, number CLAMPED BETWEEN lower_bound AND upper_bound) ``` -Both dense vectors must have the same dimensions. If not, an error is produced. -In the following examples, the first vector has two dimensions and the second -vector has three: +**Description** -```sql --- ERROR -SELECT EUCLIDEAN_DISTANCE([9.0, 7.0], [8.0, 4.0, 5.0]) AS results; -``` +Returns an array of differentially private quantile boundaries for values in +`expression`. The first element in the return value is the +minimum quantile boundary and the last element is the maximum quantile boundary. +The returned results are aggregations across privacy unit columns. -If you use sparse vectors and you repeat a dimension, an error is -produced: +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: -```sql --- ERROR -SELECT EUCLIDEAN_DISTANCE( - [(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results; -``` ++ `expression`: The input expression. This can be most numeric input types, + such as `INT64`. `NULL`s are always ignored. ++ `number`: The number of quantiles to create. This must be an `INT64`. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. -[euclidean-distance]: https://en.wikipedia.org/wiki/Euclidean_distance +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. -### `FLOOR` +**Return type** -``` -FLOOR(X) -``` +`ARRAY`<`DOUBLE`> -**Description** +**Examples** -Returns the largest integral value that is not greater than X. +The following differentially private query gets the five quantile boundaries of +the four quartiles of the number of items requested. Smaller aggregations +might not be included. This query references a view called +[`view_on_professors`][dp-example-views]. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XFLOOR(X)
2.02.0
2.32.0
2.82.0
2.52.0
-2.3-3.0
-2.8-3.0
-2.5-3.0
00
+inf+inf
-inf-inf
NaNNaN
+```sql +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_QUANTILES(quantity, 4 CLAMPED BETWEEN 0 AND 100) quantiles_requested +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -**Return Data Type** +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+----------------------------------------------------------------------* + | item | quantiles_requested | + +----------+----------------------------------------------------------------------+ + | pen | [6.409375,20.647684733072918,41.40625,67.30848524305556,99.80078125] | + | pencil | [6.849259,44.010416666666664,62.64204,65.83806818181819,98.59375] | + *----------+----------------------------------------------------------------------*/ +``` - +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - - - - - - - - +[dp-clamp-between]: #dp_clamp_between -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
+### `ANON_STDDEV_POP` (DEPRECATED) + -### `GREATEST` +Warning: This function has been deprecated. Use +`STDDEV_POP` (differential privacy) instead. -``` -GREATEST(X1,...,XN) +```sql +WITH ANONYMIZATION ... + ANON_STDDEV_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** -Returns the greatest value among `X1,...,XN`. If any argument is `NULL`, returns -`NULL`. Otherwise, in the case of floating-point arguments, if any argument is -`NaN`, returns `NaN`. In all other cases, returns the value among `X1,...,XN` -that has the greatest value according to the ordering used by the `ORDER BY` -clause. The arguments `X1, ..., XN` must be coercible to a common supertype, and -the supertype must support ordering. - - - - - - - - - - - - - - -
X1,...,XNGREATEST(X1,...,XN)
3,5,15
- -This function supports specifying [collation][collation]. - -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about +Takes an expression and computes the population (biased) standard deviation of +the values in the expression. The final result is an aggregation across +privacy unit columns between `0` and `+Inf`. -**Return Data Types** +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: -Data type of the input values. ++ `expression`: The input expression. This can be most numeric input types, + such as `INT64`. `NULL`s are always ignored. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per individual entity values. -### `IEEE_DIVIDE` +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. -``` -IEEE_DIVIDE(X, Y) -``` +**Return type** -**Description** +`DOUBLE` -Divides X by Y; this function never fails. Returns -`DOUBLE` unless -both X and Y are `FLOAT`, in which case it returns -`FLOAT`. Unlike the division operator (/), -this function does not generate errors for division by zero or overflow.

+**Examples** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XYIEEE_DIVIDE(X, Y)
20.04.05.0
0.025.00.0
25.00.0+inf
-25.00.0-inf
0.00.0NaN
0.0NaNNaN
NaN0.0NaN
+inf+infNaN
-inf-infNaN
+The following differentially private query gets the +population (biased) standard deviation of items requested. Smaller aggregations +might not be included. This query references a view called +[`view_on_professors`][dp-example-views]. -### `IS_INF` +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_STDDEV_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_standard_deviation +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -``` -IS_INF(X) +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+------------------------* + | item | pop_standard_deviation | + +----------+------------------------+ + | pencil | 25.350871122442054 | + | scissors | 50 | + | pen | 2 | + *----------+------------------------*/ ``` -**Description** +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -Returns `TRUE` if the value is positive or negative infinity. +[dp-clamp-between]: #dp_clamp_between - - - - - - - - - - - - - - - - - - - - - -
XIS_INF(X)
+infTRUE
-infTRUE
25FALSE
+### `ANON_SUM` (DEPRECATED) + -### `IS_NAN` +Warning: This function has been deprecated. Use +`SUM` (differential privacy) instead. -``` -IS_NAN(X) +```sql +WITH ANONYMIZATION ... + ANON_SUM(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** -Returns `TRUE` if the value is a `NaN` value. - - - - - - - - - - - - - - - - - - -
XIS_NAN(X)
NaNTRUE
25FALSE
+Returns the sum of non-`NULL`, non-`NaN` values in the expression. The final +result is an aggregation across privacy unit columns. -### `LEAST` +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: -``` -LEAST(X1,...,XN) -``` ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per privacy unit column. -**Description** +**Return type** -Returns the least value among `X1,...,XN`. If any argument is `NULL`, returns -`NULL`. Otherwise, in the case of floating-point arguments, if any argument is -`NaN`, returns `NaN`. In all other cases, returns the value among `X1,...,XN` -that has the least value according to the ordering used by the `ORDER BY` -clause. The arguments `X1, ..., XN` must be coercible to a common supertype, and -the supertype must support ordering. +One of the following [supertypes][dp-supertype]: - - - - - - - - - - - - - -
X1,...,XNLEAST(X1,...,XN)
3,5,11
++ `INT64` ++ `UINT64` ++ `DOUBLE` -This function supports specifying [collation][collation]. +**Examples** -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about +The following differentially private query gets the sum of items requested. +Smaller aggregations might not be included. This query references a view called +[`view_on_professors`][dp-example-views]. -**Return Data Types** +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_SUM(quantity CLAMPED BETWEEN 0 AND 100) quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; -Data type of the input values. +-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------* + | item | quantity | + +----------+-----------+ + | pencil | 143 | + | pen | 59 | + *----------+-----------*/ +``` -### `LN` +```sql +-- Without noise, using the epsilon parameter. +-- (this un-noised version is for demonstration only) +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1) + item, + ANON_SUM(quantity) quantity +FROM {{USERNAME}}.view_on_professors +GROUP BY item; +-- These results will not change when you run the query. +/*----------+----------* + | item | quantity | + +----------+----------+ + | scissors | 8 | + | pencil | 144 | + | pen | 58 | + *----------+----------*/ ``` -LN(X) -``` - -**Description** -Computes the natural logarithm of X. Generates an error if X is less than or -equal to zero. +Note: You can learn more about when and when not to use +noise [here][dp-noise]. - - - - - - - - - - - - - - - - - - - - - -
XLN(X)
1.00.0
+inf+inf
X < 0Error
+[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views -**Return Data Type** +[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise - +[dp-supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes - - - - - - - - +[dp-clamp-between]: #dp_clamp_between -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
+### `ANON_VAR_POP` (DEPRECATED) + -### `LOG` +Warning: This function has been deprecated. Use +`VAR_POP` (differential privacy) instead. -``` -LOG(X [, Y]) +```sql +WITH ANONYMIZATION ... + ANON_VAR_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound]) ``` **Description** -If only X is present, `LOG` is a synonym of `LN`. If Y is also present, -`LOG` computes the logarithm of X to base Y. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XYLOG(X, Y)
100.010.02.0
-infAny valueNaN
Any value+infNaN
+inf0.0 < Y < 1.0-inf
+infY > 1.0+inf
X <= 0Any valueError
Any valueY <= 0Error
Any value1.0Error
+Takes an expression and computes the population (biased) variance of the values +in the expression. The final result is an aggregation across +privacy unit columns between `0` and `+Inf`. You can +[clamp the input values][dp-clamp-explicit] explicitly, otherwise input values +are clamped implicitly. Clamping is performed per individual entity values. -**Return Data Type** +This function must be used with the `ANONYMIZATION` clause and +can support these arguments: - ++ `expression`: The input expression. This can be any numeric input type, + such as `INT64`. `NULL`s are always ignored. ++ `CLAMPED BETWEEN` clause: + Perform [clamping][dp-clamp-between] per individual entity values. - - - - - - - - - - - - - - - +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. + If you need them, cast them as the +`DOUBLE` data type first. -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
INT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
INT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
+**Return type** -### `LOG10` +`DOUBLE` -``` -LOG10(X) -``` +**Examples** -**Description** +The following differentially private query gets the +population (biased) variance of items requested. Smaller aggregations might not +be included. This query references a view called +[`view_on_professors`][dp-example-views]. -Similar to `LOG`, but computes logarithm to base 10. +```sql +-- With noise, using the epsilon parameter. +SELECT + WITH ANONYMIZATION + OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1) + item, + ANON_VAR_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_variance +FROM {{USERNAME}}.view_on_professors +GROUP BY item; - - - - - - - - - - - - - - - - - - - - - - - - - -
XLOG10(X)
100.02.0
-infNaN
+inf+inf
X <= 0Error
+-- These results will change each time you run the query. +-- Smaller aggregations might be removed. +/*----------+-----------------* + | item | pop_variance | + +----------+-----------------+ + | pencil | 642 | + | pen | 2.6666666666665 | + | scissors | 2500 | + *----------+-----------------*/ +``` -**Return Data Type** +[dp-clamp-explicit]: #dp_explicit_clamping - +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views - - - - - - - - +[dp-clamp-between]: #dp_clamp_between -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
+### Clamp values in a differentially private aggregate function + -### `MOD` +In [differentially private queries][dp-syntax], +aggregation clamping is used to limit the contribution of outliers. You can +clamp explicitly or implicitly as follows: -``` -MOD(X, Y) -``` ++ [Clamp explicitly in the `DIFFERENTIAL_PRIVACY` clause][dp-clamped-named]. ++ [Clamp implicitly in the `DIFFERENTIAL_PRIVACY` clause][dp-clamped-named-imp]. -**Description** +#### Implicitly clamp values + -Modulo function: returns the remainder of the division of X by Y. Returned -value has the same sign as X. An error is generated if Y is 0. +If you don't include the contribution bounds named argument with the +`DIFFERENTIAL_PRIVACY` clause, clamping is implicit, which +means bounds are derived from the data itself in a differentially private way. - - - - - - - - - - - - - - - - - - - -
XYMOD(X, Y)
25121
250Error
+Implicit bounding works best when computed using large datasets. For more +information, see +[Implicit bounding limitations for small datasets][implicit-limits]. -**Return Data Type** +**Details** -The return data type is determined by the argument types with the following -table. - +In differentially private aggregate functions, explicit clamping is optional. +If you don't include this clause, clamping is implicit, +which means bounds are derived from the data itself in a differentially +private way. The process is somewhat random, so aggregations with identical +ranges can have different bounds. - - - - - - - - - - - - - +Implicit bounds are determined for each aggregation. So if some +aggregations have a wide range of values, and others have a narrow range of +values, implicit bounding can identify different bounds for different +aggregations as appropriate. Implicit bounds might be an advantage or a +disadvantage depending on your use case. Different bounds for different +aggregations can result in lower error. Different bounds also means that +different aggregations have different levels of uncertainty, which might not be +directly comparable. [Explicit bounds][dp-clamped-named], on the other hand, +apply uniformly to all aggregations and should be derived from public +information. -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERIC
INT32INT64INT64INT64ERRORNUMERICBIGNUMERIC
INT64INT64INT64INT64ERRORNUMERICBIGNUMERIC
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERIC
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERIC
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERIC
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERIC
+When clamping is implicit, part of the total epsilon is spent picking bounds. +This leaves less epsilon for aggregations, so these aggregations are noisier. -### `PI` +**Example** + +The following anonymized query clamps each aggregate contribution for each +differential privacy ID and within a derived range from the data itself. +As long as all or most values fall within this range, your results +will be accurate. This query references a view called +[`view_on_professors`][dp-example-views]. ```sql -PI() -``` +--Without noise (this un-noised version is for demonstration only) +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS ( + epsilon = 1e20, + delta = .01, + privacy_unit_column=id + ) + item, + AVG(quantity) average_quantity +FROM view_on_professors +GROUP BY item; -**Description** +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 72 | + | pen | 18.5 | + *----------+------------------*/ +``` -Returns the mathematical constant `Ï€` as a `DOUBLE` -value. +The following anonymized query clamps each aggregate contribution for each +differential privacy ID and within a derived range from the data itself. +As long as all or most values fall within this range, your results +will be accurate. This query references a view called +[`view_on_professors`][dp-example-views]. -**Return type** +```sql +--Without noise (this un-noised version is for demonstration only) +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS ( + epsilon = 1e20, + delta = .01, + max_groups_contributed = 1 + ) + item, + AVG(quantity) AS average_quantity +FROM view_on_professors +GROUP BY item; -`DOUBLE` +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 72 | + | pen | 18.5 | + *----------+------------------*/ +``` -**Example** +#### Explicitly clamp values + ```sql -SELECT PI() AS pi - -/*--------------------* - | pi | - +--------------------+ - | 3.1415926535897931 | - *--------------------*/ +contribution_bounds_per_group => (lower_bound,upper_bound) ``` -### `PI_BIGNUMERIC` - ```sql -PI_BIGNUMERIC() +contribution_bounds_per_row => (lower_bound,upper_bound) ``` -**Description** +Use the contribution bounds named argument to explicitly clamp +values per group or per row between a lower and upper bound in a +`DIFFERENTIAL_PRIVACY` clause. -Returns the mathematical constant `Ï€` as a `BIGNUMERIC` value. +Input values: -**Return type** ++ `contribution_bounds_per_row`: Contributions per privacy unit are clamped + on a per-row (per-record) basis. This means the following: + + Upper and lower bounds are applied to column values in individual + rows produced by the input subquery independently. + + The maximum possible contribution per privacy unit (and per grouping set) + is the product of the per-row contribution limit and `max_groups_contributed` + differential privacy parameter. ++ `contribution_bounds_per_group`: Contributions per privacy unit are clamped + on a unique set of entity-specified `GROUP BY` keys. The upper and lower + bounds are applied to values per group after the values are aggregated per + privacy unit. ++ `lower_bound`: Numeric literal that represents the smallest value to + include in an aggregation. ++ `upper_bound`: Numeric literal that represents the largest value to + include in an aggregation. -`BIGNUMERIC` +`NUMERIC` and `BIGNUMERIC` arguments are not allowed. -**Example** +**Details** -```sql -SELECT PI_BIGNUMERIC() AS pi +In differentially private aggregate functions, clamping explicitly clamps the +total contribution from each privacy unit column to within a specified +range. -/*-----------------------------------------* - | pi | - +-----------------------------------------+ - | 3.1415926535897932384626433832795028842 | - *-----------------------------------------*/ -``` +Explicit bounds are uniformly applied to all aggregations. So even if some +aggregations have a wide range of values, and others have a narrow range of +values, the same bounds are applied to all of them. On the other hand, when +[implicit bounds][dp-clamped-named-imp] are inferred from the data, the bounds +applied to each aggregation can be different. -### `PI_NUMERIC` +Explicit bounds should be chosen to reflect public information. +For example, bounding ages between 0 and 100 reflects public information +because the age of most people generally falls within this range. + +Important: The results of the query reveal the explicit bounds. Do not use +explicit bounds based on the entity data; explicit bounds should be based on +public information. + +**Examples** + +The following anonymized query clamps each aggregate contribution for each +differential privacy ID and within a specified range (`0` and `100`). +As long as all or most values fall within this range, your results +will be accurate. This query references a view called +[`view_on_professors`][dp-example-views]. ```sql -PI_NUMERIC() -``` +--Without noise (this un-noised version is for demonstration only) +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS ( + epsilon = 1e20, + delta = .01, + privacy_unit_column=id + ) + item, + AVG(quantity, contribution_bounds_per_group=>(0,100)) AS average_quantity +FROM view_on_professors +GROUP BY item; -**Description** +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 40 | + | pen | 18.5 | + *----------+------------------*/ +``` -Returns the mathematical constant `Ï€` as a `NUMERIC` value. +Notice what happens when most or all values fall outside of the clamped range. +To get accurate results, ensure that the difference between the upper and lower +bound is as small as possible, and that most inputs are between the upper and +lower bound. -**Return type** +```sql {.bad} +--Without noise (this un-noised version is for demonstration only) +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS ( + epsilon = 1e20, + delta = .01, + privacy_unit_column=id + ) + item, + AVG(quantity, contribution_bounds_per_group=>(50,100)) AS average_quantity +FROM view_on_professors +GROUP BY item; -`NUMERIC` +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 54 | + | pencil | 58 | + | pen | 51 | + *----------+------------------*/ +``` -**Example** +The following differentially private query clamps each aggregate contribution +for each privacy unit column and within a specified range (`0` and `100`). +As long as all or most values fall within this range, your results will be +accurate. This query references a view called +[`view_on_professors`][dp-example-views]. ```sql -SELECT PI_NUMERIC() AS pi +--Without noise (this un-noised version is for demonstration only) +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS ( + epsilon = 1e20, + delta = .01, + max_groups_contributed = 1 + ) + item, + AVG(quantity, contribution_bounds_per_group=>(0,100)) AS average_quantity +FROM view_on_professors +GROUP BY item; -/*-------------* - | pi | - +-------------+ - | 3.141592654 | - *-------------*/ +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 8 | + | pencil | 40 | + | pen | 18.5 | + *----------+------------------*/ ``` -### `POW` +Notice what happens when most or all values fall outside of the clamped range. +To get accurate results, ensure that the difference between the upper and lower +bound is as small as possible, and that most inputs are between the upper and +lower bound. -``` -POW(X, Y) +```sql {.bad} +--Without noise (this un-noised version is for demonstration only) +SELECT WITH DIFFERENTIAL_PRIVACY + OPTIONS ( + epsilon = 1e20, + delta = .01, + max_groups_contributed = 1 + ) + item, + AVG(quantity, contribution_bounds_per_group=>(50,100)) AS average_quantity +FROM view_on_professors +GROUP BY item; + +/*----------+------------------* + | item | average_quantity | + +----------+------------------+ + | scissors | 54 | + | pencil | 58 | + | pen | 51 | + *----------+------------------*/ ``` -**Description** +Note: For more information about when and when not to use +noise, see [Remove noise][dp-noise]. -Returns the value of X raised to the power of Y. If the result underflows and is -not representable, then the function returns a value of zero. +[dp-guide]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md + +[dp-syntax]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_clause + +[agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + +[dp-example-views]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#dp_example_views + +[dp-noise]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#eliminate_noise + +[implicit-limits]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#implicit_limits + +[dp-clamped-named]: #dp_clamped_named + +[dp-clamped-named-imp]: #dp_clamped_named_implicit + +## Geography functions + +ZetaSQL supports geography functions. +Geography functions operate on or generate ZetaSQL +`GEOGRAPHY` values. The signature of most geography +functions starts with `ST_`. ZetaSQL supports the following functions +that can be used to analyze geographical data, determine spatial relationships +between geographical features, and construct or manipulate +`GEOGRAPHY`s. + +All ZetaSQL geography functions return `NULL` if any input argument +is `NULL`. + +### Categories + +The geography functions are grouped into the following categories based on their +behavior: - - - + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + + - - - + + + + +
XYPOW(X, Y)CategoryFunctionsDescription
2.03.08.0
1.0Any value including NaN1.0
Any value including NaN01.0
-1.0+inf1.0
-1.0-inf1.0
ABS(X) < 1-inf+inf
ABS(X) > 1-inf0.0
ABS(X) < 1+inf0.0Constructors + ST_GEOGPOINT
+ ST_MAKELINE
+ ST_MAKEPOLYGON
+ ST_MAKEPOLYGONORIENTED +
+ Functions that build new + geography values from coordinates + or existing geographies. +
ABS(X) > 1+inf+infParsers + ST_GEOGFROM
+ ST_GEOGFROMGEOJSON
+ ST_GEOGFROMKML
+ ST_GEOGFROMTEXT
+ ST_GEOGFROMWKB
+ ST_GEOGPOINTFROMGEOHASH
+
+ Functions that create geographies + from an external format such as + WKT and + GeoJSON. +
-infY < 00.0Formatters + ST_ASBINARY
+ ST_ASGEOJSON
+ ST_ASKML
+ ST_ASTEXT
+ ST_GEOHASH +
+ Functions that export geographies + to an external format such as WKT. +
-infY > 0-inf if Y is an odd integer, +inf otherwiseTransformations + ST_ACCUM (Aggregate)
+ ST_BOUNDARY
+ ST_BUFFER
+ ST_BUFFERWITHTOLERANCE
+ ST_CENTROID
+ ST_CENTROID_AGG (Aggregate)
+ ST_CLOSESTPOINT
+ ST_CONVEXHULL
+ ST_DIFFERENCE
+ ST_EXTERIORRING
+ ST_INTERIORRINGS
+ ST_INTERSECTION
+ ST_LINEINTERPOLATEPOINT
+ ST_LINESUBSTRING
+ ST_SIMPLIFY
+ ST_SNAPTOGRID
+ ST_UNION
+ ST_UNION_AGG (Aggregate)
+
+ Functions that generate a new + geography based on input. +
+infY < 00Accessors + ST_DIMENSION
+ ST_DUMP
+ ST_DUMPPOINTS
+ ST_ENDPOINT
+ ST_GEOMETRYTYPE
+ ST_ISCLOSED
+ ST_ISCOLLECTION
+ ST_ISEMPTY
+ ST_ISRING
+ ST_NPOINTS
+ ST_NUMGEOMETRIES
+ ST_NUMPOINTS
+ ST_POINTN
+ ST_STARTPOINT
+ ST_X
+ ST_Y
+
+ Functions that provide access to + properties of a geography without + side-effects. +
+infY > 0+infPredicates + ST_CONTAINS
+ ST_COVEREDBY
+ ST_COVERS
+ ST_DISJOINT
+ ST_DWITHIN
+ ST_EQUALS
+ ST_INTERSECTS
+ ST_INTERSECTSBOX
+ ST_TOUCHES
+ ST_WITHIN
+
+ Functions that return TRUE or + FALSE for some spatial + relationship between two + geographies or some property of + a geography. These functions + are commonly used in filter + clauses. +
Finite value < 0Non-integerErrorMeasures + ST_ANGLE
+ ST_AREA
+ ST_AZIMUTH
+ ST_BOUNDINGBOX
+ ST_DISTANCE
+ ST_EXTENT (Aggregate)
+ ST_HAUSDORFFDISTANCE
+ ST_LINELOCATEPOINT
+ ST_LENGTH
+ ST_MAXDISTANCE
+ ST_PERIMETER
+
+ Functions that compute measurements + of one or more geographies. +
0Finite value < 0ErrorClustering + ST_CLUSTERDBSCAN + + Functions that perform clustering on geographies. +
-**Return Data Type** +### Function list -The return data type is determined by the argument types with the following -table. + + + + + + + + -
NameSummary
+ + + + - - + + - - - - - - - - - - - -
ST_ACCUM + + + Aggregates GEOGRAPHY values into an array of + GEOGRAPHY elements. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEST_ANGLE + + + Takes three point GEOGRAPHY values, which represent two + intersecting lines, and returns the angle between these lines. +
INT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
INT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
+ + ST_AREA -### `POWER` + + + Gets the area covered by the polygons in a GEOGRAPHY value. + + -``` -POWER(X, Y) -``` + + ST_ASBINARY -**Description** + + + Converts a GEOGRAPHY value to a + BYTES WKB geography value. + + -Synonym of [`POW(X, Y)`][pow]. + + ST_ASGEOJSON -[pow]: #pow + + + Converts a GEOGRAPHY value to a STRING + GeoJSON geography value. + + -### `RAND` + + ST_ASKML -``` -RAND() -``` + + + Converts a GEOGRAPHY value to a STRING + KML geometry value. + + -**Description** + + ST_ASTEXT -Generates a pseudo-random value of type `DOUBLE` in -the range of [0, 1), inclusive of 0 and exclusive of 1. + + + Converts a GEOGRAPHY value to a + STRING WKT geography value. + + -### `ROUND` + + ST_AZIMUTH -``` -ROUND(X [, N]) -``` + + + Gets the azimuth of a line segment formed by two + point GEOGRAPHY values. + + -**Description** + + ST_BOUNDARY -If only X is present, rounds X to the nearest integer. If N is present, -rounds X to N decimal places after the decimal point. If N is negative, -rounds off digits to the left of the decimal point. Rounds halfway cases -away from zero. Generates an error if overflow occurs. + + + Gets the union of component boundaries in a + GEOGRAPHY value. + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ExpressionReturn Value
ROUND(2.0)2.0
ROUND(2.3)2.0
ROUND(2.8)3.0
ROUND(2.5)3.0
ROUND(-2.3)-2.0
ROUND(-2.8)-3.0
ROUND(-2.5)-3.0
ROUND(0)0
ROUND(+inf)+inf
ROUND(-inf)-inf
ROUND(NaN)NaN
ROUND(123.7, -1)120.0
ROUND(1.235, 2)1.24
+ + ST_BOUNDINGBOX -**Return Data Type** + + + Gets the bounding box for a GEOGRAPHY value. + + - + + + + - - + + - - - - -
ST_BUFFER + + + Gets the buffer around a GEOGRAPHY value, using a specific + number of segments. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEST_BUFFERWITHTOLERANCE + + + Gets the buffer around a GEOGRAPHY value, using tolerance. +
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
+ + ST_CENTROID -### `SAFE_ADD` + + + Gets the centroid of a GEOGRAPHY value. + + -``` -SAFE_ADD(X, Y) -``` + + ST_CLOSESTPOINT -**Description** + + + Gets the point on a GEOGRAPHY value which is closest to any + point in a second GEOGRAPHY value. + + -Equivalent to the addition operator (`+`), but returns -`NULL` if overflow occurs. + + ST_CLUSTERDBSCAN - - - - - - - - - - - - - - - -
XYSAFE_ADD(X, Y)
549
+ + + Performs DBSCAN clustering on a group of GEOGRAPHY values and + produces a 0-based cluster number for this row. + + -**Return Data Type** + + ST_CONTAINS - + + + - - + + - - - - - - - - - - - -
+ Checks if one GEOGRAPHY value contains another + GEOGRAPHY value. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEST_CONVEXHULL + + + Returns the convex hull for a GEOGRAPHY value. +
INT32INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
INT64INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
+ + ST_COVEREDBY -### `SAFE_DIVIDE` + + + Checks if all points of a GEOGRAPHY value are on the boundary + or interior of another GEOGRAPHY value. + + -``` -SAFE_DIVIDE(X, Y) -``` + + ST_COVERS -**Description** + + + Checks if all points of a GEOGRAPHY value are on the boundary + or interior of another GEOGRAPHY value. + + -Equivalent to the division operator (`X / Y`), but returns -`NULL` if an error occurs, such as a division by zero error. + + ST_DIFFERENCE - - - - - - - - - - - - - - - - - - - - - - - - - -
XYSAFE_DIVIDE(X, Y)
2045
0200
200NULL
- -**Return Data Type** - - + + + - - + + - - - - - - - - - - - -
+ Gets the point set difference between two GEOGRAPHY values. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEST_DIMENSION + + + Gets the dimension of the highest-dimensional element in a + GEOGRAPHY value. +
INT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
INT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
+ + ST_DISJOINT -### `SAFE_MULTIPLY` + + + Checks if two GEOGRAPHY values are disjoint (do not intersect). + + -``` -SAFE_MULTIPLY(X, Y) -``` + + ST_DISTANCE -**Description** + + + Gets the shortest distance in meters between two GEOGRAPHY + values. + + -Equivalent to the multiplication operator (`*`), but returns -`NULL` if overflow occurs. + + ST_DUMP - - - - - - - - - - - - - - - -
XYSAFE_MULTIPLY(X, Y)
20480
+ + + Returns an array of simple GEOGRAPHY components in a + GEOGRAPHY value. + + -**Return Data Type** + + ST_DUMPPOINTS - + + + - - + + - - - - - - - - - - - -
+ Produces an array of GEOGRAPHY points with all points, line + vertices, and polygon vertices in a GEOGRAPHY value. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEST_DWITHIN + + + Checks if any points in two GEOGRAPHY values are within a given + distance. +
INT32INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
INT64INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
+ + ST_ENDPOINT -### `SAFE_NEGATE` + + + Gets the last point of a linestring GEOGRAPHY value. + + -``` -SAFE_NEGATE(X) -``` + + ST_EQUALS -**Description** + + + Checks if two GEOGRAPHY values represent the same + GEOGRAPHY value. + + -Equivalent to the unary minus operator (`-`), but returns -`NULL` if overflow occurs. + + ST_EXTENT - - - - - - - - - - - - - - - - - - - - - -
XSAFE_NEGATE(X)
+1-1
-1+1
00
+ + + Gets the bounding box for a group of GEOGRAPHY values. + + -**Return Data Type** + + ST_EXTERIORRING - + + + - - + + - - - - -
+ Returns a linestring GEOGRAPHY value that corresponds to the + outermost ring of a polygon GEOGRAPHY value. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEST_GEOGFROM + + + Converts a STRING or BYTES value + into a GEOGRAPHY value. +
OUTPUTINT32INT64ERRORERRORNUMERICBIGNUMERICFLOATDOUBLE
+ + ST_GEOGFROMGEOJSON -### `SAFE_SUBTRACT` + + + Converts a STRING GeoJSON geometry value into a + GEOGRAPHY value. + + -``` -SAFE_SUBTRACT(X, Y) -``` + + ST_GEOGFROMKML -**Description** + + + Converts a STRING KML geometry value into a + GEOGRAPHY value. + + -Returns the result of Y subtracted from X. -Equivalent to the subtraction operator (`-`), but returns -`NULL` if overflow occurs. + + ST_GEOGFROMTEXT - - - - - - - - - - - - - - - -
XYSAFE_SUBTRACT(X, Y)
541
+ + + Converts a STRING WKT geometry value into a + GEOGRAPHY value. + + -**Return Data Type** + + ST_GEOGFROMWKB - + + + - - + + - - - - - - - - - - - -
+ Converts a BYTES or hexadecimal-text STRING WKT + geometry value into a GEOGRAPHY value. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEST_GEOGPOINT + + + Creates a point GEOGRAPHY value for a given longitude and + latitude. +
INT32INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
INT64INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
UINT32INT64INT64INT64INT64NUMERICBIGNUMERICDOUBLEDOUBLE
UINT64ERRORERRORINT64INT64NUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
+ + ST_GEOGPOINTFROMGEOHASH -### `SEC` + + + Gets a point GEOGRAPHY value that is in the middle of a + bounding box defined in a STRING GeoHash value. + + -``` -SEC(X) -``` + + ST_GEOHASH -**Description** + + + Converts a point GEOGRAPHY value to a STRING + GeoHash value. + + -Computes the secant for the angle of `X`, where `X` is specified in radians. -`X` can be any data type -that [coerces to `DOUBLE`][conversion-rules]. + + ST_GEOMETRYTYPE - - - - - - - - - - - - - - - - - - - - - - - - - -
XSEC(X)
+infNaN
-infNaN
NaNNaN
NULLNULL
+ + + Gets the Open Geospatial Consortium (OGC) geometry type for a + GEOGRAPHY value. + + -**Return Data Type** + + ST_HAUSDORFFDISTANCE -`DOUBLE` + + Gets the discrete Hausdorff distance between two geometries. + -**Example** + + ST_INTERIORRINGS -```sql -SELECT SEC(100) AS a, SEC(-1) AS b; + + + Gets the interior rings of a polygon GEOGRAPHY value. + + -/*----------------+---------------* - | a | b | - +----------------+---------------+ - | 1.159663822905 | 1.85081571768 | - *----------------+---------------*/ -``` + + ST_INTERSECTION -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules + + + Gets the point set intersection of two GEOGRAPHY values. + + -### `SECH` + + ST_INTERSECTS -``` -SECH(X) -``` + + + Checks if at least one point appears in two GEOGRAPHY + values. + + -**Description** + + ST_INTERSECTSBOX -Computes the hyperbolic secant for the angle of `X`, where `X` is specified -in radians. `X` can be any data type -that [coerces to `DOUBLE`][conversion-rules]. -Never produces an error. + + + Checks if a GEOGRAPHY value intersects a rectangle. + + - - - - - - - - - - - - - - - - - - - - - - - - - -
XSECH(X)
+inf0
-inf0
NaNNaN
NULLNULL
+ + ST_ISCLOSED -**Return Data Type** + + + Checks if all components in a GEOGRAPHY value are closed. + + -`DOUBLE` + + ST_ISCOLLECTION -**Example** + + + Checks if the total number of points, linestrings, and polygons is + greater than one in a GEOGRAPHY value. + + -```sql -SELECT SECH(0.5) AS a, SECH(-2) AS b, SECH(100) AS c; + + ST_ISEMPTY -/*----------------+----------------+---------------------* - | a | b | c | - +----------------+----------------+---------------------+ - | 0.88681888397 | 0.265802228834 | 7.4401519520417E-44 | - *----------------+----------------+---------------------*/ -``` + + + Checks if a GEOGRAPHY value is empty. + + -[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules + + ST_ISRING -### `SIGN` + + + Checks if a GEOGRAPHY value is a closed, simple + linestring. + + -``` -SIGN(X) -``` + + ST_LENGTH -**Description** + + + Gets the total length of lines in a GEOGRAPHY value. + + -Returns `-1`, `0`, or `+1` for negative, zero and positive arguments -respectively. For floating point arguments, this function does not distinguish -between positive and negative zero. + + ST_LINEINTERPOLATEPOINT - - - - - - - - - - - - - - - - - - - - - - - - - -
XSIGN(X)
25+1
00
-25-1
NaNNaN
+ + + Gets a point at a specific fraction in a linestring GEOGRAPHY + value. + + -**Return Data Type** + + ST_LINELOCATEPOINT - - - - - + + - - - - - -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE + Gets a section of a linestring GEOGRAPHY value between the + start point and a point GEOGRAPHY value. +
OUTPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
- -### `SIN` - -``` -SIN(X) -``` - -**Description** - -Computes the sine of X where X is specified in radians. Never fails. - - - - - - - - - - - - - - - - - - - - - - -
XSIN(X)
+infNaN
-infNaN
NaNNaN
- -### `SINH` - -``` -SINH(X) -``` - -**Description** - -Computes the hyperbolic sine of X where X is specified in radians. Generates -an error if overflow occurs. - - - - - - - - - - - - - - - - - - - - - - -
XSINH(X)
+inf+inf
-inf-inf
NaNNaN
- -### `SQRT` - -``` -SQRT(X) -``` - -**Description** - -Computes the square root of X. Generates an error if X is less than 0. - - - - - - - - - - - - - - - - - - - - - - -
XSQRT(X)
25.05.0
+inf+inf
X < 0Error
- -**Return Data Type** - - - - - - - - - - -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
- -### `TAN` - -``` -TAN(X) -``` + ST_LINESUBSTRING -**Description** + + + Gets a segment of a single linestring at a specific starting and + ending fraction. + + -Computes the tangent of X where X is specified in radians. Generates an error if -overflow occurs. + + ST_MAKELINE - - - - - - - - - - - - - - - - - - - - - -
XTAN(X)
+infNaN
-infNaN
NaNNaN
+ + + Creates a linestring GEOGRAPHY value by concatenating the point + and linestring vertices of GEOGRAPHY values. + + -### `TANH` + + ST_MAKEPOLYGON -``` -TANH(X) -``` + + + Constructs a polygon GEOGRAPHY value by combining + a polygon shell with polygon holes. + + -**Description** + + ST_MAKEPOLYGONORIENTED -Computes the hyperbolic tangent of X where X is specified in radians. Does not -fail. + + + Constructs a polygon GEOGRAPHY value, using an array of + linestring GEOGRAPHY values. The vertex ordering of each + linestring determines the orientation of each polygon ring. + + - - - - - - - - - - - - - - - - - - - - - -
XTANH(X)
+inf1.0
-inf-1.0
NaNNaN
+ + ST_MAXDISTANCE -### `TRUNC` + + + Gets the longest distance between two non-empty + GEOGRAPHY values. + + -``` -TRUNC(X [, N]) -``` + + ST_NPOINTS -**Description** + + + An alias of ST_NUMPOINTS. + + -If only X is present, `TRUNC` rounds X to the nearest integer whose absolute -value is not greater than the absolute value of X. If N is also present, `TRUNC` -behaves like `ROUND(X, N)`, but always rounds towards zero and never overflows. + + ST_NUMGEOMETRIES - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
XTRUNC(X)
2.02.0
2.32.0
2.82.0
2.52.0
-2.3-2.0
-2.8-2.0
-2.5-2.0
00
+inf+inf
-inf-inf
NaNNaN
+ + + Gets the number of geometries in a GEOGRAPHY value. + + -**Return Data Type** + + ST_NUMPOINTS - + + + - - - - - - - +
+ Gets the number of vertices in the a GEOGRAPHY value. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
ST_PERIMETER -
+ + + Gets the length of the boundary of the polygons in a + GEOGRAPHY value. + + -## Navigation functions + + ST_POINTN -ZetaSQL supports navigation functions. -Navigation functions are a subset window functions. To create a -window function call and learn about the syntax for window functions, -see [Window function_calls][window-function-calls]. + + + Gets the point at a specific index of a linestring GEOGRAPHY + value. + + -Navigation functions generally compute some -`value_expression` over a different row in the window frame from the -current row. The `OVER` clause syntax varies across navigation functions. + + ST_SIMPLIFY -For all navigation functions, the result data type is the same type as -`value_expression`. + + + Converts a GEOGRAPHY value into a simplified + GEOGRAPHY value, using tolerance. + + -### Function list + + ST_SNAPTOGRID - - - - - - - - + + + - - - - - - -
NameSummary
+ Produces a GEOGRAPHY value, where each vertex has + been snapped to a longitude/latitude grid. +
FIRST_VALUE + ST_STARTPOINT - Gets a value for the first row in the current window frame. + Gets the first point of a linestring GEOGRAPHY value.
LAG + ST_TOUCHES - Gets a value for a preceding row. + Checks if two GEOGRAPHY values intersect and their interiors + have no elements in common.
LAST_VALUE + ST_UNION - Gets a value for the last row in the current window frame. + Gets the point set union of multiple GEOGRAPHY values.
LEAD + ST_UNION_AGG - Gets a value for a subsequent row. + Aggregates over GEOGRAPHY values and gets their + point set union.
NTH_VALUE + ST_WITHIN - Gets a value for the Nth row of the current window frame. + Checks if one GEOGRAPHY value contains another + GEOGRAPHY value.
PERCENTILE_CONT + ST_X - Computes the specified percentile for a value, using - linear interpolation. + Gets the longitude from a point GEOGRAPHY value.
PERCENTILE_DISC + ST_Y - Computes the specified percentile for a discrete value. + Gets the latitude from a point GEOGRAPHY value.
-### `FIRST_VALUE` +### `ST_ACCUM` ```sql -FIRST_VALUE (value_expression [{RESPECT | IGNORE} NULLS]) -OVER over_clause +ST_ACCUM(geography) +``` -over_clause: - { named_window | ( [ window_specification ] ) } +**Description** -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] - [ window_frame_clause ] +Takes a `GEOGRAPHY` and returns an array of +`GEOGRAPHY` elements. +This function is identical to [ARRAY_AGG][geography-link-array-agg], +but only applies to `GEOGRAPHY` objects. + +**Return type** + +`ARRAY` +[geography-link-array-agg]: #array_agg + +### `ST_ANGLE` + +```sql +ST_ANGLE(point_geography_1, point_geography_2, point_geography_3) ``` **Description** -Returns the value of the `value_expression` for the first row in the current -window frame. +Takes three point `GEOGRAPHY` values, which represent two intersecting lines. +Returns the angle between these lines. Point 2 and point 1 represent the first +line and point 2 and point 3 represent the second line. The angle between +these lines is in radians, in the range `[0, 2pi)`. The angle is measured +clockwise from the first line to the second line. -This function includes `NULL` values in the calculation unless `IGNORE NULLS` is -present. If `IGNORE NULLS` is present, the function excludes `NULL` values from -the calculation. +`ST_ANGLE` has the following edge cases: -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. ++ If points 2 and 3 are the same, returns `NULL`. ++ If points 2 and 1 are the same, returns `NULL`. ++ If points 2 and 3 are exactly antipodal, returns `NULL`. ++ If points 2 and 1 are exactly antipodal, returns `NULL`. ++ If any of the input geographies are not single points or are the empty + geography, then throws an error. - +**Return type** -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +`DOUBLE` - +**Example** -**Supported Argument Types** +```sql +WITH geos AS ( + SELECT 1 id, ST_GEOGPOINT(1, 0) geo1, ST_GEOGPOINT(0, 0) geo2, ST_GEOGPOINT(0, 1) geo3 UNION ALL + SELECT 2 id, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0), ST_GEOGPOINT(0, 1) UNION ALL + SELECT 3 id, ST_GEOGPOINT(1, 0), ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0) UNION ALL + SELECT 4 id, ST_GEOGPOINT(1, 0) geo1, ST_GEOGPOINT(0, 0) geo2, ST_GEOGPOINT(0, 0) geo3 UNION ALL + SELECT 5 id, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(-30, 0), ST_GEOGPOINT(150, 0) UNION ALL + SELECT 6 id, ST_GEOGPOINT(0, 0), NULL, NULL UNION ALL + SELECT 7 id, NULL, ST_GEOGPOINT(0, 0), NULL UNION ALL + SELECT 8 id, NULL, NULL, ST_GEOGPOINT(0, 0)) +SELECT ST_ANGLE(geo1,geo2,geo3) AS angle FROM geos ORDER BY id; -`value_expression` can be any data type that an expression can return. +/*---------------------* + | angle | + +---------------------+ + | 4.71238898038469 | + | 0.78547432161873854 | + | 0 | + | NULL | + | NULL | + | NULL | + | NULL | + | NULL | + *---------------------*/ +``` -**Return Data Type** +### `ST_AREA` -Same type as `value_expression`. +```sql +ST_AREA(geography_expression[, use_spheroid]) +``` -**Examples** +**Description** -The following example computes the fastest time for each division. +Returns the area in square meters covered by the polygons in the input +`GEOGRAPHY`. + +If `geography_expression` is a point or a line, returns zero. If +`geography_expression` is a collection, returns the area of the polygons in the +collection; if the collection does not contain polygons, returns zero. + +The optional `use_spheroid` parameter determines how this function measures +distance. If `use_spheroid` is `FALSE`, the function measures distance on the +surface of a perfect sphere. + +The `use_spheroid` parameter currently only supports +the value `FALSE`. The default value of `use_spheroid` is `FALSE`. + +**Return type** + +`DOUBLE` + +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System + +### `ST_ASBINARY` ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - FORMAT_TIMESTAMP('%X', finish_time) AS finish_time, - division, - FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time, - TIMESTAMP_DIFF(finish_time, fastest_time, SECOND) AS delta_in_seconds -FROM ( - SELECT name, - finish_time, - division, - FIRST_VALUE(finish_time) - OVER (PARTITION BY division ORDER BY finish_time ASC - ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS fastest_time - FROM finishers); - -/*-----------------+-------------+----------+--------------+------------------* - | name | finish_time | division | fastest_time | delta_in_seconds | - +-----------------+-------------+----------+--------------+------------------+ - | Carly Forte | 03:08:58 | F25-29 | 03:08:58 | 0 | - | Sophia Liu | 02:51:45 | F30-34 | 02:51:45 | 0 | - | Nikki Leith | 02:59:01 | F30-34 | 02:51:45 | 436 | - | Jen Edwards | 03:06:36 | F30-34 | 02:51:45 | 891 | - | Meghan Lederer | 03:07:41 | F30-34 | 02:51:45 | 956 | - | Lauren Reasoner | 03:10:14 | F30-34 | 02:51:45 | 1109 | - | Lisa Stelzner | 02:54:11 | F35-39 | 02:54:11 | 0 | - | Lauren Matthews | 03:01:17 | F35-39 | 02:54:11 | 426 | - | Desiree Berry | 03:05:42 | F35-39 | 02:54:11 | 691 | - | Suzy Slane | 03:06:24 | F35-39 | 02:54:11 | 733 | - *-----------------+-------------+----------+--------------+------------------*/ +ST_ASBINARY(geography_expression) ``` -### `LAG` +**Description** -```sql -LAG (value_expression[, offset [, default_expression]]) -OVER over_clause +Returns the [WKB][wkb-link] representation of an input +`GEOGRAPHY`. -over_clause: - { named_window | ( [ window_specification ] ) } +See [`ST_GEOGFROMWKB`][st-geogfromwkb] to construct a +`GEOGRAPHY` from WKB. -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] +**Return type** -``` +`BYTES` -**Description** +[wkb-link]: https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary -Returns the value of the `value_expression` on a preceding row. Changing the -`offset` value changes which preceding row is returned; the default value is -`1`, indicating the previous row in the window frame. An error occurs if -`offset` is NULL or a negative value. +[st-geogfromwkb]: #st_geogfromwkb -The optional `default_expression` is used if there isn't a row in the window -frame at the specified offset. This expression must be a constant expression and -its type must be implicitly coercible to the type of `value_expression`. If left -unspecified, `default_expression` defaults to NULL. +### `ST_ASGEOJSON` -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +```sql +ST_ASGEOJSON(geography_expression) +``` - +**Description** -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +Returns the [RFC 7946][GeoJSON-spec-link] compliant [GeoJSON][geojson-link] +representation of the input `GEOGRAPHY`. - +A ZetaSQL `GEOGRAPHY` has spherical +geodesic edges, whereas a GeoJSON `Geometry` object explicitly has planar edges. +To convert between these two types of edges, ZetaSQL adds additional +points to the line where necessary so that the resulting sequence of edges +remains within 10 meters of the original edge. -**Supported Argument Types** +See [`ST_GEOGFROMGEOJSON`][st-geogfromgeojson] to construct a +`GEOGRAPHY` from GeoJSON. -+ `value_expression` can be any data type that can be returned from an - expression. -+ `offset` must be a non-negative integer literal or parameter. -+ `default_expression` must be compatible with the value expression type. +**Return type** -**Return Data Type** +`STRING` -Same type as `value_expression`. +[geojson-spec-link]: https://tools.ietf.org/html/rfc7946 -**Examples** +[geojson-link]: https://en.wikipedia.org/wiki/GeoJSON -The following example illustrates a basic use of the `LAG` function. +[st-geogfromgeojson]: #st_geogfromgeojson -```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - finish_time, - division, - LAG(name) - OVER (PARTITION BY division ORDER BY finish_time ASC) AS preceding_runner -FROM finishers; +### `ST_ASKML` -/*-----------------+-------------+----------+------------------* - | name | finish_time | division | preceding_runner | - +-----------------+-------------+----------+------------------+ - | Carly Forte | 03:08:58 | F25-29 | NULL | - | Sophia Liu | 02:51:45 | F30-34 | NULL | - | Nikki Leith | 02:59:01 | F30-34 | Sophia Liu | - | Jen Edwards | 03:06:36 | F30-34 | Nikki Leith | - | Meghan Lederer | 03:07:41 | F30-34 | Jen Edwards | - | Lauren Reasoner | 03:10:14 | F30-34 | Meghan Lederer | - | Lisa Stelzner | 02:54:11 | F35-39 | NULL | - | Lauren Matthews | 03:01:17 | F35-39 | Lisa Stelzner | - | Desiree Berry | 03:05:42 | F35-39 | Lauren Matthews | - | Suzy Slane | 03:06:24 | F35-39 | Desiree Berry | - *-----------------+-------------+----------+------------------*/ +```sql +ST_ASKML(geography) ``` -This next example uses the optional `offset` parameter. +**Description** -```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - finish_time, - division, - LAG(name, 2) - OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_ahead -FROM finishers; +Takes a `GEOGRAPHY` and returns a `STRING` [KML geometry][kml-geometry-link]. +Coordinates are formatted with as few digits as possible without loss +of precision. -/*-----------------+-------------+----------+-------------------* - | name | finish_time | division | two_runners_ahead | - +-----------------+-------------+----------+-------------------+ - | Carly Forte | 03:08:58 | F25-29 | NULL | - | Sophia Liu | 02:51:45 | F30-34 | NULL | - | Nikki Leith | 02:59:01 | F30-34 | NULL | - | Jen Edwards | 03:06:36 | F30-34 | Sophia Liu | - | Meghan Lederer | 03:07:41 | F30-34 | Nikki Leith | - | Lauren Reasoner | 03:10:14 | F30-34 | Jen Edwards | - | Lisa Stelzner | 02:54:11 | F35-39 | NULL | - | Lauren Matthews | 03:01:17 | F35-39 | NULL | - | Desiree Berry | 03:05:42 | F35-39 | Lisa Stelzner | - | Suzy Slane | 03:06:24 | F35-39 | Lauren Matthews | - *-----------------+-------------+----------+-------------------*/ -``` +**Return type** -The following example replaces NULL values with a default value. +`STRING` -```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - finish_time, - division, - LAG(name, 2, 'Nobody') - OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_ahead -FROM finishers; +[kml-geometry-link]: https://developers.google.com/kml/documentation/kmlreference#geometry -/*-----------------+-------------+----------+-------------------* - | name | finish_time | division | two_runners_ahead | - +-----------------+-------------+----------+-------------------+ - | Carly Forte | 03:08:58 | F25-29 | Nobody | - | Sophia Liu | 02:51:45 | F30-34 | Nobody | - | Nikki Leith | 02:59:01 | F30-34 | Nobody | - | Jen Edwards | 03:06:36 | F30-34 | Sophia Liu | - | Meghan Lederer | 03:07:41 | F30-34 | Nikki Leith | - | Lauren Reasoner | 03:10:14 | F30-34 | Jen Edwards | - | Lisa Stelzner | 02:54:11 | F35-39 | Nobody | - | Lauren Matthews | 03:01:17 | F35-39 | Nobody | - | Desiree Berry | 03:05:42 | F35-39 | Lisa Stelzner | - | Suzy Slane | 03:06:24 | F35-39 | Lauren Matthews | - *-----------------+-------------+----------+-------------------*/ +### `ST_ASTEXT` + +```sql +ST_ASTEXT(geography_expression) ``` -### `LAST_VALUE` +**Description** -```sql -LAST_VALUE (value_expression [{RESPECT | IGNORE} NULLS]) -OVER over_clause +Returns the [WKT][wkt-link] representation of an input +`GEOGRAPHY`. -over_clause: - { named_window | ( [ window_specification ] ) } +See [`ST_GEOGFROMTEXT`][st-geogfromtext] to construct a +`GEOGRAPHY` from WKT. -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] - [ window_frame_clause ] +**Return type** -``` +`STRING` -**Description** +[wkt-link]: https://en.wikipedia.org/wiki/Well-known_text -Returns the value of the `value_expression` for the last row in the current -window frame. +[st-geogfromtext]: #st_geogfromtext -This function includes `NULL` values in the calculation unless `IGNORE NULLS` is -present. If `IGNORE NULLS` is present, the function excludes `NULL` values from -the calculation. +### `ST_AZIMUTH` -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +```sql +ST_AZIMUTH(point_geography_1, point_geography_2) +``` - +**Description** -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +Takes two point `GEOGRAPHY` values, and returns the azimuth of the line segment +formed by points 1 and 2. The azimuth is the angle in radians measured between +the line from point 1 facing true North to the line segment from point 1 to +point 2. - +The positive angle is measured clockwise on the surface of a sphere. For +example, the azimuth for a line segment: -**Supported Argument Types** ++ Pointing North is `0` ++ Pointing East is `PI/2` ++ Pointing South is `PI` ++ Pointing West is `3PI/2` -`value_expression` can be any data type that an expression can return. +`ST_AZIMUTH` has the following edge cases: -**Return Data Type** ++ If the two input points are the same, returns `NULL`. ++ If the two input points are exactly antipodal, returns `NULL`. ++ If either of the input geographies are not single points or are the empty + geography, throws an error. -Same type as `value_expression`. +**Return type** -**Examples** +`DOUBLE` -The following example computes the slowest time for each division. +**Example** ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - FORMAT_TIMESTAMP('%X', finish_time) AS finish_time, - division, - FORMAT_TIMESTAMP('%X', slowest_time) AS slowest_time, - TIMESTAMP_DIFF(slowest_time, finish_time, SECOND) AS delta_in_seconds -FROM ( - SELECT name, - finish_time, - division, - LAST_VALUE(finish_time) - OVER (PARTITION BY division ORDER BY finish_time ASC - ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS slowest_time - FROM finishers); +WITH geos AS ( + SELECT 1 id, ST_GEOGPOINT(1, 0) AS geo1, ST_GEOGPOINT(0, 0) AS geo2 UNION ALL + SELECT 2, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0) UNION ALL + SELECT 3, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(0, 1) UNION ALL + -- identical + SELECT 4, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(0, 0) UNION ALL + -- antipode + SELECT 5, ST_GEOGPOINT(-30, 0), ST_GEOGPOINT(150, 0) UNION ALL + -- nulls + SELECT 6, ST_GEOGPOINT(0, 0), NULL UNION ALL + SELECT 7, NULL, ST_GEOGPOINT(0, 0)) +SELECT ST_AZIMUTH(geo1, geo2) AS azimuth FROM geos ORDER BY id; -/*-----------------+-------------+----------+--------------+------------------* - | name | finish_time | division | slowest_time | delta_in_seconds | - +-----------------+-------------+----------+--------------+------------------+ - | Carly Forte | 03:08:58 | F25-29 | 03:08:58 | 0 | - | Sophia Liu | 02:51:45 | F30-34 | 03:10:14 | 1109 | - | Nikki Leith | 02:59:01 | F30-34 | 03:10:14 | 673 | - | Jen Edwards | 03:06:36 | F30-34 | 03:10:14 | 218 | - | Meghan Lederer | 03:07:41 | F30-34 | 03:10:14 | 153 | - | Lauren Reasoner | 03:10:14 | F30-34 | 03:10:14 | 0 | - | Lisa Stelzner | 02:54:11 | F35-39 | 03:06:24 | 733 | - | Lauren Matthews | 03:01:17 | F35-39 | 03:06:24 | 307 | - | Desiree Berry | 03:05:42 | F35-39 | 03:06:24 | 42 | - | Suzy Slane | 03:06:24 | F35-39 | 03:06:24 | 0 | - *-----------------+-------------+----------+--------------+------------------*/ +/*--------------------* + | azimuth | + +--------------------+ + | 4.71238898038469 | + | 1.5707963267948966 | + | 0 | + | NULL | + | NULL | + | NULL | + | NULL | + *--------------------*/ ``` -### `LEAD` +### `ST_BOUNDARY` ```sql -LEAD (value_expression[, offset [, default_expression]]) -OVER over_clause - -over_clause: - { named_window | ( [ window_specification ] ) } - -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] - +ST_BOUNDARY(geography_expression) ``` **Description** -Returns the value of the `value_expression` on a subsequent row. Changing the -`offset` value changes which subsequent row is returned; the default value is -`1`, indicating the next row in the window frame. An error occurs if `offset` is -NULL or a negative value. +Returns a single `GEOGRAPHY` that contains the union +of the boundaries of each component in the given input +`GEOGRAPHY`. -The optional `default_expression` is used if there isn't a row in the window -frame at the specified offset. This expression must be a constant expression and -its type must be implicitly coercible to the type of `value_expression`. If left -unspecified, `default_expression` defaults to NULL. +The boundary of each component of a `GEOGRAPHY` is +defined as follows: -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. ++ The boundary of a point is empty. ++ The boundary of a linestring consists of the endpoints of the linestring. ++ The boundary of a polygon consists of the linestrings that form the polygon + shell and each of the polygon's holes. - +**Return type** -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +`GEOGRAPHY` - +### `ST_BOUNDINGBOX` -**Supported Argument Types** +```sql +ST_BOUNDINGBOX(geography_expression) +``` -+ `value_expression` can be any data type that can be returned from an - expression. -+ `offset` must be a non-negative integer literal or parameter. -+ `default_expression` must be compatible with the value expression type. +**Description** -**Return Data Type** +Returns a `STRUCT` that represents the bounding box for the specified geography. +The bounding box is the minimal rectangle that encloses the geography. The edges +of the rectangle follow constant lines of longitude and latitude. -Same type as `value_expression`. +Caveats: -**Examples** ++ Returns `NULL` if the input is `NULL` or an empty geography. ++ The bounding box might cross the antimeridian if this allows for a smaller + rectangle. In this case, the bounding box has one of its longitudinal bounds + outside of the [-180, 180] range, so that `xmin` is smaller than the eastmost + value `xmax`. -The following example illustrates a basic use of the `LEAD` function. +**Return type** -```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - finish_time, - division, - LEAD(name) - OVER (PARTITION BY division ORDER BY finish_time ASC) AS followed_by -FROM finishers; +`STRUCT`. -/*-----------------+-------------+----------+-----------------* - | name | finish_time | division | followed_by | - +-----------------+-------------+----------+-----------------+ - | Carly Forte | 03:08:58 | F25-29 | NULL | - | Sophia Liu | 02:51:45 | F30-34 | Nikki Leith | - | Nikki Leith | 02:59:01 | F30-34 | Jen Edwards | - | Jen Edwards | 03:06:36 | F30-34 | Meghan Lederer | - | Meghan Lederer | 03:07:41 | F30-34 | Lauren Reasoner | - | Lauren Reasoner | 03:10:14 | F30-34 | NULL | - | Lisa Stelzner | 02:54:11 | F35-39 | Lauren Matthews | - | Lauren Matthews | 03:01:17 | F35-39 | Desiree Berry | - | Desiree Berry | 03:05:42 | F35-39 | Suzy Slane | - | Suzy Slane | 03:06:24 | F35-39 | NULL | - *-----------------+-------------+----------+-----------------*/ -``` +Bounding box parts: -This next example uses the optional `offset` parameter. ++ `xmin`: The westmost constant longitude line that bounds the rectangle. ++ `xmax`: The eastmost constant longitude line that bounds the rectangle. ++ `ymin`: The minimum constant latitude line that bounds the rectangle. ++ `ymax`: The maximum constant latitude line that bounds the rectangle. + +**Example** ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - finish_time, - division, - LEAD(name, 2) - OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back -FROM finishers; +WITH data AS ( + SELECT 1 id, ST_GEOGFROMTEXT('POLYGON((-125 48, -124 46, -117 46, -117 49, -125 48))') g + UNION ALL + SELECT 2 id, ST_GEOGFROMTEXT('POLYGON((172 53, -130 55, -141 70, 172 53))') g + UNION ALL + SELECT 3 id, ST_GEOGFROMTEXT('POINT EMPTY') g + UNION ALL + SELECT 4 id, ST_GEOGFROMTEXT('POLYGON((172 53, -141 70, -130 55, 172 53))', oriented => TRUE) +) +SELECT id, ST_BOUNDINGBOX(g) AS box +FROM data -/*-----------------+-------------+----------+------------------* - | name | finish_time | division | two_runners_back | - +-----------------+-------------+----------+------------------+ - | Carly Forte | 03:08:58 | F25-29 | NULL | - | Sophia Liu | 02:51:45 | F30-34 | Jen Edwards | - | Nikki Leith | 02:59:01 | F30-34 | Meghan Lederer | - | Jen Edwards | 03:06:36 | F30-34 | Lauren Reasoner | - | Meghan Lederer | 03:07:41 | F30-34 | NULL | - | Lauren Reasoner | 03:10:14 | F30-34 | NULL | - | Lisa Stelzner | 02:54:11 | F35-39 | Desiree Berry | - | Lauren Matthews | 03:01:17 | F35-39 | Suzy Slane | - | Desiree Berry | 03:05:42 | F35-39 | NULL | - | Suzy Slane | 03:06:24 | F35-39 | NULL | - *-----------------+-------------+----------+------------------*/ +/*----+------------------------------------------* + | id | box | + +----+------------------------------------------+ + | 1 | {xmin:-125, ymin:46, xmax:-117, ymax:49} | + | 2 | {xmin:172, ymin:53, xmax:230, ymax:70} | + | 3 | NULL | + | 4 | {xmin:-180, ymin:-90, xmax:180, ymax:90} | + *----+------------------------------------------*/ ``` -The following example replaces NULL values with a default value. +See [`ST_EXTENT`][st-extent] for the aggregate version of `ST_BOUNDINGBOX`. + +[st-extent]: #st_extent + +### `ST_BUFFER` ```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - finish_time, - division, - LEAD(name, 2, 'Nobody') - OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back -FROM finishers; - -/*-----------------+-------------+----------+------------------* - | name | finish_time | division | two_runners_back | - +-----------------+-------------+----------+------------------+ - | Carly Forte | 03:08:58 | F25-29 | Nobody | - | Sophia Liu | 02:51:45 | F30-34 | Jen Edwards | - | Nikki Leith | 02:59:01 | F30-34 | Meghan Lederer | - | Jen Edwards | 03:06:36 | F30-34 | Lauren Reasoner | - | Meghan Lederer | 03:07:41 | F30-34 | Nobody | - | Lauren Reasoner | 03:10:14 | F30-34 | Nobody | - | Lisa Stelzner | 02:54:11 | F35-39 | Desiree Berry | - | Lauren Matthews | 03:01:17 | F35-39 | Suzy Slane | - | Desiree Berry | 03:05:42 | F35-39 | Nobody | - | Suzy Slane | 03:06:24 | F35-39 | Nobody | - *-----------------+-------------+----------+------------------*/ +ST_BUFFER( + geography, + buffer_radius + [, num_seg_quarter_circle => num_segments] + [, use_spheroid => boolean_expression] + [, endcap => endcap_style] + [, side => line_side]) ``` -### `NTH_VALUE` +**Description** -```sql -NTH_VALUE (value_expression, constant_integer_expression [{RESPECT | IGNORE} NULLS]) -OVER over_clause +Returns a `GEOGRAPHY` that represents the buffer around the input `GEOGRAPHY`. +This function is similar to [`ST_BUFFERWITHTOLERANCE`][st-bufferwithtolerance], +but you specify the number of segments instead of providing tolerance to +determine how much the resulting geography can deviate from the ideal +buffer radius. -over_clause: - { named_window | ( [ window_specification ] ) } ++ `geography`: The input `GEOGRAPHY` to encircle with the buffer radius. ++ `buffer_radius`: `DOUBLE` that represents the radius of the + buffer around the input geography. The radius is in meters. Note that + polygons contract when buffered with a negative `buffer_radius`. Polygon + shells and holes that are contracted to a point are discarded. ++ `num_seg_quarter_circle`: (Optional) `DOUBLE` specifies the + number of segments that are used to approximate a quarter circle. The + default value is `8.0`. Naming this argument is optional. ++ `endcap`: (Optional) `STRING` allows you to specify one of two endcap + styles: `ROUND` and `FLAT`. The default value is `ROUND`. This option only + affects the endcaps of buffered linestrings. ++ `side`: (Optional) `STRING` allows you to specify one of three possibilities + for lines: `BOTH`, `LEFT`, and `RIGHT`. The default is `BOTH`. This option + only affects how linestrings are buffered. ++ `use_spheroid`: (Optional) `BOOL` determines how this function measures + distance. If `use_spheroid` is `FALSE`, the function measures distance on + the surface of a perfect sphere. The `use_spheroid` parameter + currently only supports the value `FALSE`. The default value of + `use_spheroid` is `FALSE`. -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - ORDER BY expression [ { ASC | DESC } ] [, ...] - [ window_frame_clause ] +**Return type** -``` +Polygon `GEOGRAPHY` -**Description** +**Example** -Returns the value of `value_expression` at the Nth row of the current window -frame, where Nth is defined by `constant_integer_expression`. Returns NULL if -there is no such row. +The following example shows the result of `ST_BUFFER` on a point. A buffered +point is an approximated circle. When `num_seg_quarter_circle = 2`, there are +two line segments in a quarter circle, and therefore the buffered circle has +eight sides and [`ST_NUMPOINTS`][st-numpoints] returns nine vertices. When +`num_seg_quarter_circle = 8`, there are eight line segments in a quarter circle, +and therefore the buffered circle has thirty-two sides and +[`ST_NUMPOINTS`][st-numpoints] returns thirty-three vertices. -This function includes `NULL` values in the calculation unless `IGNORE NULLS` is -present. If `IGNORE NULLS` is present, the function excludes `NULL` values from -the calculation. +```sql +SELECT + -- num_seg_quarter_circle=2 + ST_NUMPOINTS(ST_BUFFER(ST_GEOGFROMTEXT('POINT(1 2)'), 50, 2)) AS eight_sides, + -- num_seg_quarter_circle=8, since 8 is the default + ST_NUMPOINTS(ST_BUFFER(ST_GEOGFROMTEXT('POINT(100 2)'), 50)) AS thirty_two_sides; -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +/*-------------+------------------* + | eight_sides | thirty_two_sides | + +-------------+------------------+ + | 9 | 33 | + *-------------+------------------*/ +``` - +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +[st-bufferwithtolerance]: #st_bufferwithtolerance - +[st-numpoints]: #st_numpoints -**Supported Argument Types** +### `ST_BUFFERWITHTOLERANCE` -+ `value_expression` can be any data type that can be returned from an - expression. -+ `constant_integer_expression` can be any constant expression that returns an - integer. +```sql +ST_BUFFERWITHTOLERANCE( + geography, + buffer_radius, + tolerance_meters => tolerance + [, use_spheroid => boolean_expression] + [, endcap => endcap_style] + [, side => line_side]) +``` -**Return Data Type** +Returns a `GEOGRAPHY` that represents the buffer around the input `GEOGRAPHY`. +This function is similar to [`ST_BUFFER`][st-buffer], +but you provide tolerance instead of segments to determine how much the +resulting geography can deviate from the ideal buffer radius. -Same type as `value_expression`. ++ `geography`: The input `GEOGRAPHY` to encircle with the buffer radius. ++ `buffer_radius`: `DOUBLE` that represents the radius of the + buffer around the input geography. The radius is in meters. Note that + polygons contract when buffered with a negative `buffer_radius`. Polygon + shells and holes that are contracted to a point are discarded. ++ `tolerance_meters`: `DOUBLE` specifies a tolerance in + meters with which the shape is approximated. Tolerance determines how much a + polygon can deviate from the ideal radius. Naming this argument is optional. ++ `endcap`: (Optional) `STRING` allows you to specify one of two endcap + styles: `ROUND` and `FLAT`. The default value is `ROUND`. This option only + affects the endcaps of buffered linestrings. ++ `side`: (Optional) `STRING` allows you to specify one of three possible line + styles: `BOTH`, `LEFT`, and `RIGHT`. The default is `BOTH`. This option only + affects the endcaps of buffered linestrings. ++ `use_spheroid`: (Optional) `BOOL` determines how this function measures + distance. If `use_spheroid` is `FALSE`, the function measures distance on + the surface of a perfect sphere. The `use_spheroid` parameter + currently only supports the value `FALSE`. The default value of + `use_spheroid` is `FALSE`. -**Examples** +**Return type** -```sql -WITH finishers AS - (SELECT 'Sophia Liu' as name, - TIMESTAMP '2016-10-18 2:51:45' as finish_time, - 'F30-34' as division - UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' - UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' - UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' - UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' - UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' - UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' - UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' - UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' - UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') -SELECT name, - FORMAT_TIMESTAMP('%X', finish_time) AS finish_time, - division, - FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time, - FORMAT_TIMESTAMP('%X', second_fastest) AS second_fastest -FROM ( - SELECT name, - finish_time, - division,finishers, - FIRST_VALUE(finish_time) - OVER w1 AS fastest_time, - NTH_VALUE(finish_time, 2) - OVER w1 as second_fastest - FROM finishers - WINDOW w1 AS ( - PARTITION BY division ORDER BY finish_time ASC - ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)); +Polygon `GEOGRAPHY` -/*-----------------+-------------+----------+--------------+----------------* - | name | finish_time | division | fastest_time | second_fastest | - +-----------------+-------------+----------+--------------+----------------+ - | Carly Forte | 03:08:58 | F25-29 | 03:08:58 | NULL | - | Sophia Liu | 02:51:45 | F30-34 | 02:51:45 | 02:59:01 | - | Nikki Leith | 02:59:01 | F30-34 | 02:51:45 | 02:59:01 | - | Jen Edwards | 03:06:36 | F30-34 | 02:51:45 | 02:59:01 | - | Meghan Lederer | 03:07:41 | F30-34 | 02:51:45 | 02:59:01 | - | Lauren Reasoner | 03:10:14 | F30-34 | 02:51:45 | 02:59:01 | - | Lisa Stelzner | 02:54:11 | F35-39 | 02:54:11 | 03:01:17 | - | Lauren Matthews | 03:01:17 | F35-39 | 02:54:11 | 03:01:17 | - | Desiree Berry | 03:05:42 | F35-39 | 02:54:11 | 03:01:17 | - | Suzy Slane | 03:06:24 | F35-39 | 02:54:11 | 03:01:17 | - *-----------------+-------------+----------+--------------+----------------*/ -``` +**Example** -### `PERCENTILE_CONT` +The following example shows the results of `ST_BUFFERWITHTOLERANCE` on a point, +given two different values for tolerance but with the same buffer radius of +`100`. A buffered point is an approximated circle. When `tolerance_meters=25`, +the tolerance is a large percentage of the buffer radius, and therefore only +five segments are used to approximate a circle around the input point. When +`tolerance_meters=1`, the tolerance is a much smaller percentage of the buffer +radius, and therefore twenty-four edges are used to approximate a circle around +the input point. ```sql -PERCENTILE_CONT (value_expression, percentile [{RESPECT | IGNORE} NULLS]) -OVER over_clause - -over_clause: - { named_window | ( [ window_specification ] ) } - -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] +SELECT + -- tolerance_meters=25, or 25% of the buffer radius. + ST_NumPoints(ST_BUFFERWITHTOLERANCE(ST_GEOGFROMTEXT('POINT(1 2)'), 100, 25)) AS five_sides, + -- tolerance_meters=1, or 1% of the buffer radius. + st_NumPoints(ST_BUFFERWITHTOLERANCE(ST_GEOGFROMTEXT('POINT(100 2)'), 100, 1)) AS twenty_four_sides; +/*------------+-------------------* + | five_sides | twenty_four_sides | + +------------+-------------------+ + | 6 | 24 | + *------------+-------------------*/ ``` -**Description** +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System -Computes the specified percentile value for the value_expression, with linear -interpolation. +[st-buffer]: #st_buffer -This function ignores NULL -values if -`RESPECT NULLS` is absent. If `RESPECT NULLS` is present: +### `ST_CENTROID` -+ Interpolation between two `NULL` values returns `NULL`. -+ Interpolation between a `NULL` value and a non-`NULL` value returns the - non-`NULL` value. +```sql +ST_CENTROID(geography_expression) +``` -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. +**Description** - +Returns the _centroid_ of the input `GEOGRAPHY` as a single point `GEOGRAPHY`. -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +The _centroid_ of a `GEOGRAPHY` is the weighted average of the centroids of the +highest-dimensional components in the `GEOGRAPHY`. The centroid for components +in each dimension is defined as follows: - ++ The centroid of points is the arithmetic mean of the input coordinates. ++ The centroid of linestrings is the centroid of all the edges weighted by + length. The centroid of each edge is the geodesic midpoint of the edge. ++ The centroid of a polygon is its center of mass. -`PERCENTILE_CONT` can be used with differential privacy. To learn more, see -[Differentially private aggregate functions][dp-functions]. +If the input `GEOGRAPHY` is empty, an empty `GEOGRAPHY` is returned. -**Supported Argument Types** +**Constraints** -+ `value_expression` and `percentile` must have one of the following types: - + `NUMERIC` - + `BIGNUMERIC` - + `DOUBLE` -+ `percentile` must be a literal in the range `[0, 1]`. +In the unlikely event that the centroid of a `GEOGRAPHY` cannot be defined by a +single point on the surface of the Earth, a deterministic but otherwise +arbitrary point is returned. This can only happen if the centroid is exactly at +the center of the Earth, such as the centroid for a pair of antipodal points, +and the likelihood of this happening is vanishingly small. -**Return Data Type** +**Return type** -The return data type is determined by the argument types with the following -table. - +Point `GEOGRAPHY` - - - - - - - - - - +### `ST_CLOSESTPOINT` -
INPUTNUMERICBIGNUMERICDOUBLE
NUMERICNUMERICBIGNUMERICDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLE
+```sql +ST_CLOSESTPOINT(geography_1, geography_2[, use_spheroid]) +``` -**Examples** +**Description** -The following example computes the value for some percentiles from a column of -values while ignoring nulls. +Returns a `GEOGRAPHY` containing a point on +`geography_1` with the smallest possible distance to `geography_2`. This implies +that the distance between the point returned by `ST_CLOSESTPOINT` and +`geography_2` is less than or equal to the distance between any other point on +`geography_1` and `geography_2`. -```sql -SELECT - PERCENTILE_CONT(x, 0) OVER() AS min, - PERCENTILE_CONT(x, 0.01) OVER() AS percentile1, - PERCENTILE_CONT(x, 0.5) OVER() AS median, - PERCENTILE_CONT(x, 0.9) OVER() AS percentile90, - PERCENTILE_CONT(x, 1) OVER() AS max -FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1; +If either of the input `GEOGRAPHY`s is empty, `ST_CLOSESTPOINT` returns `NULL`. - /*-----+-------------+--------+--------------+-----* - | min | percentile1 | median | percentile90 | max | - +-----+-------------+--------+--------------+-----+ - | 0 | 0.03 | 1.5 | 2.7 | 3 | - *-----+-------------+--------+--------------+-----+ -``` +The optional `use_spheroid` parameter determines how this function measures +distance. If `use_spheroid` is `FALSE`, the function measures distance on the +surface of a perfect sphere. -The following example computes the value for some percentiles from a column of -values while respecting nulls. +The `use_spheroid` parameter currently only supports +the value `FALSE`. The default value of `use_spheroid` is `FALSE`. -```sql -SELECT - PERCENTILE_CONT(x, 0 RESPECT NULLS) OVER() AS min, - PERCENTILE_CONT(x, 0.01 RESPECT NULLS) OVER() AS percentile1, - PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER() AS median, - PERCENTILE_CONT(x, 0.9 RESPECT NULLS) OVER() AS percentile90, - PERCENTILE_CONT(x, 1 RESPECT NULLS) OVER() AS max -FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1; +**Return type** -/*------+-------------+--------+--------------+-----* - | min | percentile1 | median | percentile90 | max | - +------+-------------+--------+--------------+-----+ - | NULL | 0 | 1 | 2.6 | 3 | - *------+-------------+--------+--------------+-----+ -``` +Point `GEOGRAPHY` -[dp-functions]: #aggregate-dp-functions +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System -### `PERCENTILE_DISC` +### `ST_CLUSTERDBSCAN` ```sql -PERCENTILE_DISC (value_expression, percentile [{RESPECT | IGNORE} NULLS]) +ST_CLUSTERDBSCAN(geography_column, epsilon, minimum_geographies) OVER over_clause over_clause: @@ -15954,18 +15065,12 @@ over_clause: window_specification: [ named_window ] [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] ``` -**Description** - -Computes the specified percentile value for a discrete `value_expression`. The -returned value is the first sorted value of `value_expression` with cumulative -distribution greater than or equal to the given `percentile` value. - -This function ignores `NULL` -values unless -`RESPECT NULLS` is present. +Performs [DBSCAN clustering][dbscan-link] on a column of geographies. Returns a +0-based cluster number. To learn more about the `OVER` clause and how to use it, see [Window function calls][window-function-calls]. @@ -15976,5393 +15081,4694 @@ To learn more about the `OVER` clause and how to use it, see -**Supported Argument Types** +**Input parameters** -+ `value_expression` can be any orderable type. -+ `percentile` must be a literal in the range `[0, 1]`, with one of the - following types: - + `NUMERIC` - + `BIGNUMERIC` - + `DOUBLE` ++ `geography_column`: A column of `GEOGRAPHY`s that + is clustered. ++ `epsilon`: The epsilon that specifies the radius, measured in meters, around + a core value. Non-negative `DOUBLE` value. ++ `minimum_geographies`: Specifies the minimum number of geographies in a + single cluster. Only dense input forms a cluster, otherwise it is classified + as noise. Non-negative `INT64` value. -**Return Data Type** +**Geography types and the DBSCAN algorithm** -Same type as `value_expression`. +The DBSCAN algorithm identifies high-density clusters of data and marks outliers +in low-density areas of noise. Geographies passed in through `geography_column` +are classified in one of three ways by the DBSCAN algorithm: + ++ Core value: A geography is a core value if it is within `epsilon` distance + of `minimum_geographies` geographies, including itself. The core value + starts a new cluster, or is added to the same cluster as a core value within + `epsilon` distance. Core values are grouped in a cluster together with all + other core and border values that are within `epsilon` distance. ++ Border value: A geography is a border value if it is within epsilon distance + of a core value. It is added to the same cluster as a core value within + `epsilon` distance. A border value may be within `epsilon` distance of more + than one cluster. In this case, it may be arbitrarily assigned to either + cluster and the function will produce the same result in subsequent calls. ++ Noise: A geography is noise if it is neither a core nor a border value. + Noise values are assigned to a `NULL` cluster. An empty + `GEOGRAPHY` is always classified as noise. + +**Constraints** + ++ The argument `minimum_geographies` is a non-negative + `INT64`and `epsilon` is a non-negative + `DOUBLE`. ++ An empty geography cannot join any cluster. ++ Multiple clustering assignments could be possible for a border value. If a + geography is a border value, `ST_CLUSTERDBSCAN` will assign it to an + arbitrary valid cluster. + +**Return type** + +`INT64` for each geography in the geography column. **Examples** -The following example computes the value for some percentiles from a column of -values while ignoring nulls. +This example performs DBSCAN clustering with a radius of 100,000 meters with a +`minimum_geographies` argument of 1. The geographies being analyzed are a +mixture of points, lines, and polygons. ```sql -SELECT - x, - PERCENTILE_DISC(x, 0) OVER() AS min, - PERCENTILE_DISC(x, 0.5) OVER() AS median, - PERCENTILE_DISC(x, 1) OVER() AS max -FROM UNNEST(['c', NULL, 'b', 'a']) AS x; +WITH Geos as + (SELECT 1 as row_id, ST_GEOGFROMTEXT('POINT EMPTY') as geo UNION ALL + SELECT 2, ST_GEOGFROMTEXT('MULTIPOINT(1 1, 2 2, 4 4, 5 2)') UNION ALL + SELECT 3, ST_GEOGFROMTEXT('POINT(14 15)') UNION ALL + SELECT 4, ST_GEOGFROMTEXT('LINESTRING(40 1, 42 34, 44 39)') UNION ALL + SELECT 5, ST_GEOGFROMTEXT('POLYGON((40 2, 40 1, 41 2, 40 2))')) +SELECT row_id, geo, ST_CLUSTERDBSCAN(geo, 1e5, 1) OVER () AS cluster_num FROM +Geos ORDER BY row_id -/*------+-----+--------+-----* - | x | min | median | max | - +------+-----+--------+-----+ - | c | a | b | c | - | NULL | a | b | c | - | b | a | b | c | - | a | a | b | c | - *------+-----+--------+-----*/ +/*--------+-----------------------------------+-------------* + | row_id | geo | cluster_num | + +--------+-----------------------------------+-------------+ + | 1 | GEOMETRYCOLLECTION EMPTY | NULL | + | 2 | MULTIPOINT(1 1, 2 2, 5 2, 4 4) | 0 | + | 3 | POINT(14 15) | 1 | + | 4 | LINESTRING(40 1, 42 34, 44 39) | 2 | + | 5 | POLYGON((40 2, 40 1, 41 2, 40 2)) | 2 | + *--------+-----------------------------------+-------------*/ ``` -The following example computes the value for some percentiles from a column of -values while respecting nulls. - -```sql -SELECT - x, - PERCENTILE_DISC(x, 0 RESPECT NULLS) OVER() AS min, - PERCENTILE_DISC(x, 0.5 RESPECT NULLS) OVER() AS median, - PERCENTILE_DISC(x, 1 RESPECT NULLS) OVER() AS max -FROM UNNEST(['c', NULL, 'b', 'a']) AS x; +[dbscan-link]: https://en.wikipedia.org/wiki/DBSCAN -/*------+------+--------+-----* - | x | min | median | max | - +------+------+--------+-----+ - | c | NULL | a | c | - | NULL | NULL | a | c | - | b | NULL | a | c | - | a | NULL | a | c | - *------+------+--------+-----*/ +### `ST_CONTAINS` +```sql +ST_CONTAINS(geography_1, geography_2) ``` -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md +**Description** -## Hash functions +Returns `TRUE` if no point of `geography_2` is outside `geography_1`, and +the interiors intersect; returns `FALSE` otherwise. -ZetaSQL supports the following hash functions. +NOTE: A `GEOGRAPHY` *does not* contain its own +boundary. Compare with [`ST_COVERS`][st_covers]. -### Function list +**Return type** - - - - - - - - +`BOOL` - - - - +The following query tests whether the polygon `POLYGON((1 1, 20 1, 10 20, 1 1))` +contains each of the three points `(0, 0)`, `(1, 1)`, and `(10, 10)`, which lie +on the exterior, the boundary, and the interior of the polygon respectively. - - - - +/*--------------+----------* + | p | contains | + +--------------+----------+ + | POINT(0 0) | FALSE | + | POINT(1 1) | FALSE | + | POINT(10 10) | TRUE | + *--------------+----------*/ +``` - - - - - - - - - - - - - - - - -
NameSummary
FARM_FINGERPRINT +**Example** - - Computes the fingerprint of a STRING or - BYTES value, using the FarmHash Fingerprint64 algorithm. -
MD5 +```sql +SELECT + ST_GEOGPOINT(i, i) AS p, + ST_CONTAINS(ST_GEOGFROMTEXT('POLYGON((1 1, 20 1, 10 20, 1 1))'), + ST_GEOGPOINT(i, i)) AS `contains` +FROM UNNEST([0, 1, 10]) AS i; - - Computes the hash of a STRING or - BYTES value, using the MD5 algorithm. -
SHA1 +[st_covers]: #st_covers - - Computes the hash of a STRING or - BYTES value, using the SHA-1 algorithm. -
SHA256 - - - Computes the hash of a STRING or - BYTES value, using the SHA-256 algorithm. -
SHA512 - - - Computes the hash of a STRING or - BYTES value, using the SHA-512 algorithm. -
- -### `FARM_FINGERPRINT` +### `ST_CONVEXHULL` -``` -FARM_FINGERPRINT(value) +```sql +ST_CONVEXHULL(geography_expression) ``` **Description** -Computes the fingerprint of the `STRING` or `BYTES` input using the -`Fingerprint64` function from the -[open-source FarmHash library][hash-link-to-farmhash-github]. The output -of this function for a particular input will never change. +Returns the convex hull for the input `GEOGRAPHY`. The convex hull is the +smallest convex `GEOGRAPHY` that covers the input. A `GEOGRAPHY` is convex if +for every pair of points in the `GEOGRAPHY`, the geodesic edge connecting the +points are also contained in the same `GEOGRAPHY`. + +In most cases, the convex hull consists of a single polygon. Notable edge cases +include the following: + +* The convex hull of a single point is also a point. +* The convex hull of two or more collinear points is a linestring as long as + that linestring is convex. +* If the input `GEOGRAPHY` spans more than a + hemisphere, the convex hull is the full globe. This includes any input that + contains a pair of antipodal points. +* `ST_CONVEXHULL` returns `NULL` if the input is either `NULL` or the empty + `GEOGRAPHY`. **Return type** -INT64 +`GEOGRAPHY` **Examples** +The convex hull returned by `ST_CONVEXHULL` can be a point, linestring, or a +polygon, depending on the input. + ```sql -WITH example AS ( - SELECT 1 AS x, "foo" AS y, true AS z UNION ALL - SELECT 2 AS x, "apple" AS y, false AS z UNION ALL - SELECT 3 AS x, "" AS y, true AS z -) +WITH Geographies AS + (SELECT ST_GEOGFROMTEXT('POINT(1 1)') AS g UNION ALL + SELECT ST_GEOGFROMTEXT('LINESTRING(1 1, 2 2)') AS g UNION ALL + SELECT ST_GEOGFROMTEXT('MULTIPOINT(2 11, 4 12, 0 15, 1 9, 1 12)') AS g) SELECT - *, - FARM_FINGERPRINT(CONCAT(CAST(x AS STRING), y, CAST(z AS STRING))) - AS row_fingerprint -FROM example; -/*---+-------+-------+----------------------* - | x | y | z | row_fingerprint | - +---+-------+-------+----------------------+ - | 1 | foo | true | -1541654101129638711 | - | 2 | apple | false | 2794438866806483259 | - | 3 | | true | -4880158226897771312 | - *---+-------+-------+----------------------*/ -``` + g AS input_geography, + ST_CONVEXHULL(g) AS convex_hull +FROM Geographies; -[hash-link-to-farmhash-github]: https://github.com/google/farmhash +/*-----------------------------------------+--------------------------------------------------------* + | input_geography | convex_hull | + +-----------------------------------------+--------------------------------------------------------+ + | POINT(1 1) | POINT(0.999999999999943 1) | + | LINESTRING(1 1, 2 2) | LINESTRING(2 2, 1.49988573656168 1.5000570914792, 1 1) | + | MULTIPOINT(1 9, 4 12, 2 11, 1 12, 0 15) | POLYGON((1 9, 4 12, 0 15, 1 9)) | + *-----------------------------------------+--------------------------------------------------------*/ +``` -### `MD5` +### `ST_COVEREDBY` -``` -MD5(input) +```sql +ST_COVEREDBY(geography_1, geography_2) ``` **Description** -Computes the hash of the input using the -[MD5 algorithm][hash-link-to-md5-wikipedia]. The input can either be -`STRING` or `BYTES`. The string version treats the input as an array of bytes. +Returns `FALSE` if `geography_1` or `geography_2` is empty. Returns `TRUE` if no +points of `geography_1` lie in the exterior of `geography_2`. -This function returns 16 bytes. +Given two `GEOGRAPHY`s `a` and `b`, +`ST_COVEREDBY(a, b)` returns the same result as +[`ST_COVERS`][st-covers]`(b, a)`. Note the opposite order of arguments. -Warning: MD5 is no longer considered secure. -For increased security use another hashing function. +**Return type** + +`BOOL` + +[st-covers]: #st_covers + +### `ST_COVERS` + +```sql +ST_COVERS(geography_1, geography_2) +``` + +**Description** + +Returns `FALSE` if `geography_1` or `geography_2` is empty. +Returns `TRUE` if no points of `geography_2` lie in the exterior of +`geography_1`. **Return type** -`BYTES` +`BOOL` **Example** +The following query tests whether the polygon `POLYGON((1 1, 20 1, 10 20, 1 1))` +covers each of the three points `(0, 0)`, `(1, 1)`, and `(10, 10)`, which lie +on the exterior, the boundary, and the interior of the polygon respectively. + ```sql -SELECT MD5("Hello World") as md5; +SELECT + ST_GEOGPOINT(i, i) AS p, + ST_COVERS(ST_GEOGFROMTEXT('POLYGON((1 1, 20 1, 10 20, 1 1))'), + ST_GEOGPOINT(i, i)) AS `covers` +FROM UNNEST([0, 1, 10]) AS i; -/*-------------------------------------------------* - | md5 | - +-------------------------------------------------+ - | \xb1\n\x8d\xb1d\xe0uA\x05\xb7\xa9\x9b\xe7.?\xe5 | - *-------------------------------------------------*/ +/*--------------+--------* + | p | covers | + +--------------+--------+ + | POINT(0 0) | FALSE | + | POINT(1 1) | TRUE | + | POINT(10 10) | TRUE | + *--------------+--------*/ ``` -[hash-link-to-md5-wikipedia]: https://en.wikipedia.org/wiki/MD5 - -### `SHA1` +### `ST_DIFFERENCE` -``` -SHA1(input) +```sql +ST_DIFFERENCE(geography_1, geography_2) ``` **Description** -Computes the hash of the input using the -[SHA-1 algorithm][hash-link-to-sha-1-wikipedia]. The input can either be -`STRING` or `BYTES`. The string version treats the input as an array of bytes. +Returns a `GEOGRAPHY` that represents the point set +difference of `geography_1` and `geography_2`. Therefore, the result consists of +the part of `geography_1` that does not intersect with `geography_2`. -This function returns 20 bytes. +If `geometry_1` is completely contained in `geometry_2`, then `ST_DIFFERENCE` +returns an empty `GEOGRAPHY`. -Warning: SHA1 is no longer considered secure. -For increased security, use another hashing function. +**Constraints** + +The underlying geometric objects that a ZetaSQL +`GEOGRAPHY` represents correspond to a *closed* point +set. Therefore, `ST_DIFFERENCE` is the closure of the point set difference of +`geography_1` and `geography_2`. This implies that if `geography_1` and +`geography_2` intersect, then a portion of the boundary of `geography_2` could +be in the difference. **Return type** -`BYTES` +`GEOGRAPHY` **Example** +The following query illustrates the difference between `geog1`, a larger polygon +`POLYGON((0 0, 10 0, 10 10, 0 0))` and `geog1`, a smaller polygon +`POLYGON((4 2, 6 2, 8 6, 4 2))` that intersects with `geog1`. The result is +`geog1` with a hole where `geog2` intersects with it. + ```sql -SELECT SHA1("Hello World") as sha1; +SELECT + ST_DIFFERENCE( + ST_GEOGFROMTEXT('POLYGON((0 0, 10 0, 10 10, 0 0))'), + ST_GEOGFROMTEXT('POLYGON((4 2, 6 2, 8 6, 4 2))') + ); -/*-----------------------------------------------------------* - | sha1 | - +-----------------------------------------------------------+ - | \nMU\xa8\xd7x\xe5\x02/\xabp\x19w\xc5\xd8@\xbb\xc4\x86\xd0 | - *-----------------------------------------------------------*/ +/*--------------------------------------------------------* + | difference_of_geog1_and_geog2 | + +--------------------------------------------------------+ + | POLYGON((0 0, 10 0, 10 10, 0 0), (8 6, 6 2, 4 2, 8 6)) | + *--------------------------------------------------------*/ ``` -[hash-link-to-sha-1-wikipedia]: https://en.wikipedia.org/wiki/SHA-1 - -### `SHA256` +### `ST_DIMENSION` -``` -SHA256(input) +```sql +ST_DIMENSION(geography_expression) ``` **Description** -Computes the hash of the input using the -[SHA-256 algorithm][hash-link-to-sha-2-wikipedia]. The input can either be -`STRING` or `BYTES`. The string version treats the input as an array of bytes. +Returns the dimension of the highest-dimensional element in the input +`GEOGRAPHY`. -This function returns 32 bytes. +The dimension of each possible element is as follows: + ++ The dimension of a point is `0`. ++ The dimension of a linestring is `1`. ++ The dimension of a polygon is `2`. + +If the input `GEOGRAPHY` is empty, `ST_DIMENSION` +returns `-1`. **Return type** -`BYTES` +`INT64` -**Example** +### `ST_DISJOINT` ```sql -SELECT SHA256("Hello World") as sha256; +ST_DISJOINT(geography_1, geography_2) ``` -[hash-link-to-sha-2-wikipedia]: https://en.wikipedia.org/wiki/SHA-2 +**Description** -### `SHA512` +Returns `TRUE` if the intersection of `geography_1` and `geography_2` is empty, +that is, no point in `geography_1` also appears in `geography_2`. + +`ST_DISJOINT` is the logical negation of [`ST_INTERSECTS`][st-intersects]. + +**Return type** + +`BOOL` + +[st-intersects]: #st_intersects + +### `ST_DISTANCE` ``` -SHA512(input) +ST_DISTANCE(geography_1, geography_2[, use_spheroid]) ``` **Description** -Computes the hash of the input using the -[SHA-512 algorithm][hash-link-to-sha-2-wikipedia]. The input can either be -`STRING` or `BYTES`. The string version treats the input as an array of bytes. +Returns the shortest distance in meters between two non-empty +`GEOGRAPHY`s. -This function returns 64 bytes. +If either of the input `GEOGRAPHY`s is empty, +`ST_DISTANCE` returns `NULL`. + +The optional `use_spheroid` parameter determines how this function measures +distance. If `use_spheroid` is `FALSE`, the function measures distance on the +surface of a perfect sphere. If `use_spheroid` is `TRUE`, the function measures +distance on the surface of the [WGS84][wgs84-link] spheroid. The default value +of `use_spheroid` is `FALSE`. **Return type** -`BYTES` +`DOUBLE` -**Example** +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System + +### `ST_DUMP` ```sql -SELECT SHA512("Hello World") as sha512; +ST_DUMP(geography[, dimension]) ``` -[hash-link-to-sha-2-wikipedia]: https://en.wikipedia.org/wiki/SHA-2 +**Description** -## String functions +Returns an `ARRAY` of simple +`GEOGRAPHY`s where each element is a component of +the input `GEOGRAPHY`. A simple +`GEOGRAPHY` consists of a single point, linestring, +or polygon. If the input `GEOGRAPHY` is simple, the +result is a single element. When the input +`GEOGRAPHY` is a collection, `ST_DUMP` returns an +`ARRAY` with one simple +`GEOGRAPHY` for each component in the collection. -ZetaSQL supports string functions. -These string functions work on two different values: -`STRING` and `BYTES` data types. `STRING` values must be well-formed UTF-8. +If `dimension` is provided, the function only returns +`GEOGRAPHY`s of the corresponding dimension. A +dimension of -1 is equivalent to omitting `dimension`. -Functions that return position values, such as [STRPOS][string-link-to-strpos], -encode those positions as `INT64`. The value `1` -refers to the first character (or byte), `2` refers to the second, and so on. -The value `0` indicates an invalid position. When working on `STRING` types, the -returned positions refer to character positions. +**Return Type** -All string comparisons are done byte-by-byte, without regard to Unicode -canonical equivalence. +`ARRAY` -### Function list +**Examples** - - - - - - - - +The following example shows how `ST_DUMP` returns the simple geographies within +a complex geography. - - - - +/*-------------------------------------+------------------------------------* + | original_geographies | dumped_geographies | + +-------------------------------------+------------------------------------+ + | POINT(0 0) | [POINT(0 0)] | + | MULTIPOINT(0 0, 1 1) | [POINT(0 0), POINT(1 1)] | + | GEOMETRYCOLLECTION(POINT(0 0), | [POINT(0 0), LINESTRING(1 2, 2 1)] | + | LINESTRING(1 2, 2 1)) | | + *-------------------------------------+------------------------------------*/ +``` - - - - +```sql +WITH example AS ( + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography) +SELECT + geography AS original_geography, + ST_DUMP(geography, 1) AS dumped_geographies +FROM example - - - - +### `ST_DUMPPOINTS` - - - - +**Description** - - - - +**Return Type** - - - - +**Examples** - - - - +/*-------------------------------------+------------------------------------* + | original_geographies | dumped_points_geographies | + +-------------------------------------+------------------------------------+ + | POINT(0 0) | [POINT(0 0)] | + | MULTIPOINT(0 0, 1 1) | [POINT(0 0),POINT(1 1)] | + | GEOMETRYCOLLECTION(POINT(0 0), | [POINT(0 0),POINT(1 2),POINT(2 1)] | + | LINESTRING(1 2, 2 1)) | | + *-------------------------------------+------------------------------------*/ +``` - - - - +```sql +ST_DWITHIN(geography_1, geography_2, distance[, use_spheroid]) +``` - - - - +Returns `TRUE` if the distance between at least one point in `geography_1` and +one point in `geography_2` is less than or equal to the distance given by the +`distance` argument; otherwise, returns `FALSE`. If either input +`GEOGRAPHY` is empty, `ST_DWithin` returns `FALSE`. The +given `distance` is in meters on the surface of the Earth. - - - - +The `use_spheroid` parameter currently only supports +the value `FALSE`. The default value of `use_spheroid` is `FALSE`. - - - - +`BOOL` - - - - +### `ST_ENDPOINT` - - - - +**Description** - - - - +**Return Type** - - - - +**Example** - - - - +/*--------------* + | last | + +--------------+ + | POINT(3 3) | + *--------------*/ +``` - - - - +```sql +ST_EQUALS(geography_1, geography_2) +``` - - - - +Returns `TRUE` if `geography_1` and `geography_2` represent the same - - - - +Therefore, two `GEOGRAPHY`s may be equal even if the +ordering of points or vertices differ, as long as they still represent the same +geometric structure. - - - - +`ST_EQUALS` is not guaranteed to be a transitive function. - - - - +`BOOL` - - - - +```sql +ST_EXTENT(geography_expression) +``` - - - - +Returns a `STRUCT` that represents the bounding box for the set of input +`GEOGRAPHY` values. The bounding box is the minimal rectangle that encloses the +geography. The edges of the rectangle follow constant lines of longitude and +latitude. - - - - ++ Returns `NULL` if all the inputs are `NULL` or empty geographies. ++ The bounding box might cross the antimeridian if this allows for a smaller + rectangle. In this case, the bounding box has one of its longitudinal bounds + outside of the [-180, 180] range, so that `xmin` is smaller than the eastmost + value `xmax`. ++ If the longitude span of the bounding box is larger than or equal to 180 + degrees, the function returns the bounding box with the longitude range of + [-180, 180]. - - - - +`STRUCT`. - - - - ++ `xmin`: The westmost constant longitude line that bounds the rectangle. ++ `xmax`: The eastmost constant longitude line that bounds the rectangle. ++ `ymin`: The minimum constant latitude line that bounds the rectangle. ++ `ymax`: The maximum constant latitude line that bounds the rectangle. - - - - +```sql +WITH data AS ( + SELECT 1 id, ST_GEOGFROMTEXT('POLYGON((-125 48, -124 46, -117 46, -117 49, -125 48))') g + UNION ALL + SELECT 2 id, ST_GEOGFROMTEXT('POLYGON((172 53, -130 55, -141 70, 172 53))') g + UNION ALL + SELECT 3 id, ST_GEOGFROMTEXT('POINT EMPTY') g +) +SELECT ST_EXTENT(g) AS box +FROM data - - - - +[`ST_BOUNDINGBOX`][st-boundingbox] for the non-aggregate version of `ST_EXTENT`. - - - - +### `ST_EXTERIORRING` - - - - +**Description** - - - - ++ If the input geography is a polygon, gets the outermost ring of the polygon + geography and returns the corresponding linestring. ++ If the input is the full `GEOGRAPHY`, returns an empty geography. ++ Returns an error if the input is not a single polygon. - - - - +**Return type** - - - - +**Examples** - - - - +/*---------------------------------------* + | ring | + +---------------------------------------+ + | LINESTRING(2 2, 1 4, 0 0, 2 2) | + | LINESTRING(5 1, 5 10, 1 10, 1 1, 5 1) | + *---------------------------------------*/ +``` - - - - +```sql +ST_GEOGFROM(expression) +``` - - - - +Converts an expression for a `STRING` or `BYTES` value into a +`GEOGRAPHY` value. - - - - ++ WKT format. To learn more about this format and the requirements to use it, + see [ST_GEOGFROMTEXT][st-geogfromtext]. ++ WKB in hexadecimal text format. To learn more about this format and the + requirements to use it, see [ST_GEOGFROMWKB][st-geogfromwkb]. ++ GeoJSON format. To learn more about this format and the + requirements to use it, see [ST_GEOGFROMGEOJSON][st-geogfromgeojson]. - - - - +If `expression` is `NULL`, the output is `NULL`. - - - - +`GEOGRAPHY` - - - - +This takes a WKT-formatted string and returns a `GEOGRAPHY` polygon: - - - - +/*------------------------------------* + | WKT_format | + +------------------------------------+ + | POLYGON((2 0, 2 2, 0 2, 0 0, 2 0)) | + *------------------------------------*/ +``` - - - - +```sql +SELECT ST_GEOGFROM(FROM_HEX('010100000000000000000000400000000000001040')) AS WKB_format - - - - +This takes WKB-formatted bytes and returns a `GEOGRAPHY` point: - - - - +/*----------------* + | WKB_format | + +----------------+ + | POINT(2 4) | + *----------------*/ +``` - - - - +```sql +SELECT ST_GEOGFROM( + '{ "type": "Polygon", "coordinates": [ [ [2, 0], [2, 2], [1, 2], [0, 2], [0, 0], [2, 0] ] ] }' +) AS GEOJSON_format - - - - +[st-geogfromtext]: #st_geogfromtext - - - - +[st-geogfromgeojson]: #st_geogfromgeojson - - - - +```sql +ST_GEOGFROMGEOJSON(geojson_string [, make_valid => constant_expression]) +``` - - - - +Returns a `GEOGRAPHY` value that corresponds to the +input [GeoJSON][geojson-link] representation. - - - - +If the parameter `make_valid` is set to `TRUE`, the function attempts to repair +polygons that don't conform to [Open Geospatial Consortium][ogc-link] semantics. +This parameter uses named argument syntax, and should be specified using +`make_valid => argument_value` syntax. - - - - +See [`ST_ASGEOJSON`][st-asgeojson] to format a +`GEOGRAPHY` as GeoJSON. - - - - +The JSON input is subject to the following constraints: - -
NameSummary
ASCII +```sql +WITH example AS ( + SELECT ST_GEOGFROMTEXT('POINT(0 0)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 0, 1 1)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))')) +SELECT + geography AS original_geography, + ST_DUMP(geography) AS dumped_geographies +FROM example - - Gets the ASCII code for the first character or byte in a STRING - or BYTES value. -
BYTE_LENGTH +The following example shows how `ST_DUMP` with the dimension argument only +returns simple geographies of the given dimension. - - Gets the number of BYTES in a STRING or - BYTES value. -
CHAR_LENGTH +/*-------------------------------------+------------------------------* + | original_geographies | dumped_geographies | + +-------------------------------------+------------------------------+ + | GEOMETRYCOLLECTION(POINT(0 0), | [LINESTRING(1 2, 2 1)] | + | LINESTRING(1 2, 2 1)) | | + *-------------------------------------+------------------------------*/ +``` - - Gets the number of characters in a STRING value. -
CHARACTER_LENGTH +```sql +ST_DUMPPOINTS(geography) +``` - - Synonym for CHAR_LENGTH. -
CHR +Takes an input geography and returns all of its points, line vertices, and +polygon vertices as an array of point geographies. - - Converts a Unicode code point to a character. -
CODE_POINTS_TO_BYTES +`ARRAY` - - Converts an array of extended ASCII code points to a - BYTES value. -
CODE_POINTS_TO_STRING +```sql +WITH example AS ( + SELECT ST_GEOGFROMTEXT('POINT(0 0)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 0, 1 1)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))')) +SELECT + geography AS original_geography, + ST_DUMPPOINTS(geography) AS dumped_points_geographies +FROM example - - Converts an array of extended ASCII code points to a - STRING value. -
COLLATE +### `ST_DWITHIN` - - Combines a STRING value and a collation specification into a - collation specification-supported STRING value. -
CONCAT +**Description** - - Concatenates one or more STRING or BYTES - values into a single result. -
EDIT_DISTANCE +The optional `use_spheroid` parameter determines how this function measures +distance. If `use_spheroid` is `FALSE`, the function measures distance on the +surface of a perfect sphere. - - Computes the Levenshtein distance between two STRING - or BYTES values. -
ENDS_WITH +**Return type** - - Checks if a STRING or BYTES value is the suffix - of another value. -
FORMAT +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System - - Formats data and produces the results as a STRING value. -
FROM_BASE32 +```sql +ST_ENDPOINT(linestring_geography) +``` - - Converts a base32-encoded STRING value into a - BYTES value. -
FROM_BASE64 +Returns the last point of a linestring geography as a point geography. Returns +an error if the input is not a linestring or if the input is empty. Use the +`SAFE` prefix to obtain `NULL` for invalid input instead of an error. - - Converts a base64-encoded STRING value into a - BYTES value. -
FROM_HEX +Point `GEOGRAPHY` - - Converts a hexadecimal-encoded STRING value into a - BYTES value. -
INITCAP +```sql +SELECT ST_ENDPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)')) last - - Formats a STRING as proper case, which means that the first - character in each word is uppercase and all other characters are lowercase. -
INSTR +### `ST_EQUALS` - - Finds the position of a subvalue inside another value, optionally starting - the search at a given offset or occurrence. -
LEFT +**Description** - - Gets the specified leftmost portion from a STRING or - BYTES value. -
LENGTH +`GEOGRAPHY` value. More precisely, this means that +one of the following conditions holds: ++ `ST_COVERS(geography_1, geography_2) = TRUE` and `ST_COVERS(geography_2, + geography_1) = TRUE` ++ Both `geography_1` and `geography_2` are empty. - - Gets the length of a STRING or BYTES value. -
LOWER +**Constraints** - - Formats alphabetic characters in a STRING value as - lowercase. -

- Formats ASCII characters in a BYTES value as - lowercase. -
LPAD +**Return type** - - Prepends a STRING or BYTES value with a pattern. -
LTRIM +### `ST_EXTENT` - - Identical to the TRIM function, but only removes leading - characters. -
NORMALIZE +**Description** - - Case-sensitively normalizes the characters in a STRING value. -
NORMALIZE_AND_CASEFOLD +Caveats: - - Case-insensitively normalizes the characters in a STRING value. -
OCTET_LENGTH +**Return type** - - Alias for BYTE_LENGTH. -
REGEXP_CONTAINS +Bounding box parts: - - Checks if a value is a partial match for a regular expression. -
REGEXP_EXTRACT +**Example** - - Produces a substring that matches a regular expression. -
REGEXP_EXTRACT_ALL +/*----------------------------------------------* + | box | + +----------------------------------------------+ + | {xmin:172, ymin:46, xmax:243, ymax:70} | + *----------------------------------------------*/ +``` - - Produces an array of all substrings that match a - regular expression. -
REGEXP_INSTR +[st-boundingbox]: #st_boundingbox - - Finds the position of a regular expression match in a value, optionally - starting the search at a given offset or occurrence. -
REGEXP_MATCH +```sql +ST_EXTERIORRING(polygon_geography) +``` - - (Deprecated) Checks if a value is a full match for a regular expression. -
REGEXP_REPLACE +Returns a linestring geography that corresponds to the outermost ring of a +polygon geography. - - Produces a STRING value where all substrings that match a - regular expression are replaced with a specified value. -
REPEAT +Use the `SAFE` prefix to return `NULL` for invalid input instead of an error. - - Produces a STRING or BYTES value that consists of - an original value, repeated. -
REPLACE ++ Linestring `GEOGRAPHY` ++ Empty `GEOGRAPHY` - - Replaces all occurrences of a pattern with another pattern in a - STRING or BYTES value. -
REVERSE +```sql +WITH geo as + (SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 4, 2 2, 0 0))') AS g UNION ALL + SELECT ST_GEOGFROMTEXT('''POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), + (2 2, 3 4, 2 4, 2 2))''') as g) +SELECT ST_EXTERIORRING(g) AS ring FROM geo; - - Reverses a STRING or BYTES value. -
RIGHT +### `ST_GEOGFROM` - - Gets the specified rightmost portion from a STRING or - BYTES value. -
RPAD +**Description** - - Appends a STRING or BYTES value with a pattern. -
RTRIM +If `expression` represents a `STRING` value, it must be a valid +`GEOGRAPHY` representation in one of the following formats: - - Identical to the TRIM function, but only removes trailing - characters. -
SAFE_CONVERT_BYTES_TO_STRING +If `expression` represents a `BYTES` value, it must be a valid `GEOGRAPHY` +binary expression in WKB format. To learn more about this format and the +requirements to use it, see [ST_GEOGFROMWKB][st-geogfromwkb]. - - Converts a BYTES value to a STRING value and - replace any invalid UTF-8 characters with the Unicode replacement character, - U+FFFD. -
SOUNDEX +**Return type** - - Gets the Soundex codes for words in a STRING value. -
SPLIT +**Examples** - - Splits a STRING or BYTES value, using a delimiter. -
STARTS_WITH +```sql +SELECT ST_GEOGFROM('POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))') AS WKT_format - - Checks if a STRING or BYTES value is a - prefix of another value. -
STRPOS +This takes a WKB-formatted hexadecimal-encoded string and returns a +`GEOGRAPHY` point: - - Finds the position of the first occurrence of a subvalue inside another - value. -
SUBSTR +/*----------------* + | WKB_format | + +----------------+ + | POINT(2 4) | + *----------------*/ +``` - - Gets a portion of a STRING or BYTES value. -
SUBSTRING +```sql +SELECT ST_GEOGFROM('010100000000000000000000400000000000001040')-AS WKB_format -Alias for SUBSTR
TO_BASE32 +This takes a GeoJSON-formatted string and returns a `GEOGRAPHY` polygon: - - Converts a BYTES value to a - base32-encoded STRING value. -
TO_BASE64 +/*-----------------------------------------* + | GEOJSON_format | + +-----------------------------------------+ + | POLYGON((2 0, 2 2, 1 2, 0 2, 0 0, 2 0)) | + *-----------------------------------------*/ +``` - - Converts a BYTES value to a - base64-encoded STRING value. -
TO_CODE_POINTS +[st-geogfromwkb]: #st_geogfromwkb - - Converts a STRING or BYTES value into an array of - extended ASCII code points. -
TO_HEX +### `ST_GEOGFROMGEOJSON` - - Converts a BYTES value to a - hexadecimal STRING value. -
TRANSLATE +**Description** - - Within a value, replaces each source character with the corresponding - target character. -
TRIM +`ST_GEOGFROMGEOJSON` accepts input that is [RFC 7946][geojson-spec-link] +compliant. - - Removes the specified leading and trailing Unicode code points or bytes - from a STRING or BYTES value. -
UNICODE +A ZetaSQL `GEOGRAPHY` has spherical +geodesic edges, whereas a GeoJSON `Geometry` object explicitly has planar edges. +To convert between these two types of edges, ZetaSQL adds additional +points to the line where necessary so that the resulting sequence of edges +remains within 10 meters of the original edge. - - Gets the Unicode code point for the first character in a value. -
UPPER +**Constraints** - - Formats alphabetic characters in a STRING value as - uppercase. -

- Formats ASCII characters in a BYTES value as - uppercase. -
++ `ST_GEOGFROMGEOJSON` only accepts JSON geometry fragments and cannot be used + to ingest a whole JSON document. ++ The input JSON fragment must consist of a GeoJSON geometry type, which + includes `Point`, `MultiPoint`, `LineString`, `MultiLineString`, `Polygon`, + `MultiPolygon`, and `GeometryCollection`. Any other GeoJSON type such as + `Feature` or `FeatureCollection` will result in an error. ++ A position in the `coordinates` member of a GeoJSON geometry type must + consist of exactly two elements. The first is the longitude and the second + is the latitude. Therefore, `ST_GEOGFROMGEOJSON` does not support the + optional third element for a position in the `coordinates` member. -### `ASCII` +**Return type** -```sql -ASCII(value) -``` +`GEOGRAPHY` -**Description** +[geojson-link]: https://en.wikipedia.org/wiki/GeoJSON -Returns the ASCII code for the first character or byte in `value`. Returns -`0` if `value` is empty or the ASCII code is `0` for the first character -or byte. +[geojson-spec-link]: https://tools.ietf.org/html/rfc7946 -**Return type** +[ogc-link]: https://www.ogc.org/standards/sfa -`INT64` +[st-asgeojson]: #st_asgeojson -**Examples** +### `ST_GEOGFROMKML` ```sql -SELECT ASCII('abcd') as A, ASCII('a') as B, ASCII('') as C, ASCII(NULL) as D; +ST_GEOGFROMKML(kml_geometry) +``` -/*-------+-------+-------+-------* - | A | B | C | D | - +-------+-------+-------+-------+ - | 97 | 97 | 0 | NULL | - *-------+-------+-------+-------*/ -``` - -### `BYTE_LENGTH` - -```sql -BYTE_LENGTH(value) -``` +Takes a `STRING` [KML geometry][kml-geometry-link] and returns a +`GEOGRAPHY`. The KML geomentry can include: -**Description** ++ Point with coordinates element only ++ Linestring with coordinates element only ++ Polygon with boundary elements only ++ Multigeometry -Gets the number of `BYTES` in a `STRING` or `BYTES` value, -regardless of whether the value is a `STRING` or `BYTES` type. +[kml-geometry-link]: https://developers.google.com/kml/documentation/kmlreference#geometry -**Return type** +### `ST_GEOGFROMTEXT` -`INT64` ++ [Signature 1](#st_geogfromtext_signature1) ++ [Signature 2](#st_geogfromtext_signature2) -**Examples** +#### Signature 1 + ```sql -WITH example AS - (SELECT 'абвгд' AS characters, b'абвгд' AS bytes) +ST_GEOGFROMTEXT(wkt_string[, oriented]) +``` -SELECT - characters, - BYTE_LENGTH(characters) AS string_example, - bytes, - BYTE_LENGTH(bytes) AS bytes_example -FROM example; +**Description** -/*------------+----------------+-------+---------------* - | characters | string_example | bytes | bytes_example | - +------------+----------------+-------+---------------+ - | абвгд | 10 | абвгд | 10 | - *------------+----------------+-------+---------------*/ -``` +Returns a `GEOGRAPHY` value that corresponds to the +input [WKT][wkt-link] representation. -### `CHAR_LENGTH` +This function supports an optional parameter of type +`BOOL`, `oriented`. If this parameter is set to +`TRUE`, any polygons in the input are assumed to be oriented as follows: +if someone walks along the boundary of the polygon in the order of +the input vertices, the interior of the polygon is on the left. This allows +WKT to represent polygons larger than a hemisphere. If `oriented` is `FALSE` or +omitted, this function returns the polygon with the smaller area. +See also [`ST_MAKEPOLYGONORIENTED`][st-makepolygonoriented] which is similar +to `ST_GEOGFROMTEXT` with `oriented=TRUE`. -```sql -CHAR_LENGTH(value) -``` +To format `GEOGRAPHY` as WKT, use +[`ST_ASTEXT`][st-astext]. -**Description** +**Constraints** -Gets the number of characters in a `STRING` value. +* All input edges are assumed to be spherical geodesics, and *not* planar + straight lines. For reading data in a planar projection, consider using + [`ST_GEOGFROMGEOJSON`][st-geogfromgeojson]. +* The function does not support three-dimensional geometries that have a `Z` + suffix, nor does it support linear referencing system geometries with an `M` + suffix. +* The function only supports geometry primitives and multipart geometries. In + particular it supports only point, multipoint, linestring, multilinestring, + polygon, multipolygon, and geometry collection. **Return type** -`INT64` +`GEOGRAPHY` -**Examples** +**Example** -```sql -WITH example AS - (SELECT 'абвгд' AS characters) +The following query reads the WKT string `POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))` +both as a non-oriented polygon and as an oriented polygon, and checks whether +each result contains the point `(1, 1)`. +```sql +WITH polygon AS (SELECT 'POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))' AS p) SELECT - characters, - CHAR_LENGTH(characters) AS char_length_example -FROM example; + ST_CONTAINS(ST_GEOGFROMTEXT(p), ST_GEOGPOINT(1, 1)) AS fromtext_default, + ST_CONTAINS(ST_GEOGFROMTEXT(p, FALSE), ST_GEOGPOINT(1, 1)) AS non_oriented, + ST_CONTAINS(ST_GEOGFROMTEXT(p, TRUE), ST_GEOGPOINT(1, 1)) AS oriented +FROM polygon; -/*------------+---------------------* - | characters | char_length_example | - +------------+---------------------+ - | абвгд | 5 | - *------------+---------------------*/ +/*-------------------+---------------+-----------* + | fromtext_default | non_oriented | oriented | + +-------------------+---------------+-----------+ + | TRUE | TRUE | FALSE | + *-------------------+---------------+-----------*/ ``` -### `CHARACTER_LENGTH` +#### Signature 2 + ```sql -CHARACTER_LENGTH(value) +ST_GEOGFROMTEXT(wkt_string[, oriented => boolean_constant_1] + [, planar => boolean_constant_2] [, make_valid => boolean_constant_3]) ``` **Description** -Synonym for [CHAR_LENGTH][string-link-to-char-length]. +Returns a `GEOGRAPHY` value that corresponds to the +input [WKT][wkt-link] representation. -**Return type** +This function supports three optional parameters of type +`BOOL`: `oriented`, `planar`, and `make_valid`. +This signature uses named arguments syntax, and the parameters should be +specified using `parameter_name => parameter_value` syntax, in any order. -`INT64` +If the `oriented` parameter is set to +`TRUE`, any polygons in the input are assumed to be oriented as follows: +if someone walks along the boundary of the polygon in the order of +the input vertices, the interior of the polygon is on the left. This allows +WKT to represent polygons larger than a hemisphere. If `oriented` is `FALSE` or +omitted, this function returns the polygon with the smaller area. +See also [`ST_MAKEPOLYGONORIENTED`][st-makepolygonoriented] which is similar +to `ST_GEOGFROMTEXT` with `oriented=TRUE`. -**Examples** +If the parameter `planar` is set to `TRUE`, the edges of the line strings and +polygons are assumed to use planar map semantics, rather than ZetaSQL +default spherical geodesics semantics. -```sql -WITH example AS - (SELECT 'абвгд' AS characters) +If the parameter `make_valid` is set to `TRUE`, the function attempts to repair +polygons that don't conform to [Open Geospatial Consortium][ogc-link] semantics. + +To format `GEOGRAPHY` as WKT, use +[`ST_ASTEXT`][st-astext]. + +**Constraints** + +* All input edges are assumed to be spherical geodesics by default, and *not* + planar straight lines. For reading data in a planar projection, + pass `planar => TRUE` argument, or consider using + [`ST_GEOGFROMGEOJSON`][st-geogfromgeojson]. +* The function does not support three-dimensional geometries that have a `Z` + suffix, nor does it support linear referencing system geometries with an `M` + suffix. +* The function only supports geometry primitives and multipart geometries. In + particular it supports only point, multipoint, linestring, multilinestring, + polygon, multipolygon, and geometry collection. +* `oriented` and `planar` cannot be equal to `TRUE` at the same time. +* `oriented` and `make_valid` cannot be equal to `TRUE` at the same time. +**Example** + +The following query reads the WKT string `POLYGON((0 0, 0 2, 2 2, 0 2, 0 0))` +both as a non-oriented polygon and as an oriented polygon, and checks whether +each result contains the point `(1, 1)`. + +```sql +WITH polygon AS (SELECT 'POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))' AS p) SELECT - characters, - CHARACTER_LENGTH(characters) AS char_length_example -FROM example; + ST_CONTAINS(ST_GEOGFROMTEXT(p), ST_GEOGPOINT(1, 1)) AS fromtext_default, + ST_CONTAINS(ST_GEOGFROMTEXT(p, oriented => FALSE), ST_GEOGPOINT(1, 1)) AS non_oriented, + ST_CONTAINS(ST_GEOGFROMTEXT(p, oriented => TRUE), ST_GEOGPOINT(1, 1)) AS oriented +FROM polygon; -/*------------+---------------------* - | characters | char_length_example | - +------------+---------------------+ - | абвгд | 5 | - *------------+---------------------*/ +/*-------------------+---------------+-----------* + | fromtext_default | non_oriented | oriented | + +-------------------+---------------+-----------+ + | TRUE | TRUE | FALSE | + *-------------------+---------------+-----------*/ ``` -[string-link-to-char-length]: #char_length - -### `CHR` +The following query converts a WKT string with an invalid polygon to +`GEOGRAPHY`. The WKT string violates two properties +of a valid polygon - the loop describing the polygon is not closed, and it +contains self-intersection. With the `make_valid` option, `ST_GEOGFROMTEXT` +successfully converts it to a multipolygon shape. ```sql -CHR(value) +WITH data AS ( + SELECT 'POLYGON((0 -1, 2 1, 2 -1, 0 1))' wkt) +SELECT + SAFE.ST_GEOGFROMTEXT(wkt) as geom, + SAFE.ST_GEOGFROMTEXT(wkt, make_valid => TRUE) as valid_geom +FROM data + +/*------+-----------------------------------------------------------------* + | geom | valid_geom | + +------+-----------------------------------------------------------------+ + | NULL | MULTIPOLYGON(((0 -1, 1 0, 0 1, 0 -1)), ((1 0, 2 -1, 2 1, 1 0))) | + *------+-----------------------------------------------------------------*/ ``` -**Description** +[ogc-link]: https://www.ogc.org/standards/sfa -Takes a Unicode [code point][string-link-to-code-points-wikipedia] and returns -the character that matches the code point. Each valid code point should fall -within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. Returns an empty string -if the code point is `0`. If an invalid Unicode code point is specified, an -error is returned. +[wkt-link]: https://en.wikipedia.org/wiki/Well-known_text -To work with an array of Unicode code points, see -[`CODE_POINTS_TO_STRING`][string-link-to-codepoints-to-string] +[st-makepolygonoriented]: #st_makepolygonoriented -**Return type** +[st-astext]: #st_astext -`STRING` +[st-geogfromgeojson]: #st_geogfromgeojson -**Examples** +### `ST_GEOGFROMWKB` ```sql -SELECT CHR(65) AS A, CHR(255) AS B, CHR(513) AS C, CHR(1024) AS D; - -/*-------+-------+-------+-------* - | A | B | C | D | - +-------+-------+-------+-------+ - | A | ÿ | È | Ѐ | - *-------+-------+-------+-------*/ +ST_GEOGFROMWKB(wkb_bytes_expression) ``` ```sql -SELECT CHR(97) AS A, CHR(0xF9B5) AS B, CHR(0) AS C, CHR(NULL) AS D; - -/*-------+-------+-------+-------* - | A | B | C | D | - +-------+-------+-------+-------+ - | a | 例 | | NULL | - *-------+-------+-------+-------*/ +ST_GEOGFROMWKB(wkb_hex_string_expression) ``` -[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point +**Description** -[string-link-to-codepoints-to-string]: #code_points_to_string +Converts an expression for a hexadecimal-text `STRING` or `BYTES` +value into a `GEOGRAPHY` value. The expression must be in +[WKB][wkb-link] format. -### `CODE_POINTS_TO_BYTES` +To format `GEOGRAPHY` as WKB, use +[`ST_ASBINARY`][st-asbinary]. -```sql -CODE_POINTS_TO_BYTES(ascii_code_points) -``` +**Constraints** -**Description** +All input edges are assumed to be spherical geodesics, and *not* planar straight +lines. For reading data in a planar projection, consider using +[`ST_GEOGFROMGEOJSON`][st-geogfromgeojson]. -Takes an array of extended ASCII -[code points][string-link-to-code-points-wikipedia] -as `ARRAY` and returns `BYTES`. +**Return type** -To convert from `BYTES` to an array of code points, see -[TO_CODE_POINTS][string-link-to-code-points]. +`GEOGRAPHY` -**Return type** +[wkb-link]: https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary -`BYTES` +[st-asbinary]: #st_asbinary -**Examples** +[st-geogfromgeojson]: #st_geogfromgeojson -The following is a basic example using `CODE_POINTS_TO_BYTES`. +### `ST_GEOGPOINT` ```sql -SELECT CODE_POINTS_TO_BYTES([65, 98, 67, 100]) AS bytes; - -/*----------* - | bytes | - +----------+ - | AbCd | - *----------*/ +ST_GEOGPOINT(longitude, latitude) ``` -The following example uses a rotate-by-13 places (ROT13) algorithm to encode a -string. +**Description** -```sql -SELECT CODE_POINTS_TO_BYTES(ARRAY_AGG( - (SELECT - CASE - WHEN chr BETWEEN b'a' and b'z' - THEN TO_CODE_POINTS(b'a')[offset(0)] + - MOD(code+13-TO_CODE_POINTS(b'a')[offset(0)],26) - WHEN chr BETWEEN b'A' and b'Z' - THEN TO_CODE_POINTS(b'A')[offset(0)] + - MOD(code+13-TO_CODE_POINTS(b'A')[offset(0)],26) - ELSE code - END - FROM - (SELECT code, CODE_POINTS_TO_BYTES([code]) chr) - ) ORDER BY OFFSET)) AS encoded_string -FROM UNNEST(TO_CODE_POINTS(b'Test String!')) code WITH OFFSET; +Creates a `GEOGRAPHY` with a single point. `ST_GEOGPOINT` creates a point from +the specified `DOUBLE` longitude (in degrees, +negative west of the Prime Meridian, positive east) and latitude (in degrees, +positive north of the Equator, negative south) parameters and returns that point +in a `GEOGRAPHY` value. -/*------------------* - | encoded_string | - +------------------+ - | Grfg Fgevat! | - *------------------*/ -``` +NOTE: Some systems present latitude first; take care with argument order. -[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point +**Constraints** -[string-link-to-code-points]: #to_code_points ++ Longitudes outside the range \[-180, 180\] are allowed; `ST_GEOGPOINT` uses + the input longitude modulo 360 to obtain a longitude within \[-180, 180\]. ++ Latitudes must be in the range \[-90, 90\]. Latitudes outside this range + will result in an error. -### `CODE_POINTS_TO_STRING` +**Return type** + +Point `GEOGRAPHY` + +### `ST_GEOGPOINTFROMGEOHASH` ```sql -CODE_POINTS_TO_STRING(unicode_code_points) +ST_GEOGPOINTFROMGEOHASH(geohash) ``` **Description** -Takes an array of Unicode [code points][string-link-to-code-points-wikipedia] -as `ARRAY` and returns a `STRING`. - -To convert from a string to an array of code points, see -[TO_CODE_POINTS][string-link-to-code-points]. +Returns a `GEOGRAPHY` value that corresponds to a +point in the middle of a bounding box defined in the [GeoHash][geohash-link]. **Return type** -`STRING` +Point `GEOGRAPHY` -**Examples** +[geohash-link]: https://en.wikipedia.org/wiki/Geohash -The following are basic examples using `CODE_POINTS_TO_STRING`. +### `ST_GEOHASH` ```sql -SELECT CODE_POINTS_TO_STRING([65, 255, 513, 1024]) AS string; - -/*--------* - | string | - +--------+ - | AÿÈЀ | - *--------*/ +ST_GEOHASH(geography_expression[, maxchars]) ``` -```sql -SELECT CODE_POINTS_TO_STRING([97, 0, 0xF9B5]) AS string; +**Description** -/*--------* - | string | - +--------+ - | a例 | - *--------*/ -``` +Takes a single-point `GEOGRAPHY` and returns a [GeoHash][geohash-link] +representation of that `GEOGRAPHY` object. -```sql -SELECT CODE_POINTS_TO_STRING([65, 255, NULL, 1024]) AS string; ++ `geography_expression`: Represents a `GEOGRAPHY` object. Only a `GEOGRAPHY` + object that represents a single point is supported. If `ST_GEOHASH` is used + over an empty `GEOGRAPHY` object, returns `NULL`. ++ `maxchars`: This optional `INT64` parameter specifies the maximum number of + characters the hash will contain. Fewer characters corresponds to lower + precision (or, described differently, to a bigger bounding box). `maxchars` + defaults to 20 if not explicitly specified. A valid `maxchars` value is 1 + to 20. Any value below or above is considered unspecified and the default of + 20 is used. -/*--------* - | string | - +--------+ - | NULL | - *--------*/ -``` +**Return type** -The following example computes the frequency of letters in a set of words. +`STRING` + +**Example** + +Returns a GeoHash of the Seattle Center with 10 characters of precision. ```sql -WITH Words AS ( - SELECT word - FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word -) -SELECT - CODE_POINTS_TO_STRING([code_point]) AS letter, - COUNT(*) AS letter_count -FROM Words, - UNNEST(TO_CODE_POINTS(word)) AS code_point -GROUP BY 1 -ORDER BY 2 DESC; +SELECT ST_GEOHASH(ST_GEOGPOINT(-122.35, 47.62), 10) geohash -/*--------+--------------* - | letter | letter_count | - +--------+--------------+ - | a | 5 | - | f | 3 | - | r | 2 | - | b | 2 | - | l | 2 | - | o | 2 | - | g | 1 | - | z | 1 | - | e | 1 | - | m | 1 | - | i | 1 | - *--------+--------------*/ +/*--------------* + | geohash | + +--------------+ + | c22yzugqw7 | + *--------------*/ ``` -[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point - -[string-link-to-code-points]: #to_code_points +[geohash-link]: https://en.wikipedia.org/wiki/Geohash -### `COLLATE` +### `ST_GEOMETRYTYPE` ```sql -COLLATE(value, collate_specification) +ST_GEOMETRYTYPE(geography_expression) ``` -Takes a `STRING` and a [collation specification][link-collation-spec]. Returns -a `STRING` with a collation specification. If `collate_specification` is empty, -returns a value with collation removed from the `STRING`. +**Description** -The collation specification defines how the resulting `STRING` can be compared -and sorted. To learn more, see -[Working with collation][link-collation-concepts]. +Returns the [Open Geospatial Consortium][ogc-link] (OGC) geometry type that +describes the input `GEOGRAPHY`. The OGC geometry type matches the +types that are used in [WKT][wkt-link] and [GeoJSON][geojson-link] formats and +printed for [ST_ASTEXT][st-astext] and [ST_ASGEOJSON][st-asgeojson]. +`ST_GEOMETRYTYPE` returns the OGC geometry type with the "ST_" prefix. -+ `collation_specification` must be a string literal, otherwise an error is - thrown. -+ Returns `NULL` if `value` is `NULL`. +`ST_GEOMETRYTYPE` returns the following given the type on the input: + ++ Single point geography: Returns `ST_Point`. ++ Collection of only points: Returns `ST_MultiPoint`. ++ Single linestring geography: Returns `ST_LineString`. ++ Collection of only linestrings: Returns `ST_MultiLineString`. ++ Single polygon geography: Returns `ST_Polygon`. ++ Collection of only polygons: Returns `ST_MultiPolygon`. ++ Collection with elements of different dimensions, or the input is the empty + geography: Returns `ST_GeometryCollection`. **Return type** `STRING` -**Examples** +**Example** -In this example, the weight of `a` is less than the weight of `Z`. This -is because the collate specification, `und:ci` assigns more weight to `Z`. +The following example shows how `ST_GEOMETRYTYPE` takes geographies and returns +the names of their OGC geometry types. ```sql -WITH Words AS ( - SELECT - COLLATE('a', 'und:ci') AS char1, - COLLATE('Z', 'und:ci') AS char2 -) -SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z -FROM Words; +WITH example AS( + SELECT ST_GEOGFROMTEXT('POINT(0 1)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('MULTILINESTRING((2 2, 3 4), (5 6, 7 7))') + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(MULTIPOINT(-1 2, 0 12), LINESTRING(-2 4, 0 6))') + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')) +SELECT + geography AS WKT, + ST_GEOMETRYTYPE(geography) AS geometry_type_name +FROM example; -/*----------------* - | a_less_than_Z | - +----------------+ - | TRUE | - *----------------*/ +/*-------------------------------------------------------------------+-----------------------* + | WKT | geometry_type_name | + +-------------------------------------------------------------------+-----------------------+ + | POINT(0 1) | ST_Point | + | MULTILINESTRING((2 2, 3 4), (5 6, 7 7)) | ST_MultiLineString | + | GEOMETRYCOLLECTION(MULTIPOINT(-1 2, 0 12), LINESTRING(-2 4, 0 6)) | ST_GeometryCollection | + | GEOMETRYCOLLECTION EMPTY | ST_GeometryCollection | + *-------------------------------------------------------------------+-----------------------*/ ``` -In this example, the weight of `a` is greater than the weight of `Z`. This -is because the default collate specification assigns more weight to `a`. +[ogc-link]: https://www.ogc.org/standards/sfa -```sql -WITH Words AS ( - SELECT - 'a' AS char1, - 'Z' AS char2 -) -SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z -FROM Words; +[wkt-link]: https://en.wikipedia.org/wiki/Well-known_text -/*----------------* - | a_less_than_Z | - +----------------+ - | FALSE | - *----------------*/ -``` +[geojson-link]: https://en.wikipedia.org/wiki/GeoJSON -[link-collation-spec]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_spec_details +[st-astext]: #st_astext -[link-collation-concepts]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#working_with_collation +[st-asgeojson]: #st_asgeojson -### `CONCAT` +### `ST_HAUSDORFFDISTANCE` ```sql -CONCAT(value1[, ...]) +ST_HAUSDORFFDISTANCE(geography_1, geography_2) +``` + +```sql +ST_HAUSDORFFDISTANCE(geography_1, geography_2, directed=>{ TRUE | FALSE }) ``` **Description** -Concatenates one or more values into a single result. All values must be -`BYTES` or data types that can be cast to `STRING`. +Gets the discrete [Hausdorff distance][h-distance], which is the greatest of all +the distances from a discrete point in one geography to the closest +discrete point in another geography. -The function returns `NULL` if any input argument is `NULL`. +**Definitions** -Note: You can also use the -[|| concatenation operator][string-link-to-operators] to concatenate -values into a string. ++ `geography_1`: A `GEOGRAPHY` value that represents the first geography. ++ `geography_2`: A `GEOGRAPHY` value that represents the second geography. ++ `directed`: Optional, required named argument that represents the type of + computation to use on the input geographies. If this argument is not + specified, `directed=>FALSE` is used by default. + + + `FALSE`: The largest Hausdorff distance found in + (`geography_1`, `geography_2`) and + (`geography_2`, `geography_1`). + + + `TRUE` (default): The Hausdorff distance for + (`geography_1`, `geography_2`). + +**Details** + +If an input geography is `NULL`, the function returns `NULL`. **Return type** -`STRING` or `BYTES` +`DOUBLE` -**Examples** +**Example** + +The following query gets the Hausdorff distance between `geo1` and `geo2`: ```sql -SELECT CONCAT('T.P.', ' ', 'Bar') as author; +WITH data AS ( + SELECT + ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, + ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 +) +SELECT ST_HAUSDORFFDISTANCE(geo1, geo2, directed=>TRUE) AS distance +FROM data; -/*---------------------* - | author | - +---------------------+ - | T.P. Bar | - *---------------------*/ +/*--------------------+ + | distance | + +--------------------+ + | 1688933.9832041925 | + +--------------------*/ ``` +The following query gets the Hausdorff distance between `geo2` and `geo1`: + ```sql -SELECT CONCAT('Summer', ' ', 1923) as release_date; +WITH data AS ( + SELECT + ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, + ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 +) +SELECT ST_HAUSDORFFDISTANCE(geo2, geo1, directed=>TRUE) AS distance +FROM data; -/*---------------------* - | release_date | - +---------------------+ - | Summer 1923 | - *---------------------*/ +/*--------------------+ + | distance | + +--------------------+ + | 5802892.745488612 | + +--------------------*/ ``` -```sql +The following query gets the largest Hausdorff distance between +(`geo1` and `geo2`) and (`geo2` and `geo1`): -With Employees AS - (SELECT - 'John' AS first_name, - 'Doe' AS last_name - UNION ALL +```sql +WITH data AS ( SELECT - 'Jane' AS first_name, - 'Smith' AS last_name - UNION ALL + ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, + ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 +) +SELECT ST_HAUSDORFFDISTANCE(geo1, geo2, directed=>FALSE) AS distance +FROM data; + +/*--------------------+ + | distance | + +--------------------+ + | 5802892.745488612 | + +--------------------*/ +``` + +The following query produces the same results as the previous query because +`ST_HAUSDORFFDISTANCE` uses `directed=>FALSE` by default. + +```sql +WITH data AS ( SELECT - 'Joe' AS first_name, - 'Jackson' AS last_name) + ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, + ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 +) +SELECT ST_HAUSDORFFDISTANCE(geo1, geo2) AS distance +FROM data; +``` -SELECT - CONCAT(first_name, ' ', last_name) - AS full_name -FROM Employees; +[h-distance]: http://en.wikipedia.org/wiki/Hausdorff_distance -/*---------------------* - | full_name | - +---------------------+ - | John Doe | - | Jane Smith | - | Joe Jackson | - *---------------------*/ +### `ST_INTERIORRINGS` + +```sql +ST_INTERIORRINGS(polygon_geography) ``` -[string-link-to-operators]: #operators +**Description** -### `EDIT_DISTANCE` +Returns an array of linestring geographies that corresponds to the interior +rings of a polygon geography. Each interior ring is the border of a hole within +the input polygon. + ++ If the input geography is a polygon, excludes the outermost ring of the + polygon geography and returns the linestrings corresponding to the interior + rings. ++ If the input is the full `GEOGRAPHY`, returns an empty array. ++ If the input polygon has no holes, returns an empty array. ++ Returns an error if the input is not a single polygon. + +Use the `SAFE` prefix to return `NULL` for invalid input instead of an error. + +**Return type** + +`ARRAY` - +**Examples** ```sql -EDIT_DISTANCE(value1, value2, [max_distance => max_distance_value]) +WITH geo AS ( + SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 1, 1 2, 0 0))') AS g UNION ALL + SELECT ST_GEOGFROMTEXT('POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), (2 2, 3 4, 2 4, 2 2))') UNION ALL + SELECT ST_GEOGFROMTEXT('POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), (2 2.5, 3.5 3, 2.5 2, 2 2.5), (3.5 7, 4 6, 3 3, 3.5 7))') UNION ALL + SELECT ST_GEOGFROMTEXT('fullglobe') UNION ALL + SELECT NULL) +SELECT ST_INTERIORRINGS(g) AS rings FROM geo; + +/*----------------------------------------------------------------------------* + | rings | + +----------------------------------------------------------------------------+ + | [] | + | [LINESTRING(2 2, 3 4, 2 4, 2 2)] | + | [LINESTRING(2.5 2, 3.5 3, 2 2.5, 2.5 2), LINESTRING(3 3, 4 6, 3.5 7, 3 3)] | + | [] | + | NULL | + *----------------------------------------------------------------------------*/ +``` + +### `ST_INTERSECTION` + +```sql +ST_INTERSECTION(geography_1, geography_2) ``` **Description** -Computes the [Levenshtein distance][l-distance] between two `STRING` or -`BYTES` values. +Returns a `GEOGRAPHY` that represents the point set +intersection of the two input `GEOGRAPHY`s. Thus, +every point in the intersection appears in both `geography_1` and `geography_2`. -**Definitions** +If the two input `GEOGRAPHY`s are disjoint, that is, +there are no points that appear in both input `geometry_1` and `geometry_2`, +then an empty `GEOGRAPHY` is returned. -+ `value1`: The first `STRING` or `BYTES` value to compare. -+ `value2`: The second `STRING` or `BYTES` value to compare. -+ `max_distance`: Optional mandatory-named argument. Takes a non-negative - `INT64` value that represents the maximum distance between the two values - to compute. +See [ST_INTERSECTS][st-intersects], [ST_DISJOINT][st-disjoint] for related +predicate functions. - If this distance is exceeded, the function returns this value. - The default value for this argument is the maximum size of - `value1` and `value2`. +**Return type** -**Details** +`GEOGRAPHY` -If `value1` or `value2` is `NULL`, `NULL` is returned. +[st-intersects]: #st_intersects -You can only compare values of the same type. Otherwise, an error is produced. +[st-disjoint]: #st_disjoint + +### `ST_INTERSECTS` + +```sql +ST_INTERSECTS(geography_1, geography_2) +``` + +**Description** + +Returns `TRUE` if the point set intersection of `geography_1` and `geography_2` +is non-empty. Thus, this function returns `TRUE` if there is at least one point +that appears in both input `GEOGRAPHY`s. + +If `ST_INTERSECTS` returns `TRUE`, it implies that [`ST_DISJOINT`][st-disjoint] +returns `FALSE`. **Return type** -`INT64` +`BOOL` -**Examples** +[st-disjoint]: #st_disjoint -In the following example, the first character in both strings is different: +### `ST_INTERSECTSBOX` ```sql -SELECT EDIT_DISTANCE('a', 'b') AS results; - -/*---------* - | results | - +---------+ - | 1 | - *---------*/ +ST_INTERSECTSBOX(geography, lng1, lat1, lng2, lat2) ``` -In the following example, the first and second characters in both strings are -different: +**Description** -```sql -SELECT EDIT_DISTANCE('aa', 'b') AS results; +Returns `TRUE` if `geography` intersects the rectangle between `[lng1, lng2]` +and `[lat1, lat2]`. The edges of the rectangle follow constant lines of +longitude and latitude. `lng1` and `lng2` specify the westmost and eastmost +constant longitude lines that bound the rectangle, and `lat1` and `lat2` specify +the minimum and maximum constant latitude lines that bound the rectangle. -/*---------* - | results | - +---------+ - | 2 | - *---------*/ -``` +Specify all longitude and latitude arguments in degrees. -In the following example, only the first character in both strings is -different: +**Constraints** -```sql -SELECT EDIT_DISTANCE('aa', 'ba') AS results; +The input arguments are subject to the following constraints: -/*---------* - | results | - +---------+ - | 1 | - *---------*/ -``` ++ Latitudes should be in the `[-90, 90]` degree range. ++ Longitudes should follow either of the following rules: + + Both longitudes are in the `[-180, 180]` degree range. + + One of the longitudes is in the `[-180, 180]` degree range, and + `lng2 - lng1` is in the `[0, 360]` interval. -In the following example, the last six characters are different, but because -the maximum distance is `2`, this function exits early and returns `2`, the -maximum distance: +**Return type** + +`BOOL` + +**Example** ```sql -SELECT EDIT_DISTANCE('abcdefg', 'a', max_distance => 2) AS results; +SELECT p, ST_INTERSECTSBOX(p, -90, 0, 90, 20) AS box1, + ST_INTERSECTSBOX(p, 90, 0, -90, 20) AS box2 +FROM UNNEST([ST_GEOGPOINT(10, 10), ST_GEOGPOINT(170, 10), + ST_GEOGPOINT(30, 30)]) p -/*---------* - | results | - +---------+ - | 2 | - *---------*/ +/*----------------+--------------+--------------* + | p | box1 | box2 | + +----------------+--------------+--------------+ + | POINT(10 10) | TRUE | FALSE | + | POINT(170 10) | FALSE | TRUE | + | POINT(30 30) | FALSE | FALSE | + *----------------+--------------+--------------*/ ``` -[l-distance]: https://en.wikipedia.org/wiki/Levenshtein_distance - -### `ENDS_WITH` +### `ST_ISCLOSED` ```sql -ENDS_WITH(value, suffix) +ST_ISCLOSED(geography_expression) ``` **Description** -Takes two `STRING` or `BYTES` values. Returns `TRUE` if `suffix` -is a suffix of `value`. +Returns `TRUE` for a non-empty Geography, where each element in the Geography +has an empty boundary. The boundary for each element can be defined with +[`ST_BOUNDARY`][st-boundary]. -This function supports specifying [collation][collation]. ++ A point is closed. ++ A linestring is closed if the start and end points of the linestring are + the same. ++ A polygon is closed only if it is a full polygon. ++ A collection is closed if and only if every element in the collection is + closed. -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about +An empty `GEOGRAPHY` is not closed. **Return type** `BOOL` -**Examples** +**Example** ```sql -WITH items AS - (SELECT 'apple' as item +WITH example AS( + SELECT ST_GEOGFROMTEXT('POINT(5 0)') AS geography UNION ALL - SELECT 'banana' as item + SELECT ST_GEOGFROMTEXT('LINESTRING(0 1, 4 3, 2 6, 0 1)') AS geography UNION ALL - SELECT 'orange' as item) - + SELECT ST_GEOGFROMTEXT('LINESTRING(2 6, 1 3, 3 9)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')) SELECT - ENDS_WITH(item, 'e') as example -FROM items; + geography, + ST_ISCLOSED(geography) AS is_closed, +FROM example; -/*---------* - | example | - +---------+ - | True | - | False | - | True | - *---------*/ +/*------------------------------------------------------+-----------* + | geography | is_closed | + +------------------------------------------------------+-----------+ + | POINT(5 0) | TRUE | + | LINESTRING(0 1, 4 3, 2 6, 0 1) | TRUE | + | LINESTRING(2 6, 1 3, 3 9) | FALSE | + | GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1)) | FALSE | + | GEOMETRYCOLLECTION EMPTY | FALSE | + *------------------------------------------------------+-----------*/ ``` -### `FORMAT` - +[st-boundary]: #st_boundary + +### `ST_ISCOLLECTION` ```sql -FORMAT(format_string_expression, data_type_expression[, ...]) +ST_ISCOLLECTION(geography_expression) ``` **Description** -`FORMAT` formats a data type expression as a string. +Returns `TRUE` if the total number of points, linestrings, and polygons is +greater than one. -+ `format_string_expression`: Can contain zero or more - [format specifiers][format-specifiers]. Each format specifier is introduced - by the `%` symbol, and must map to one or more of the remaining arguments. - In general, this is a one-to-one mapping, except when the `*` specifier is - present. For example, `%.*i` maps to two arguments—a length argument - and a signed integer argument. If the number of arguments related to the - format specifiers is not the same as the number of arguments, an error occurs. -+ `data_type_expression`: The value to format as a string. This can be any - ZetaSQL data type. +An empty `GEOGRAPHY` is not a collection. **Return type** -`STRING` +`BOOL` -**Examples** +### `ST_ISEMPTY` - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +```sql +ST_ISEMPTY(geography_expression) +``` - - - - - +**Description** - - - - - -
DescriptionStatementResult
Simple integerFORMAT('%d', 10)10
Integer with left blank paddingFORMAT('|%10d|', 11)|           11|
Integer with left zero paddingFORMAT('+%010d+', 12)+0000000012+
Integer with commasFORMAT("%'d", 123456789)123,456,789
STRINGFORMAT('-%s-', 'abcd efg')-abcd efg-
DOUBLEFORMAT('%f %E', 1.1, 2.2)1.100000 2.200000E+00
DATEFORMAT('%t', date '2015-09-01')2015-09-01
TIMESTAMPFORMAT('%t', timestamp '2015-09-01 12:34:56 -America/Los_Angeles')2015‑09‑01 19:34:56+00
+Returns `TRUE` if the given `GEOGRAPHY` is empty; that is, the `GEOGRAPHY` does +not contain any points, lines, or polygons. -The `FORMAT()` function does not provide fully customizable formatting for all -types and values, nor formatting that is sensitive to locale. +NOTE: An empty `GEOGRAPHY` is not associated with a particular geometry shape. +For example, the results of expressions `ST_GEOGFROMTEXT('POINT EMPTY')` and +`ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')` are identical. -If custom formatting is necessary for a type, you must first format it using -type-specific format functions, such as `FORMAT_DATE()` or `FORMAT_TIMESTAMP()`. -For example: +**Return type** + +`BOOL` + +### `ST_ISRING` ```sql -SELECT FORMAT('date: %s!', FORMAT_DATE('%B %d, %Y', date '2015-01-02')); +ST_ISRING(geography_expression) ``` -Returns +**Description** -``` -date: January 02, 2015! -``` +Returns `TRUE` if the input `GEOGRAPHY` is a linestring and if the +linestring is both [`ST_ISCLOSED`][st-isclosed] and +simple. A linestring is considered simple if it does not pass through the +same point twice (with the exception of the start and endpoint, which may +overlap to form a ring). -#### Supported format specifiers - +An empty `GEOGRAPHY` is not a ring. -``` -%[flags][width][.precision]specifier -``` +**Return type** -A [format specifier][format-specifier-list] adds formatting when casting a -value to a string. It can optionally contain these sub-specifiers: +`BOOL` -+ [Flags][flags] -+ [Width][width] -+ [Precision][precision] +[st-isclosed]: #st_isclosed -Additional information about format specifiers: +### `ST_LENGTH` -+ [%g and %G behavior][g-and-g-behavior] -+ [%p and %P behavior][p-and-p-behavior] -+ [%t and %T behavior][t-and-t-behavior] -+ [Error conditions][error-format-specifiers] -+ [NULL argument handling][null-format-specifiers] -+ [Additional semantic rules][rules-format-specifiers] +```sql +ST_LENGTH(geography_expression[, use_spheroid]) +``` -##### Format specifiers - +**Description** - - - - - - - - - - - - - +If `geography_expression` is a point or a polygon, returns zero. If +`geography_expression` is a collection, returns the length of the lines in the +collection; if the collection does not contain lines, returns zero. - - - - - - +The `use_spheroid` parameter currently only supports +the value `FALSE`. The default value of `use_spheroid` is `FALSE`. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ++ Returns `NULL` if any input argument is `NULL`. ++ Returns an empty geography if `linestring_geography` is an empty geography. ++ Returns an error if `linestring_geography` is not a linestring or an empty + geography, or if `fraction` is outside the `[0, 1]` range. - - - - - - +`GEOGRAPHY` - - - - - - +The following query returns a few points on a linestring. Notice that the + midpoint of the linestring `LINESTRING(1 1, 5 5)` is slightly different from + `POINT(3 3)` because the `GEOGRAPHY` type uses geodesic line segments. - - - - - - - - - - - - - - - - - - - - - - - - -
SpecifierDescriptionExamplesTypes
d or iDecimal integer392 +Returns the total length in meters of the lines in the input +`GEOGRAPHY`. -INT32
INT64
UINT32
UINT64
-
uUnsigned integer7235 +The optional `use_spheroid` parameter determines how this function measures +distance. If `use_spheroid` is `FALSE`, the function measures distance on the +surface of a perfect sphere. -UINT32
UINT64
-
o - Octal -

- Note: If an INT64 value is negative, an error is produced. -
610 +**Return type** -INT32
INT64
UINT32
UINT64
-
x - Hexadecimal integer -

- Note: If an INT64 value is negative, an error is produced. -
7fa +`DOUBLE` -INT32
INT64
UINT32
UINT64
-
X - Hexadecimal integer (uppercase) -

- Note: If an INT64 value is negative, an error is produced. -
7FA +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System -INT32
INT64
UINT32
UINT64
+### `ST_LINEINTERPOLATEPOINT` -
fDecimal notation, in [-](integer part).(fractional part) for finite - values, and in lowercase for non-finite values392.650000
- inf
- nan
+```sql +ST_LINEINTERPOLATEPOINT(linestring_geography, fraction) +``` -NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
-
FDecimal notation, in [-](integer part).(fractional part) for finite - values, and in uppercase for non-finite values392.650000
- INF
- NAN
+**Description** -NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
-
eScientific notation (mantissa/exponent), lowercase3.926500e+02
- inf
- nan
+Gets a point at a specific fraction in a linestring GEOGRAPHY +value. -NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
-
EScientific notation (mantissa/exponent), uppercase3.926500E+02
- INF
- NAN
+**Definitions** -NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
-
gEither decimal notation or scientific notation, depending on the input - value's exponent and the specified precision. Lowercase. - See %g and %G behavior for details.392.65
- 3.9265e+07
- inf
- nan
++ `linestring_geography`: A linestring `GEOGRAPHY` on which the target point + is located. ++ `fraction`: A `DOUBLE` value that represents a fraction + along the linestring `GEOGRAPHY` where the target point is located. + This should be an inclusive value between `0` (start of the + linestring) and `1` (end of the linestring). -NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
-
G - Either decimal notation or scientific notation, depending on the input - value's exponent and the specified precision. Uppercase. - See %g and %G behavior for details. - - 392.65
- 3.9265E+07
- INF
- NAN -
+**Details** -NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
-
p - - Produces a one-line printable string representing a protocol buffer - or JSON. - - See %p and %P behavior. - - -
year: 2019 month: 10
- - -
{"month":10,"year":2019}
- -
+**Return Type** -JSON
PROTO
-
P - - Produces a multi-line printable string representing a protocol buffer - or JSON. - - See %p and %P behavior. - - -
-year: 2019
-month: 10
-
- - -
-{
-  "month": 10,
-  "year": 2019
-}
-
- -
+**Example** -JSON
PROTO
-
sString of characterssample +```sql +WITH fractions AS ( + SELECT 0 AS fraction UNION ALL + SELECT 0.5 UNION ALL + SELECT 1 UNION ALL + SELECT NULL + ) +SELECT + fraction, + ST_LINEINTERPOLATEPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 5 5)'), fraction) + AS point +FROM fractions -STRING
-
t - Returns a printable string representing the value. Often looks - similar to casting the argument to STRING. - See %t and %T behavior. - - sample
- 2014‑01‑01 -
Any type
T - Produces a string that is a valid ZetaSQL constant with a - similar type to the value's type (maybe wider, or maybe string). - See %t and %T behavior. - - 'sample'
- b'bytes sample'
- 1234
- 2.3
- date '2014‑01‑01' -
Any type
%'%%' produces a single '%'%n/a
+/*-------------+-------------------------------------------* + | fraction | point | + +-------------+-------------------------------------------+ + | 0 | POINT(1 1) | + | 0.5 | POINT(2.99633827268976 3.00182528336078) | + | 1 | POINT(5 5) | + | NULL | NULL | + *-------------+-------------------------------------------*/ +``` -The format specifier can optionally contain the sub-specifiers identified above -in the specifier prototype. +### `ST_LINELOCATEPOINT` -These sub-specifiers must comply with the following specifications. +```sql +ST_LINELOCATEPOINT(linestring_geography, point_geography) +``` -##### Flags - +**Description** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FlagsDescription
-Left-justify within the given field width; Right justification is the -default (see width sub-specifier)
+Forces to precede the result with a plus or minus sign (+ -or -) even for positive numbers. By default, only negative numbers -are preceded with a - sign
<space>If no sign is going to be written, a blank space is inserted before the -value
#
    -
  • For `%o`, `%x`, and `%X`, this flag means to precede the - value with 0, 0x or 0X respectively for values different than zero.
  • -
  • For `%f`, `%F`, `%e`, and `%E`, this flag means to add the decimal - point even when there is no fractional part, unless the value - is non-finite.
  • -
  • For `%g` and `%G`, this flag means to add the decimal point even - when there is no fractional part unless the value is non-finite, and - never remove the trailing zeros after the decimal point.
  • -
-
0 - Left-pads the number with zeroes (0) instead of spaces when padding is - specified (see width sub-specifier)
' -

Formats integers using the appropriating grouping character. - For example:

-
    -
  • FORMAT("%'d", 12345678) returns 12,345,678
  • -
  • FORMAT("%'x", 12345678) returns bc:614e
  • -
  • FORMAT("%'o", 55555) returns 15,4403
  • -

    This flag is only relevant for decimal, hex, and octal values.

    -
-
+Gets a section of a linestring between the start point and a selected point (a +point on the linestring closest to the `point_geography` argument). Returns the +percentage that this section represents in the linestring. -Flags may be specified in any order. Duplicate flags are not an error. When -flags are not relevant for some element type, they are ignored. +Details: -##### Width - ++ To select a point on the linestring `GEOGRAPHY` (`linestring_geography`), + this function takes a point `GEOGRAPHY` (`point_geography`) and finds the + [closest point][st-closestpoint] to it on the linestring. ++ If two points on `linestring_geography` are an equal distance away from + `point_geography`, it is not guaranteed which one will be selected. ++ The return value is an inclusive value between 0 and 1 (0-100%). ++ If the selected point is the start point on the linestring, function returns + 0 (0%). ++ If the selected point is the end point on the linestring, function returns 1 + (100%). - - - - - - - - - - - - - -
WidthDescription
<number> - Minimum number of characters to be printed. If the value to be printed - is shorter than this number, the result is padded with blank spaces. - The value is not truncated even if the result is larger -
* - The width is not specified in the format string, but as an additional - integer value argument preceding the argument that has to be formatted -
+`NULL` and error handling: -##### Precision - ++ Returns `NULL` if any input argument is `NULL`. ++ Returns an error if `linestring_geography` is not a linestring or if + `point_geography` is not a point. Use the `SAFE` prefix + to obtain `NULL` for invalid input instead of an error. - - - - - - - - - - - - - -
PrecisionDescription
.<number> -
    -
  • For integer specifiers `%d`, `%i`, `%o`, `%u`, `%x`, and `%X`: - precision specifies the - minimum number of digits to be written. If the value to be written is - shorter than this number, the result is padded with trailing zeros. - The value is not truncated even if the result is longer. A precision - of 0 means that no character is written for the value 0.
  • -
  • For specifiers `%a`, `%A`, `%e`, `%E`, `%f`, and `%F`: this is the - number of digits to be printed after the decimal point. The default - value is 6.
  • -
  • For specifiers `%g` and `%G`: this is the number of significant digits - to be printed, before the removal of the trailing zeros after the - decimal point. The default value is 6.
  • -
-
.* - The precision is not specified in the format string, but as an - additional integer value argument preceding the argument that has to be - formatted -
+**Return Type** -##### %g and %G behavior - -The `%g` and `%G` format specifiers choose either the decimal notation (like -the `%f` and `%F` specifiers) or the scientific notation (like the `%e` and `%E` -specifiers), depending on the input value's exponent and the specified -[precision](#precision). +`DOUBLE` -Let p stand for the specified [precision](#precision) (defaults to 6; 1 if the -specified precision is less than 1). The input value is first converted to -scientific notation with precision = (p - 1). If the resulting exponent part x -is less than -4 or no less than p, the scientific notation with precision = -(p - 1) is used; otherwise the decimal notation with precision = (p - 1 - x) is -used. +**Examples** -Unless [`#` flag](#flags) is present, the trailing zeros after the decimal point -are removed, and the decimal point is also removed if there is no digit after -it. +```sql +WITH geos AS ( + SELECT ST_GEOGPOINT(0, 0) AS point UNION ALL + SELECT ST_GEOGPOINT(1, 0) UNION ALL + SELECT ST_GEOGPOINT(1, 1) UNION ALL + SELECT ST_GEOGPOINT(2, 2) UNION ALL + SELECT ST_GEOGPOINT(3, 3) UNION ALL + SELECT ST_GEOGPOINT(4, 4) UNION ALL + SELECT ST_GEOGPOINT(5, 5) UNION ALL + SELECT ST_GEOGPOINT(6, 5) UNION ALL + SELECT NULL + ) +SELECT + point AS input_point, + ST_LINELOCATEPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 5 5)'), point) + AS percentage_from_beginning +FROM geos -##### %p and %P behavior - +/*-------------+---------------------------* + | input_point | percentage_from_beginning | + +-------------+---------------------------+ + | POINT(0 0) | 0 | + | POINT(1 0) | 0 | + | POINT(1 1) | 0 | + | POINT(2 2) | 0.25015214685147907 | + | POINT(3 3) | 0.5002284283637185 | + | POINT(4 4) | 0.7501905913884388 | + | POINT(5 5) | 1 | + | POINT(6 5) | 1 | + | NULL | NULL | + *-------------+---------------------------*/ +``` -The `%p` format specifier produces a one-line printable string. The `%P` -format specifier produces a multi-line printable string. You can use these -format specifiers with the following data types: +[st-closestpoint]: #st_closestpoint - - - - - - - - - - - - - - - - - - - - -
Type%p%P
PROTO -

PROTO input:

-
-message ReleaseDate {
- required int32 year = 1 [default=2019];
- required int32 month = 2 [default=10];
-}
-

Produces a one-line printable string representing a protocol buffer:

-
year: 2019 month: 10
-
-

PROTO input:

-
-message ReleaseDate {
- required int32 year = 1 [default=2019];
- required int32 month = 2 [default=10];
-}
-

Produces a multi-line printable string representing a protocol buffer:

-
-year: 2019
-month: 10
-
-
JSON -

JSON input:

-
-JSON '
-{
-  "month": 10,
-  "year": 2019
-}
-'
-

Produces a one-line printable string representing JSON:

-
{"month":10,"year":2019}
-
-

JSON input:

-
-JSON '
-{
-  "month": 10,
-  "year": 2019
-}
-'
-

Produces a multi-line printable string representing JSON:

-
-{
-  "month": 10,
-  "year": 2019
-}
-
-
+### `ST_LINESUBSTRING` -##### %t and %T behavior - +```sql +ST_LINESUBSTRING(linestring_geography, start_fraction, end_fraction); +``` -The `%t` and `%T` format specifiers are defined for all types. The -[width](#width), [precision](#precision), and [flags](#flags) act as they do -for `%s`: the [width](#width) is the minimum width and the `STRING` will be -padded to that size, and [precision](#precision) is the maximum width -of content to show and the `STRING` will be truncated to that size, prior to -padding to width. +**Description** -The `%t` specifier is always meant to be a readable form of the value. +Gets a segment of a linestring at a specific starting and ending fraction. -The `%T` specifier is always a valid SQL literal of a similar type, such as a -wider numeric type. -The literal will not include casts or a type name, except for the special case -of non-finite floating point values. +**Definitions** -The `STRING` is formatted as follows: ++ `linestring_geography`: The LineString `GEOGRAPHY` value that represents the + linestring from which to extract a segment. ++ `start_fraction`: `DOUBLE` value that represents + the starting fraction of the total length of `linestring_geography`. + This must be an inclusive value between 0 and 1 (0-100%). ++ `end_fraction`: `DOUBLE` value that represents + the ending fraction of the total length of `linestring_geography`. + This must be an inclusive value between 0 and 1 (0-100%). - - - - - - - - - - - - - - - - - - - - - - +**Details** - +`end_fraction` must be greater than or equal to `start_fraction`. - - - - - - - - - - - - - - - - - - +If `start_fraction` and `end_fraction` are equal, a linestring with only +one point is produced. - - - - - - - - - - - - - - - - - - - +**Return type** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Type%t%T
NULL of any typeNULLNULL
- -INT32
INT64
UINT32
UINT64
-
123123
NUMERIC123.0 (always with .0) - NUMERIC "123.0"
FLOAT, DOUBLE - 123.0 (always with .0)
- 123e+10
inf
-inf
NaN -
- 123.0 (always with .0)
- 123e+10
- CAST("inf" AS <type>)
- CAST("-inf" AS <type>)
- CAST("nan" AS <type>) -
STRINGunquoted string valuequoted string literal
BYTES - unquoted escaped bytes
- e.g. abc\x01\x02 -
- quoted bytes literal
- e.g. b"abc\x01\x02" -
BOOLboolean valueboolean value
ENUMEnumName"EnumName"
DATE2011-02-03DATE "2011-02-03"
TIMESTAMP2011-02-03 04:05:06+00TIMESTAMP "2011-02-03 04:05:06+00"
INTERVAL1-2 3 4:5:6.789INTERVAL "1-2 3 4:5:6.789" YEAR TO SECOND
PROTO - one-line printable string representing a protocol buffer. - - quoted string literal with one-line printable string representing a - protocol buffer. -
ARRAY[value, value, ...]
- where values are formatted with %t
[value, value, ...]
- where values are formatted with %T
STRUCT(value, value, ...)
- where fields are formatted with %t
(value, value, ...)
- where fields are formatted with %T
-
- Special cases:
- Zero fields: STRUCT()
- One field: STRUCT(value)
JSON - one-line printable string representing JSON.
-
{"name":"apple","stock":3}
-
- one-line printable string representing a JSON literal.
-
JSON '{"name":"apple","stock":3}'
-
++ LineString `GEOGRAPHY` if the resulting geography has more than one point. ++ Point `GEOGRAPHY` if the resulting geography has only one point. -##### Error conditions - +**Example** -If a format specifier is invalid, or is not compatible with the related -argument type, or the wrong number or arguments are provided, then an error is -produced. For example, the following `` expressions are invalid: +The following query returns the second half of the linestring: ```sql -FORMAT('%s', 1) -``` +WITH data AS ( + SELECT ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1 +) +SELECT ST_LINESUBSTRING(geo1, 0.5, 1) AS segment +FROM data; -```sql -FORMAT('%') +/*-------------------------------------------------------------+ + | segment | + +-------------------------------------------------------------+ + | LINESTRING(49.4760661523471 67.2419539103851, 10 70, 70 70) | + +-------------------------------------------------------------*/ ``` -##### NULL argument handling - +The following query returns a linestring that only contains one point: -A `NULL` format string results in a `NULL` output `STRING`. Any other arguments -are ignored in this case. +```sql +WITH data AS ( + SELECT ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1 +) +SELECT ST_LINESUBSTRING(geo1, 0.5, 0.5) AS segment +FROM data; -The function generally produces a `NULL` value if a `NULL` argument is present. -For example, `FORMAT('%i', NULL_expression)` produces a `NULL STRING` as -output. +/*------------------------------------------+ + | segment | + +------------------------------------------+ + | POINT(49.4760661523471 67.2419539103851) | + +------------------------------------------*/ +``` -However, there are some exceptions: if the format specifier is %t or %T -(both of which produce `STRING`s that effectively match CAST and literal value -semantics), a `NULL` value produces 'NULL' (without the quotes) in the result -`STRING`. For example, the function: +### `ST_MAKELINE` ```sql -FORMAT('00-%t-00', NULL_expression); +ST_MAKELINE(geography_1, geography_2) ``` -Returns - ```sql -00-NULL-00 +ST_MAKELINE(array_of_geography) ``` -##### Additional semantic rules - - -`DOUBLE` and -`FLOAT` values can be `+/-inf` or `NaN`. -When an argument has one of those values, the result of the format specifiers -`%f`, `%F`, `%e`, `%E`, `%g`, `%G`, and `%t` are `inf`, `-inf`, or `nan` -(or the same in uppercase) as appropriate. This is consistent with how -ZetaSQL casts these values to `STRING`. For `%T`, -ZetaSQL returns quoted strings for -`DOUBLE` values that don't have non-string literal -representations. - -[format-specifiers]: #format_specifiers - -[format-specifier-list]: #format_specifier_list +**Description** -[flags]: #flags +Creates a `GEOGRAPHY` with a single linestring by +concatenating the point or line vertices of each of the input +`GEOGRAPHY`s in the order they are given. -[width]: #width +`ST_MAKELINE` comes in two variants. For the first variant, input must be two +`GEOGRAPHY`s. For the second, input must be an `ARRAY` of type `GEOGRAPHY`. In +either variant, each input `GEOGRAPHY` must consist of one of the following +values: -[precision]: #precision ++ Exactly one point. ++ Exactly one linestring. -[g-and-g-behavior]: #g_and_g_behavior +For the first variant of `ST_MAKELINE`, if either input `GEOGRAPHY` is `NULL`, +`ST_MAKELINE` returns `NULL`. For the second variant, if input `ARRAY` or any +element in the input `ARRAY` is `NULL`, `ST_MAKELINE` returns `NULL`. -[p-and-p-behavior]: #p_and_p_behavior +**Constraints** -[t-and-t-behavior]: #t_and_t_behavior +Every edge must span strictly less than 180 degrees. -[error-format-specifiers]: #error_format_specifiers +NOTE: The ZetaSQL snapping process may discard sufficiently short +edges and snap the two endpoints together. For instance, if two input +`GEOGRAPHY`s each contain a point and the two points are separated by a distance +less than the snap radius, the points will be snapped together. In such a case +the result will be a `GEOGRAPHY` with exactly one point. -[null-format-specifiers]: #null_format_specifiers +**Return type** -[rules-format-specifiers]: #rules_format_specifiers +LineString `GEOGRAPHY` -### `FROM_BASE32` +### `ST_MAKEPOLYGON` ```sql -FROM_BASE32(string_expr) +ST_MAKEPOLYGON(polygon_shell[, array_of_polygon_holes]) ``` **Description** -Converts the base32-encoded input `string_expr` into `BYTES` format. To convert -`BYTES` to a base32-encoded `STRING`, use [TO_BASE32][string-link-to-base32]. +Creates a `GEOGRAPHY` containing a single polygon +from linestring inputs, where each input linestring is used to construct a +polygon ring. -**Return type** +`ST_MAKEPOLYGON` comes in two variants. For the first variant, the input +linestring is provided by a single `GEOGRAPHY` containing exactly one +linestring. For the second variant, the input consists of a single `GEOGRAPHY` +and an array of `GEOGRAPHY`s, each containing exactly one linestring. -`BYTES` +The first `GEOGRAPHY` in either variant is used to construct the polygon shell. +Additional `GEOGRAPHY`s provided in the input `ARRAY` specify a polygon hole. +For every input `GEOGRAPHY` containing exactly one linestring, the following +must be true: -**Example** ++ The linestring must consist of at least three distinct vertices. ++ The linestring must be closed: that is, the first and last vertex have to be + the same. If the first and last vertex differ, the function constructs a + final edge from the first vertex to the last. -```sql -SELECT FROM_BASE32('MFRGGZDF74======') AS byte_data; +For the first variant of `ST_MAKEPOLYGON`, if either input `GEOGRAPHY` is +`NULL`, `ST_MAKEPOLYGON` returns `NULL`. For the second variant, if +input `ARRAY` or any element in the `ARRAY` is `NULL`, `ST_MAKEPOLYGON` returns +`NULL`. -/*-----------* - | byte_data | - +-----------+ - | abcde\xff | - *-----------*/ -``` +NOTE: `ST_MAKEPOLYGON` accepts an empty `GEOGRAPHY` as input. `ST_MAKEPOLYGON` +interprets an empty `GEOGRAPHY` as having an empty linestring, which will +create a full loop: that is, a polygon that covers the entire Earth. -[string-link-to-base32]: #to_base32 +**Constraints** -### `FROM_BASE64` +Together, the input rings must form a valid polygon: -```sql -FROM_BASE64(string_expr) -``` ++ The polygon shell must cover each of the polygon holes. ++ There can be only one polygon shell (which has to be the first input ring). + This implies that polygon holes cannot be nested. ++ Polygon rings may only intersect in a vertex on the boundary of both rings. -**Description** +Every edge must span strictly less than 180 degrees. -Converts the base64-encoded input `string_expr` into -`BYTES` format. To convert -`BYTES` to a base64-encoded `STRING`, -use [TO_BASE64][string-link-to-base64]. +Each polygon ring divides the sphere into two regions. The first input linesting +to `ST_MAKEPOLYGON` forms the polygon shell, and the interior is chosen to be +the smaller of the two regions. Each subsequent input linestring specifies a +polygon hole, so the interior of the polygon is already well-defined. In order +to define a polygon shell such that the interior of the polygon is the larger of +the two regions, see [`ST_MAKEPOLYGONORIENTED`][st-makepolygonoriented]. -There are several base64 encodings in common use that vary in exactly which -alphabet of 65 ASCII characters are used to encode the 64 digits and padding. -See [RFC 4648][RFC-4648] for details. This -function expects the alphabet `[A-Za-z0-9+/=]`. +NOTE: The ZetaSQL snapping process may discard sufficiently +short edges and snap the two endpoints together. Hence, when vertices are +snapped together, it is possible that a polygon hole that is sufficiently small +may disappear, or the output `GEOGRAPHY` may contain only a line or a +point. **Return type** -`BYTES` +`GEOGRAPHY` -**Example** +[st-makepolygonoriented]: #st_makepolygonoriented -```sql -SELECT FROM_BASE64('/+A=') AS byte_data; +### `ST_MAKEPOLYGONORIENTED` -/*-----------* - | byte_data | - +-----------+ - | \377\340 | - *-----------*/ +```sql +ST_MAKEPOLYGONORIENTED(array_of_geography) ``` -To work with an encoding using a different base64 alphabet, you might need to -compose `FROM_BASE64` with the `REPLACE` function. For instance, the -`base64url` url-safe and filename-safe encoding commonly used in web programming -uses `-_=` as the last characters rather than `+/=`. To decode a -`base64url`-encoded string, replace `-` and `_` with `+` and `/` respectively. - -```sql -SELECT FROM_BASE64(REPLACE(REPLACE('_-A=', '-', '+'), '_', '/')) AS binary; +**Description** -/*-----------* - | binary | - +-----------+ - | \377\340 | - *-----------*/ -``` +Like `ST_MAKEPOLYGON`, but the vertex ordering of each input linestring +determines the orientation of each polygon ring. The orientation of a polygon +ring defines the interior of the polygon as follows: if someone walks along the +boundary of the polygon in the order of the input vertices, the interior of the +polygon is on the left. This applies for each polygon ring provided. -[RFC-4648]: https://tools.ietf.org/html/rfc4648#section-4 +This variant of the polygon constructor is more flexible since +`ST_MAKEPOLYGONORIENTED` can construct a polygon such that the interior is on +either side of the polygon ring. However, proper orientation of polygon rings is +critical in order to construct the desired polygon. -[string-link-to-from-base64]: #from_base64 +If the input `ARRAY` or any element in the `ARRAY` is `NULL`, +`ST_MAKEPOLYGONORIENTED` returns `NULL`. -### `FROM_HEX` +NOTE: The input argument for `ST_MAKEPOLYGONORIENTED` may contain an empty +`GEOGRAPHY`. `ST_MAKEPOLYGONORIENTED` interprets an empty `GEOGRAPHY` as having +an empty linestring, which will create a full loop: that is, a polygon that +covers the entire Earth. -```sql -FROM_HEX(string) -``` +**Constraints** -**Description** +Together, the input rings must form a valid polygon: -Converts a hexadecimal-encoded `STRING` into `BYTES` format. Returns an error -if the input `STRING` contains characters outside the range -`(0..9, A..F, a..f)`. The lettercase of the characters does not matter. If the -input `STRING` has an odd number of characters, the function acts as if the -input has an additional leading `0`. To convert `BYTES` to a hexadecimal-encoded -`STRING`, use [TO_HEX][string-link-to-to-hex]. ++ The polygon shell must cover each of the polygon holes. ++ There must be only one polygon shell, which must to be the first input ring. + This implies that polygon holes cannot be nested. ++ Polygon rings may only intersect in a vertex on the boundary of both rings. -**Return type** +Every edge must span strictly less than 180 degrees. -`BYTES` +`ST_MAKEPOLYGONORIENTED` relies on the ordering of the input vertices of each +linestring to determine the orientation of the polygon. This applies to the +polygon shell and any polygon holes. `ST_MAKEPOLYGONORIENTED` expects all +polygon holes to have the opposite orientation of the shell. See +[`ST_MAKEPOLYGON`][st-makepolygon] for an alternate polygon constructor, and +other constraints on building a valid polygon. -**Example** +NOTE: Due to the ZetaSQL snapping process, edges with a sufficiently +short length will be discarded and the two endpoints will be snapped to a single +point. Therefore, it is possible that vertices in a linestring may be snapped +together such that one or more edge disappears. Hence, it is possible that a +polygon hole that is sufficiently small may disappear, or the resulting +`GEOGRAPHY` may contain only a line or a point. -```sql -WITH Input AS ( - SELECT '00010203aaeeefff' AS hex_str UNION ALL - SELECT '0AF' UNION ALL - SELECT '666f6f626172' -) -SELECT hex_str, FROM_HEX(hex_str) AS bytes_str -FROM Input; +**Return type** -/*------------------+----------------------------------* - | hex_str | bytes_str | - +------------------+----------------------------------+ - | 0AF | \x00\xaf | - | 00010203aaeeefff | \x00\x01\x02\x03\xaa\xee\xef\xff | - | 666f6f626172 | foobar | - *------------------+----------------------------------*/ -``` +`GEOGRAPHY` -[string-link-to-to-hex]: #to_hex +[st-makepolygon]: #st_makepolygon -### `INITCAP` +### `ST_MAXDISTANCE` ```sql -INITCAP(value[, delimiters]) +ST_MAXDISTANCE(geography_1, geography_2[, use_spheroid]) ``` -**Description** +Returns the longest distance in meters between two non-empty +`GEOGRAPHY`s; that is, the distance between two +vertices where the first vertex is in the first +`GEOGRAPHY`, and the second vertex is in the second +`GEOGRAPHY`. If `geography_1` and `geography_2` are the +same `GEOGRAPHY`, the function returns the distance +between the two most distant vertices in that +`GEOGRAPHY`. -Takes a `STRING` and returns it with the first character in each word in -uppercase and all other characters in lowercase. Non-alphabetic characters -remain the same. +If either of the input `GEOGRAPHY`s is empty, +`ST_MAXDISTANCE` returns `NULL`. -`delimiters` is an optional string argument that is used to override the default -set of characters used to separate words. If `delimiters` is not specified, it -defaults to the following characters: \ -` [ ] ( ) { } / | \ < > ! ? @ " ^ # $ & ~ _ , . : ; * % + -` +The optional `use_spheroid` parameter determines how this function measures +distance. If `use_spheroid` is `FALSE`, the function measures distance on the +surface of a perfect sphere. -If `value` or `delimiters` is `NULL`, the function returns `NULL`. +The `use_spheroid` parameter currently only supports +the value `FALSE`. The default value of `use_spheroid` is `FALSE`. **Return type** -`STRING` - -**Examples** - -```sql -WITH example AS -( - SELECT 'Hello World-everyone!' AS value UNION ALL - SELECT 'tHe dog BARKS loudly+friendly' AS value UNION ALL - SELECT 'apples&oranges;&pears' AS value UNION ALL - SELECT 'καθίσματα ταινιών' AS value -) -SELECT value, INITCAP(value) AS initcap_value FROM example - -/*-------------------------------+-------------------------------* - | value | initcap_value | - +-------------------------------+-------------------------------+ - | Hello World-everyone! | Hello World-Everyone! | - | tHe dog BARKS loudly+friendly | The Dog Barks Loudly+Friendly | - | apples&oranges;&pears | Apples&Oranges;&Pears | - | καθίσματα ταινιών | Καθίσματα Ταινιών | - *-------------------------------+-------------------------------*/ - -WITH example AS -( - SELECT 'hello WORLD!' AS value, '' AS delimiters UNION ALL - SELECT 'καθίσματα ταιντιώ@ν' AS value, 'Ï„@' AS delimiters UNION ALL - SELECT 'Apples1oranges2pears' AS value, '12' AS delimiters UNION ALL - SELECT 'tHisEisEaESentence' AS value, 'E' AS delimiters -) -SELECT value, delimiters, INITCAP(value, delimiters) AS initcap_value FROM example; +`DOUBLE` -/*----------------------+------------+----------------------* - | value | delimiters | initcap_value | - +----------------------+------------+----------------------+ - | hello WORLD! | | Hello world! | - | καθίσματα ταιντιώ@ν | Ï„@ | ΚαθίσματΑ τΑιντΙώ@Î | - | Apples1oranges2pears | 12 | Apples1Oranges2Pears | - | tHisEisEaESentence | E | ThisEIsEAESentence | - *----------------------+------------+----------------------*/ -``` +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System -### `INSTR` +### `ST_NPOINTS` ```sql -INSTR(value, subvalue[, position[, occurrence]]) +ST_NPOINTS(geography_expression) ``` **Description** -Returns the lowest 1-based position of `subvalue` in `value`. -`value` and `subvalue` must be the same type, either -`STRING` or `BYTES`. - -If `position` is specified, the search starts at this position in -`value`, otherwise it starts at `1`, which is the beginning of -`value`. If `position` is negative, the function searches backwards -from the end of `value`, with `-1` indicating the last character. -`position` is of type `INT64` and cannot be `0`. - -If `occurrence` is specified, the search returns the position of a specific -instance of `subvalue` in `value`. If not specified, `occurrence` -defaults to `1` and returns the position of the first occurrence. -For `occurrence` > `1`, the function includes overlapping occurrences. -`occurrence` is of type `INT64` and must be positive. - -This function supports specifying [collation][collation]. - -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about - -Returns `0` if: +An alias of [ST_NUMPOINTS][st-numpoints]. -+ No match is found. -+ If `occurrence` is greater than the number of matches found. -+ If `position` is greater than the length of `value`. +[st-numpoints]: #st_numpoints -Returns `NULL` if: +### `ST_NUMGEOMETRIES` -+ Any input argument is `NULL`. +``` +ST_NUMGEOMETRIES(geography_expression) +``` -Returns an error if: +**Description** -+ `position` is `0`. -+ `occurrence` is `0` or negative. +Returns the number of geometries in the input `GEOGRAPHY`. For a single point, +linestring, or polygon, `ST_NUMGEOMETRIES` returns `1`. For any collection of +geometries, `ST_NUMGEOMETRIES` returns the number of geometries making up the +collection. `ST_NUMGEOMETRIES` returns `0` if the input is the empty +`GEOGRAPHY`. **Return type** `INT64` -**Examples** +**Example** + +The following example computes `ST_NUMGEOMETRIES` for a single point geography, +two collections, and an empty geography. ```sql -WITH example AS -(SELECT 'banana' as value, 'an' as subvalue, 1 as position, 1 as -occurrence UNION ALL -SELECT 'banana' as value, 'an' as subvalue, 1 as position, 2 as -occurrence UNION ALL -SELECT 'banana' as value, 'an' as subvalue, 1 as position, 3 as -occurrence UNION ALL -SELECT 'banana' as value, 'an' as subvalue, 3 as position, 1 as -occurrence UNION ALL -SELECT 'banana' as value, 'an' as subvalue, -1 as position, 1 as -occurrence UNION ALL -SELECT 'banana' as value, 'an' as subvalue, -3 as position, 1 as -occurrence UNION ALL -SELECT 'banana' as value, 'ann' as subvalue, 1 as position, 1 as -occurrence UNION ALL -SELECT 'helloooo' as value, 'oo' as subvalue, 1 as position, 1 as -occurrence UNION ALL -SELECT 'helloooo' as value, 'oo' as subvalue, 1 as position, 2 as -occurrence -) -SELECT value, subvalue, position, occurrence, INSTR(value, -subvalue, position, occurrence) AS instr +WITH example AS( + SELECT ST_GEOGFROMTEXT('POINT(5 0)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 1, 4 3, 2 6)') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography + UNION ALL + SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')) +SELECT + geography, + ST_NUMGEOMETRIES(geography) AS num_geometries, FROM example; -/*--------------+--------------+----------+------------+-------* - | value | subvalue | position | occurrence | instr | - +--------------+--------------+----------+------------+-------+ - | banana | an | 1 | 1 | 2 | - | banana | an | 1 | 2 | 4 | - | banana | an | 1 | 3 | 0 | - | banana | an | 3 | 1 | 4 | - | banana | an | -1 | 1 | 4 | - | banana | an | -3 | 1 | 4 | - | banana | ann | 1 | 1 | 0 | - | helloooo | oo | 1 | 1 | 5 | - | helloooo | oo | 1 | 2 | 6 | - *--------------+--------------+----------+------------+-------*/ +/*------------------------------------------------------+----------------* + | geography | num_geometries | + +------------------------------------------------------+----------------+ + | POINT(5 0) | 1 | + | MULTIPOINT(0 1, 4 3, 2 6) | 3 | + | GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1)) | 2 | + | GEOMETRYCOLLECTION EMPTY | 0 | + *------------------------------------------------------+----------------*/ ``` -### `LEFT` +### `ST_NUMPOINTS` ```sql -LEFT(value, length) +ST_NUMPOINTS(geography_expression) ``` **Description** -Returns a `STRING` or `BYTES` value that consists of the specified -number of leftmost characters or bytes from `value`. The `length` is an -`INT64` that specifies the length of the returned -value. If `value` is of type `BYTES`, `length` is the number of leftmost bytes -to return. If `value` is `STRING`, `length` is the number of leftmost characters -to return. +Returns the number of vertices in the input +`GEOGRAPHY`. This includes the number of points, the +number of linestring vertices, and the number of polygon vertices. -If `length` is 0, an empty `STRING` or `BYTES` value will be -returned. If `length` is negative, an error will be returned. If `length` -exceeds the number of characters or bytes from `value`, the original `value` -will be returned. +NOTE: The first and last vertex of a polygon ring are counted as distinct +vertices. **Return type** -`STRING` or `BYTES` - -**Examples** - -```sql -WITH examples AS -(SELECT 'apple' as example -UNION ALL -SELECT 'banana' as example -UNION ALL -SELECT 'абвгд' as example -) -SELECT example, LEFT(example, 3) AS left_example -FROM examples; - -/*---------+--------------* - | example | left_example | - +---------+--------------+ - | apple | app | - | banana | ban | - | абвгд | абв | - *---------+--------------*/ -``` - -```sql -WITH examples AS -(SELECT b'apple' as example -UNION ALL -SELECT b'banana' as example -UNION ALL -SELECT b'\xab\xcd\xef\xaa\xbb' as example -) -SELECT example, LEFT(example, 3) AS left_example -FROM examples; - -/*----------------------+--------------* - | example | left_example | - +----------------------+--------------+ - | apple | app | - | banana | ban | - | \xab\xcd\xef\xaa\xbb | \xab\xcd\xef | - *----------------------+--------------*/ -``` +`INT64` -### `LENGTH` +### `ST_PERIMETER` ```sql -LENGTH(value) +ST_PERIMETER(geography_expression[, use_spheroid]) ``` **Description** -Returns the length of the `STRING` or `BYTES` value. The returned -value is in characters for `STRING` arguments and in bytes for the `BYTES` -argument. - -**Return type** +Returns the length in meters of the boundary of the polygons in the input +`GEOGRAPHY`. -`INT64` +If `geography_expression` is a point or a line, returns zero. If +`geography_expression` is a collection, returns the perimeter of the polygons +in the collection; if the collection does not contain polygons, returns zero. -**Examples** +The optional `use_spheroid` parameter determines how this function measures +distance. If `use_spheroid` is `FALSE`, the function measures distance on the +surface of a perfect sphere. -```sql +The `use_spheroid` parameter currently only supports +the value `FALSE`. The default value of `use_spheroid` is `FALSE`. -WITH example AS - (SELECT 'абвгд' AS characters) +**Return type** -SELECT - characters, - LENGTH(characters) AS string_example, - LENGTH(CAST(characters AS BYTES)) AS bytes_example -FROM example; +`DOUBLE` -/*------------+----------------+---------------* - | characters | string_example | bytes_example | - +------------+----------------+---------------+ - | абвгд | 5 | 10 | - *------------+----------------+---------------*/ -``` +[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System -### `LOWER` +### `ST_POINTN` ```sql -LOWER(value) +ST_POINTN(linestring_geography, index) ``` **Description** -For `STRING` arguments, returns the original string with all alphabetic -characters in lowercase. Mapping between lowercase and uppercase is done -according to the -[Unicode Character Database][string-link-to-unicode-character-definitions] -without taking into account language-specific mappings. +Returns the Nth point of a linestring geography as a point geography, where N is +the index. The index is 1-based. Negative values are counted backwards from the +end of the linestring, so that -1 is the last point. Returns an error if the +input is not a linestring, if the input is empty, or if there is no vertex at +the given index. Use the `SAFE` prefix to obtain `NULL` for invalid input +instead of an error. -For `BYTES` arguments, the argument is treated as ASCII text, with all bytes -greater than 127 left intact. +**Return Type** -**Return type** +Point `GEOGRAPHY` -`STRING` or `BYTES` +**Example** -**Examples** +The following example uses `ST_POINTN`, [`ST_STARTPOINT`][st-startpoint] and +[`ST_ENDPOINT`][st-endpoint] to extract points from a linestring. ```sql +WITH linestring AS ( + SELECT ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)') g +) +SELECT ST_POINTN(g, 1) AS first, ST_POINTN(g, -1) AS last, + ST_POINTN(g, 2) AS second, ST_POINTN(g, -2) AS second_to_last +FROM linestring; -WITH items AS - (SELECT - 'FOO' as item - UNION ALL - SELECT - 'BAR' as item - UNION ALL - SELECT - 'BAZ' as item) - -SELECT - LOWER(item) AS example -FROM items; - -/*---------* - | example | - +---------+ - | foo | - | bar | - | baz | - *---------*/ +/*--------------+--------------+--------------+----------------* + | first | last | second | second_to_last | + +--------------+--------------+--------------+----------------+ + | POINT(1 1) | POINT(3 3) | POINT(2 1) | POINT(3 2) | + *--------------+--------------+--------------+----------------*/ ``` -[string-link-to-unicode-character-definitions]: http://unicode.org/ucd/ +[st-startpoint]: #st_startpoint -### `LPAD` +[st-endpoint]: #st_endpoint + +### `ST_SIMPLIFY` ```sql -LPAD(original_value, return_length[, pattern]) +ST_SIMPLIFY(geography, tolerance_meters) ``` **Description** -Returns a `STRING` or `BYTES` value that consists of `original_value` prepended -with `pattern`. The `return_length` is an `INT64` that -specifies the length of the returned value. If `original_value` is of type -`BYTES`, `return_length` is the number of bytes. If `original_value` is -of type `STRING`, `return_length` is the number of characters. - -The default value of `pattern` is a blank space. +Returns a simplified version of `geography`, the given input +`GEOGRAPHY`. The input `GEOGRAPHY` is simplified by replacing nearly straight +chains of short edges with a single long edge. The input `geography` will not +change by more than the tolerance specified by `tolerance_meters`. Thus, +simplified edges are guaranteed to pass within `tolerance_meters` of the +*original* positions of all vertices that were removed from that edge. The given +`tolerance_meters` is in meters on the surface of the Earth. -Both `original_value` and `pattern` must be the same data type. +Note that `ST_SIMPLIFY` preserves topological relationships, which means that +no new crossing edges will be created and the output will be valid. For a large +enough tolerance, adjacent shapes may collapse into a single object, or a shape +could be simplified to a shape with a smaller dimension. -If `return_length` is less than or equal to the `original_value` length, this -function returns the `original_value` value, truncated to the value of -`return_length`. For example, `LPAD('hello world', 7);` returns `'hello w'`. +**Constraints** -If `original_value`, `return_length`, or `pattern` is `NULL`, this function -returns `NULL`. +For `ST_SIMPLIFY` to have any effect, `tolerance_meters` must be non-zero. -This function returns an error if: +`ST_SIMPLIFY` returns an error if the tolerance specified by `tolerance_meters` +is one of the following: -+ `return_length` is negative -+ `pattern` is empty ++ A negative tolerance. ++ Greater than ~7800 kilometers. **Return type** -`STRING` or `BYTES` +`GEOGRAPHY` **Examples** -```sql -SELECT t, len, FORMAT('%T', LPAD(t, len)) AS LPAD FROM UNNEST([ - STRUCT('abc' AS t, 5 AS len), - ('abc', 2), - ('例å­', 4) -]); - -/*------+-----+----------* - | t | len | LPAD | - |------|-----|----------| - | abc | 5 | " abc" | - | abc | 2 | "ab" | - | ä¾‹å­ | 4 | " 例å­" | - *------+-----+----------*/ -``` +The following example shows how `ST_SIMPLIFY` simplifies the input line +`GEOGRAPHY` by removing intermediate vertices. ```sql -SELECT t, len, pattern, FORMAT('%T', LPAD(t, len, pattern)) AS LPAD FROM UNNEST([ - STRUCT('abc' AS t, 8 AS len, 'def' AS pattern), - ('abc', 5, '-'), - ('例å­', 5, '中文') -]); +WITH example AS + (SELECT ST_GEOGFROMTEXT('LINESTRING(0 0, 0.05 0, 0.1 0, 0.15 0, 2 0)') AS line) +SELECT + line AS original_line, + ST_SIMPLIFY(line, 1) AS simplified_line +FROM example; -/*------+-----+---------+--------------* - | t | len | pattern | LPAD | - |------|-----|---------|--------------| - | abc | 8 | def | "defdeabc" | - | abc | 5 | - | "--abc" | - | ä¾‹å­ | 5 | 中文 | "中文中例å­" | - *------+-----+---------+--------------*/ +/*---------------------------------------------+----------------------* + | original_line | simplified_line | + +---------------------------------------------+----------------------+ + | LINESTRING(0 0, 0.05 0, 0.1 0, 0.15 0, 2 0) | LINESTRING(0 0, 2 0) | + *---------------------------------------------+----------------------*/ ``` -```sql -SELECT FORMAT('%T', t) AS t, len, FORMAT('%T', LPAD(t, len)) AS LPAD FROM UNNEST([ - STRUCT(b'abc' AS t, 5 AS len), - (b'abc', 2), - (b'\xab\xcd\xef', 4) -]); - -/*-----------------+-----+------------------* - | t | len | LPAD | - |-----------------|-----|------------------| - | b"abc" | 5 | b" abc" | - | b"abc" | 2 | b"ab" | - | b"\xab\xcd\xef" | 4 | b" \xab\xcd\xef" | - *-----------------+-----+------------------*/ -``` +The following example illustrates how the result of `ST_SIMPLIFY` can have a +lower dimension than the original shape. ```sql +WITH example AS + (SELECT + ST_GEOGFROMTEXT('POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0))') AS polygon, + t AS tolerance + FROM UNNEST([1000, 10000, 100000]) AS t) SELECT - FORMAT('%T', t) AS t, - len, - FORMAT('%T', pattern) AS pattern, - FORMAT('%T', LPAD(t, len, pattern)) AS LPAD -FROM UNNEST([ - STRUCT(b'abc' AS t, 8 AS len, b'def' AS pattern), - (b'abc', 5, b'-'), - (b'\xab\xcd\xef', 5, b'\x00') -]); + polygon AS original_triangle, + tolerance AS tolerance_meters, + ST_SIMPLIFY(polygon, tolerance) AS simplified_result +FROM example -/*-----------------+-----+---------+-------------------------* - | t | len | pattern | LPAD | - |-----------------|-----|---------|-------------------------| - | b"abc" | 8 | b"def" | b"defdeabc" | - | b"abc" | 5 | b"-" | b"--abc" | - | b"\xab\xcd\xef" | 5 | b"\x00" | b"\x00\x00\xab\xcd\xef" | - *-----------------+-----+---------+-------------------------*/ +/*-------------------------------------+------------------+-------------------------------------* + | original_triangle | tolerance_meters | simplified_result | + +-------------------------------------+------------------+-------------------------------------+ + | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 1000 | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | + | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 10000 | LINESTRING(0 0, 0.1 0.1) | + | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 100000 | POINT(0 0) | + *-------------------------------------+------------------+-------------------------------------*/ ``` -### `LTRIM` +### `ST_SNAPTOGRID` ```sql -LTRIM(value1[, value2]) +ST_SNAPTOGRID(geography_expression, grid_size) ``` **Description** -Identical to [TRIM][string-link-to-trim], but only removes leading characters. +Returns the input `GEOGRAPHY`, where each vertex has +been snapped to a longitude/latitude grid. The grid size is determined by the +`grid_size` parameter which is given in degrees. + +**Constraints** + +Arbitrary grid sizes are not supported. The `grid_size` parameter is rounded so +that it is of the form `10^n`, where `-10 < n < 0`. **Return type** -`STRING` or `BYTES` +`GEOGRAPHY` -**Examples** +### `ST_STARTPOINT` ```sql -WITH items AS - (SELECT ' apple ' as item - UNION ALL - SELECT ' banana ' as item - UNION ALL - SELECT ' orange ' as item) +ST_STARTPOINT(linestring_geography) +``` -SELECT - CONCAT('#', LTRIM(item), '#') as example -FROM items; +**Description** -/*-------------* - | example | - +-------------+ - | #apple # | - | #banana # | - | #orange # | - *-------------*/ -``` +Returns the first point of a linestring geography as a point geography. Returns +an error if the input is not a linestring or if the input is empty. Use the +`SAFE` prefix to obtain `NULL` for invalid input instead of an error. -```sql -WITH items AS - (SELECT '***apple***' as item - UNION ALL - SELECT '***banana***' as item - UNION ALL - SELECT '***orange***' as item) +**Return Type** -SELECT - LTRIM(item, '*') as example -FROM items; +Point `GEOGRAPHY` -/*-----------* - | example | - +-----------+ - | apple*** | - | banana*** | - | orange*** | - *-----------*/ -``` +**Example** ```sql -WITH items AS - (SELECT 'xxxapplexxx' as item - UNION ALL - SELECT 'yyybananayyy' as item - UNION ALL - SELECT 'zzzorangezzz' as item - UNION ALL - SELECT 'xyzpearxyz' as item) +SELECT ST_STARTPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)')) first + +/*--------------* + | first | + +--------------+ + | POINT(1 1) | + *--------------*/ ``` -```sql -SELECT - LTRIM(item, 'xyz') as example -FROM items; +### `ST_TOUCHES` -/*-----------* - | example | - +-----------+ - | applexxx | - | bananayyy | - | orangezzz | - | pearxyz | - *-----------*/ +```sql +ST_TOUCHES(geography_1, geography_2) ``` -[string-link-to-trim]: #trim +**Description** -### `NORMALIZE` +Returns `TRUE` provided the following two conditions are satisfied: + +1. `geography_1` intersects `geography_2`. +1. The interior of `geography_1` and the interior of `geography_2` are + disjoint. + +**Return type** + +`BOOL` + +### `ST_UNION` ```sql -NORMALIZE(value[, normalization_mode]) +ST_UNION(geography_1, geography_2) +``` + +```sql +ST_UNION(array_of_geography) ``` **Description** -Takes a string value and returns it as a normalized string. If you do not -provide a normalization mode, `NFC` is used. +Returns a `GEOGRAPHY` that represents the point set +union of all input `GEOGRAPHY`s. -[Normalization][string-link-to-normalization-wikipedia] is used to ensure that -two strings are equivalent. Normalization is often used in situations in which -two strings render the same on the screen but have different Unicode code -points. +`ST_UNION` comes in two variants. For the first variant, input must be two +`GEOGRAPHY`s. For the second, the input is an +`ARRAY` of type `GEOGRAPHY`. -`NORMALIZE` supports four optional normalization modes: +For the first variant of `ST_UNION`, if an input +`GEOGRAPHY` is `NULL`, `ST_UNION` returns `NULL`. +For the second variant, if the input `ARRAY` value +is `NULL`, `ST_UNION` returns `NULL`. +For a non-`NULL` input `ARRAY`, the union is computed +and `NULL` elements are ignored so that they do not affect the output. -| Value | Name | Description| -|---------|------------------------------------------------|------------| -| `NFC` | Normalization Form Canonical Composition | Decomposes and recomposes characters by canonical equivalence.| -| `NFKC` | Normalization Form Compatibility Composition | Decomposes characters by compatibility, then recomposes them by canonical equivalence.| -| `NFD` | Normalization Form Canonical Decomposition | Decomposes characters by canonical equivalence, and multiple combining characters are arranged in a specific order.| -| `NFKD` | Normalization Form Compatibility Decomposition | Decomposes characters by compatibility, and multiple combining characters are arranged in a specific order.| +See [`ST_UNION_AGG`][st-union-agg] for the aggregate version of `ST_UNION`. **Return type** -`STRING` +`GEOGRAPHY` -**Examples** +**Example** ```sql -SELECT a, b, a = b as normalized -FROM (SELECT NORMALIZE('\u00ea') as a, NORMALIZE('\u0065\u0302') as b); +SELECT ST_UNION( + ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)'), + ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -100.19 47.69)') +) AS results -/*---+---+------------* - | a | b | normalized | - +---+---+------------+ - | ê | ê | true | - *---+---+------------*/ +/*---------------------------------------------------------* + | results | + +---------------------------------------------------------+ + | LINESTRING(-100.19 47.69, -122.12 47.67, -122.19 47.69) | + *---------------------------------------------------------*/ ``` -The following example normalizes different space characters. + +[st-union-agg]: #st_union_agg + +### `ST_UNION_AGG` ```sql -WITH EquivalentNames AS ( - SELECT name - FROM UNNEST([ - 'Jane\u2004Doe', - 'John\u2004Smith', - 'Jane\u2005Doe', - 'Jane\u2006Doe', - 'John Smith']) AS name -) -SELECT - NORMALIZE(name, NFKC) AS normalized_name, - COUNT(*) AS name_count -FROM EquivalentNames -GROUP BY 1; - -/*-----------------+------------* - | normalized_name | name_count | - +-----------------+------------+ - | John Smith | 2 | - | Jane Doe | 3 | - *-----------------+------------*/ +ST_UNION_AGG(geography) ``` -[string-link-to-normalization-wikipedia]: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization +**Description** -### `NORMALIZE_AND_CASEFOLD` +Returns a `GEOGRAPHY` that represents the point set +union of all input `GEOGRAPHY`s. + +`ST_UNION_AGG` ignores `NULL` input `GEOGRAPHY` values. + +See [`ST_UNION`][st-union] for the non-aggregate version of `ST_UNION_AGG`. + +**Return type** + +`GEOGRAPHY` + +**Example** ```sql -NORMALIZE_AND_CASEFOLD(value[, normalization_mode]) +SELECT ST_UNION_AGG(items) AS results +FROM UNNEST([ + ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)'), + ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -100.19 47.69)'), + ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)')]) as items; + +/*---------------------------------------------------------* + | results | + +---------------------------------------------------------+ + | LINESTRING(-100.19 47.69, -122.12 47.67, -122.19 47.69) | + *---------------------------------------------------------*/ ``` -**Description** +[st-union]: #st_union -Takes a string value and returns it as a normalized string. If you do not -provide a normalization mode, `NFC` is used. +### `ST_WITHIN` -[Normalization][string-link-to-normalization-wikipedia] is used to ensure that -two strings are equivalent. Normalization is often used in situations in which -two strings render the same on the screen but have different Unicode code -points. +```sql +ST_WITHIN(geography_1, geography_2) +``` -[Case folding][string-link-to-case-folding-wikipedia] is used for the caseless -comparison of strings. If you need to compare strings and case should not be -considered, use `NORMALIZE_AND_CASEFOLD`, otherwise use -[`NORMALIZE`][string-link-to-normalize]. +**Description** -`NORMALIZE_AND_CASEFOLD` supports four optional normalization modes: +Returns `TRUE` if no point of `geography_1` is outside of `geography_2` and +the interiors of `geography_1` and `geography_2` intersect. -| Value | Name | Description| -|---------|------------------------------------------------|------------| -| `NFC` | Normalization Form Canonical Composition | Decomposes and recomposes characters by canonical equivalence.| -| `NFKC` | Normalization Form Compatibility Composition | Decomposes characters by compatibility, then recomposes them by canonical equivalence.| -| `NFD` | Normalization Form Canonical Decomposition | Decomposes characters by canonical equivalence, and multiple combining characters are arranged in a specific order.| -| `NFKD` | Normalization Form Compatibility Decomposition | Decomposes characters by compatibility, and multiple combining characters are arranged in a specific order.| +Given two geographies `a` and `b`, `ST_WITHIN(a, b)` returns the same result +as [`ST_CONTAINS`][st-contains]`(b, a)`. Note the opposite order of arguments. **Return type** -`STRING` +`BOOL` -**Examples** +[st-contains]: #st_contains -```sql -SELECT - a, b, - NORMALIZE(a) = NORMALIZE(b) as normalized, - NORMALIZE_AND_CASEFOLD(a) = NORMALIZE_AND_CASEFOLD(b) as normalized_with_case_folding -FROM (SELECT 'The red barn' AS a, 'The Red Barn' AS b); +### `ST_X` -/*--------------+--------------+------------+------------------------------* - | a | b | normalized | normalized_with_case_folding | - +--------------+--------------+------------+------------------------------+ - | The red barn | The Red Barn | false | true | - *--------------+--------------+------------+------------------------------*/ +```sql +ST_X(point_geography_expression) ``` -```sql -WITH Strings AS ( - SELECT '\u2168' AS a, 'IX' AS b UNION ALL - SELECT '\u0041\u030A', '\u00C5' -) -SELECT a, b, - NORMALIZE_AND_CASEFOLD(a, NFD)=NORMALIZE_AND_CASEFOLD(b, NFD) AS nfd, - NORMALIZE_AND_CASEFOLD(a, NFC)=NORMALIZE_AND_CASEFOLD(b, NFC) AS nfc, - NORMALIZE_AND_CASEFOLD(a, NFKD)=NORMALIZE_AND_CASEFOLD(b, NFKD) AS nkfd, - NORMALIZE_AND_CASEFOLD(a, NFKC)=NORMALIZE_AND_CASEFOLD(b, NFKC) AS nkfc -FROM Strings; +**Description** -/*---+----+-------+-------+------+------* - | a | b | nfd | nfc | nkfd | nkfc | - +---+----+-------+-------+------+------+ - | â…¨ | IX | false | false | true | true | - | AÌŠ | Ã… | true | true | true | true | - *---+----+-------+-------+------+------*/ -``` +Returns the longitude in degrees of the single-point input +`GEOGRAPHY`. -[string-link-to-normalization-wikipedia]: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization +For any input `GEOGRAPHY` that is not a single point, +including an empty `GEOGRAPHY`, `ST_X` returns an +error. Use the `SAFE.` prefix to obtain `NULL`. -[string-link-to-case-folding-wikipedia]: https://en.wikipedia.org/wiki/Letter_case#Case_folding +**Return type** -[string-link-to-normalize]: #normalize +`DOUBLE` -### `OCTET_LENGTH` +**Example** -```sql -OCTET_LENGTH(value) -``` +The following example uses `ST_X` and `ST_Y` to extract coordinates from +single-point geographies. -Alias for [`BYTE_LENGTH`][byte-length]. +```sql +WITH points AS + (SELECT ST_GEOGPOINT(i, i + 1) AS p FROM UNNEST([0, 5, 12]) AS i) + SELECT + p, + ST_X(p) as longitude, + ST_Y(p) as latitude +FROM points; -[byte-length]: #byte_length +/*--------------+-----------+----------* + | p | longitude | latitude | + +--------------+-----------+----------+ + | POINT(0 1) | 0.0 | 1.0 | + | POINT(5 6) | 5.0 | 6.0 | + | POINT(12 13) | 12.0 | 13.0 | + *--------------+-----------+----------*/ +``` -### `REGEXP_CONTAINS` +### `ST_Y` ```sql -REGEXP_CONTAINS(value, regexp) +ST_Y(point_geography_expression) ``` **Description** -Returns `TRUE` if `value` is a partial match for the regular expression, -`regexp`. +Returns the latitude in degrees of the single-point input +`GEOGRAPHY`. -If the `regexp` argument is invalid, the function returns an error. +For any input `GEOGRAPHY` that is not a single point, +including an empty `GEOGRAPHY`, `ST_Y` returns an +error. Use the `SAFE.` prefix to return `NULL` instead. -You can search for a full match by using `^` (beginning of text) and `$` (end of -text). Due to regular expression operator precedence, it is good practice to use -parentheses around everything between `^` and `$`. +**Return type** -Note: ZetaSQL provides regular expression support using the -[re2][string-link-to-re2] library; see that documentation for its -regular expression syntax. +`DOUBLE` -**Return type** +**Example** -`BOOL` +See [`ST_X`][st-x] for example usage. -**Examples** +[st-x]: #st_x -```sql -SELECT - email, - REGEXP_CONTAINS(email, r'@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') AS is_valid -FROM - (SELECT - ['foo@example.com', 'bar@example.org', 'www.example.net'] - AS addresses), - UNNEST(addresses) AS email; +## Hash functions -/*-----------------+----------* - | email | is_valid | - +-----------------+----------+ - | foo@example.com | true | - | bar@example.org | true | - | www.example.net | false | - *-----------------+----------*/ +ZetaSQL supports the following hash functions. --- Performs a full match, using ^ and $. Due to regular expression operator --- precedence, it is good practice to use parentheses around everything between ^ --- and $. -SELECT - email, - REGEXP_CONTAINS(email, r'^([\w.+-]+@foo\.com|[\w.+-]+@bar\.org)$') - AS valid_email_address, - REGEXP_CONTAINS(email, r'^[\w.+-]+@foo\.com|[\w.+-]+@bar\.org$') - AS without_parentheses -FROM - (SELECT - ['a@foo.com', 'a@foo.computer', 'b@bar.org', '!b@bar.org', 'c@buz.net'] - AS addresses), - UNNEST(addresses) AS email; +### Function list -/*----------------+---------------------+---------------------* - | email | valid_email_address | without_parentheses | - +----------------+---------------------+---------------------+ - | a@foo.com | true | true | - | a@foo.computer | false | true | - | b@bar.org | true | true | - | !b@bar.org | false | true | - | c@buz.net | false | false | - *----------------+---------------------+---------------------*/ -``` + + + + + + + + -[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax + + + + -```sql -REGEXP_EXTRACT(value, regexp) -``` + + + + -Returns the first substring in `value` that matches the -[re2 regular expression][string-link-to-re2], -`regexp`. Returns `NULL` if there is no match. + + + + -Returns an error if: + + + + + + + + + + + +
NameSummary
FARM_FINGERPRINT -### `REGEXP_EXTRACT` + + Computes the fingerprint of a STRING or + BYTES value, using the FarmHash Fingerprint64 algorithm. +
MD5 -**Description** + + Computes the hash of a STRING or + BYTES value, using the MD5 algorithm. +
SHA1 -If the regular expression contains a capturing group (`(...)`), and there is a -match for that capturing group, that match is returned. If there -are multiple matches for a capturing group, the first match is returned. + + Computes the hash of a STRING or + BYTES value, using the SHA-1 algorithm. +
SHA256 -+ The regular expression is invalid -+ The regular expression has more than one capturing group + + Computes the hash of a STRING or + BYTES value, using the SHA-256 algorithm. +
SHA512 + + + Computes the hash of a STRING or + BYTES value, using the SHA-512 algorithm. +
+ +### `FARM_FINGERPRINT` + +``` +FARM_FINGERPRINT(value) +``` + +**Description** + +Computes the fingerprint of the `STRING` or `BYTES` input using the +`Fingerprint64` function from the +[open-source FarmHash library][hash-link-to-farmhash-github]. The output +of this function for a particular input will never change. **Return type** -`STRING` or `BYTES` +INT64 **Examples** ```sql -WITH email_addresses AS - (SELECT 'foo@example.com' as email - UNION ALL - SELECT 'bar@example.org' as email - UNION ALL - SELECT 'baz@example.net' as email) - +WITH example AS ( + SELECT 1 AS x, "foo" AS y, true AS z UNION ALL + SELECT 2 AS x, "apple" AS y, false AS z UNION ALL + SELECT 3 AS x, "" AS y, true AS z +) SELECT - REGEXP_EXTRACT(email, r'^[a-zA-Z0-9_.+-]+') - AS user_name -FROM email_addresses; - -/*-----------* - | user_name | - +-----------+ - | foo | - | bar | - | baz | - *-----------*/ + *, + FARM_FINGERPRINT(CONCAT(CAST(x AS STRING), y, CAST(z AS STRING))) + AS row_fingerprint +FROM example; +/*---+-------+-------+----------------------* + | x | y | z | row_fingerprint | + +---+-------+-------+----------------------+ + | 1 | foo | true | -1541654101129638711 | + | 2 | apple | false | 2794438866806483259 | + | 3 | | true | -4880158226897771312 | + *---+-------+-------+----------------------*/ ``` -```sql -WITH email_addresses AS - (SELECT 'foo@example.com' as email - UNION ALL - SELECT 'bar@example.org' as email - UNION ALL - SELECT 'baz@example.net' as email) +[hash-link-to-farmhash-github]: https://github.com/google/farmhash -SELECT - REGEXP_EXTRACT(email, r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.([a-zA-Z0-9-.]+$)') - AS top_level_domain -FROM email_addresses; +### `MD5` -/*------------------* - | top_level_domain | - +------------------+ - | com | - | org | - | net | - *------------------*/ +``` +MD5(input) ``` -```sql -WITH - characters AS ( - SELECT 'ab' AS value, '.b' AS regex UNION ALL - SELECT 'ab' AS value, '(.)b' AS regex UNION ALL - SELECT 'xyztb' AS value, '(.)+b' AS regex UNION ALL - SELECT 'ab' AS value, '(z)?b' AS regex - ) -SELECT value, regex, REGEXP_EXTRACT(value, regex) AS result FROM characters; +**Description** -/*-------+---------+----------* - | value | regex | result | - +-------+---------+----------+ - | ab | .b | ab | - | ab | (.)b | a | - | xyztb | (.)+b | t | - | ab | (z)?b | NULL | - *-------+---------+----------*/ -``` +Computes the hash of the input using the +[MD5 algorithm][hash-link-to-md5-wikipedia]. The input can either be +`STRING` or `BYTES`. The string version treats the input as an array of bytes. -[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax +This function returns 16 bytes. -### `REGEXP_EXTRACT_ALL` +Warning: MD5 is no longer considered secure. +For increased security use another hashing function. + +**Return type** + +`BYTES` + +**Example** ```sql -REGEXP_EXTRACT_ALL(value, regexp) +SELECT MD5("Hello World") as md5; + +/*-------------------------------------------------* + | md5 | + +-------------------------------------------------+ + | \xb1\n\x8d\xb1d\xe0uA\x05\xb7\xa9\x9b\xe7.?\xe5 | + *-------------------------------------------------*/ ``` -**Description** +[hash-link-to-md5-wikipedia]: https://en.wikipedia.org/wiki/MD5 -Returns an array of all substrings of `value` that match the -[re2 regular expression][string-link-to-re2], `regexp`. Returns an empty array -if there is no match. +### `SHA1` -If the regular expression contains a capturing group (`(...)`), and there is a -match for that capturing group, that match is added to the results. +``` +SHA1(input) +``` -The `REGEXP_EXTRACT_ALL` function only returns non-overlapping matches. For -example, using this function to extract `ana` from `banana` returns only one -substring, not two. +**Description** -Returns an error if: +Computes the hash of the input using the +[SHA-1 algorithm][hash-link-to-sha-1-wikipedia]. The input can either be +`STRING` or `BYTES`. The string version treats the input as an array of bytes. -+ The regular expression is invalid -+ The regular expression has more than one capturing group +This function returns 20 bytes. + +Warning: SHA1 is no longer considered secure. +For increased security, use another hashing function. **Return type** -`ARRAY` or `ARRAY` +`BYTES` -**Examples** +**Example** ```sql -WITH code_markdown AS - (SELECT 'Try `function(x)` or `function(y)`' as code) - -SELECT - REGEXP_EXTRACT_ALL(code, '`(.+?)`') AS example -FROM code_markdown; +SELECT SHA1("Hello World") as sha1; -/*----------------------------* - | example | - +----------------------------+ - | [function(x), function(y)] | - *----------------------------*/ +/*-----------------------------------------------------------* + | sha1 | + +-----------------------------------------------------------+ + | \nMU\xa8\xd7x\xe5\x02/\xabp\x19w\xc5\xd8@\xbb\xc4\x86\xd0 | + *-----------------------------------------------------------*/ ``` -[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax +[hash-link-to-sha-1-wikipedia]: https://en.wikipedia.org/wiki/SHA-1 -### `REGEXP_INSTR` +### `SHA256` -```sql -REGEXP_INSTR(source_value, regexp [, position[, occurrence, [occurrence_position]]]) +``` +SHA256(input) ``` **Description** -Returns the lowest 1-based position of a regular expression, `regexp`, in -`source_value`. `source_value` and `regexp` must be the same type, either -`STRING` or `BYTES`. +Computes the hash of the input using the +[SHA-256 algorithm][hash-link-to-sha-2-wikipedia]. The input can either be +`STRING` or `BYTES`. The string version treats the input as an array of bytes. -If `position` is specified, the search starts at this position in -`source_value`, otherwise it starts at `1`, which is the beginning of -`source_value`. `position` is of type `INT64` and must be positive. +This function returns 32 bytes. -If `occurrence` is specified, the search returns the position of a specific -instance of `regexp` in `source_value`. If not specified, `occurrence` defaults -to `1` and returns the position of the first occurrence. For `occurrence` > 1, -the function searches for the next, non-overlapping occurrence. -`occurrence` is of type `INT64` and must be positive. +**Return type** -You can optionally use `occurrence_position` to specify where a position -in relation to an `occurrence` starts. Your choices are: +`BYTES` -+ `0`: Returns the start position of `occurrence`. -+ `1`: Returns the end position of `occurrence` + `1`. If the - end of the occurrence is at the end of `source_value `, - `LENGTH(source_value) + 1` is returned. +**Example** -Returns `0` if: +```sql +SELECT SHA256("Hello World") as sha256; +``` -+ No match is found. -+ If `occurrence` is greater than the number of matches found. -+ If `position` is greater than the length of `source_value`. -+ The regular expression is empty. +[hash-link-to-sha-2-wikipedia]: https://en.wikipedia.org/wiki/SHA-2 -Returns `NULL` if: +### `SHA512` -+ `position` is `NULL`. -+ `occurrence` is `NULL`. +``` +SHA512(input) +``` -Returns an error if: +**Description** -+ `position` is `0` or negative. -+ `occurrence` is `0` or negative. -+ `occurrence_position` is neither `0` nor `1`. -+ The regular expression is invalid. -+ The regular expression has more than one capturing group. +Computes the hash of the input using the +[SHA-512 algorithm][hash-link-to-sha-2-wikipedia]. The input can either be +`STRING` or `BYTES`. The string version treats the input as an array of bytes. + +This function returns 64 bytes. **Return type** -`INT64` +`BYTES` -**Examples** +**Example** ```sql -WITH example AS ( - SELECT 'ab@cd-ef' AS source_value, '@[^-]*' AS regexp UNION ALL - SELECT 'ab@d-ef', '@[^-]*' UNION ALL - SELECT 'abc@cd-ef', '@[^-]*' UNION ALL - SELECT 'abc-ef', '@[^-]*') -SELECT source_value, regexp, REGEXP_INSTR(source_value, regexp) AS instr -FROM example; - -/*--------------+--------+-------* - | source_value | regexp | instr | - +--------------+--------+-------+ - | ab@cd-ef | @[^-]* | 3 | - | ab@d-ef | @[^-]* | 3 | - | abc@cd-ef | @[^-]* | 4 | - | abc-ef | @[^-]* | 0 | - *--------------+--------+-------*/ +SELECT SHA512("Hello World") as sha512; ``` -```sql -WITH example AS ( - SELECT 'a@cd-ef b@cd-ef' AS source_value, '@[^-]*' AS regexp, 1 AS position UNION ALL - SELECT 'a@cd-ef b@cd-ef', '@[^-]*', 2 UNION ALL - SELECT 'a@cd-ef b@cd-ef', '@[^-]*', 3 UNION ALL - SELECT 'a@cd-ef b@cd-ef', '@[^-]*', 4) -SELECT - source_value, regexp, position, - REGEXP_INSTR(source_value, regexp, position) AS instr -FROM example; +[hash-link-to-sha-2-wikipedia]: https://en.wikipedia.org/wiki/SHA-2 -/*-----------------+--------+----------+-------* - | source_value | regexp | position | instr | - +-----------------+--------+----------+-------+ - | a@cd-ef b@cd-ef | @[^-]* | 1 | 2 | - | a@cd-ef b@cd-ef | @[^-]* | 2 | 2 | - | a@cd-ef b@cd-ef | @[^-]* | 3 | 10 | - | a@cd-ef b@cd-ef | @[^-]* | 4 | 10 | - *-----------------+--------+----------+-------*/ -``` +## HyperLogLog++ functions + -```sql -WITH example AS ( - SELECT 'a@cd-ef b@cd-ef c@cd-ef' AS source_value, - '@[^-]*' AS regexp, 1 AS position, 1 AS occurrence UNION ALL - SELECT 'a@cd-ef b@cd-ef c@cd-ef', '@[^-]*', 1, 2 UNION ALL - SELECT 'a@cd-ef b@cd-ef c@cd-ef', '@[^-]*', 1, 3) -SELECT - source_value, regexp, position, occurrence, - REGEXP_INSTR(source_value, regexp, position, occurrence) AS instr -FROM example; +The [HyperLogLog++ algorithm (HLL++)][hll-sketches] estimates +[cardinality][cardinality] from [sketches][hll-sketches]. -/*-------------------------+--------+----------+------------+-------* - | source_value | regexp | position | occurrence | instr | - +-------------------------+--------+----------+------------+-------+ - | a@cd-ef b@cd-ef c@cd-ef | @[^-]* | 1 | 1 | 2 | - | a@cd-ef b@cd-ef c@cd-ef | @[^-]* | 1 | 2 | 10 | - | a@cd-ef b@cd-ef c@cd-ef | @[^-]* | 1 | 3 | 18 | - *-------------------------+--------+----------+------------+-------*/ -``` +HLL++ functions are approximate aggregate functions. +Approximate aggregation typically requires less +memory than exact aggregation functions, +like [`COUNT(DISTINCT)`][count-distinct], but also introduces statistical error. +This makes HLL++ functions appropriate for large data streams for +which linear memory usage is impractical, as well as for data that is +already approximate. -```sql -WITH example AS ( - SELECT 'a@cd-ef' AS source_value, '@[^-]*' AS regexp, - 1 AS position, 1 AS occurrence, 0 AS o_position UNION ALL - SELECT 'a@cd-ef', '@[^-]*', 1, 1, 1) -SELECT - source_value, regexp, position, occurrence, o_position, - REGEXP_INSTR(source_value, regexp, position, occurrence, o_position) AS instr -FROM example; +If you do not need materialized sketches, you can alternatively use an +[approximate aggregate function with system-defined precision][approx-functions-reference], +such as [`APPROX_COUNT_DISTINCT`][approx-count-distinct]. However, +`APPROX_COUNT_DISTINCT` does not allow partial aggregations, re-aggregations, +and custom precision. -/*--------------+--------+----------+------------+------------+-------* - | source_value | regexp | position | occurrence | o_position | instr | - +--------------+--------+----------+------------+------------+-------+ - | a@cd-ef | @[^-]* | 1 | 1 | 0 | 2 | - | a@cd-ef | @[^-]* | 1 | 1 | 1 | 5 | - *--------------+--------+----------+------------+------------+-------*/ -``` +ZetaSQL supports the following HLL++ functions: -### `REGEXP_MATCH` (Deprecated) - +### Function list -```sql -REGEXP_MATCH(value, regexp) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameSummary
HLL_COUNT.EXTRACT + + + Extracts a cardinality estimate of an HLL++ sketch. +
HLL_COUNT.INIT + + + Aggregates values of the same underlying type into a new HLL++ sketch. +
HLL_COUNT.MERGE + + + Merges HLL++ sketches of the same underlying type into a new sketch, and + then gets the cardinality of the new sketch. +
HLL_COUNT.MERGE_PARTIAL + + + Merges HLL++ sketches of the same underlying type into a new sketch. +
+ +### `HLL_COUNT.EXTRACT` + +``` +HLL_COUNT.EXTRACT(sketch) ``` **Description** -Returns `TRUE` if `value` is a full match for the regular expression, `regexp`. +A scalar function that extracts a cardinality estimate of a single +[HLL++][hll-link-to-research-whitepaper] sketch. -If the `regexp` argument is invalid, the function returns an error. +If `sketch` is `NULL`, this function returns a cardinality estimate of `0`. -This function is deprecated. When possible, use -[`REGEXP_CONTAINS`][regexp-contains] to find a partial match for a -regular expression. +**Supported input types** -Note: ZetaSQL provides regular expression support using the -[re2][string-link-to-re2] library; see that documentation for its -regular expression syntax. +`BYTES` **Return type** -`BOOL` +`INT64` -**Examples** +**Example** -```sql -WITH email_addresses AS - (SELECT 'foo@example.com' as email - UNION ALL - SELECT 'bar@example.org' as email - UNION ALL - SELECT 'notavalidemailaddress' as email) +The following query returns the number of distinct users for each country who +have at least one invoice. +```sql SELECT - email, - REGEXP_MATCH(email, - r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') - AS valid_email_address -FROM email_addresses; + country, + HLL_COUNT.EXTRACT(HLL_sketch) AS distinct_customers_with_open_invoice +FROM + ( + SELECT + country, + HLL_COUNT.INIT(customer_id) AS hll_sketch + FROM + UNNEST( + ARRAY>[ + ('UA', 'customer_id_1', 'invoice_id_11'), + ('BR', 'customer_id_3', 'invoice_id_31'), + ('CZ', 'customer_id_2', 'invoice_id_22'), + ('CZ', 'customer_id_2', 'invoice_id_23'), + ('BR', 'customer_id_3', 'invoice_id_31'), + ('UA', 'customer_id_2', 'invoice_id_24')]) + GROUP BY country + ); -/*-----------------------+---------------------* - | email | valid_email_address | - +-----------------------+---------------------+ - | foo@example.com | true | - | bar@example.org | true | - | notavalidemailaddress | false | - *-----------------------+---------------------*/ +/*---------+--------------------------------------* + | country | distinct_customers_with_open_invoice | + +---------+--------------------------------------+ + | UA | 2 | + | BR | 1 | + | CZ | 1 | + *---------+--------------------------------------*/ ``` -[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax - -[regexp-contains]: #regexp_contains +[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html -### `REGEXP_REPLACE` +### `HLL_COUNT.INIT` -```sql -REGEXP_REPLACE(value, regexp, replacement) +``` +HLL_COUNT.INIT(input [, precision]) ``` **Description** -Returns a `STRING` where all substrings of `value` that -match regular expression `regexp` are replaced with `replacement`. +An aggregate function that takes one or more `input` values and aggregates them +into a [HLL++][hll-link-to-research-whitepaper] sketch. Each sketch +is represented using the `BYTES` data type. You can then merge sketches using +`HLL_COUNT.MERGE` or `HLL_COUNT.MERGE_PARTIAL`. If no merging is needed, +you can extract the final count of distinct values from the sketch using +`HLL_COUNT.EXTRACT`. -You can use backslashed-escaped digits (\1 to \9) within the `replacement` -argument to insert text matching the corresponding parenthesized group in the -`regexp` pattern. Use \0 to refer to the entire matching text. +This function supports an optional parameter, `precision`. This parameter +defines the accuracy of the estimate at the cost of additional memory required +to process the sketches or store them on disk. The range for this value is +`10` to `24`. The default value is `15`. For more information about precision, +see [Precision for sketches][precision_hll]. -To add a backslash in your regular expression, you must first escape it. For -example, `SELECT REGEXP_REPLACE('abc', 'b(.)', 'X\\1');` returns `aXc`. You can -also use [raw strings][string-link-to-lexical-literals] to remove one layer of -escaping, for example `SELECT REGEXP_REPLACE('abc', 'b(.)', r'X\1');`. +If the input is `NULL`, this function returns `NULL`. -The `REGEXP_REPLACE` function only replaces non-overlapping matches. For -example, replacing `ana` within `banana` results in only one replacement, not -two. +For more information, see [HyperLogLog in Practice: Algorithmic Engineering of +a State of The Art Cardinality Estimation Algorithm][hll-link-to-research-whitepaper]. -If the `regexp` argument is not a valid regular expression, this function -returns an error. +**Supported input types** -Note: ZetaSQL provides regular expression support using the -[re2][string-link-to-re2] library; see that documentation for its -regular expression syntax. ++ `INT64` ++ `UINT64` ++ `NUMERIC` ++ `BIGNUMERIC` ++ `STRING` ++ `BYTES` **Return type** -`STRING` or `BYTES` +`BYTES` -**Examples** +**Example** -```sql -WITH markdown AS - (SELECT '# Heading' as heading - UNION ALL - SELECT '# Another heading' as heading) +The following query creates HLL++ sketches that count the number of distinct +users with at least one invoice per country. +```sql SELECT - REGEXP_REPLACE(heading, r'^# ([a-zA-Z0-9\s]+$)', '

\\1

') - AS html -FROM markdown; + country, + HLL_COUNT.INIT(customer_id, 10) + AS hll_sketch +FROM + UNNEST( + ARRAY>[ + ('UA', 'customer_id_1', 'invoice_id_11'), + ('CZ', 'customer_id_2', 'invoice_id_22'), + ('CZ', 'customer_id_2', 'invoice_id_23'), + ('BR', 'customer_id_3', 'invoice_id_31'), + ('UA', 'customer_id_2', 'invoice_id_24')]) +GROUP BY country; -/*--------------------------* - | html | - +--------------------------+ - |

Heading

| - |

Another heading

| - *--------------------------*/ +/*---------+------------------------------------------------------------------------------------* + | country | hll_sketch | + +---------+------------------------------------------------------------------------------------+ + | UA | "\010p\020\002\030\002 \013\202\007\r\020\002\030\n \0172\005\371\344\001\315\010" | + | CZ | "\010p\020\002\030\002 \013\202\007\013\020\001\030\n \0172\003\371\344\001" | + | BR | "\010p\020\001\030\002 \013\202\007\013\020\001\030\n \0172\003\202\341\001" | + *---------+------------------------------------------------------------------------------------*/ ``` -[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax +[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html -[string-link-to-lexical-literals]: https://github.com/google/zetasql/blob/master/docs/lexical.md#string_and_bytes_literals +[precision_hll]: https://github.com/google/zetasql/blob/master/docs/sketches.md#precision_hll -### `REPEAT` +### `HLL_COUNT.MERGE` -```sql -REPEAT(original_value, repetitions) +``` +HLL_COUNT.MERGE(sketch) ``` **Description** -Returns a `STRING` or `BYTES` value that consists of `original_value`, repeated. -The `repetitions` parameter specifies the number of times to repeat -`original_value`. Returns `NULL` if either `original_value` or `repetitions` -are `NULL`. +An aggregate function that returns the cardinality of several +[HLL++][hll-link-to-research-whitepaper] sketches by computing their union. -This function returns an error if the `repetitions` value is negative. +Each `sketch` must be initialized on the same type. Attempts to merge sketches +for different types results in an error. For example, you cannot merge a sketch +initialized from `INT64` data with one initialized from `STRING` data. + +If the merged sketches were initialized with different precisions, the precision +will be downgraded to the lowest precision involved in the merge. + +This function ignores `NULL` values when merging sketches. If the merge happens +over zero rows or only over `NULL` values, the function returns `0`. + +**Supported input types** + +`BYTES` **Return type** -`STRING` or `BYTES` +`INT64` -**Examples** +**Example** + + The following query counts the number of distinct users across all countries + who have at least one invoice. ```sql -SELECT t, n, REPEAT(t, n) AS REPEAT FROM UNNEST([ - STRUCT('abc' AS t, 3 AS n), - ('例å­', 2), - ('abc', null), - (null, 3) -]); +SELECT HLL_COUNT.MERGE(hll_sketch) AS distinct_customers_with_open_invoice +FROM + ( + SELECT + country, + HLL_COUNT.INIT(customer_id) AS hll_sketch + FROM + UNNEST( + ARRAY>[ + ('UA', 'customer_id_1', 'invoice_id_11'), + ('BR', 'customer_id_3', 'invoice_id_31'), + ('CZ', 'customer_id_2', 'invoice_id_22'), + ('CZ', 'customer_id_2', 'invoice_id_23'), + ('BR', 'customer_id_3', 'invoice_id_31'), + ('UA', 'customer_id_2', 'invoice_id_24')]) + GROUP BY country + ); -/*------+------+-----------* - | t | n | REPEAT | - |------|------|-----------| - | abc | 3 | abcabcabc | - | ä¾‹å­ | 2 | 例å­ä¾‹å­ | - | abc | NULL | NULL | - | NULL | 3 | NULL | - *------+------+-----------*/ +/*--------------------------------------* + | distinct_customers_with_open_invoice | + +--------------------------------------+ + | 3 | + *--------------------------------------*/ ``` -### `REPLACE` +[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html -```sql -REPLACE(original_value, from_pattern, to_pattern) +### `HLL_COUNT.MERGE_PARTIAL` + +``` +HLL_COUNT.MERGE_PARTIAL(sketch) ``` **Description** -Replaces all occurrences of `from_pattern` with `to_pattern` in -`original_value`. If `from_pattern` is empty, no replacement is made. +An aggregate function that takes one or more +[HLL++][hll-link-to-research-whitepaper] `sketch` +inputs and merges them into a new sketch. -This function supports specifying [collation][collation]. +Each `sketch` must be initialized on the same type. Attempts to merge sketches +for different types results in an error. For example, you cannot merge a sketch +initialized from `INT64` data with one initialized from `STRING` data. -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about +If the merged sketches were initialized with different precisions, the precision +will be downgraded to the lowest precision involved in the merge. For example, +if `MERGE_PARTIAL` encounters sketches of precision 14 and 15, the returned new +sketch will have precision 14. -**Return type** +This function returns `NULL` if there is no input or all inputs are `NULL`. -`STRING` or `BYTES` +**Supported input types** -**Examples** +`BYTES` -```sql -WITH desserts AS - (SELECT 'apple pie' as dessert - UNION ALL - SELECT 'blackberry pie' as dessert - UNION ALL - SELECT 'cherry pie' as dessert) +**Return type** -SELECT - REPLACE (dessert, 'pie', 'cobbler') as example -FROM desserts; +`BYTES` -/*--------------------* - | example | - +--------------------+ - | apple cobbler | - | blackberry cobbler | - | cherry cobbler | - *--------------------*/ -``` +**Example** -### `REVERSE` +The following query returns an HLL++ sketch that counts the number of distinct +users who have at least one invoice across all countries. ```sql -REVERSE(value) +SELECT HLL_COUNT.MERGE_PARTIAL(HLL_sketch) AS distinct_customers_with_open_invoice +FROM + ( + SELECT + country, + HLL_COUNT.INIT(customer_id) AS hll_sketch + FROM + UNNEST( + ARRAY>[ + ('UA', 'customer_id_1', 'invoice_id_11'), + ('BR', 'customer_id_3', 'invoice_id_31'), + ('CZ', 'customer_id_2', 'invoice_id_22'), + ('CZ', 'customer_id_2', 'invoice_id_23'), + ('BR', 'customer_id_3', 'invoice_id_31'), + ('UA', 'customer_id_2', 'invoice_id_24')]) + GROUP BY country + ); + +/*----------------------------------------------------------------------------------------------* + | distinct_customers_with_open_invoice | + +----------------------------------------------------------------------------------------------+ + | "\010p\020\006\030\002 \013\202\007\020\020\003\030\017 \0242\010\320\2408\352}\244\223\002" | + *----------------------------------------------------------------------------------------------*/ ``` -**Description** +[hll-link-to-research-whitepaper]: https://research.google.com/pubs/pub40671.html -Returns the reverse of the input `STRING` or `BYTES`. +[hll-sketches]: https://github.com/google/zetasql/blob/master/docs/sketches.md#sketches_hll -**Return type** +[cardinality]: https://en.wikipedia.org/wiki/Cardinality -`STRING` or `BYTES` +[count-distinct]: #count -**Examples** +[approx-count-distinct]: #approx-count-distinct -```sql -WITH example AS ( - SELECT 'foo' AS sample_string, b'bar' AS sample_bytes UNION ALL - SELECT 'абвгд' AS sample_string, b'123' AS sample_bytes -) -SELECT - sample_string, - REVERSE(sample_string) AS reverse_string, - sample_bytes, - REVERSE(sample_bytes) AS reverse_bytes -FROM example; +[approx-functions-reference]: #approximate_aggregate_functions -/*---------------+----------------+--------------+---------------* - | sample_string | reverse_string | sample_bytes | reverse_bytes | - +---------------+----------------+--------------+---------------+ - | foo | oof | bar | rab | - | абвгд | дгвба | 123 | 321 | - *---------------+----------------+--------------+---------------*/ -``` +## Interval functions -### `RIGHT` +ZetaSQL supports the following interval functions. -```sql -RIGHT(value, length) -``` +### Function list -**Description** + + + + + + + + -Returns a `STRING` or `BYTES` value that consists of the specified -number of rightmost characters or bytes from `value`. The `length` is an -`INT64` that specifies the length of the returned -value. If `value` is `BYTES`, `length` is the number of rightmost bytes to -return. If `value` is `STRING`, `length` is the number of rightmost characters -to return. + + + + -**Return type** + + + + -**Examples** - -```sql -WITH examples AS -(SELECT 'apple' as example -UNION ALL -SELECT 'banana' as example -UNION ALL -SELECT 'абвгд' as example -) -SELECT example, RIGHT(example, 3) AS right_example -FROM examples; - -/*---------+---------------* - | example | right_example | - +---------+---------------+ - | apple | ple | - | banana | ana | - | абвгд | вгд | - *---------+---------------*/ -``` - -```sql -WITH examples AS -(SELECT b'apple' as example -UNION ALL -SELECT b'banana' as example -UNION ALL -SELECT b'\xab\xcd\xef\xaa\xbb' as example -) -SELECT example, RIGHT(example, 3) AS right_example -FROM examples; - -/*----------------------+---------------* - | example | right_example | - +----------------------+---------------+ - | apple | ple | - | banana | ana | - | \xab\xcd\xef\xaa\xbb | \xef\xaa\xbb | - *----------------------+---------------* -``` - -### `RPAD` - -```sql -RPAD(original_value, return_length[, pattern]) -``` - -**Description** - -Returns a `STRING` or `BYTES` value that consists of `original_value` appended -with `pattern`. The `return_length` parameter is an -`INT64` that specifies the length of the -returned value. If `original_value` is `BYTES`, -`return_length` is the number of bytes. If `original_value` is `STRING`, -`return_length` is the number of characters. - -The default value of `pattern` is a blank space. - -Both `original_value` and `pattern` must be the same data type. - -If `return_length` is less than or equal to the `original_value` length, this -function returns the `original_value` value, truncated to the value of -`return_length`. For example, `RPAD('hello world', 7);` returns `'hello w'`. - -If `original_value`, `return_length`, or `pattern` is `NULL`, this function -returns `NULL`. - -This function returns an error if: - -+ `return_length` is negative -+ `pattern` is empty - -**Return type** - -`STRING` or `BYTES` - -**Examples** - -```sql -SELECT t, len, FORMAT('%T', RPAD(t, len)) AS RPAD FROM UNNEST([ - STRUCT('abc' AS t, 5 AS len), - ('abc', 2), - ('例å­', 4) -]); - -/*------+-----+----------* - | t | len | RPAD | - +------+-----+----------+ - | abc | 5 | "abc " | - | abc | 2 | "ab" | - | ä¾‹å­ | 4 | "ä¾‹å­ " | - *------+-----+----------*/ -``` + + + + -/*------+-----+---------+--------------* - | t | len | pattern | RPAD | - +------+-----+---------+--------------+ - | abc | 8 | def | "abcdefde" | - | abc | 5 | - | "abc--" | - | ä¾‹å­ | 5 | 中文 | "例å­ä¸­æ–‡ä¸­" | - *------+-----+---------+--------------*/ -``` + + + + -/*-----------------+-----+------------------* - | t | len | RPAD | - +-----------------+-----+------------------+ - | b"abc" | 5 | b"abc " | - | b"abc" | 2 | b"ab" | - | b"\xab\xcd\xef" | 4 | b"\xab\xcd\xef " | - *-----------------+-----+------------------*/ -``` + + + + -/*-----------------+-----+---------+-------------------------* - | t | len | pattern | RPAD | - +-----------------+-----+---------+-------------------------+ - | b"abc" | 8 | b"def" | b"abcdefde" | - | b"abc" | 5 | b"-" | b"abc--" | - | b"\xab\xcd\xef" | 5 | b"\x00" | b"\xab\xcd\xef\x00\x00" | - *-----------------+-----+---------+-------------------------*/ -``` + +
NameSummary
EXTRACT -If `length` is 0, an empty `STRING` or `BYTES` value will be -returned. If `length` is negative, an error will be returned. If `length` -exceeds the number of characters or bytes from `value`, the original `value` -will be returned. + + Extracts part of an INTERVAL value. +
JUSTIFY_DAYS -`STRING` or `BYTES` + + Normalizes the day part of an INTERVAL value. +
JUSTIFY_HOURS -```sql -SELECT t, len, pattern, FORMAT('%T', RPAD(t, len, pattern)) AS RPAD FROM UNNEST([ - STRUCT('abc' AS t, 8 AS len, 'def' AS pattern), - ('abc', 5, '-'), - ('例å­', 5, '中文') -]); + + Normalizes the time part of an INTERVAL value. +
JUSTIFY_INTERVAL -```sql -SELECT FORMAT('%T', t) AS t, len, FORMAT('%T', RPAD(t, len)) AS RPAD FROM UNNEST([ - STRUCT(b'abc' AS t, 5 AS len), - (b'abc', 2), - (b'\xab\xcd\xef', 4) -]); + + Normalizes the day and time parts of an INTERVAL value. +
MAKE_INTERVAL -```sql -SELECT - FORMAT('%T', t) AS t, - len, - FORMAT('%T', pattern) AS pattern, - FORMAT('%T', RPAD(t, len, pattern)) AS RPAD -FROM UNNEST([ - STRUCT(b'abc' AS t, 8 AS len, b'def' AS pattern), - (b'abc', 5, b'-'), - (b'\xab\xcd\xef', 5, b'\x00') -]); + + Constructs an INTERVAL value. +
-### `RTRIM` +### `EXTRACT` ```sql -RTRIM(value1[, value2]) +EXTRACT(part FROM interval_expression) ``` **Description** -Identical to [TRIM][string-link-to-trim], but only removes trailing characters. +Returns the value corresponding to the specified date part. The `part` must be +one of `YEAR`, `MONTH`, `DAY`, `HOUR`, `MINUTE`, `SECOND`, `MILLISECOND` or +`MICROSECOND`. -**Return type** +**Return Data Type** -`STRING` or `BYTES` +`INTERVAL` **Examples** -```sql -WITH items AS - (SELECT '***apple***' as item - UNION ALL - SELECT '***banana***' as item - UNION ALL - SELECT '***orange***' as item) +In the following example, different parts of two intervals are extracted. +```sql SELECT - RTRIM(item, '*') as example -FROM items; + EXTRACT(YEAR FROM i) AS year, + EXTRACT(MONTH FROM i) AS month, + EXTRACT(DAY FROM i) AS day, + EXTRACT(HOUR FROM i) AS hour, + EXTRACT(MINUTE FROM i) AS minute, + EXTRACT(SECOND FROM i) AS second, + EXTRACT(MILLISECOND FROM i) AS milli, + EXTRACT(MICROSECOND FROM i) AS micro +FROM + UNNEST([INTERVAL '1-2 3 4:5:6.789999' YEAR TO SECOND, + INTERVAL '0-13 370 48:61:61' YEAR TO SECOND]) AS i -/*-----------* - | example | - +-----------+ - | ***apple | - | ***banana | - | ***orange | - *-----------*/ +/*------+-------+-----+------+--------+--------+-------+--------* + | year | month | day | hour | minute | second | milli | micro | + +------+-------+-----+------+--------+--------+-------+--------+ + | 1 | 2 | 3 | 4 | 5 | 6 | 789 | 789999 | + | 1 | 1 | 370 | 49 | 2 | 1 | 0 | 0 | + *------+-------+-----+------+--------+--------+-------+--------*/ ``` -```sql -WITH items AS - (SELECT 'applexxx' as item - UNION ALL - SELECT 'bananayyy' as item - UNION ALL - SELECT 'orangezzz' as item - UNION ALL - SELECT 'pearxyz' as item) +When a negative sign precedes the time part in an interval, the negative sign +distributes over the hours, minutes, and seconds. For example: +```sql SELECT - RTRIM(item, 'xyz') as example -FROM items; + EXTRACT(HOUR FROM i) AS hour, + EXTRACT(MINUTE FROM i) AS minute +FROM + UNNEST([INTERVAL '10 -12:30' DAY TO MINUTE]) AS i -/*---------* - | example | - +---------+ - | apple | - | banana | - | orange | - | pear | - *---------*/ +/*------+--------* + | hour | minute | + +------+--------+ + | -12 | -30 | + *------+--------*/ ``` -[string-link-to-trim]: #trim - -### `SAFE_CONVERT_BYTES_TO_STRING` +When a negative sign precedes the year and month part in an interval, the +negative sign distributes over the years and months. For example: ```sql -SAFE_CONVERT_BYTES_TO_STRING(value) -``` - -**Description** - -Converts a sequence of `BYTES` to a `STRING`. Any invalid UTF-8 characters are -replaced with the Unicode replacement character, `U+FFFD`. - -**Return type** - -`STRING` - -**Examples** - -The following statement returns the Unicode replacement character, �. +SELECT + EXTRACT(YEAR FROM i) AS year, + EXTRACT(MONTH FROM i) AS month +FROM + UNNEST([INTERVAL '-22-6 10 -12:30' YEAR TO MINUTE]) AS i -```sql -SELECT SAFE_CONVERT_BYTES_TO_STRING(b'\xc2') as safe_convert; +/*------+--------* + | year | month | + +------+--------+ + | -22 | -6 | + *------+--------*/ ``` -### `SOUNDEX` +### `JUSTIFY_DAYS` ```sql -SOUNDEX(value) +JUSTIFY_DAYS(interval_expression) ``` **Description** -Returns a `STRING` that represents the -[Soundex][string-link-to-soundex-wikipedia] code for `value`. - -SOUNDEX produces a phonetic representation of a string. It indexes words by -sound, as pronounced in English. It is typically used to help determine whether -two strings, such as the family names _Levine_ and _Lavine_, or the words _to_ -and _too_, have similar English-language pronunciation. - -The result of the SOUNDEX consists of a letter followed by 3 digits. Non-latin -characters are ignored. If the remaining string is empty after removing -non-Latin characters, an empty `STRING` is returned. +Normalizes the day part of the interval to the range from -29 to 29 by +incrementing/decrementing the month or year part of the interval. -**Return type** +**Return Data Type** -`STRING` +`INTERVAL` -**Examples** +**Example** ```sql -WITH example AS ( - SELECT 'Ashcraft' AS value UNION ALL - SELECT 'Raven' AS value UNION ALL - SELECT 'Ribbon' AS value UNION ALL - SELECT 'apple' AS value UNION ALL - SELECT 'Hello world!' AS value UNION ALL - SELECT ' H3##!@llo w00orld!' AS value UNION ALL - SELECT '#1' AS value UNION ALL - SELECT NULL AS value -) -SELECT value, SOUNDEX(value) AS soundex -FROM example; +SELECT + JUSTIFY_DAYS(INTERVAL 29 DAY) AS i1, + JUSTIFY_DAYS(INTERVAL -30 DAY) AS i2, + JUSTIFY_DAYS(INTERVAL 31 DAY) AS i3, + JUSTIFY_DAYS(INTERVAL -65 DAY) AS i4, + JUSTIFY_DAYS(INTERVAL 370 DAY) AS i5 -/*----------------------+---------* - | value | soundex | - +----------------------+---------+ - | Ashcraft | A261 | - | Raven | R150 | - | Ribbon | R150 | - | apple | a140 | - | Hello world! | H464 | - | H3##!@llo w00orld! | H464 | - | #1 | | - | NULL | NULL | - *----------------------+---------*/ +/*--------------+--------------+-------------+---------------+--------------* + | i1 | i2 | i3 | i4 | i5 | + +--------------+--------------+-------------+---------------+--------------+ + | 0-0 29 0:0:0 | -0-1 0 0:0:0 | 0-1 1 0:0:0 | -0-2 -5 0:0:0 | 1-0 10 0:0:0 | + *--------------+--------------+-------------+---------------+--------------*/ ``` -[string-link-to-soundex-wikipedia]: https://en.wikipedia.org/wiki/Soundex - -### `SPLIT` +### `JUSTIFY_HOURS` ```sql -SPLIT(value[, delimiter]) +JUSTIFY_HOURS(interval_expression) ``` **Description** -Splits `value` using the `delimiter` argument. - -For `STRING`, the default delimiter is the comma `,`. - -For `BYTES`, you must specify a delimiter. - -Splitting on an empty delimiter produces an array of UTF-8 characters for -`STRING` values, and an array of `BYTES` for `BYTES` values. - -Splitting an empty `STRING` returns an -`ARRAY` with a single empty -`STRING`. - -This function supports specifying [collation][collation]. - -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about +Normalizes the time part of the interval to the range from -23:59:59.999999 to +23:59:59.999999 by incrementing/decrementing the day part of the interval. -**Return type** +**Return Data Type** -`ARRAY` or `ARRAY` +`INTERVAL` -**Examples** +**Example** ```sql -WITH letters AS - (SELECT '' as letter_group - UNION ALL - SELECT 'a' as letter_group - UNION ALL - SELECT 'b c d' as letter_group) - -SELECT SPLIT(letter_group, ' ') as example -FROM letters; +SELECT + JUSTIFY_HOURS(INTERVAL 23 HOUR) AS i1, + JUSTIFY_HOURS(INTERVAL -24 HOUR) AS i2, + JUSTIFY_HOURS(INTERVAL 47 HOUR) AS i3, + JUSTIFY_HOURS(INTERVAL -12345 MINUTE) AS i4 -/*----------------------* - | example | - +----------------------+ - | [] | - | [a] | - | [b, c, d] | - *----------------------*/ +/*--------------+--------------+--------------+-----------------* + | i1 | i2 | i3 | i4 | + +--------------+--------------+--------------+-----------------+ + | 0-0 0 23:0:0 | 0-0 -1 0:0:0 | 0-0 1 23:0:0 | 0-0 -8 -13:45:0 | + *--------------+--------------+--------------+-----------------*/ ``` -### `STARTS_WITH` +### `JUSTIFY_INTERVAL` ```sql -STARTS_WITH(value, prefix) +JUSTIFY_INTERVAL(interval_expression) ``` **Description** -Takes two `STRING` or `BYTES` values. Returns `TRUE` if `prefix` is a -prefix of `value`. - -This function supports specifying [collation][collation]. - -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about +Normalizes the days and time parts of the interval. -**Return type** +**Return Data Type** -`BOOL` +`INTERVAL` -**Examples** +**Example** ```sql -WITH items AS - (SELECT 'foo' as item - UNION ALL - SELECT 'bar' as item - UNION ALL - SELECT 'baz' as item) - -SELECT - STARTS_WITH(item, 'b') as example -FROM items; +SELECT JUSTIFY_INTERVAL(INTERVAL '29 49:00:00' DAY TO SECOND) AS i -/*---------* - | example | - +---------+ - | False | - | True | - | True | - *---------*/ +/*-------------* + | i | + +-------------+ + | 0-1 1 1:0:0 | + *-------------*/ ``` -### `STRPOS` +### `MAKE_INTERVAL` ```sql -STRPOS(value, subvalue) +MAKE_INTERVAL([year][, month][, day][, hour][, minute][, second]) ``` **Description** -Takes two `STRING` or `BYTES` values. Returns the 1-based position of the first -occurrence of `subvalue` inside `value`. Returns `0` if `subvalue` is not found. - -This function supports specifying [collation][collation]. - -[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about +Constructs an [`INTERVAL`][interval-type] object using `INT64` values +representing the year, month, day, hour, minute, and second. All arguments are +optional, `0` by default, and can be [named arguments][named-arguments]. -**Return type** +**Return Data Type** -`INT64` +`INTERVAL` -**Examples** +**Example** ```sql -WITH email_addresses AS - (SELECT - 'foo@example.com' AS email_address - UNION ALL - SELECT - 'foobar@example.com' AS email_address - UNION ALL - SELECT - 'foobarbaz@example.com' AS email_address - UNION ALL - SELECT - 'quxexample.com' AS email_address) - SELECT - STRPOS(email_address, '@') AS example -FROM email_addresses; - -/*---------* - | example | - +---------+ - | 4 | - | 7 | - | 10 | - | 0 | - *---------*/ -``` - -### `SUBSTR` + MAKE_INTERVAL(1, 6, 15) AS i1, + MAKE_INTERVAL(hour => 10, second => 20) AS i2, + MAKE_INTERVAL(1, minute => 5, day => 2) AS i3 -```sql -SUBSTR(value, position[, length]) +/*--------------+---------------+-------------* + | i1 | i2 | i3 | + +--------------+---------------+-------------+ + | 1-6 15 0:0:0 | 0-0 0 10:0:20 | 1-0 2 0:5:0 | + *--------------+---------------+-------------*/ ``` -**Description** +[interval-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#interval_type -Gets a portion (substring) of the supplied `STRING` or `BYTES` value. +[named-arguments]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#named_arguments -The `position` argument is an integer specifying the starting position of the -substring. +## JSON functions -+ If `position` is `1`, the substring starts from the first character or byte. -+ If `position` is `0` or less than `-LENGTH(value)`, `position` is set to `1`, - and the substring starts from the first character or byte. -+ If `position` is greater than the length of `value`, the function produces - an empty substring. -+ If `position` is negative, the function counts from the end of `value`, - with `-1` indicating the last character or byte. +ZetaSQL supports the following functions, which can retrieve and +transform JSON data. -The `length` argument specifies the maximum number of characters or bytes to -return. +### Categories -+ If `length` is not specified, the function produces a substring that starts - at the specified position and ends at the last character or byte of `value`. -+ If `length` is `0`, the function produces an empty substring. -+ If `length` is negative, the function produces an error. -+ The returned substring may be shorter than `length`, for example, when - `length` exceeds the length of `value`, or when the starting position of the - substring plus `length` is greater than the length of `value`. +The JSON functions are grouped into the following categories based on their +behavior: -**Return type** + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CategoryFunctionsDescription
+ + Standard extractors + + + + JSON_QUERY
+ JSON_VALUE
+ + + JSON_QUERY_ARRAY
+ + + JSON_VALUE_ARRAY
+ +
+ Functions that extract JSON data. +
+ + Legacy extractors + + + + JSON_EXTRACT
+ JSON_EXTRACT_SCALAR
+ + + + JSON_EXTRACT_STRING_ARRAY
+ +
+ Functions that extract JSON data.
+ + While these functions are supported by ZetaSQL, we recommend + using the standard extractor functions. + +
Lax converters + + LAX_BOOL
+ + + LAX_DOUBLE
+ + + LAX_INT64
+ + + LAX_STRING
+ +
+ Functions that flexibly convert a JSON value to a scalar SQL value + without returning errors. +
Converters + + BOOL
+ + + DOUBLE
+ + + INT64
+ + + STRING
+ +
+ Functions that convert a JSON value to a scalar SQL value. +
Other converters + + PARSE_JSON
+ + + TO_JSON
+ + + TO_JSON_STRING
+ +
+ Other conversion functions from or to JSON. +
Constructors + JSON_ARRAY
+ JSON_OBJECT
+
+ Functions that create JSON. +
Mutators + + JSON_ARRAY_APPEND
+ + + JSON_ARRAY_INSERT
+ + + JSON_REMOVE
+ + + JSON_SET
+ + + JSON_STRIP_NULLS
+ +
+ Functions that mutate existing JSON. +
Accessors + + JSON_TYPE
+ +
+ Functions that provide access to JSON properties. +
-`STRING` or `BYTES` +### Function list -**Examples** + + + + + + + + -```sql -WITH items AS - (SELECT 'apple' as item - UNION ALL - SELECT 'banana' as item - UNION ALL - SELECT 'orange' as item) + + + + -/*---------* - | example | - +---------+ - | pple | - | anana | - | range | - *---------*/ -``` + + + + -SELECT - SUBSTR(item, 2, 2) as example -FROM items; + + + + -```sql -WITH items AS - (SELECT 'apple' as item - UNION ALL - SELECT 'banana' as item - UNION ALL - SELECT 'orange' as item) + + + + -/*---------* - | example | - +---------+ - | le | - | na | - | ge | - *---------*/ -``` + + + + -SELECT - SUBSTR(item, 1, 123) as example -FROM items; + + + + -```sql -WITH items AS - (SELECT 'apple' as item - UNION ALL - SELECT 'banana' as item - UNION ALL - SELECT 'orange' as item) + + + + -/*---------* - | example | - +---------+ - | | - | | - | | - *---------*/ -``` + + + + -SELECT - SUBSTR(item, 123, 5) as example -FROM items; + + + + -### `SUBSTRING` + + + + -Alias for [`SUBSTR`][substr]. + + + + -### `TO_BASE32` + + + + -**Description** + + + + -**Return type** + + + + -**Example** + + + + -/*------------------* - | base32_string | - +------------------+ - | MFRGGZDF74====== | - *------------------*/ -``` + + + + -### `TO_BASE64` + + + + -**Description** + + + + -There are several base64 encodings in common use that vary in exactly which -alphabet of 65 ASCII characters are used to encode the 64 digits and padding. -See [RFC 4648][RFC-4648] for details. This -function adds padding and uses the alphabet `[A-Za-z0-9+/=]`. + + + + -`STRING` + + + + -```sql -SELECT TO_BASE64(b'\377\340') AS base64_string; + + + + -To work with an encoding using a different base64 alphabet, you might need to -compose `TO_BASE64` with the `REPLACE` function. For instance, the -`base64url` url-safe and filename-safe encoding commonly used in web programming -uses `-_=` as the last characters rather than `+/=`. To encode a -`base64url`-encoded string, replace `+` and `/` with `-` and `_` respectively. + + + + -/*----------------* - | websafe_base64 | - +----------------+ - | _-A= | - *----------------*/ -``` + + + + -[RFC-4648]: https://tools.ietf.org/html/rfc4648#section-4 + + + + + + + + + + + +
NameSummary
BOOL -SELECT - SUBSTR(item, 2) as example -FROM items; + + Converts a JSON boolean to a SQL BOOL value. +
+ + DOUBLE -```sql -WITH items AS - (SELECT 'apple' as item - UNION ALL - SELECT 'banana' as item - UNION ALL - SELECT 'orange' as item) + + + Converts a JSON number to a SQL + DOUBLE value. +
INT64 -/*---------* - | example | - +---------+ - | pp | - | an | - | ra | - *---------*/ -``` + + Converts a JSON number to a SQL INT64 value. +
JSON_ARRAY -SELECT - SUBSTR(item, -2) as example -FROM items; +Creates a JSON array.
JSON_ARRAY_APPEND -```sql -WITH items AS - (SELECT 'apple' as item - UNION ALL - SELECT 'banana' as item - UNION ALL - SELECT 'orange' as item) +Appends JSON data to the end of a JSON array.
JSON_ARRAY_INSERT -/*---------* - | example | - +---------+ - | apple | - | banana | - | orange | - *---------*/ -``` +Inserts JSON data into a JSON array.
JSON_EXTRACT -SELECT - SUBSTR(item, 123) as example -FROM items; + + (Deprecated) + Extracts a JSON value and converts it to a SQL + JSON-formatted STRING + or + JSON + + value. +
JSON_EXTRACT_SCALAR -```sql -WITH items AS - (SELECT 'apple' as item - UNION ALL - SELECT 'banana' as item - UNION ALL - SELECT 'orange' as item) + + (Deprecated) + Extracts a JSON scalar value and converts it to a SQL + STRING value. +
JSON_OBJECT -/*---------* - | example | - +---------+ - | | - | | - | | - *---------*/ -``` +Creates a JSON object.
JSON_QUERY -```sql -SUBSTRING(value, position[, length]) -``` + + Extracts a JSON value and converts it to a SQL + JSON-formatted STRING + or + JSON + + value. +
JSON_QUERY_ARRAY -[substr]: #substr + + Extracts a JSON array and converts it to + a SQL ARRAY<JSON-formatted STRING> + or + ARRAY<JSON> + + value. +
JSON_REMOVE -```sql -TO_BASE32(bytes_expr) -``` +Produces JSON with the specified JSON data removed.
JSON_SET -Converts a sequence of `BYTES` into a base32-encoded `STRING`. To convert a -base32-encoded `STRING` into `BYTES`, use [FROM_BASE32][string-link-to-from-base32]. +Inserts or replaces JSON data.
JSON_STRIP_NULLS -`STRING` +Removes JSON nulls from JSON objects and JSON arrays.
JSON_TYPE -```sql -SELECT TO_BASE32(b'abcde\xFF') AS base32_string; + + Gets the JSON type of the outermost JSON value and converts the name of + this type to a SQL STRING value. +
JSON_VALUE -[string-link-to-from-base32]: #from_base32 + + Extracts a JSON scalar value and converts it to a SQL + STRING value. +
JSON_VALUE_ARRAY -```sql -TO_BASE64(bytes_expr) -``` + + Extracts a JSON array of scalar values and converts it to a SQL + ARRAY<STRING> value. +
LAX_BOOL -Converts a sequence of `BYTES` into a base64-encoded `STRING`. To convert a -base64-encoded `STRING` into `BYTES`, use [FROM_BASE64][string-link-to-from-base64]. + + Attempts to convert a JSON value to a SQL BOOL value. +
+ + LAX_DOUBLE -**Return type** + + + Attempts to convert a JSON value to a + SQL DOUBLE value. +
LAX_INT64 -**Example** + + Attempts to convert a JSON value to a SQL INT64 value. +
LAX_STRING -/*---------------* - | base64_string | - +---------------+ - | /+A= | - *---------------*/ -``` + + Attempts to convert a JSON value to a SQL STRING value. +
PARSE_JSON -```sql -SELECT REPLACE(REPLACE(TO_BASE64(b'\377\340'), '+', '-'), '/', '_') as websafe_base64; + + Converts a JSON-formatted STRING value to a + JSON value. +
STRING -[string-link-to-from-base64]: #from_base64 + + Converts a JSON string to a SQL STRING value. +
TO_JSON -### `TO_CODE_POINTS` + + Converts a SQL value to a JSON value. +
TO_JSON_STRING + + + Converts a SQL value to a JSON-formatted STRING value. +
+ +### `BOOL` + ```sql -TO_CODE_POINTS(value) +BOOL(json_expr) ``` **Description** -Takes a `STRING` or `BYTES` value and returns an array of `INT64` values that -represent code points or extended ASCII character values. +Converts a JSON boolean to a SQL `BOOL` value. -+ If `value` is a `STRING`, each element in the returned array represents a - [code point][string-link-to-code-points-wikipedia]. Each code point falls - within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. -+ If `value` is `BYTES`, each element in the array is an extended ASCII - character value in the range of [0, 255]. +Arguments: -To convert from an array of code points to a `STRING` or `BYTES`, see -[CODE_POINTS_TO_STRING][string-link-to-codepoints-to-string] or -[CODE_POINTS_TO_BYTES][string-link-to-codepoints-to-bytes]. ++ `json_expr`: JSON. For example: + + ``` + JSON 'true' + ``` + + If the JSON value is not a boolean, an error is produced. If the expression + is SQL `NULL`, the function returns SQL `NULL`. **Return type** -`ARRAY` +`BOOL` **Examples** -The following example gets the code points for each element in an array of -words. - ```sql -SELECT word, TO_CODE_POINTS(word) AS code_points -FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word; +SELECT BOOL(JSON 'true') AS vacancy; -/*---------+------------------------------------* - | word | code_points | - +---------+------------------------------------+ - | foo | [102, 111, 111] | - | bar | [98, 97, 114] | - | baz | [98, 97, 122] | - | giraffe | [103, 105, 114, 97, 102, 102, 101] | - | llama | [108, 108, 97, 109, 97] | - *---------+------------------------------------*/ +/*---------* + | vacancy | + +---------+ + | true | + *---------*/ ``` -The following example converts integer representations of `BYTES` to their -corresponding ASCII character values. - ```sql -SELECT word, TO_CODE_POINTS(word) AS bytes_value_as_integer -FROM UNNEST([b'\x00\x01\x10\xff', b'\x66\x6f\x6f']) AS word; +SELECT BOOL(JSON_QUERY(JSON '{"hotel class": "5-star", "vacancy": true}', "$.vacancy")) AS vacancy; -/*------------------+------------------------* - | word | bytes_value_as_integer | - +------------------+------------------------+ - | \x00\x01\x10\xff | [0, 1, 16, 255] | - | foo | [102, 111, 111] | - *------------------+------------------------*/ +/*---------* + | vacancy | + +---------+ + | true | + *---------*/ ``` -The following example demonstrates the difference between a `BYTES` result and a -`STRING` result. +The following examples show how invalid requests are handled: ```sql -SELECT TO_CODE_POINTS(b'Ā') AS b_result, TO_CODE_POINTS('Ā') AS s_result; - -/*------------+----------* - | b_result | s_result | - +------------+----------+ - | [196, 128] | [256] | - *------------+----------*/ +-- An error is thrown if JSON is not of type bool. +SELECT BOOL(JSON '123') AS result; -- Throws an error +SELECT BOOL(JSON 'null') AS result; -- Throws an error +SELECT SAFE.BOOL(JSON '123') AS result; -- Returns a SQL NULL ``` -Notice that the character, Ā, is represented as a two-byte Unicode sequence. As -a result, the `BYTES` version of `TO_CODE_POINTS` returns an array with two -elements, while the `STRING` version returns an array with a single element. +### `DOUBLE` + -[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point +```sql +DOUBLE(json_expr[, wide_number_mode=>{ 'exact' | 'round' }]) +``` -[string-link-to-codepoints-to-string]: #code_points_to_string +**Description** -[string-link-to-codepoints-to-bytes]: #code_points_to_bytes +Converts a JSON number to a SQL `DOUBLE` value. -### `TO_HEX` +Arguments: -```sql -TO_HEX(bytes) -``` ++ `json_expr`: JSON. For example: -**Description** + ``` + JSON '9.8' + ``` -Converts a sequence of `BYTES` into a hexadecimal `STRING`. Converts each byte -in the `STRING` as two hexadecimal characters in the range -`(0..9, a..f)`. To convert a hexadecimal-encoded -`STRING` to `BYTES`, use [FROM_HEX][string-link-to-from-hex]. + If the JSON value is not a number, an error is produced. If the expression + is a SQL `NULL`, the function returns SQL `NULL`. ++ `wide_number_mode`: Optional mandatory-named argument, + which defines what happens with a number that cannot be + represented as a `DOUBLE` without loss of + precision. This argument accepts one of the two case-sensitive values: + + + `exact`: The function fails if the result cannot be represented as a + `DOUBLE` without loss of precision. + + `round` (default): The numeric value stored in JSON will be rounded to + `DOUBLE`. If such rounding is not possible, + the function fails. **Return type** -`STRING` +`DOUBLE` -**Example** +**Examples** ```sql -WITH Input AS ( - SELECT b'\x00\x01\x02\x03\xAA\xEE\xEF\xFF' AS byte_str UNION ALL - SELECT b'foobar' -) -SELECT byte_str, TO_HEX(byte_str) AS hex_str -FROM Input; +SELECT DOUBLE(JSON '9.8') AS velocity; -/*----------------------------------+------------------* - | byte_string | hex_string | - +----------------------------------+------------------+ - | \x00\x01\x02\x03\xaa\xee\xef\xff | 00010203aaeeefff | - | foobar | 666f6f626172 | - *----------------------------------+------------------*/ +/*----------* + | velocity | + +----------+ + | 9.8 | + *----------*/ ``` -[string-link-to-from-hex]: #from_hex +```sql +SELECT DOUBLE(JSON_QUERY(JSON '{"vo2_max": 39.1, "age": 18}', "$.vo2_max")) AS vo2_max; -### `TRANSLATE` +/*---------* + | vo2_max | + +---------+ + | 39.1 | + *---------*/ +``` ```sql -TRANSLATE(expression, source_characters, target_characters) -``` +SELECT DOUBLE(JSON '18446744073709551615', wide_number_mode=>'round') as result; -**Description** +/*------------------------* + | result | + +------------------------+ + | 1.8446744073709552e+19 | + *------------------------*/ +``` -In `expression`, replaces each character in `source_characters` with the -corresponding character in `target_characters`. All inputs must be the same -type, either `STRING` or `BYTES`. +```sql +SELECT DOUBLE(JSON '18446744073709551615') as result; -+ Each character in `expression` is translated at most once. -+ A character in `expression` that is not present in `source_characters` is left - unchanged in `expression`. -+ A character in `source_characters` without a corresponding character in - `target_characters` is omitted from the result. -+ A duplicate character in `source_characters` results in an error. +/*------------------------* + | result | + +------------------------+ + | 1.8446744073709552e+19 | + *------------------------*/ +``` -**Return type** +The following examples show how invalid requests are handled: -`STRING` or `BYTES` +```sql +-- An error is thrown if JSON is not of type DOUBLE. +SELECT DOUBLE(JSON '"strawberry"') AS result; +SELECT DOUBLE(JSON 'null') AS result; -**Examples** +-- An error is thrown because `wide_number_mode` is case-sensitive and not "exact" or "round". +SELECT DOUBLE(JSON '123.4', wide_number_mode=>'EXACT') as result; +SELECT DOUBLE(JSON '123.4', wide_number_mode=>'exac') as result; -```sql -WITH example AS ( - SELECT 'This is a cookie' AS expression, 'sco' AS source_characters, 'zku' AS - target_characters UNION ALL - SELECT 'A coaster' AS expression, 'co' AS source_characters, 'k' as - target_characters -) -SELECT expression, source_characters, target_characters, TRANSLATE(expression, -source_characters, target_characters) AS translate -FROM example; +-- An error is thrown because the number cannot be converted to DOUBLE without loss of precision +SELECT DOUBLE(JSON '18446744073709551615', wide_number_mode=>'exact') as result; -/*------------------+-------------------+-------------------+------------------* - | expression | source_characters | target_characters | translate | - +------------------+-------------------+-------------------+------------------+ - | This is a cookie | sco | zku | Thiz iz a kuukie | - | A coaster | co | k | A kaster | - *------------------+-------------------+-------------------+------------------*/ +-- Returns a SQL NULL +SELECT SAFE.DOUBLE(JSON '"strawberry"') AS result; ``` -### `TRIM` +### `INT64` + ```sql -TRIM(value_to_trim[, set_of_characters_to_remove]) +INT64(json_expr) ``` **Description** -Takes a `STRING` or `BYTES` value to trim. +Converts a JSON number to a SQL `INT64` value. -If the value to trim is a `STRING`, removes from this value all leading and -trailing Unicode code points in `set_of_characters_to_remove`. -The set of code points is optional. If it is not specified, all -whitespace characters are removed from the beginning and end of the -value to trim. +Arguments: -If the value to trim is `BYTES`, removes from this value all leading and -trailing bytes in `set_of_characters_to_remove`. The set of bytes is required. ++ `json_expr`: JSON. For example: + + ``` + JSON '999' + ``` + + If the JSON value is not a number, or the JSON number is not in the SQL + `INT64` domain, an error is produced. If the expression is SQL `NULL`, the + function returns SQL `NULL`. **Return type** -+ `STRING` if `value_to_trim` is a `STRING` value. -+ `BYTES` if `value_to_trim` is a `BYTES` value. +`INT64` **Examples** -In the following example, all leading and trailing whitespace characters are -removed from `item` because `set_of_characters_to_remove` is not specified. - ```sql -WITH items AS - (SELECT ' apple ' as item - UNION ALL - SELECT ' banana ' as item - UNION ALL - SELECT ' orange ' as item) - -SELECT - CONCAT('#', TRIM(item), '#') as example -FROM items; +SELECT INT64(JSON '2005') AS flight_number; -/*----------* - | example | - +----------+ - | #apple# | - | #banana# | - | #orange# | - *----------*/ +/*---------------* + | flight_number | + +---------------+ + | 2005 | + *---------------*/ ``` -In the following example, all leading and trailing `*` characters are removed -from `item`. - ```sql -WITH items AS - (SELECT '***apple***' as item - UNION ALL - SELECT '***banana***' as item - UNION ALL - SELECT '***orange***' as item) - -SELECT - TRIM(item, '*') as example -FROM items; +SELECT INT64(JSON_QUERY(JSON '{"gate": "A4", "flight_number": 2005}', "$.flight_number")) AS flight_number; -/*---------* - | example | - +---------+ - | apple | - | banana | - | orange | - *---------*/ +/*---------------* + | flight_number | + +---------------+ + | 2005 | + *---------------*/ ``` -In the following example, all leading and trailing `x`, `y`, and `z` characters -are removed from `item`. - ```sql -WITH items AS - (SELECT 'xxxapplexxx' as item - UNION ALL - SELECT 'yyybananayyy' as item - UNION ALL - SELECT 'zzzorangezzz' as item - UNION ALL - SELECT 'xyzpearxyz' as item) - -SELECT - TRIM(item, 'xyz') as example -FROM items; +SELECT INT64(JSON '10.0') AS score; -/*---------* - | example | - +---------+ - | apple | - | banana | - | orange | - | pear | - *---------*/ +/*-------* + | score | + +-------+ + | 10 | + *-------*/ ``` -In the following example, examine how `TRIM` interprets characters as -Unicode code-points. If your trailing character set contains a combining -diacritic mark over a particular letter, `TRIM` might strip the -same diacritic mark from a different letter. +The following examples show how invalid requests are handled: ```sql -SELECT - TRIM('abaW̊', 'Y̊') AS a, - TRIM('W̊aba', 'Y̊') AS b, - TRIM('abaŪ̊', 'Y̊') AS c, - TRIM('Ū̊aba', 'Y̊') AS d; - -/*------+------+------+------* - | a | b | c | d | - +------+------+------+------+ - | abaW | W̊aba | abaŪ | Ūaba | - *------+------+------+------*/ +-- An error is thrown if JSON is not a number or cannot be converted to a 64-bit integer. +SELECT INT64(JSON '10.1') AS result; -- Throws an error +SELECT INT64(JSON '"strawberry"') AS result; -- Throws an error +SELECT INT64(JSON 'null') AS result; -- Throws an error +SELECT SAFE.INT64(JSON '"strawberry"') AS result; -- Returns a SQL NULL ``` -In the following example, all leading and trailing `b'n'`, `b'a'`, `b'\xab'` -bytes are removed from `item`. +### `JSON_ARRAY` ```sql -WITH items AS -( - SELECT b'apple' as item UNION ALL - SELECT b'banana' as item UNION ALL - SELECT b'\xab\xcd\xef\xaa\xbb' as item -) -SELECT item, TRIM(item, b'na\xab') AS examples -FROM items; - -/*----------------------+------------------* - | item | example | - +----------------------+------------------+ - | apple | pple | - | banana | b | - | \xab\xcd\xef\xaa\xbb | \xcd\xef\xaa\xbb | - *----------------------+------------------*/ +JSON_ARRAY([value][, ...]) ``` -### `UNICODE` +**Description** -```sql -UNICODE(value) -``` +Creates a JSON array from zero or more SQL values. -**Description** +Arguments: -Returns the Unicode [code point][string-code-point] for the first character in -`value`. Returns `0` if `value` is empty, or if the resulting Unicode code -point is `0`. ++ `value`: A [JSON encoding-supported][json-encodings] value to add + to a JSON array. **Return type** -`INT64` +`JSON` **Examples** +You can create an empty JSON array. For example: + ```sql -SELECT UNICODE('âbcd') as A, UNICODE('â') as B, UNICODE('') as C, UNICODE(NULL) as D; +SELECT JSON_ARRAY() AS json_data -/*-------+-------+-------+-------* - | A | B | C | D | - +-------+-------+-------+-------+ - | 226 | 226 | 0 | NULL | - *-------+-------+-------+-------*/ +/*-----------* + | json_data | + +-----------+ + | [] | + *-----------*/ ``` -[string-code-point]: https://en.wikipedia.org/wiki/Code_point - -### `UPPER` +The following query creates a JSON array with one value in it: ```sql -UPPER(value) -``` - -**Description** - -For `STRING` arguments, returns the original string with all alphabetic -characters in uppercase. Mapping between uppercase and lowercase is done -according to the -[Unicode Character Database][string-link-to-unicode-character-definitions] -without taking into account language-specific mappings. +SELECT JSON_ARRAY(10) AS json_data -For `BYTES` arguments, the argument is treated as ASCII text, with all bytes -greater than 127 left intact. +/*-----------* + | json_data | + +-----------+ + | [10] | + *-----------*/ +``` -**Return type** +You can create a JSON array with an empty JSON array in it. For example: -`STRING` or `BYTES` +```sql +SELECT JSON_ARRAY([]) AS json_data -**Examples** +/*-----------* + | json_data | + +-----------+ + | [[]] | + *-----------*/ +``` ```sql -WITH items AS - (SELECT - 'foo' as item - UNION ALL - SELECT - 'bar' as item - UNION ALL - SELECT - 'baz' as item) - -SELECT - UPPER(item) AS example -FROM items; +SELECT JSON_ARRAY(10, 'foo', NULL) AS json_data -/*---------* - | example | - +---------+ - | FOO | - | BAR | - | BAZ | - *---------*/ +/*-----------------* + | json_data | + +-----------------+ + | [10,"foo",null] | + *-----------------*/ ``` -[string-link-to-unicode-character-definitions]: http://unicode.org/ucd/ +```sql +SELECT JSON_ARRAY(STRUCT(10 AS a, 'foo' AS b)) AS json_data -[string-link-to-strpos]: #strpos +/*----------------------* + | json_data | + +----------------------+ + | [{"a":10,"b":"foo"}] | + *----------------------*/ +``` -## JSON functions +```sql +SELECT JSON_ARRAY(10, ['foo', 'bar'], [20, 30]) AS json_data -ZetaSQL supports the following functions, which can retrieve and -transform JSON data. +/*----------------------------* + | json_data | + +----------------------------+ + | [10,["foo","bar"],[20,30]] | + *----------------------------*/ +``` -### Categories +```sql +SELECT JSON_ARRAY(10, [JSON '20', JSON '"foo"']) AS json_data -The JSON functions are grouped into the following categories based on their -behavior: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CategoryFunctionsDescription
- - Standard extractors - - - - JSON_QUERY
- JSON_VALUE
- - - JSON_QUERY_ARRAY
- - - JSON_VALUE_ARRAY
- -
- Functions that extract JSON data. -
- - Legacy extractors - - - - JSON_EXTRACT
- JSON_EXTRACT_SCALAR
- - - - JSON_EXTRACT_STRING_ARRAY
- -
- Functions that extract JSON data.
- - While these functions are supported by ZetaSQL, we recommend - using the standard extractor functions. - -
Lax converters - - LAX_BOOL
- - - LAX_DOUBLE
- - - LAX_INT64
- - - LAX_STRING
- -
- Functions that flexibly convert a JSON value to a scalar SQL value - without returning errors. -
Converters - - BOOL
- - - DOUBLE
- - - INT64
- - - STRING
- -
- Functions that convert a JSON value to a scalar SQL value. -
Other converters - - PARSE_JSON
- - - TO_JSON
- - - TO_JSON_STRING
- -
- Other conversion functions from or to JSON. -
Constructors - JSON_ARRAY
- JSON_OBJECT
-
- Functions that create JSON. -
Mutators - - JSON_ARRAY_APPEND
- - - JSON_ARRAY_INSERT
- - - JSON_REMOVE
- - - JSON_SET
- - - JSON_STRIP_NULLS
- -
- Functions that mutate existing JSON. -
Accessors - - JSON_TYPE
- -
- Functions that provide access to JSON properties. -
+/*-----------------* + | json_data | + +-----------------+ + | [10,[20,"foo"]] | + *-----------------*/ +``` -### Function list +### `JSON_ARRAY_APPEND` - - - - - - - - +```sql +JSON_ARRAY_APPEND( + json_expr, + json_path_value_pair[, ...] + [, append_each_element=>{ TRUE | FALSE }] +) - - - - +Appends JSON data to the end of a JSON array. - - - - ++ `json_expr`: JSON. For example: - - - - + + `json_path`: Append `value` at this [JSONPath][JSONPath-format] + in `json_expr`. - - - - + + If `TRUE` (default), and `value` is a SQL array, + appends each element individually. - - - - +Details: - - - - +**Return type** - - - - +**Examples** - - - - +```sql +SELECT JSON_ARRAY_APPEND(JSON '["a", "b", "c"]', '$', 1) AS json_data - - - - +In the following example, `append_each_element` defaults to `TRUE`, so +`[1, 2]` is appended as individual elements. - - - - +/*-------------------* + | json_data | + +-------------------+ + | ["a","b","c",1,2] | + *-------------------*/ +``` - - - - +```sql +SELECT JSON_ARRAY_APPEND( + JSON '["a", "b", "c"]', + '$', [1, 2], + append_each_element=>FALSE) AS json_data - - - - +In the following example, `append_each_element` is `FALSE`, so +`[1, 2]` and `[3, 4]` are each appended as one element. - - - - +/*-----------------------------* + | json_data | + +-----------------------------+ + | ["a",["b",[1,2,[3,4]]],"c"] | + *-----------------------------*/ +``` - - - - +```sql +SELECT JSON_ARRAY_APPEND( + JSON '["a", ["b"], "c"]', + '$[1]', [1, 2], + '$[1][1]', [3, 4]) AS json_data - - - - +In the following example, path `$.a` is matched and appends `2`. - - - - +/*-------------* + | json_data | + +-------------+ + | {"a":[1,2]} | + *-------------*/ +``` - - - - +```sql +SELECT JSON_ARRAY_APPEND(JSON '{"a": null}', '$.a', 10) - - - - +In the following example, path `$.a` is not an array, so the operation is +ignored. - - - - +/*-----------* + | json_data | + +-----------+ + | {"a":1} | + *-----------*/ +``` - - - - +```sql +SELECT JSON_ARRAY_APPEND(JSON '{"a": 1}', '$.b', 2) AS json_data - - - - +### `JSON_ARRAY_INSERT` - - - - +json_path_value_pair: + json_path, value +``` - - - - +Arguments: - - - - + ``` + JSON '["a", "b", "c"]' + ``` ++ `json_path_value_pair`: A value and the [JSONPath][JSONPath-format] for + that value. This includes: - - - - + + `value`: A [JSON encoding-supported][json-encodings] value to + insert. ++ `insert_each_element`: An optional, mandatory named argument. - -
NameSummary
BOOL +json_path_value_pair: + json_path, value +``` - - Converts a JSON boolean to a SQL BOOL value. -
- - DOUBLE +Arguments: - - - Converts a JSON number to a SQL - DOUBLE value. -
INT64 + ``` + JSON '["a", "b", "c"]' + ``` ++ `json_path_value_pair`: A value and the [JSONPath][JSONPath-format] for + that value. This includes: - - Converts a JSON number to a SQL INT64 value. -
JSON_ARRAY + + `value`: A [JSON encoding-supported][json-encodings] value to + append. ++ `append_each_element`: An optional, mandatory named argument. -Creates a JSON array.
JSON_ARRAY_APPEND + + If `FALSE,` and `value` is a SQL array, appends + the array as one element. -Appends JSON data to the end of a JSON array.
JSON_ARRAY_INSERT ++ Path value pairs are evaluated left to right. The JSON produced by + evaluating one pair becomes the JSON against which the next pair + is evaluated. ++ The operation is ignored if the path points to a JSON non-array value that + is not a JSON null. ++ If `json_path` points to a JSON null, the JSON null is replaced by a + JSON array that contains `value`. ++ If the path exists but has an incompatible type at any given path token, + the path value pair operation is ignored. ++ The function applies all path value pair append operations even if an + individual path value pair operation is invalid. For invalid operations, + the operation is ignored and the function continues to process the rest of + the path value pairs. ++ If any `json_path` is an invalid [JSONPath][JSONPath-format], an error is + produced. ++ If `json_expr` is SQL `NULL`, the function returns SQL `NULL`. ++ If `append_each_element` is SQL `NULL`, the function returns `json_expr`. ++ If `json_path` is SQL `NULL`, the `json_path_value_pair` operation is + ignored. -Inserts JSON data into a JSON array.
JSON_EXTRACT +`JSON` - - (Deprecated) - Extracts a JSON value and converts it to a SQL - JSON-formatted STRING - or - JSON - - value. -
JSON_EXTRACT_SCALAR +In the following example, path `$` is matched and appends `1`. - - (Deprecated) - Extracts a JSON scalar value and converts it to a SQL - STRING value. -
JSON_OBJECT +/*-----------------* + | json_data | + +-----------------+ + | ["a","b","c",1] | + *-----------------*/ +``` -Creates a JSON object.
JSON_QUERY +```sql +SELECT JSON_ARRAY_APPEND(JSON '["a", "b", "c"]', '$', [1, 2]) AS json_data - - Extracts a JSON value and converts it to a SQL - JSON-formatted STRING - or - JSON - - value. -
JSON_QUERY_ARRAY +In the following example, `append_each_element` is `FALSE`, so +`[1, 2]` is appended as one element. - - Extracts a JSON array and converts it to - a SQL ARRAY<JSON-formatted STRING> - or - ARRAY<JSON> - - value. -
JSON_REMOVE +/*---------------------* + | json_data | + +---------------------+ + | ["a","b","c",[1,2]] | + *---------------------*/ +``` -Produces JSON with the specified JSON data removed.
JSON_SET +```sql +SELECT JSON_ARRAY_APPEND( + JSON '["a", ["b"], "c"]', + '$[1]', [1, 2], + '$[1][1]', [3, 4], + append_each_element=>FALSE) AS json_data -Inserts or replaces JSON data.
JSON_STRIP_NULLS +In the following example, the first path `$[1]` appends `[1, 2]` as single +elements, and then the second path `$[1][1]` is not a valid path to an array, +so the second operation is ignored. -Removes JSON nulls from JSON objects and JSON arrays.
JSON_TYPE +/*---------------------* + | json_data | + +---------------------+ + | ["a",["b",1,2],"c"] | + *---------------------*/ +``` - - Gets the JSON type of the outermost JSON value and converts the name of - this type to a SQL STRING value. -
JSON_VALUE +```sql +SELECT JSON_ARRAY_APPEND(JSON '{"a": [1]}', '$.a', 2) AS json_data - - Extracts a JSON scalar value and converts it to a SQL - STRING value. -
JSON_VALUE_ARRAY +In the following example, a value is appended into a JSON null. - - Extracts a JSON array of scalar values and converts it to a SQL - ARRAY<STRING> value. -
LAX_BOOL +/*------------* + | json_data | + +------------+ + | {"a":[10]} | + *------------*/ +``` - - Attempts to convert a JSON value to a SQL BOOL value. -
- - LAX_DOUBLE +```sql +SELECT JSON_ARRAY_APPEND(JSON '{"a": 1}', '$.a', 2) AS json_data - - - Attempts to convert a JSON value to a - SQL DOUBLE value. -
LAX_INT64 +In the following example, path `$.b` does not exist, so the operation is +ignored. - - Attempts to convert a JSON value to a SQL INT64 value. -
LAX_STRING +/*-----------* + | json_data | + +-----------+ + | {"a":1} | + *-----------*/ +``` - - Attempts to convert a JSON value to a SQL STRING value. -
PARSE_JSON +```sql +JSON_ARRAY_INSERT( + json_expr, + json_path_value_pair[, ...] + [, insert_each_element=>{ TRUE | FALSE }] +) - - Converts a JSON-formatted STRING value to a - JSON value. -
STRING +Produces a new JSON value that is created by inserting JSON data into +a JSON array. - - Converts a JSON string to a SQL STRING value. -
TO_JSON ++ `json_expr`: JSON. For example: - - Converts a SQL value to a JSON value. -
TO_JSON_STRING + + `json_path`: Insert `value` at this [JSONPath][JSONPath-format] + in `json_expr`. - - Converts a SQL value to a JSON-formatted STRING value. -
+ + If `TRUE` (default), and `value` is a SQL array, + inserts each element individually. -### `BOOL` - + + If `FALSE,` and `value` is a SQL array, inserts + the array as one element. + +Details: + ++ Path value pairs are evaluated left to right. The JSON produced by + evaluating one pair becomes the JSON against which the next pair + is evaluated. ++ The operation is ignored if the path points to a JSON non-array value that + is not a JSON null. ++ If `json_path` points to a JSON null, the JSON null is replaced by a + JSON array of the appropriate size and padded on the left with JSON nulls. ++ If the path exists but has an incompatible type at any given path token, + the path value pair operator is ignored. ++ The function applies all path value pair append operations even if an + individual path value pair operation is invalid. For invalid operations, + the operation is ignored and the function continues to process the rest of + the path value pairs. ++ If the array index in `json_path` is larger than the size of the array, the + function extends the length of the array to the index, fills in + the array with JSON nulls, then adds `value` at the index. ++ If any `json_path` is an invalid [JSONPath][JSONPath-format], an error is + produced. ++ If `json_expr` is SQL `NULL`, the function returns SQL `NULL`. ++ If `insert_each_element` is SQL `NULL`, the function returns `json_expr`. ++ If `json_path` is SQL `NULL`, the `json_path_value_pair` operation is + ignored. + +**Return type** + +`JSON` + +**Examples** + +In the following example, path `$[1]` is matched and inserts `1`. ```sql -BOOL(json_expr) +SELECT JSON_ARRAY_INSERT(JSON '["a", ["b", "c"], "d"]', '$[1]', 1) AS json_data + +/*-----------------------* + | json_data | + +-----------------------+ + | ["a",1,["b","c"],"d"] | + *-----------------------*/ ``` -**Description** +In the following example, path `$[1][0]` is matched and inserts `1`. -Converts a JSON boolean to a SQL `BOOL` value. +```sql +SELECT JSON_ARRAY_INSERT(JSON '["a", ["b", "c"], "d"]', '$[1][0]', 1) AS json_data -Arguments: +/*-----------------------* + | json_data | + +-----------------------+ + | ["a",[1,"b","c"],"d"] | + *-----------------------*/ +``` -+ `json_expr`: JSON. For example: +In the following example, `insert_each_element` defaults to `TRUE`, so +`[1, 2]` is inserted as individual elements. - ``` - JSON 'true' - ``` +```sql +SELECT JSON_ARRAY_INSERT(JSON '["a", "b", "c"]', '$[1]', [1, 2]) AS json_data - If the JSON value is not a boolean, an error is produced. If the expression - is SQL `NULL`, the function returns SQL `NULL`. +/*-------------------* + | json_data | + +-------------------+ + | ["a",1,2,"b","c"] | + *-------------------*/ +``` -**Return type** +In the following example, `insert_each_element` is `FALSE`, so `[1, 2]` is +inserted as one element. -`BOOL` +```sql +SELECT JSON_ARRAY_INSERT( + JSON '["a", "b", "c"]', + '$[1]', [1, 2], + insert_each_element=>FALSE) AS json_data -**Examples** +/*---------------------* + | json_data | + +---------------------+ + | ["a",[1,2],"b","c"] | + *---------------------*/ +``` + +In the following example, path `$[7]` is larger than the length of the +matched array, so the array is extended with JSON nulls and `"e"` is inserted at +the end of the array. ```sql -SELECT BOOL(JSON 'true') AS vacancy; +SELECT JSON_ARRAY_INSERT(JSON '["a", "b", "c", "d"]', '$[7]', "e") AS json_data -/*---------* - | vacancy | - +---------+ - | true | - *---------*/ +/*--------------------------------------* + | json_data | + +--------------------------------------+ + | ["a","b","c","d",null,null,null,"e"] | + *--------------------------------------*/ ``` +In the following example, path `$.a` is an object, so the operation is ignored. + ```sql -SELECT BOOL(JSON_QUERY(JSON '{"hotel class": "5-star", "vacancy": true}', "$.vacancy")) AS vacancy; +SELECT JSON_ARRAY_INSERT(JSON '{"a": {}}', '$.a[0]', 2) AS json_data -/*---------* - | vacancy | - +---------+ - | true | - *---------*/ +/*-----------* + | json_data | + +-----------+ + | {"a":{}} | + *-----------*/ ``` -The following examples show how invalid requests are handled: +In the following example, path `$` does not specify a valid array position, +so the operation is ignored. ```sql --- An error is thrown if JSON is not of type bool. -SELECT BOOL(JSON '123') AS result; -- Throws an error -SELECT BOOL(JSON 'null') AS result; -- Throws an error -SELECT SAFE.BOOL(JSON '123') AS result; -- Returns a SQL NULL +SELECT JSON_ARRAY_INSERT(JSON '[1, 2]', '$', 3) AS json_data + +/*-----------* + | json_data | + +-----------+ + | [1,2] | + *-----------*/ ``` -### `DOUBLE` - +In the following example, a value is inserted into a JSON null. ```sql -DOUBLE(json_expr[, wide_number_mode=>{ 'exact' | 'round' }]) +SELECT JSON_ARRAY_INSERT(JSON '{"a": null}', '$.a[2]', 10) AS json_data + +/*----------------------* + | json_data | + +----------------------+ + | {"a":[null,null,10]} | + *----------------------*/ +``` + +In the following example, the operation is ignored because you can't insert +data into a JSON number. + +```sql +SELECT JSON_ARRAY_INSERT(JSON '1', '$[0]', 'r1') AS json_data + +/*-----------* + | json_data | + +-----------+ + | 1 | + *-----------*/ +``` + +### `JSON_EXTRACT` + +Note: This function is deprecated. Consider using [JSON_QUERY][json-query]. + +```sql +JSON_EXTRACT(json_string_expr, json_path) +``` + +```sql +JSON_EXTRACT(json_expr, json_path) ``` **Description** -Converts a JSON number to a SQL `DOUBLE` value. +Extracts a JSON value and converts it to a +SQL JSON-formatted `STRING` or `JSON` value. +This function uses single quotes and brackets to escape invalid +[JSONPath][JSONPath-format] characters in JSON keys. For example: `['a.b']`. Arguments: ++ `json_string_expr`: A JSON-formatted string. For example: + + ``` + '{"class": {"students": [{"name": "Jane"}]}}' + ``` + + Extracts a SQL `NULL` when a JSON-formatted string `null` is encountered. + For example: + + ```sql + SELECT JSON_EXTRACT("null", "$") -- Returns a SQL NULL + ``` + `json_expr`: JSON. For example: ``` - JSON '9.8' + JSON '{"class": {"students": [{"name": "Jane"}]}}' ``` - If the JSON value is not a number, an error is produced. If the expression - is a SQL `NULL`, the function returns SQL `NULL`. -+ `wide_number_mode`: Optional mandatory-named argument, - which defines what happens with a number that cannot be - represented as a `DOUBLE` without loss of - precision. This argument accepts one of the two case-sensitive values: + Extracts a JSON `null` when a JSON `null` is encountered. - + `exact`: The function fails if the result cannot be represented as a - `DOUBLE` without loss of precision. - + `round` (default): The numeric value stored in JSON will be rounded to - `DOUBLE`. If such rounding is not possible, - the function fails. + ```sql + SELECT JSON_EXTRACT(JSON 'null', "$") -- Returns a JSON 'null' + ``` ++ `json_path`: The [JSONPath][JSONPath-format]. This identifies the data that + you want to obtain from the input. + +There are differences between the JSON-formatted string and JSON input types. +For details, see [Differences between the JSON and JSON-formatted STRING types][differences-json-and-string]. **Return type** -`DOUBLE` ++ `json_string_expr`: A JSON-formatted `STRING` ++ `json_expr`: `JSON` **Examples** +In the following example, JSON data is extracted and returned as JSON. + ```sql -SELECT DOUBLE(JSON '9.8') AS velocity; +SELECT + JSON_EXTRACT(JSON '{"class": {"students": [{"id": 5}, {"id": 12}]}}', '$.class') + AS json_data; -/*----------* - | velocity | - +----------+ - | 9.8 | - *----------*/ +/*-----------------------------------* + | json_data | + +-----------------------------------+ + | {"students":[{"id":5},{"id":12}]} | + *-----------------------------------*/ ``` +In the following examples, JSON data is extracted and returned as +JSON-formatted strings. + ```sql -SELECT DOUBLE(JSON_QUERY(JSON '{"vo2_max": 39.1, "age": 18}', "$.vo2_max")) AS vo2_max; +SELECT JSON_EXTRACT(json_text, '$') AS json_text_string +FROM UNNEST([ + '{"class": {"students": [{"name": "Jane"}]}}', + '{"class": {"students": []}}', + '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' + ]) AS json_text; -/*---------* - | vo2_max | - +---------+ - | 39.1 | - *---------*/ +/*-----------------------------------------------------------* + | json_text_string | + +-----------------------------------------------------------+ + | {"class":{"students":[{"name":"Jane"}]}} | + | {"class":{"students":[]}} | + | {"class":{"students":[{"name":"John"},{"name":"Jamie"}]}} | + *-----------------------------------------------------------*/ ``` ```sql -SELECT DOUBLE(JSON '18446744073709551615', wide_number_mode=>'round') as result; +SELECT JSON_EXTRACT(json_text, '$.class.students[0]') AS first_student +FROM UNNEST([ + '{"class": {"students": [{"name": "Jane"}]}}', + '{"class": {"students": []}}', + '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' + ]) AS json_text; -/*------------------------* - | result | - +------------------------+ - | 1.8446744073709552e+19 | - *------------------------*/ +/*-----------------* + | first_student | + +-----------------+ + | {"name":"Jane"} | + | NULL | + | {"name":"John"} | + *-----------------*/ ``` ```sql -SELECT DOUBLE(JSON '18446744073709551615') as result; +SELECT JSON_EXTRACT(json_text, '$.class.students[1].name') AS second_student_name +FROM UNNEST([ + '{"class": {"students": [{"name": "Jane"}]}}', + '{"class": {"students": []}}', + '{"class": {"students": [{"name": "John"}, {"name": null}]}}', + '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' + ]) AS json_text; -/*------------------------* - | result | - +------------------------+ - | 1.8446744073709552e+19 | - *------------------------*/ +/*----------------* + | second_student | + +----------------+ + | NULL | + | NULL | + | NULL | + | "Jamie" | + *----------------*/ ``` -The following examples show how invalid requests are handled: - ```sql --- An error is thrown if JSON is not of type DOUBLE. -SELECT DOUBLE(JSON '"strawberry"') AS result; -SELECT DOUBLE(JSON 'null') AS result; +SELECT JSON_EXTRACT(json_text, "$.class['students']") AS student_names +FROM UNNEST([ + '{"class": {"students": [{"name": "Jane"}]}}', + '{"class": {"students": []}}', + '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' + ]) AS json_text; --- An error is thrown because `wide_number_mode` is case-sensitive and not "exact" or "round". -SELECT DOUBLE(JSON '123.4', wide_number_mode=>'EXACT') as result; -SELECT DOUBLE(JSON '123.4', wide_number_mode=>'exac') as result; +/*------------------------------------* + | student_names | + +------------------------------------+ + | [{"name":"Jane"}] | + | [] | + | [{"name":"John"},{"name":"Jamie"}] | + *------------------------------------*/ +``` --- An error is thrown because the number cannot be converted to DOUBLE without loss of precision -SELECT DOUBLE(JSON '18446744073709551615', wide_number_mode=>'exact') as result; +```sql +SELECT JSON_EXTRACT('{"a": null}', "$.a"); -- Returns a SQL NULL +SELECT JSON_EXTRACT('{"a": null}', "$.b"); -- Returns a SQL NULL +``` --- Returns a SQL NULL -SELECT SAFE.DOUBLE(JSON '"strawberry"') AS result; +```sql +SELECT JSON_EXTRACT(JSON '{"a": null}', "$.a"); -- Returns a JSON 'null' +SELECT JSON_EXTRACT(JSON '{"a": null}', "$.b"); -- Returns a SQL NULL ``` -### `INT64` - +[json-query]: #json_query + +[JSONPath-format]: #JSONPath_format + +[differences-json-and-string]: #differences_json_and_string + +### `JSON_EXTRACT_SCALAR` + +Note: This function is deprecated. Consider using [JSON_VALUE][json-value]. ```sql -INT64(json_expr) +JSON_EXTRACT_SCALAR(json_string_expr[, json_path]) +``` + +```sql +JSON_EXTRACT_SCALAR(json_expr[, json_path]) ``` **Description** -Converts a JSON number to a SQL `INT64` value. +Extracts a JSON scalar value and converts it to a SQL `STRING` value. +In addition, this function: + ++ Removes the outermost quotes and unescapes the return values. ++ Returns a SQL `NULL` if a non-scalar value is selected. ++ Uses single quotes and brackets to escape invalid [JSONPath][JSONPath-format] + characters in JSON keys. For example: `['a.b']`. Arguments: ++ `json_string_expr`: A JSON-formatted string. For example: + + ``` + '{"name": "Jane", "age": "6"}' + ``` + `json_expr`: JSON. For example: ``` - JSON '999' + JSON '{"name": "Jane", "age": "6"}' ``` ++ `json_path`: The [JSONPath][JSONPath-format]. This identifies the data that + you want to obtain from the input. If this optional parameter is not + provided, then the JSONPath `$` symbol is applied, which means that all of + the data is analyzed. - If the JSON value is not a number, or the JSON number is not in the SQL - `INT64` domain, an error is produced. If the expression is SQL `NULL`, the - function returns SQL `NULL`. + If `json_path` returns a JSON `null` or a non-scalar value (in other words, + if `json_path` refers to an object or an array), then a SQL `NULL` is + returned. + +There are differences between the JSON-formatted string and JSON input types. +For details, see [Differences between the JSON and JSON-formatted STRING types][differences-json-and-string]. **Return type** -`INT64` +`STRING` **Examples** +In the following example, `age` is extracted. + ```sql -SELECT INT64(JSON '2005') AS flight_number; +SELECT JSON_EXTRACT_SCALAR(JSON '{"name": "Jakob", "age": "6" }', '$.age') AS scalar_age; -/*---------------* - | flight_number | - +---------------+ - | 2005 | - *---------------*/ +/*------------* + | scalar_age | + +------------+ + | 6 | + *------------*/ ``` +The following example compares how results are returned for the `JSON_EXTRACT` +and `JSON_EXTRACT_SCALAR` functions. + ```sql -SELECT INT64(JSON_QUERY(JSON '{"gate": "A4", "flight_number": 2005}', "$.flight_number")) AS flight_number; +SELECT JSON_EXTRACT('{"name": "Jakob", "age": "6" }', '$.name') AS json_name, + JSON_EXTRACT_SCALAR('{"name": "Jakob", "age": "6" }', '$.name') AS scalar_name, + JSON_EXTRACT('{"name": "Jakob", "age": "6" }', '$.age') AS json_age, + JSON_EXTRACT_SCALAR('{"name": "Jakob", "age": "6" }', '$.age') AS scalar_age; -/*---------------* - | flight_number | - +---------------+ - | 2005 | - *---------------*/ +/*-----------+-------------+----------+------------* + | json_name | scalar_name | json_age | scalar_age | + +-----------+-------------+----------+------------+ + | "Jakob" | Jakob | "6" | 6 | + *-----------+-------------+----------+------------*/ ``` ```sql -SELECT INT64(JSON '10.0') AS score; +SELECT JSON_EXTRACT('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract, + JSON_EXTRACT_SCALAR('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract_scalar; + +/*--------------------+---------------------* + | json_extract | json_extract_scalar | + +--------------------+---------------------+ + | ["apple","banana"] | NULL | + *--------------------+---------------------*/ +``` + +In cases where a JSON key uses invalid JSONPath characters, you can escape those +characters using single quotes and brackets, `[' ']`. For example: + +```sql +SELECT JSON_EXTRACT_SCALAR('{"a.b": {"c": "world"}}', "$['a.b'].c") AS hello; /*-------* - | score | + | hello | +-------+ - | 10 | + | world | *-------*/ ``` -The following examples show how invalid requests are handled: +[json-value]: #json_value -```sql --- An error is thrown if JSON is not a number or cannot be converted to a 64-bit integer. -SELECT INT64(JSON '10.1') AS result; -- Throws an error -SELECT INT64(JSON '"strawberry"') AS result; -- Throws an error -SELECT INT64(JSON 'null') AS result; -- Throws an error -SELECT SAFE.INT64(JSON '"strawberry"') AS result; -- Returns a SQL NULL -``` +[JSONPath-format]: #JSONPath_format -### `JSON_ARRAY` +[differences-json-and-string]: #differences_json_and_string + +### `JSON_OBJECT` + ++ [Signature 1](#json_object_signature1): + `JSON_OBJECT([json_key, json_value][, ...])` ++ [Signature 2](#json_object_signature2): + `JSON_OBJECT(json_key_array, json_value_array)` + +#### Signature 1 + ```sql -JSON_ARRAY([value][, ...]) +JSON_OBJECT([json_key, json_value][, ...]) ``` **Description** -Creates a JSON array from zero or more SQL values. +Creates a JSON object, using key-value pairs. Arguments: -+ `value`: A [JSON encoding-supported][json-encodings] value to add - to a JSON array. ++ `json_key`: A `STRING` value that represents a key. ++ `json_value`: A [JSON encoding-supported][json-encodings] value. + +Details: + ++ If two keys are passed in with the same name, only the first key-value pair + is preserved. ++ The order of key-value pairs is not preserved. ++ If `json_key` is `NULL`, an error is produced. **Return type** @@ -21370,336 +19776,111 @@ Arguments: **Examples** -You can create an empty JSON array. For example: +You can create an empty JSON object by passing in no JSON keys and values. +For example: ```sql -SELECT JSON_ARRAY() AS json_data +SELECT JSON_OBJECT() AS json_data /*-----------* | json_data | +-----------+ - | [] | - *-----------*/ -``` - -The following query creates a JSON array with one value in it: - -```sql -SELECT JSON_ARRAY(10) AS json_data - -/*-----------* - | json_data | - +-----------+ - | [10] | - *-----------*/ -``` - -You can create a JSON array with an empty JSON array in it. For example: - -```sql -SELECT JSON_ARRAY([]) AS json_data - -/*-----------* - | json_data | - +-----------+ - | [[]] | + | {} | *-----------*/ ``` -```sql -SELECT JSON_ARRAY(10, 'foo', NULL) AS json_data - -/*-----------------* - | json_data | - +-----------------+ - | [10,"foo",null] | - *-----------------*/ -``` +You can create a JSON object by passing in key-value pairs. For example: ```sql -SELECT JSON_ARRAY(STRUCT(10 AS a, 'foo' AS b)) AS json_data +SELECT JSON_OBJECT('foo', 10, 'bar', TRUE) AS json_data -/*----------------------* - | json_data | - +----------------------+ - | [{"a":10,"b":"foo"}] | - *----------------------*/ +/*-----------------------* + | json_data | + +-----------------------+ + | {"bar":true,"foo":10} | + *-----------------------*/ ``` ```sql -SELECT JSON_ARRAY(10, ['foo', 'bar'], [20, 30]) AS json_data +SELECT JSON_OBJECT('foo', 10, 'bar', ['a', 'b']) AS json_data /*----------------------------* | json_data | +----------------------------+ - | [10,["foo","bar"],[20,30]] | + | {"bar":["a","b"],"foo":10} | *----------------------------*/ ``` ```sql -SELECT JSON_ARRAY(10, [JSON '20', JSON '"foo"']) AS json_data - -/*-----------------* - | json_data | - +-----------------+ - | [10,[20,"foo"]] | - *-----------------*/ -``` - -### `JSON_ARRAY_APPEND` - -```sql -JSON_ARRAY_APPEND( - json_expr, - json_path_value_pair[, ...] - [, append_each_element=>{ TRUE | FALSE }] -) - -json_path_value_pair: - json_path, value -``` - -Appends JSON data to the end of a JSON array. - -Arguments: - -+ `json_expr`: JSON. For example: - - ``` - JSON '["a", "b", "c"]' - ``` -+ `json_path_value_pair`: A value and the [JSONPath][JSONPath-format] for - that value. This includes: - - + `json_path`: Append `value` at this [JSONPath][JSONPath-format] - in `json_expr`. - - + `value`: A [JSON encoding-supported][json-encodings] value to - append. -+ `append_each_element`: An optional, mandatory named argument. - - + If `TRUE` (default), and `value` is a SQL array, - appends each element individually. - - + If `FALSE,` and `value` is a SQL array, appends - the array as one element. - -Details: - -+ Path value pairs are evaluated left to right. The JSON produced by - evaluating one pair becomes the JSON against which the next pair - is evaluated. -+ The operation is ignored if the path points to a JSON non-array value that - is not a JSON null. -+ If `json_path` points to a JSON null, the JSON null is replaced by a - JSON array that contains `value`. -+ If the path exists but has an incompatible type at any given path token, - the path value pair operation is ignored. -+ The function applies all path value pair append operations even if an - individual path value pair operation is invalid. For invalid operations, - the operation is ignored and the function continues to process the rest of - the path value pairs. -+ If any `json_path` is an invalid [JSONPath][JSONPath-format], an error is - produced. -+ If `json_expr` is SQL `NULL`, the function returns SQL `NULL`. -+ If `append_each_element` is SQL `NULL`, the function returns `json_expr`. -+ If `json_path` is SQL `NULL`, the `json_path_value_pair` operation is - ignored. - -**Return type** - -`JSON` - -**Examples** - -In the following example, path `$` is matched and appends `1`. - -```sql -SELECT JSON_ARRAY_APPEND(JSON '["a", "b", "c"]', '$', 1) AS json_data - -/*-----------------* - | json_data | - +-----------------+ - | ["a","b","c",1] | - *-----------------*/ -``` - -In the following example, `append_each_element` defaults to `TRUE`, so -`[1, 2]` is appended as individual elements. - -```sql -SELECT JSON_ARRAY_APPEND(JSON '["a", "b", "c"]', '$', [1, 2]) AS json_data - -/*-------------------* - | json_data | - +-------------------+ - | ["a","b","c",1,2] | - *-------------------*/ -``` - -In the following example, `append_each_element` is `FALSE`, so -`[1, 2]` is appended as one element. - -```sql -SELECT JSON_ARRAY_APPEND( - JSON '["a", "b", "c"]', - '$', [1, 2], - append_each_element=>FALSE) AS json_data - -/*---------------------* - | json_data | - +---------------------+ - | ["a","b","c",[1,2]] | - *---------------------*/ -``` - -In the following example, `append_each_element` is `FALSE`, so -`[1, 2]` and `[3, 4]` are each appended as one element. - -```sql -SELECT JSON_ARRAY_APPEND( - JSON '["a", ["b"], "c"]', - '$[1]', [1, 2], - '$[1][1]', [3, 4], - append_each_element=>FALSE) AS json_data - -/*-----------------------------* - | json_data | - +-----------------------------+ - | ["a",["b",[1,2,[3,4]]],"c"] | - *-----------------------------*/ -``` - -In the following example, the first path `$[1]` appends `[1, 2]` as single -elements, and then the second path `$[1][1]` is not a valid path to an array, -so the second operation is ignored. - -```sql -SELECT JSON_ARRAY_APPEND( - JSON '["a", ["b"], "c"]', - '$[1]', [1, 2], - '$[1][1]', [3, 4]) AS json_data +SELECT JSON_OBJECT('a', NULL, 'b', JSON 'null') AS json_data /*---------------------* | json_data | +---------------------+ - | ["a",["b",1,2],"c"] | + | {"a":null,"b":null} | *---------------------*/ ``` -In the following example, path `$.a` is matched and appends `2`. - ```sql -SELECT JSON_ARRAY_APPEND(JSON '{"a": [1]}', '$.a', 2) AS json_data +SELECT JSON_OBJECT('a', 10, 'a', 'foo') AS json_data -/*-------------* - | json_data | - +-------------+ - | {"a":[1,2]} | - *-------------*/ +/*-----------* + | json_data | + +-----------+ + | {"a":10} | + *-----------*/ ``` -In the following example, a value is appended into a JSON null. - ```sql -SELECT JSON_ARRAY_APPEND(JSON '{"a": null}', '$.a', 10) +WITH Items AS (SELECT 'hello' AS key, 'world' AS value) +SELECT JSON_OBJECT(key, value) AS json_data FROM Items -/*------------* - | json_data | - +------------+ - | {"a":[10]} | - *------------*/ +/*-------------------* + | json_data | + +-------------------+ + | {"hello":"world"} | + *-------------------*/ ``` -In the following example, path `$.a` is not an array, so the operation is -ignored. +An error is produced if a SQL `NULL` is passed in for a JSON key. ```sql -SELECT JSON_ARRAY_APPEND(JSON '{"a": 1}', '$.a', 2) AS json_data - -/*-----------* - | json_data | - +-----------+ - | {"a":1} | - *-----------*/ +-- Error: A key cannot be NULL. +SELECT JSON_OBJECT(NULL, 1) AS json_data ``` -In the following example, path `$.b` does not exist, so the operation is -ignored. +An error is produced if the number of JSON keys and JSON values don't match: ```sql -SELECT JSON_ARRAY_APPEND(JSON '{"a": 1}', '$.b', 2) AS json_data - -/*-----------* - | json_data | - +-----------+ - | {"a":1} | - *-----------*/ +-- Error: No matching signature for function JSON_OBJECT for argument types: +-- STRING, INT64, STRING +SELECT JSON_OBJECT('a', 1, 'b') AS json_data ``` -### `JSON_ARRAY_INSERT` +#### Signature 2 + ```sql -JSON_ARRAY_INSERT( - json_expr, - json_path_value_pair[, ...] - [, insert_each_element=>{ TRUE | FALSE }] -) - -json_path_value_pair: - json_path, value +JSON_OBJECT(json_key_array, json_value_array) ``` -Produces a new JSON value that is created by inserting JSON data into -a JSON array. +Creates a JSON object, using an array of keys and values. Arguments: -+ `json_expr`: JSON. For example: - - ``` - JSON '["a", "b", "c"]' - ``` -+ `json_path_value_pair`: A value and the [JSONPath][JSONPath-format] for - that value. This includes: - - + `json_path`: Insert `value` at this [JSONPath][JSONPath-format] - in `json_expr`. - - + `value`: A [JSON encoding-supported][json-encodings] value to - insert. -+ `insert_each_element`: An optional, mandatory named argument. - - + If `TRUE` (default), and `value` is a SQL array, - inserts each element individually. - - + If `FALSE,` and `value` is a SQL array, inserts - the array as one element. ++ `json_key_array`: An array of zero or more `STRING` keys. ++ `json_value_array`: An array of zero or more + [JSON encoding-supported][json-encodings] values. Details: -+ Path value pairs are evaluated left to right. The JSON produced by - evaluating one pair becomes the JSON against which the next pair - is evaluated. -+ The operation is ignored if the path points to a JSON non-array value that - is not a JSON null. -+ If `json_path` points to a JSON null, the JSON null is replaced by a - JSON array of the appropriate size and padded on the left with JSON nulls. -+ If the path exists but has an incompatible type at any given path token, - the path value pair operator is ignored. -+ The function applies all path value pair append operations even if an - individual path value pair operation is invalid. For invalid operations, - the operation is ignored and the function continues to process the rest of - the path value pairs. -+ If the array index in `json_path` is larger than the size of the array, the - function extends the length of the array to the index, fills in - the array with JSON nulls, then adds `value` at the index. -+ If any `json_path` is an invalid [JSONPath][JSONPath-format], an error is ++ If two keys are passed in with the same name, only the first key-value pair + is preserved. ++ The order of key-value pairs is not preserved. ++ The number of keys must match the number of values, otherwise an error is produced. -+ If `json_expr` is SQL `NULL`, the function returns SQL `NULL`. -+ If `insert_each_element` is SQL `NULL`, the function returns `json_expr`. -+ If `json_path` is SQL `NULL`, the `json_path_value_pair` operation is - ignored. ++ If any argument is `NULL`, an error is produced. ++ If a key in `json_key_array` is `NULL`, an error is produced. **Return type** @@ -21707,141 +19888,132 @@ Details: **Examples** -In the following example, path `$[1]` is matched and inserts `1`. - -```sql -SELECT JSON_ARRAY_INSERT(JSON '["a", ["b", "c"], "d"]', '$[1]', 1) AS json_data - -/*-----------------------* - | json_data | - +-----------------------+ - | ["a",1,["b","c"],"d"] | - *-----------------------*/ -``` - -In the following example, path `$[1][0]` is matched and inserts `1`. +You can create an empty JSON object by passing in an empty array of +keys and values. For example: ```sql -SELECT JSON_ARRAY_INSERT(JSON '["a", ["b", "c"], "d"]', '$[1][0]', 1) AS json_data +SELECT JSON_OBJECT(CAST([] AS ARRAY), []) AS json_data -/*-----------------------* - | json_data | - +-----------------------+ - | ["a",[1,"b","c"],"d"] | - *-----------------------*/ +/*-----------* + | json_data | + +-----------+ + | {} | + *-----------*/ ``` -In the following example, `insert_each_element` defaults to `TRUE`, so -`[1, 2]` is inserted as individual elements. +You can create a JSON object by passing in an array of keys and an array of +values. For example: ```sql -SELECT JSON_ARRAY_INSERT(JSON '["a", "b", "c"]', '$[1]', [1, 2]) AS json_data +SELECT JSON_OBJECT(['a', 'b'], [10, NULL]) AS json_data /*-------------------* | json_data | +-------------------+ - | ["a",1,2,"b","c"] | + | {"a":10,"b":null} | *-------------------*/ ``` -In the following example, `insert_each_element` is `FALSE`, so `[1, 2]` is -inserted as one element. - ```sql -SELECT JSON_ARRAY_INSERT( - JSON '["a", "b", "c"]', - '$[1]', [1, 2], - insert_each_element=>FALSE) AS json_data +SELECT JSON_OBJECT(['a', 'b'], [JSON '10', JSON '"foo"']) AS json_data -/*---------------------* - | json_data | - +---------------------+ - | ["a",[1,2],"b","c"] | - *---------------------*/ +/*--------------------* + | json_data | + +--------------------+ + | {"a":10,"b":"foo"} | + *--------------------*/ ``` -In the following example, path `$[7]` is larger than the length of the -matched array, so the array is extended with JSON nulls and `"e"` is inserted at -the end of the array. - ```sql -SELECT JSON_ARRAY_INSERT(JSON '["a", "b", "c", "d"]', '$[7]', "e") AS json_data +SELECT + JSON_OBJECT( + ['a', 'b'], + [STRUCT(10 AS id, 'Red' AS color), STRUCT(20 AS id, 'Blue' AS color)]) + AS json_data -/*--------------------------------------* - | json_data | - +--------------------------------------+ - | ["a","b","c","d",null,null,null,"e"] | - *--------------------------------------*/ +/*------------------------------------------------------------* + | json_data | + +------------------------------------------------------------+ + | {"a":{"color":"Red","id":10},"b":{"color":"Blue","id":20}} | + *------------------------------------------------------------*/ ``` -In the following example, path `$.a` is an object, so the operation is ignored. - ```sql -SELECT JSON_ARRAY_INSERT(JSON '{"a": {}}', '$.a[0]', 2) AS json_data +SELECT + JSON_OBJECT( + ['a', 'b'], + [TO_JSON(10), TO_JSON(['foo', 'bar'])]) + AS json_data -/*-----------* - | json_data | - +-----------+ - | {"a":{}} | - *-----------*/ +/*----------------------------* + | json_data | + +----------------------------+ + | {"a":10,"b":["foo","bar"]} | + *----------------------------*/ ``` -In the following example, path `$` does not specify a valid array position, -so the operation is ignored. +The following query groups by `id` and then creates an array of keys and +values from the rows with the same `id`: ```sql -SELECT JSON_ARRAY_INSERT(JSON '[1, 2]', '$', 3) AS json_data +WITH + Fruits AS ( + SELECT 0 AS id, 'color' AS json_key, 'red' AS json_value UNION ALL + SELECT 0, 'fruit', 'apple' UNION ALL + SELECT 1, 'fruit', 'banana' UNION ALL + SELECT 1, 'ripe', 'true' + ) +SELECT JSON_OBJECT(ARRAY_AGG(json_key), ARRAY_AGG(json_value)) AS json_data +FROM Fruits +GROUP BY id -/*-----------* - | json_data | - +-----------+ - | [1,2] | - *-----------*/ +/*----------------------------------* + | json_data | + +----------------------------------+ + | {"color":"red","fruit":"apple"} | + | {"fruit":"banana","ripe":"true"} | + *----------------------------------*/ ``` -In the following example, a value is inserted into a JSON null. +An error is produced if the size of the JSON keys and values arrays don't +match: ```sql -SELECT JSON_ARRAY_INSERT(JSON '{"a": null}', '$.a[2]', 10) AS json_data - -/*----------------------* - | json_data | - +----------------------+ - | {"a":[null,null,10]} | - *----------------------*/ +-- Error: The number of keys and values must match. +SELECT JSON_OBJECT(['a', 'b'], [10]) AS json_data ``` -In the following example, the operation is ignored because you can't insert -data into a JSON number. +An error is produced if the array of JSON keys or JSON values is a SQL `NULL`. ```sql -SELECT JSON_ARRAY_INSERT(JSON '1', '$[0]', 'r1') AS json_data +-- Error: The keys array cannot be NULL. +SELECT JSON_OBJECT(CAST(NULL AS ARRAY), [10, 20]) AS json_data +``` -/*-----------* - | json_data | - +-----------+ - | 1 | - *-----------*/ +```sql +-- Error: The values array cannot be NULL. +SELECT JSON_OBJECT(['a', 'b'], CAST(NULL AS ARRAY)) AS json_data ``` -### `JSON_EXTRACT` +[json-encodings]: #json_encodings -Note: This function is deprecated. Consider using [JSON_QUERY][json-query]. +### `JSON_QUERY` ```sql -JSON_EXTRACT(json_string_expr, json_path) +JSON_QUERY(json_string_expr, json_path) ``` ```sql -JSON_EXTRACT(json_expr, json_path) +JSON_QUERY(json_expr, json_path) ``` **Description** -Extracts a JSON value and converts it to a -SQL JSON-formatted `STRING` or `JSON` value. -This function uses single quotes and brackets to escape invalid -[JSONPath][JSONPath-format] characters in JSON keys. For example: `['a.b']`. +Extracts a JSON value and converts it to a SQL +JSON-formatted `STRING` or +`JSON` value. +This function uses double quotes to escape invalid +[JSONPath][JSONPath-format] characters in JSON keys. For example: `"a.b"`. Arguments: @@ -21855,7 +20027,7 @@ Arguments: For example: ```sql - SELECT JSON_EXTRACT("null", "$") -- Returns a SQL NULL + SELECT JSON_QUERY("null", "$") -- Returns a SQL NULL ``` + `json_expr`: JSON. For example: @@ -21866,7 +20038,7 @@ Arguments: Extracts a JSON `null` when a JSON `null` is encountered. ```sql - SELECT JSON_EXTRACT(JSON 'null', "$") -- Returns a JSON 'null' + SELECT JSON_QUERY(JSON 'null', "$") -- Returns a JSON 'null' ``` + `json_path`: The [JSONPath][JSONPath-format]. This identifies the data that you want to obtain from the input. @@ -21885,7 +20057,7 @@ In the following example, JSON data is extracted and returned as JSON. ```sql SELECT - JSON_EXTRACT(JSON '{"class": {"students": [{"id": 5}, {"id": 12}]}}', '$.class') + JSON_QUERY(JSON '{"class": {"students": [{"id": 5}, {"id": 12}]}}', '$.class') AS json_data; /*-----------------------------------* @@ -21899,7 +20071,7 @@ In the following examples, JSON data is extracted and returned as JSON-formatted strings. ```sql -SELECT JSON_EXTRACT(json_text, '$') AS json_text_string +SELECT JSON_QUERY(json_text, '$') AS json_text_string FROM UNNEST([ '{"class": {"students": [{"name": "Jane"}]}}', '{"class": {"students": []}}', @@ -21916,7 +20088,7 @@ FROM UNNEST([ ``` ```sql -SELECT JSON_EXTRACT(json_text, '$.class.students[0]') AS first_student +SELECT JSON_QUERY(json_text, '$.class.students[0]') AS first_student FROM UNNEST([ '{"class": {"students": [{"name": "Jane"}]}}', '{"class": {"students": []}}', @@ -21933,7 +20105,7 @@ FROM UNNEST([ ``` ```sql -SELECT JSON_EXTRACT(json_text, '$.class.students[1].name') AS second_student_name +SELECT JSON_QUERY(json_text, '$.class.students[1].name') AS second_student_name FROM UNNEST([ '{"class": {"students": [{"name": "Jane"}]}}', '{"class": {"students": []}}', @@ -21952,7 +20124,7 @@ FROM UNNEST([ ``` ```sql -SELECT JSON_EXTRACT(json_text, "$.class['students']") AS student_names +SELECT JSON_QUERY(json_text, '$.class."students"') AS student_names FROM UNNEST([ '{"class": {"students": [{"name": "Jane"}]}}', '{"class": {"students": []}}', @@ -21969,272 +20141,246 @@ FROM UNNEST([ ``` ```sql -SELECT JSON_EXTRACT('{"a": null}', "$.a"); -- Returns a SQL NULL -SELECT JSON_EXTRACT('{"a": null}', "$.b"); -- Returns a SQL NULL +SELECT JSON_QUERY('{"a": null}', "$.a"); -- Returns a SQL NULL +SELECT JSON_QUERY('{"a": null}', "$.b"); -- Returns a SQL NULL ``` ```sql -SELECT JSON_EXTRACT(JSON '{"a": null}', "$.a"); -- Returns a JSON 'null' -SELECT JSON_EXTRACT(JSON '{"a": null}', "$.b"); -- Returns a SQL NULL +SELECT JSON_QUERY(JSON '{"a": null}', "$.a"); -- Returns a JSON 'null' +SELECT JSON_QUERY(JSON '{"a": null}', "$.b"); -- Returns a SQL NULL ``` -[json-query]: #json_query - [JSONPath-format]: #JSONPath_format [differences-json-and-string]: #differences_json_and_string -### `JSON_EXTRACT_SCALAR` - -Note: This function is deprecated. Consider using [JSON_VALUE][json-value]. +### `JSON_QUERY_ARRAY` ```sql -JSON_EXTRACT_SCALAR(json_string_expr[, json_path]) +JSON_QUERY_ARRAY(json_string_expr[, json_path]) ``` ```sql -JSON_EXTRACT_SCALAR(json_expr[, json_path]) +JSON_QUERY_ARRAY(json_expr[, json_path]) ``` **Description** -Extracts a JSON scalar value and converts it to a SQL `STRING` value. -In addition, this function: - -+ Removes the outermost quotes and unescapes the return values. -+ Returns a SQL `NULL` if a non-scalar value is selected. -+ Uses single quotes and brackets to escape invalid [JSONPath][JSONPath-format] - characters in JSON keys. For example: `['a.b']`. +Extracts a JSON array and converts it to +a SQL `ARRAY` or +`ARRAY` value. +In addition, this function uses double quotes to escape invalid +[JSONPath][JSONPath-format] characters in JSON keys. For example: `"a.b"`. Arguments: + `json_string_expr`: A JSON-formatted string. For example: ``` - '{"name": "Jane", "age": "6"}' + '["a", "b", {"key": "c"}]' ``` + `json_expr`: JSON. For example: ``` - JSON '{"name": "Jane", "age": "6"}' + JSON '["a", "b", {"key": "c"}]' ``` + `json_path`: The [JSONPath][JSONPath-format]. This identifies the data that you want to obtain from the input. If this optional parameter is not provided, then the JSONPath `$` symbol is applied, which means that all of the data is analyzed. - If `json_path` returns a JSON `null` or a non-scalar value (in other words, - if `json_path` refers to an object or an array), then a SQL `NULL` is - returned. - There are differences between the JSON-formatted string and JSON input types. For details, see [Differences between the JSON and JSON-formatted STRING types][differences-json-and-string]. **Return type** -`STRING` ++ `json_string_expr`: `ARRAY` ++ `json_expr`: `ARRAY` **Examples** -In the following example, `age` is extracted. +This extracts items in JSON to an array of `JSON` values: ```sql -SELECT JSON_EXTRACT_SCALAR(JSON '{"name": "Jakob", "age": "6" }', '$.age') AS scalar_age; +SELECT JSON_QUERY_ARRAY( + JSON '{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits' + ) AS json_array; -/*------------* - | scalar_age | - +------------+ - | 6 | - *------------*/ +/*---------------------------------* + | json_array | + +---------------------------------+ + | ["apples", "oranges", "grapes"] | + *---------------------------------*/ ``` -The following example compares how results are returned for the `JSON_EXTRACT` -and `JSON_EXTRACT_SCALAR` functions. - -```sql -SELECT JSON_EXTRACT('{"name": "Jakob", "age": "6" }', '$.name') AS json_name, - JSON_EXTRACT_SCALAR('{"name": "Jakob", "age": "6" }', '$.name') AS scalar_name, - JSON_EXTRACT('{"name": "Jakob", "age": "6" }', '$.age') AS json_age, - JSON_EXTRACT_SCALAR('{"name": "Jakob", "age": "6" }', '$.age') AS scalar_age; - -/*-----------+-------------+----------+------------* - | json_name | scalar_name | json_age | scalar_age | - +-----------+-------------+----------+------------+ - | "Jakob" | Jakob | "6" | 6 | - *-----------+-------------+----------+------------*/ -``` +This extracts the items in a JSON-formatted string to a string array: ```sql -SELECT JSON_EXTRACT('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract, - JSON_EXTRACT_SCALAR('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract_scalar; +SELECT JSON_QUERY_ARRAY('[1, 2, 3]') AS string_array; -/*--------------------+---------------------* - | json_extract | json_extract_scalar | - +--------------------+---------------------+ - | ["apple","banana"] | NULL | - *--------------------+---------------------*/ +/*--------------* + | string_array | + +--------------+ + | [1, 2, 3] | + *--------------*/ ``` -In cases where a JSON key uses invalid JSONPath characters, you can escape those -characters using single quotes and brackets, `[' ']`. For example: +This extracts a string array and converts it to an integer array: ```sql -SELECT JSON_EXTRACT_SCALAR('{"a.b": {"c": "world"}}', "$['a.b'].c") AS hello; +SELECT ARRAY( + SELECT CAST(integer_element AS INT64) + FROM UNNEST( + JSON_QUERY_ARRAY('[1, 2, 3]','$') + ) AS integer_element +) AS integer_array; -/*-------* - | hello | - +-------+ - | world | - *-------*/ +/*---------------* + | integer_array | + +---------------+ + | [1, 2, 3] | + *---------------*/ ``` -[json-value]: #json_value - -[JSONPath-format]: #JSONPath_format - -[differences-json-and-string]: #differences_json_and_string - -### `JSON_OBJECT` - -+ [Signature 1](#json_object_signature1): - `JSON_OBJECT([json_key, json_value][, ...])` -+ [Signature 2](#json_object_signature2): - `JSON_OBJECT(json_key_array, json_value_array)` - -#### Signature 1 - +This extracts string values in a JSON-formatted string to an array: ```sql -JSON_OBJECT([json_key, json_value][, ...]) -``` - -**Description** - -Creates a JSON object, using key-value pairs. - -Arguments: - -+ `json_key`: A `STRING` value that represents a key. -+ `json_value`: A [JSON encoding-supported][json-encodings] value. - -Details: - -+ If two keys are passed in with the same name, only the first key-value pair - is preserved. -+ The order of key-value pairs is not preserved. -+ If `json_key` is `NULL`, an error is produced. +-- Doesn't strip the double quotes +SELECT JSON_QUERY_ARRAY('["apples", "oranges", "grapes"]', '$') AS string_array; -**Return type** +/*---------------------------------* + | string_array | + +---------------------------------+ + | ["apples", "oranges", "grapes"] | + *---------------------------------*/ -`JSON` +-- Strips the double quotes +SELECT ARRAY( + SELECT JSON_VALUE(string_element, '$') + FROM UNNEST(JSON_QUERY_ARRAY('["apples", "oranges", "grapes"]', '$')) AS string_element +) AS string_array; -**Examples** +/*---------------------------* + | string_array | + +---------------------------+ + | [apples, oranges, grapes] | + *---------------------------*/ +``` -You can create an empty JSON object by passing in no JSON keys and values. -For example: +This extracts only the items in the `fruit` property to an array: ```sql -SELECT JSON_OBJECT() AS json_data +SELECT JSON_QUERY_ARRAY( + '{"fruit": [{"apples": 5, "oranges": 10}, {"apples": 2, "oranges": 4}], "vegetables": [{"lettuce": 7, "kale": 8}]}', + '$.fruit' +) AS string_array; -/*-----------* - | json_data | - +-----------+ - | {} | - *-----------*/ +/*-------------------------------------------------------* + | string_array | + +-------------------------------------------------------+ + | [{"apples":5,"oranges":10}, {"apples":2,"oranges":4}] | + *-------------------------------------------------------*/ ``` -You can create a JSON object by passing in key-value pairs. For example: +These are equivalent: ```sql -SELECT JSON_OBJECT('foo', 10, 'bar', TRUE) AS json_data - -/*-----------------------* - | json_data | - +-----------------------+ - | {"bar":true,"foo":10} | - *-----------------------*/ -``` +SELECT JSON_QUERY_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits') AS string_array; -```sql -SELECT JSON_OBJECT('foo', 10, 'bar', ['a', 'b']) AS json_data +SELECT JSON_QUERY_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$."fruits"') AS string_array; -/*----------------------------* - | json_data | - +----------------------------+ - | {"bar":["a","b"],"foo":10} | - *----------------------------*/ +-- The queries above produce the following result: +/*---------------------------------* + | string_array | + +---------------------------------+ + | ["apples", "oranges", "grapes"] | + *---------------------------------*/ ``` -```sql -SELECT JSON_OBJECT('a', NULL, 'b', JSON 'null') AS json_data - -/*---------------------* - | json_data | - +---------------------+ - | {"a":null,"b":null} | - *---------------------*/ -``` +In cases where a JSON key uses invalid JSONPath characters, you can escape those +characters using double quotes: `" "`. For example: ```sql -SELECT JSON_OBJECT('a', 10, 'a', 'foo') AS json_data +SELECT JSON_QUERY_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello; /*-----------* - | json_data | + | hello | +-----------+ - | {"a":10} | + | ["world"] | *-----------*/ ``` +The following examples show how invalid requests and empty arrays are handled: + ```sql -WITH Items AS (SELECT 'hello' AS key, 'world' AS value) -SELECT JSON_OBJECT(key, value) AS json_data FROM Items +-- An error is returned if you provide an invalid JSONPath. +SELECT JSON_QUERY_ARRAY('["foo", "bar", "baz"]', 'INVALID_JSONPath') AS result; -/*-------------------* - | json_data | - +-------------------+ - | {"hello":"world"} | - *-------------------*/ -``` +-- If the JSONPath does not refer to an array, then NULL is returned. +SELECT JSON_QUERY_ARRAY('{"a": "foo"}', '$.a') AS result; -An error is produced if a SQL `NULL` is passed in for a JSON key. +/*--------* + | result | + +--------+ + | NULL | + *--------*/ -```sql --- Error: A key cannot be NULL. -SELECT JSON_OBJECT(NULL, 1) AS json_data -``` +-- If a key that does not exist is specified, then the result is NULL. +SELECT JSON_QUERY_ARRAY('{"a": "foo"}', '$.b') AS result; -An error is produced if the number of JSON keys and JSON values don't match: +/*--------* + | result | + +--------+ + | NULL | + *--------*/ -```sql --- Error: No matching signature for function JSON_OBJECT for argument types: --- STRING, INT64, STRING -SELECT JSON_OBJECT('a', 1, 'b') AS json_data +-- Empty arrays in JSON-formatted strings are supported. +SELECT JSON_QUERY_ARRAY('{"a": "foo", "b": []}', '$.b') AS result; + +/*--------* + | result | + +--------+ + | [] | + *--------*/ ``` -#### Signature 2 - +[JSONPath-format]: #JSONPath_format + +[differences-json-and-string]: #differences_json_and_string + +### `JSON_REMOVE` ```sql -JSON_OBJECT(json_key_array, json_value_array) +JSON_REMOVE(json_expr, json_path[, ...]) ``` -Creates a JSON object, using an array of keys and values. +Produces a new SQL `JSON` value with the specified JSON data removed. Arguments: -+ `json_key_array`: An array of zero or more `STRING` keys. -+ `json_value_array`: An array of zero or more - [JSON encoding-supported][json-encodings] values. ++ `json_expr`: JSON. For example: + + ``` + JSON '{"class": {"students": [{"name": "Jane"}]}}' + ``` ++ `json_path`: Remove data at this [JSONPath][JSONPath-format] in `json_expr`. Details: -+ If two keys are passed in with the same name, only the first key-value pair - is preserved. -+ The order of key-value pairs is not preserved. -+ The number of keys must match the number of values, otherwise an error is ++ Paths are evaluated left to right. The JSON produced by evaluating the + first path is the JSON for the next path. ++ The operation ignores non-existent paths and continue processing the rest + of the paths. ++ For each path, the entire matched JSON subtree is deleted. ++ If the path matches a JSON object key, this function deletes the + key-value pair. ++ If the path matches an array element, this function deletes the specific + element from the matched array. ++ If removing the path results in an empty JSON object or empty JSON array, + the empty structure is preserved. ++ If `json_path` is `$` or an invalid [JSONPath][JSONPath-format], an error is produced. -+ If any argument is `NULL`, an error is produced. -+ If a key in `json_key_array` is `NULL`, an error is produced. ++ If `json_path` is SQL `NULL`, the path operation is ignored. **Return type** @@ -22242,708 +20388,248 @@ Details: **Examples** -You can create an empty JSON object by passing in an empty array of -keys and values. For example: +In the following example, the path `$[1]` is matched and removes +`["b", "c"]`. ```sql -SELECT JSON_OBJECT(CAST([] AS ARRAY), []) AS json_data +SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]') AS json_data /*-----------* | json_data | +-----------+ - | {} | + | ["a","d"] | *-----------*/ ``` -You can create a JSON object by passing in an array of keys and an array of -values. For example: +You can use the field access operator to pass JSON data into this function. +For example: ```sql -SELECT JSON_OBJECT(['a', 'b'], [10, NULL]) AS json_data +WITH T AS (SELECT JSON '{"a": {"b": 10, "c": 20}}' AS data) +SELECT JSON_REMOVE(data.a, '$.b') AS json_data FROM T -/*-------------------* - | json_data | - +-------------------+ - | {"a":10,"b":null} | - *-------------------*/ +/*-----------* + | json_data | + +-----------+ + | {"c":20} | + *-----------*/ ``` +In the following example, the first path `$[1]` is matched and removes +`["b", "c"]`. Then, the second path `$[1]` is matched and removes `"d"`. + ```sql -SELECT JSON_OBJECT(['a', 'b'], [JSON '10', JSON '"foo"']) AS json_data +SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]', '$[1]') AS json_data -/*--------------------* - | json_data | - +--------------------+ - | {"a":10,"b":"foo"} | - *--------------------*/ +/*-----------* + | json_data | + +-----------+ + | ["a"] | + *-----------*/ ``` +The structure of an empty array is preserved when all elements are deleted +from it. For example: + ```sql -SELECT - JSON_OBJECT( - ['a', 'b'], - [STRUCT(10 AS id, 'Red' AS color), STRUCT(20 AS id, 'Blue' AS color)]) - AS json_data +SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]', '$[1]', '$[0]') AS json_data -/*------------------------------------------------------------* - | json_data | - +------------------------------------------------------------+ - | {"a":{"color":"Red","id":10},"b":{"color":"Blue","id":20}} | - *------------------------------------------------------------*/ +/*-----------* + | json_data | + +-----------+ + | [] | + *-----------*/ ``` +In the following example, the path `$.a.b.c` is matched and removes the +`"c":"d"` key-value pair from the JSON object. + ```sql -SELECT - JSON_OBJECT( - ['a', 'b'], - [TO_JSON(10), TO_JSON(['foo', 'bar'])]) - AS json_data +SELECT JSON_REMOVE(JSON '{"a": {"b": {"c": "d"}}}', '$.a.b.c') AS json_data -/*----------------------------* - | json_data | - +----------------------------+ - | {"a":10,"b":["foo","bar"]} | - *----------------------------*/ +/*----------------* + | json_data | + +----------------+ + | {"a":{"b":{}}} | + *----------------*/ ``` -The following query groups by `id` and then creates an array of keys and -values from the rows with the same `id`: +In the following example, the path `$.a.b` is matched and removes the +`"b": {"c":"d"}` key-value pair from the JSON object. ```sql -WITH - Fruits AS ( - SELECT 0 AS id, 'color' AS json_key, 'red' AS json_value UNION ALL - SELECT 0, 'fruit', 'apple' UNION ALL - SELECT 1, 'fruit', 'banana' UNION ALL - SELECT 1, 'ripe', 'true' - ) -SELECT JSON_OBJECT(ARRAY_AGG(json_key), ARRAY_AGG(json_value)) AS json_data -FROM Fruits -GROUP BY id +SELECT JSON_REMOVE(JSON '{"a": {"b": {"c": "d"}}}', '$.a.b') AS json_data -/*----------------------------------* - | json_data | - +----------------------------------+ - | {"color":"red","fruit":"apple"} | - | {"fruit":"banana","ripe":"true"} | - *----------------------------------*/ +/*-----------* + | json_data | + +-----------+ + | {"a":{}} | + *-----------*/ ``` -An error is produced if the size of the JSON keys and values arrays don't -match: +In the following example, the path `$.b` is not valid, so the operation makes +no changes. ```sql --- Error: The number of keys and values must match. -SELECT JSON_OBJECT(['a', 'b'], [10]) AS json_data +SELECT JSON_REMOVE(JSON '{"a": 1}', '$.b') AS json_data + +/*-----------* + | json_data | + +-----------+ + | {"a":1} | + *-----------*/ ``` -An error is produced if the array of JSON keys or JSON values is a SQL `NULL`. +In the following example, path `$.a.b` and `$.b` don't exist, so those +operations are ignored, but the others are processed. ```sql --- Error: The keys array cannot be NULL. -SELECT JSON_OBJECT(CAST(NULL AS ARRAY), [10, 20]) AS json_data +SELECT JSON_REMOVE(JSON '{"a": [1, 2, 3]}', '$.a[0]', '$.a.b', '$.b', '$.a[0]') AS json_data + +/*-----------* + | json_data | + +-----------+ + | {"a":[3]} | + *-----------*/ ``` +If you pass in `$` as the path, an error is produced. For example: + ```sql --- Error: The values array cannot be NULL. -SELECT JSON_OBJECT(['a', 'b'], CAST(NULL AS ARRAY)) AS json_data +-- Error: The JSONPath cannot be '$' +SELECT JSON_REMOVE(JSON '{}', '$') AS json_data ``` -[json-encodings]: #json_encodings - -### `JSON_QUERY` +In the following example, the operation is ignored because you can't remove +data from a JSON null. ```sql -JSON_QUERY(json_string_expr, json_path) +SELECT JSON_REMOVE(JSON 'null', '$.a.b') AS json_data + +/*-----------* + | json_data | + +-----------+ + | null | + *-----------*/ ``` +### `JSON_SET` + ```sql -JSON_QUERY(json_expr, json_path) -``` +JSON_SET( + json_expr, + json_path_value_pair[, ...] + [, create_if_missing=> { TRUE | FALSE }] +) -**Description** +json_path_value_pair: + json_path, value +``` -Extracts a JSON value and converts it to a SQL -JSON-formatted `STRING` or -`JSON` value. -This function uses double quotes to escape invalid -[JSONPath][JSONPath-format] characters in JSON keys. For example: `"a.b"`. +Produces a new SQL `JSON` value with the specified JSON data inserted +or replaced. Arguments: -+ `json_string_expr`: A JSON-formatted string. For example: ++ `json_expr`: JSON. For example: ``` - '{"class": {"students": [{"name": "Jane"}]}}' + JSON '{"class": {"students": [{"name": "Jane"}]}}' ``` ++ `json_path_value_pair`: A value and the [JSONPath][JSONPath-format] for + that value. This includes: - Extracts a SQL `NULL` when a JSON-formatted string `null` is encountered. - For example: + + `json_path`: Insert or replace `value` at this [JSONPath][JSONPath-format] + in `json_expr`. - ```sql - SELECT JSON_QUERY("null", "$") -- Returns a SQL NULL - ``` -+ `json_expr`: JSON. For example: + + `value`: A [JSON encoding-supported][json-encodings] value to + insert. ++ `create_if_missing`: An optional, mandatory named argument. - ``` - JSON '{"class": {"students": [{"name": "Jane"}]}}' - ``` + + If TRUE (default), replaces or inserts data if the path does not exist. - Extracts a JSON `null` when a JSON `null` is encountered. + + If FALSE, only _existing_ JSONPath values are replaced. If the path + doesn't exist, the set operation is ignored. - ```sql - SELECT JSON_QUERY(JSON 'null', "$") -- Returns a JSON 'null' - ``` -+ `json_path`: The [JSONPath][JSONPath-format]. This identifies the data that - you want to obtain from the input. +Details: -There are differences between the JSON-formatted string and JSON input types. -For details, see [Differences between the JSON and JSON-formatted STRING types][differences-json-and-string]. ++ Path value pairs are evaluated left to right. The JSON produced by + evaluating one pair becomes the JSON against which the next pair + is evaluated. ++ If a matched path has an existing value, it overwrites the existing data + with `value`. ++ If `create_if_missing` is `TRUE`: + + + If a path doesn't exist, the remainder of the path is recursively + created. + + If the matched path prefix points to a JSON null, the remainder of the + path is recursively created, and `value` is inserted. + + If a path token points to a JSON array and the specified index is + _larger_ than the size of the array, pads the JSON array with JSON + nulls, recursively creates the remainder of the path at the specified + index, and inserts the path value pair. ++ This function applies all path value pair set operations even if an + individual path value pair operation is invalid. For invalid operations, + the operation is ignored and the function continues to process the rest + of the path value pairs. ++ If the path exists but has an incompatible type at any given path + token, no update happens for that specific path value pair. ++ If any `json_path` is an invalid [JSONPath][JSONPath-format], an error is + produced. ++ If `json_expr` is SQL `NULL`, the function returns SQL `NULL`. ++ If `json_path` is SQL `NULL`, the `json_path_value_pair` operation is + ignored. ++ If `create_if_missing` is SQL `NULL`, the set operation is ignored. **Return type** -+ `json_string_expr`: A JSON-formatted `STRING` -+ `json_expr`: `JSON` +`JSON` **Examples** -In the following example, JSON data is extracted and returned as JSON. +In the following example, the path `$` matches the entire `JSON` value +and replaces it with `{"b": 2, "c": 3}`. ```sql -SELECT - JSON_QUERY(JSON '{"class": {"students": [{"id": 5}, {"id": 12}]}}', '$.class') - AS json_data; - -/*-----------------------------------* - | json_data | - +-----------------------------------+ - | {"students":[{"id":5},{"id":12}]} | - *-----------------------------------*/ -``` - -In the following examples, JSON data is extracted and returned as -JSON-formatted strings. - -```sql -SELECT JSON_QUERY(json_text, '$') AS json_text_string -FROM UNNEST([ - '{"class": {"students": [{"name": "Jane"}]}}', - '{"class": {"students": []}}', - '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' - ]) AS json_text; - -/*-----------------------------------------------------------* - | json_text_string | - +-----------------------------------------------------------+ - | {"class":{"students":[{"name":"Jane"}]}} | - | {"class":{"students":[]}} | - | {"class":{"students":[{"name":"John"},{"name":"Jamie"}]}} | - *-----------------------------------------------------------*/ -``` - -```sql -SELECT JSON_QUERY(json_text, '$.class.students[0]') AS first_student -FROM UNNEST([ - '{"class": {"students": [{"name": "Jane"}]}}', - '{"class": {"students": []}}', - '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' - ]) AS json_text; - -/*-----------------* - | first_student | - +-----------------+ - | {"name":"Jane"} | - | NULL | - | {"name":"John"} | - *-----------------*/ -``` - -```sql -SELECT JSON_QUERY(json_text, '$.class.students[1].name') AS second_student_name -FROM UNNEST([ - '{"class": {"students": [{"name": "Jane"}]}}', - '{"class": {"students": []}}', - '{"class": {"students": [{"name": "John"}, {"name": null}]}}', - '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' - ]) AS json_text; - -/*----------------* - | second_student | - +----------------+ - | NULL | - | NULL | - | NULL | - | "Jamie" | - *----------------*/ -``` - -```sql -SELECT JSON_QUERY(json_text, '$.class."students"') AS student_names -FROM UNNEST([ - '{"class": {"students": [{"name": "Jane"}]}}', - '{"class": {"students": []}}', - '{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}' - ]) AS json_text; - -/*------------------------------------* - | student_names | - +------------------------------------+ - | [{"name":"Jane"}] | - | [] | - | [{"name":"John"},{"name":"Jamie"}] | - *------------------------------------*/ -``` - -```sql -SELECT JSON_QUERY('{"a": null}', "$.a"); -- Returns a SQL NULL -SELECT JSON_QUERY('{"a": null}', "$.b"); -- Returns a SQL NULL -``` - -```sql -SELECT JSON_QUERY(JSON '{"a": null}', "$.a"); -- Returns a JSON 'null' -SELECT JSON_QUERY(JSON '{"a": null}', "$.b"); -- Returns a SQL NULL -``` - -[JSONPath-format]: #JSONPath_format - -[differences-json-and-string]: #differences_json_and_string - -### `JSON_QUERY_ARRAY` - -```sql -JSON_QUERY_ARRAY(json_string_expr[, json_path]) -``` - -```sql -JSON_QUERY_ARRAY(json_expr[, json_path]) -``` - -**Description** - -Extracts a JSON array and converts it to -a SQL `ARRAY` or -`ARRAY` value. -In addition, this function uses double quotes to escape invalid -[JSONPath][JSONPath-format] characters in JSON keys. For example: `"a.b"`. - -Arguments: - -+ `json_string_expr`: A JSON-formatted string. For example: - - ``` - '["a", "b", {"key": "c"}]' - ``` -+ `json_expr`: JSON. For example: - - ``` - JSON '["a", "b", {"key": "c"}]' - ``` -+ `json_path`: The [JSONPath][JSONPath-format]. This identifies the data that - you want to obtain from the input. If this optional parameter is not - provided, then the JSONPath `$` symbol is applied, which means that all of - the data is analyzed. - -There are differences between the JSON-formatted string and JSON input types. -For details, see [Differences between the JSON and JSON-formatted STRING types][differences-json-and-string]. - -**Return type** - -+ `json_string_expr`: `ARRAY` -+ `json_expr`: `ARRAY` - -**Examples** - -This extracts items in JSON to an array of `JSON` values: - -```sql -SELECT JSON_QUERY_ARRAY( - JSON '{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits' - ) AS json_array; - -/*---------------------------------* - | json_array | - +---------------------------------+ - | ["apples", "oranges", "grapes"] | - *---------------------------------*/ -``` - -This extracts the items in a JSON-formatted string to a string array: - -```sql -SELECT JSON_QUERY_ARRAY('[1, 2, 3]') AS string_array; - -/*--------------* - | string_array | - +--------------+ - | [1, 2, 3] | - *--------------*/ -``` - -This extracts a string array and converts it to an integer array: - -```sql -SELECT ARRAY( - SELECT CAST(integer_element AS INT64) - FROM UNNEST( - JSON_QUERY_ARRAY('[1, 2, 3]','$') - ) AS integer_element -) AS integer_array; +SELECT JSON_SET(JSON '{"a": 1}', '$', JSON '{"b": 2, "c": 3}') AS json_data /*---------------* - | integer_array | + | json_data | +---------------+ - | [1, 2, 3] | + | {"b":2,"c":3} | *---------------*/ ``` -This extracts string values in a JSON-formatted string to an array: - -```sql --- Doesn't strip the double quotes -SELECT JSON_QUERY_ARRAY('["apples", "oranges", "grapes"]', '$') AS string_array; - -/*---------------------------------* - | string_array | - +---------------------------------+ - | ["apples", "oranges", "grapes"] | - *---------------------------------*/ - --- Strips the double quotes -SELECT ARRAY( - SELECT JSON_VALUE(string_element, '$') - FROM UNNEST(JSON_QUERY_ARRAY('["apples", "oranges", "grapes"]', '$')) AS string_element -) AS string_array; - -/*---------------------------* - | string_array | - +---------------------------+ - | [apples, oranges, grapes] | - *---------------------------*/ -``` - -This extracts only the items in the `fruit` property to an array: - -```sql -SELECT JSON_QUERY_ARRAY( - '{"fruit": [{"apples": 5, "oranges": 10}, {"apples": 2, "oranges": 4}], "vegetables": [{"lettuce": 7, "kale": 8}]}', - '$.fruit' -) AS string_array; - -/*-------------------------------------------------------* - | string_array | - +-------------------------------------------------------+ - | [{"apples":5,"oranges":10}, {"apples":2,"oranges":4}] | - *-------------------------------------------------------*/ -``` - -These are equivalent: - -```sql -SELECT JSON_QUERY_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits') AS string_array; - -SELECT JSON_QUERY_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$."fruits"') AS string_array; - --- The queries above produce the following result: -/*---------------------------------* - | string_array | - +---------------------------------+ - | ["apples", "oranges", "grapes"] | - *---------------------------------*/ -``` - -In cases where a JSON key uses invalid JSONPath characters, you can escape those -characters using double quotes: `" "`. For example: - -```sql -SELECT JSON_QUERY_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello; - -/*-----------* - | hello | - +-----------+ - | ["world"] | - *-----------*/ -``` - -The following examples show how invalid requests and empty arrays are handled: +In the following example, `create_if_missing` is `FALSE` and the path `$.b` +doesn't exist, so the set operation is ignored. ```sql --- An error is returned if you provide an invalid JSONPath. -SELECT JSON_QUERY_ARRAY('["foo", "bar", "baz"]', 'INVALID_JSONPath') AS result; - --- If the JSONPath does not refer to an array, then NULL is returned. -SELECT JSON_QUERY_ARRAY('{"a": "foo"}', '$.a') AS result; - -/*--------* - | result | - +--------+ - | NULL | - *--------*/ - --- If a key that does not exist is specified, then the result is NULL. -SELECT JSON_QUERY_ARRAY('{"a": "foo"}', '$.b') AS result; - -/*--------* - | result | - +--------+ - | NULL | - *--------*/ - --- Empty arrays in JSON-formatted strings are supported. -SELECT JSON_QUERY_ARRAY('{"a": "foo", "b": []}', '$.b') AS result; - -/*--------* - | result | - +--------+ - | [] | - *--------*/ -``` - -[JSONPath-format]: #JSONPath_format - -[differences-json-and-string]: #differences_json_and_string - -### `JSON_REMOVE` - -```sql -JSON_REMOVE(json_expr, json_path[, ...]) -``` - -Produces a new SQL `JSON` value with the specified JSON data removed. - -Arguments: - -+ `json_expr`: JSON. For example: - - ``` - JSON '{"class": {"students": [{"name": "Jane"}]}}' - ``` -+ `json_path`: Remove data at this [JSONPath][JSONPath-format] in `json_expr`. - -Details: - -+ Paths are evaluated left to right. The JSON produced by evaluating the - first path is the JSON for the next path. -+ The operation ignores non-existent paths and continue processing the rest - of the paths. -+ For each path, the entire matched JSON subtree is deleted. -+ If the path matches a JSON object key, this function deletes the - key-value pair. -+ If the path matches an array element, this function deletes the specific - element from the matched array. -+ If removing the path results in an empty JSON object or empty JSON array, - the empty structure is preserved. -+ If `json_path` is `$` or an invalid [JSONPath][JSONPath-format], an error is - produced. -+ If `json_path` is SQL `NULL`, the path operation is ignored. - -**Return type** - -`JSON` - -**Examples** - -In the following example, the path `$[1]` is matched and removes -`["b", "c"]`. - -```sql -SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]') AS json_data - -/*-----------* - | json_data | - +-----------+ - | ["a","d"] | - *-----------*/ -``` - -You can use the field access operator to pass JSON data into this function. -For example: - -```sql -WITH T AS (SELECT JSON '{"a": {"b": 10, "c": 20}}' AS data) -SELECT JSON_REMOVE(data.a, '$.b') AS json_data FROM T - -/*-----------* - | json_data | - +-----------+ - | {"c":20} | - *-----------*/ -``` - -In the following example, the first path `$[1]` is matched and removes -`["b", "c"]`. Then, the second path `$[1]` is matched and removes `"d"`. - -```sql -SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]', '$[1]') AS json_data - -/*-----------* - | json_data | - +-----------+ - | ["a"] | - *-----------*/ -``` - -The structure of an empty array is preserved when all elements are deleted -from it. For example: - -```sql -SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]', '$[1]', '$[0]') AS json_data - -/*-----------* - | json_data | - +-----------+ - | [] | - *-----------*/ -``` - -In the following example, the path `$.a.b.c` is matched and removes the -`"c":"d"` key-value pair from the JSON object. - -```sql -SELECT JSON_REMOVE(JSON '{"a": {"b": {"c": "d"}}}', '$.a.b.c') AS json_data - -/*----------------* - | json_data | - +----------------+ - | {"a":{"b":{}}} | - *----------------*/ -``` - -In the following example, the path `$.a.b` is matched and removes the -`"b": {"c":"d"}` key-value pair from the JSON object. - -```sql -SELECT JSON_REMOVE(JSON '{"a": {"b": {"c": "d"}}}', '$.a.b') AS json_data - -/*-----------* - | json_data | - +-----------+ - | {"a":{}} | - *-----------*/ -``` - -In the following example, the path `$.b` is not valid, so the operation makes -no changes. - -```sql -SELECT JSON_REMOVE(JSON '{"a": 1}', '$.b') AS json_data - -/*-----------* - | json_data | - +-----------+ - | {"a":1} | - *-----------*/ -``` - -In the following example, path `$.a.b` and `$.b` don't exist, so those -operations are ignored, but the others are processed. - -```sql -SELECT JSON_REMOVE(JSON '{"a": [1, 2, 3]}', '$.a[0]', '$.a.b', '$.b', '$.a[0]') AS json_data - -/*-----------* - | json_data | - +-----------+ - | {"a":[3]} | - *-----------*/ -``` - -If you pass in `$` as the path, an error is produced. For example: - -```sql --- Error: The JSONPath cannot be '$' -SELECT JSON_REMOVE(JSON '{}', '$') AS json_data -``` - -In the following example, the operation is ignored because you can't remove -data from a JSON null. - -```sql -SELECT JSON_REMOVE(JSON 'null', '$.a.b') AS json_data - -/*-----------* - | json_data | - +-----------+ - | null | - *-----------*/ -``` - -### `JSON_SET` - -```sql -JSON_SET( - json_expr, - json_path_value_pair[, ...] -) +SELECT JSON_SET( + JSON '{"a": 1}', + "$.b", 999, + create_if_missing => false) AS json_data -json_path_value_pair: - json_path, value +/*------------* + | json_data | + +------------+ + | '{"a": 1}' | + *------------*/ ``` -Produces a new SQL `JSON` value with the specified JSON data inserted -or replaced. - -Arguments: - -+ `json_expr`: JSON. For example: - - ``` - JSON '{"class": {"students": [{"name": "Jane"}]}}' - ``` -+ `json_path_value_pair`: A value and the [JSONPath][JSONPath-format] for - that value. This includes: - - + `json_path`: Insert or replace `value` at this [JSONPath][JSONPath-format] - in `json_expr`. - - + `value`: A [JSON encoding-supported][json-encodings] value to - insert. - -Details: - -+ Path value pairs are evaluated left to right. The JSON produced by - evaluating one pair becomes the JSON against which the next pair - is evaluated. -+ If a path doesn't exist, the remainder of the path is recursively created. -+ If a matched path has an existing value, it overwrites the existing data - with `value`. -+ This function applies all path value pair set operations even if an - individual path value pair operation is invalid. For invalid operations, - the operation is ignored and the function continues to process the rest - of the path value pairs. -+ If the path exists but has an incompatible type at any given path - token, no update happens for that specific path value pair. -+ If the matched path prefix points to a JSON null, the remainder of the - path is recursively created, and `value` is inserted. -+ If a path token points to a JSON array and the specified - index is _larger_ than the size of the array, pads the JSON array with - JSON nulls, recursively creates the remainder of the path at the specified - index, and inserts the path value pair. -+ If a matched path points to a JSON array and the specified index is - _less than_ the length of the array, replaces the existing JSON array value - at index with `value`. -+ If any `json_path` is an invalid [JSONPath][JSONPath-format], an error is - produced. -+ If `json_expr` is SQL `NULL`, the function returns SQL `NULL`. -+ If `json_path` is SQL `NULL`, the `json_path_value_pair` operation is - ignored. - -**Return type** - -`JSON` - -**Examples** - -In the following example, the path `$` matches the entire `JSON` value -and replaces it with `{"b": 2, "c": 3}`. +In the following example, `create_if_missing` is `TRUE` and the path `$.a` +exists, so the value is replaced. ```sql -SELECT JSON_SET(JSON '{"a": 1}', '$', JSON '{"b": 2, "c": 3}') AS json_data +SELECT JSON_SET( + JSON '{"a": 1}', + "$.a", 999, + create_if_missing => false) AS json_data -/*---------------* - | json_data | - +---------------+ - | {"b":2,"c":3} | - *---------------*/ +/*--------------* + | json_data | + +--------------+ + | '{"a": 999}' | + *--------------*/ ``` In the following example, the path `$.a` is matched, but `$.a.b` does not @@ -23886,8 +21572,8 @@ Details: string - If the JSON string represents a JSON number, parses it as a - BIGNUMERIC value, and then safe casts the result as a + If the JSON string represents a JSON number, parses it as + a BIGNUMERIC value, and then safe casts the result as a DOUBLE value. If the JSON string can't be converted, returns NULL. @@ -24109,8 +21795,8 @@ Details: string - If the JSON string represents a JSON number, parses it as a - BIGNUMERIC value, and then safe casts the results as an + If the JSON string represents a JSON number, parses it as + a BIGNUMERIC value, and then safe casts the results as an INT64 value. If the JSON string can't be converted, returns NULL. @@ -24321,13 +22007,13 @@ Details: string - Returns the JSON string as a STRING value. + Returns the JSON string as a STRING value. number - Returns the JSON number as a STRING value. + Returns the JSON number as a STRING value. @@ -25432,9 +23118,26 @@ FROM t; [JSON-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#json_type -## Array functions +## KLL quantile functions -ZetaSQL supports the following array functions. +ZetaSQL supports KLL functions. + +The [KLL16 algorithm][kll-sketches] estimates +quantiles from [sketches][kll-sketches]. If you do not want +to work with sketches and do not need customized precision, consider +using [approximate aggregate functions][approx-functions-reference] +with system-defined precision. + +KLL functions are approximate aggregate functions. +Approximate aggregation requires significantly less memory than an exact +quantiles computation, but also introduces statistical error. +This makes approximate aggregation appropriate for large data streams for +which linear memory usage is impractical, as well as for data that is +already approximate. + +Note: While `APPROX_QUANTILES` is also returning approximate quantile results, +the functions from this section allow for partial aggregations and +re-aggregations. ### Function list @@ -25448,4219 +23151,4446 @@ ZetaSQL supports the following array functions. - ARRAY - - - - Produces an array with one element for each row in a subquery. - - - - - ARRAY_AVG - - - - Gets the average of non-NULL values in an array. - - - - - ARRAY_CONCAT - - - - Concatenates one or more arrays with the same element type into a - single array. - - - - - ARRAY_FILTER - - - - Takes an array, filters out unwanted elements, and returns the results - in a new array. - - - - - ARRAY_FIRST - - - - Gets the first element in an array. - - - - - ARRAY_INCLUDES - - - - Checks if there is an element in the array that is - equal to a search value. - - - - - ARRAY_INCLUDES_ALL + KLL_QUANTILES.EXTRACT_INT64 - Checks if all search values are in an array. + Gets a selected number of quantiles from an + INT64-initialized KLL sketch. - - ARRAY_INCLUDES_ANY + KLL_QUANTILES.EXTRACT_UINT64 - Checks if any search values are in an array. + Gets a selected number of quantiles from an + UINT64-initialized KLL sketch. - - ARRAY_IS_DISTINCT + KLL_QUANTILES.EXTRACT_DOUBLE - Checks if an array contains no repeated elements. + Gets a selected number of quantiles from a + DOUBLE-initialized KLL sketch. - ARRAY_LAST + KLL_QUANTILES.EXTRACT_POINT_INT64 - Gets the last element in an array. + Gets a specific quantile from an + INT64-initialized KLL sketch. - - ARRAY_LENGTH + KLL_QUANTILES.EXTRACT_POINT_UINT64 - Gets the number of elements in an array. + Gets a specific quantile from an + UINT64-initialized KLL sketch. - - ARRAY_MAX + KLL_QUANTILES.EXTRACT_POINT_DOUBLE - Gets the maximum non-NULL value in an array. + Gets a specific quantile from a + DOUBLE-initialized KLL sketch. - ARRAY_MIN + KLL_QUANTILES.INIT_INT64 - Gets the minimum non-NULL value in an array. + Aggregates values into an + INT64-initialized KLL sketch. - - ARRAY_REVERSE + KLL_QUANTILES.INIT_UINT64 - Reverses the order of elements in an array. + Aggregates values into an + UINT64-initialized KLL sketch. - - ARRAY_SLICE + KLL_QUANTILES.INIT_DOUBLE - Produces an array containing zero or more consecutive elements from an - input array. + Aggregates values into a + DOUBLE-initialized KLL sketch. - ARRAY_SUM + KLL_QUANTILES.MERGE_INT64 - Gets the sum of non-NULL values in an array. + Merges INT64-initialized KLL sketches into a new sketch, and + then gets the quantiles from the new sketch. - - ARRAY_TO_STRING + KLL_QUANTILES.MERGE_UINT64 - Produces a concatenation of the elements in an array as a - STRING value. + Merges UINT64-initialized KLL sketches into a new sketch, and + then gets the quantiles from the new sketch. - - ARRAY_TRANSFORM + KLL_QUANTILES.MERGE_DOUBLE - Transforms the elements of an array, and returns the results in a new - array. + Merges DOUBLE-initialized KLL sketches + into a new sketch, and then gets the quantiles from the new sketch. - FLATTEN + KLL_QUANTILES.MERGE_PARTIAL - Flattens arrays of nested data to create a single flat array. + Merges KLL sketches of the same underlying type into a new sketch. - GENERATE_ARRAY + KLL_QUANTILES.MERGE_POINT_INT64 - Generates an array of values in a range. + Merges INT64-initialized KLL sketches into a new sketch, and + then gets a specific quantile from the new sketch. - - GENERATE_DATE_ARRAY + KLL_QUANTILES.MERGE_POINT_UINT64 - Generates an array of dates in a range. + Merges UINT64-initialized KLL sketches into a new sketch, and + then gets a specific quantile from the new sketch. - - GENERATE_TIMESTAMP_ARRAY + KLL_QUANTILES.MERGE_POINT_DOUBLE - Generates an array of timestamps in a range. + Merges DOUBLE-initialized KLL sketches + into a new sketch, and then gets a specific quantile from the new sketch. -### `ARRAY` +### `KLL_QUANTILES.EXTRACT_INT64` ```sql -ARRAY(subquery) +KLL_QUANTILES.EXTRACT_INT64(sketch, number_of_segments) ``` **Description** -The `ARRAY` function returns an `ARRAY` with one element for each row in a -[subquery][subqueries]. +Gets a selected number of approximate quantiles from an +`INT64`-initialized KLL sketch, including the minimum value and the +maximum value in the input set. -If `subquery` produces a -[SQL table][datamodel-sql-tables], -the table must have exactly one column. Each element in the output `ARRAY` is -the value of the single column of a row in the table. +**Definitions** -If `subquery` produces a -[value table][datamodel-value-tables], -then each element in the output `ARRAY` is the entire corresponding row of the -value table. ++ `sketch`: `BYTES` KLL sketch initialized on the `INT64` data type. + If this is not a valid KLL quantiles sketch or if the underlying type + of the sketch can't be coerced into the `INT64` type, an error is produced. ++ `number_of_segments`: A positive `INT64` value that represents the number of + roughly equal-sized subsets that the quantiles partition the sketch-captured + input values into. -**Constraints** +**Details** -+ Subqueries are unordered, so the elements of the output `ARRAY` are not -guaranteed to preserve any order in the source table for the subquery. However, -if the subquery includes an `ORDER BY` clause, the `ARRAY` function will return -an `ARRAY` that honors that clause. -+ If the subquery returns more than one column, the `ARRAY` function returns an -error. -+ If the subquery returns an `ARRAY` typed column or `ARRAY` typed rows, the - `ARRAY` function returns an error that ZetaSQL does not support - `ARRAY`s with elements of type - [`ARRAY`][array-data-type]. -+ If the subquery returns zero rows, the `ARRAY` function returns an empty -`ARRAY`. It never returns a `NULL` `ARRAY`. +The number of returned values produced is always `number_of_segments + 1` as +an array in this order: -**Return type** ++ minimum value in input set ++ each approximate quantile ++ maximum value in input set -`ARRAY` +For example, if `number_of_segments` is `3`, and the result of this function +is `[0, 34, 67, 100]`, this means that `0` is the minimum value, +`34` and `67` are the approximate quantiles, and `100` is the maximum value. +In addition, the result represents the following three segments: +`0 to 34`, `34 to 67`, and `67 to 100`. -**Examples** +Note: This scalar function is similar to the aggregate function +`KLL_QUANTILES.MERGE_INT64`. -```sql -SELECT ARRAY - (SELECT 1 UNION ALL - SELECT 2 UNION ALL - SELECT 3) AS new_array; +**Return Type** -/*-----------* - | new_array | - +-----------+ - | [1, 2, 3] | - *-----------*/ -``` +`ARRAY` -To construct an `ARRAY` from a subquery that contains multiple -columns, change the subquery to use `SELECT AS STRUCT`. Now -the `ARRAY` function will return an `ARRAY` of `STRUCT`s. The `ARRAY` will -contain one `STRUCT` for each row in the subquery, and each of these `STRUCT`s -will contain a field for each column in that row. +**Example** + +The following query initializes a KLL sketch, `kll_sketch`, from `Data`, and +then extracts the minimum value (`0`), the maximum value (`100`), and +approximate quantiles in between. ```sql +WITH Data AS ( + SELECT x FROM UNNEST(GENERATE_ARRAY(1, 100)) AS x +) SELECT - ARRAY - (SELECT AS STRUCT 1, 2, 3 - UNION ALL SELECT AS STRUCT 4, 5, 6) AS new_array; + KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS median, + KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 3) AS terciles, + KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 4) AS quartiles, + KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 6) AS sextiles, +FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch FROM Data); -/*------------------------* - | new_array | - +------------------------+ - | [{1, 2, 3}, {4, 5, 6}] | - *------------------------*/ +/*------------+---------------+------------------+------------------------* + | median | terciles | quartiles | sextiles | + +------------+---------------+------------------+------------------------+ + | [0,50,100] | [0,34,67,100] | [0,25,50,75,100] | [0,17,34,50,67,84,100] | + *------------+---------------+------------------+------------------------*/ ``` -Similarly, to construct an `ARRAY` from a subquery that contains -one or more `ARRAY`s, change the subquery to use `SELECT AS STRUCT`. +### `KLL_QUANTILES.EXTRACT_UINT64` ```sql -SELECT ARRAY - (SELECT AS STRUCT [1, 2, 3] UNION ALL - SELECT AS STRUCT [4, 5, 6]) AS new_array; - -/*----------------------------* - | new_array | - +----------------------------+ - | [{[1, 2, 3]}, {[4, 5, 6]}] | - *----------------------------*/ +KLL_QUANTILES.EXTRACT_UINT64(sketch, number_of_segments) ``` -[subqueries]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#subqueries +**Description** -[datamodel-sql-tables]: https://github.com/google/zetasql/blob/master/docs/data-model.md#standard_sql_tables +Like [`KLL_QUANTILES.EXTRACT_INT64`](#kll-quantilesextract-int64), +but accepts KLL sketches initialized on data of type `UINT64`. -[datamodel-value-tables]: https://github.com/google/zetasql/blob/master/docs/data-model.md#value_tables +**Return Type** -[array-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#array_type +`ARRAY` -### `ARRAY_AVG` +### `KLL_QUANTILES.EXTRACT_DOUBLE` ```sql -ARRAY_AVG(input_array) +KLL_QUANTILES.EXTRACT_DOUBLE(sketch, number_of_segments) ``` **Description** -Returns the average of non-`NULL` values in an array. +Like [`KLL_QUANTILES.EXTRACT_INT64`](#kll-quantilesextract-int64), +but accepts KLL sketches initialized on data of type +`DOUBLE`. -Caveats: +**Return Type** -+ If the array is `NULL`, empty, or contains only `NULL`s, returns - `NULL`. -+ If the array contains `NaN`, returns `NaN`. -+ If the array contains `[+|-]Infinity`, returns either `[+|-]Infinity` - or `NaN`. -+ If there is numeric overflow, produces an error. -+ If a [floating-point type][floating-point-types] is returned, the result is - [non-deterministic][non-deterministic], which means you might receive a - different result each time you use this function. +`ARRAY` -[floating-point-types]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating_point_types +### `KLL_QUANTILES.EXTRACT_POINT_INT64` -[non-deterministic]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating-point-semantics +```sql +KLL_QUANTILES.EXTRACT_POINT_INT64(sketch, phi) +``` -**Supported Argument Types** +**Description** -In the input array, `ARRAY`, `T` can represent one of the following -data types: +Takes a single KLL sketch as `BYTES` and returns a single quantile. +The `phi` argument specifies the quantile to return as a fraction of the total +number of rows in the input, normalized between 0 and 1. This means that the +function will return a value *v* such that approximately Φ * *n* inputs are less +than or equal to *v*, and a (1-Φ) * *n* inputs are greater than or equal to *v*. +This is a scalar function. -+ Any numeric input type -+ `INTERVAL` +Returns an error if the underlying type of the input sketch is not compatible +with type `INT64`. -**Return type** +Returns an error if the input is not a valid KLL quantiles sketch. -The return type depends upon `T` in the input array: +**Supported Argument Types** - ++ `sketch`: `BYTES` KLL sketch initialized on `INT64` data type ++ `phi`: `DOUBLE` between 0 and 1 - - - - - - - - +**Return Type** -
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEINTERVAL
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLEINTERVAL
+`INT64` -**Examples** +**Example** + +The following query initializes a KLL sketch from five rows of data. Then +it returns the value of the eighth decile or 80th percentile of the sketch. ```sql -SELECT ARRAY_AVG([0, 2, NULL, 4, 4, 5]) as avg +SELECT KLL_QUANTILES.EXTRACT_POINT_INT64(kll_sketch, .8) AS quintile +FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch + FROM (SELECT 1 AS x UNION ALL + SELECT 2 AS x UNION ALL + SELECT 3 AS x UNION ALL + SELECT 4 AS x UNION ALL + SELECT 5 AS x)); -/*-----* - | avg | - +-----+ - | 3 | - *-----*/ +/*----------* + | quintile | + +----------+ + | 4 | + *----------*/ ``` -### `ARRAY_CONCAT` +### `KLL_QUANTILES.EXTRACT_POINT_UINT64` ```sql -ARRAY_CONCAT(array_expression[, ...]) +KLL_QUANTILES.EXTRACT_POINT_UINT64(sketch, phi) ``` **Description** -Concatenates one or more arrays with the same element type into a single array. - -The function returns `NULL` if any input argument is `NULL`. - -Note: You can also use the [|| concatenation operator][array-link-to-operators] -to concatenate arrays. - -**Return type** - -`ARRAY` +Like [`KLL_QUANTILES.EXTRACT_POINT_INT64`](#kll-quantilesextract-point-int64), +but accepts KLL sketches initialized on data of type `UINT64`. -**Examples** +**Supported Argument Types** -```sql -SELECT ARRAY_CONCAT([1, 2], [3, 4], [5, 6]) as count_to_six; ++ `sketch`: `BYTES` KLL sketch initialized on `UINT64` data type ++ `phi`: `DOUBLE` between 0 and 1 -/*--------------------------------------------------* - | count_to_six | - +--------------------------------------------------+ - | [1, 2, 3, 4, 5, 6] | - *--------------------------------------------------*/ -``` +**Return Type** -[array-link-to-operators]: #operators +`UINT64` -### `ARRAY_FILTER` +### `KLL_QUANTILES.EXTRACT_POINT_DOUBLE` ```sql -ARRAY_FILTER(array_expression, lambda_expression) - -lambda_expression: - { - element_alias -> boolean_expression - | (element_alias, index_alias) -> boolean_expression - } +KLL_QUANTILES.EXTRACT_POINT_DOUBLE(sketch, phi) ``` **Description** -Takes an array, filters out unwanted elements, and returns the results in a new -array. +Like [`KLL_QUANTILES.EXTRACT_POINT_INT64`](#kll-quantilesextract-point-int64), +but accepts KLL sketches initialized on data of type +`DOUBLE`. -+ `array_expression`: The array to filter. -+ `lambda_expression`: Each element in `array_expression` is evaluated against - the [lambda expression][lambda-definition]. If the expression evaluates to - `FALSE` or `NULL`, the element is removed from the resulting array. -+ `element_alias`: An alias that represents an array element. -+ `index_alias`: An alias that represents the zero-based offset of the array - element. -+ `boolean_expression`: The predicate used to filter the array elements. +**Supported Argument Types** -Returns `NULL` if the `array_expression` is `NULL`. ++ `sketch`: `BYTES` KLL sketch initialized on + `DOUBLE` data type ++ `phi`: `DOUBLE` between 0 and 1 -**Return type** +**Return Type** -ARRAY +`DOUBLE` -**Example** +### `KLL_QUANTILES.INIT_INT64` ```sql -SELECT - ARRAY_FILTER([1 ,2, 3], e -> e > 1) AS a1, - ARRAY_FILTER([0, 2, 3], (e, i) -> e > i) AS a2; - -/*-------+-------* - | a1 | a2 | - +-------+-------+ - | [2,3] | [2,3] | - *-------+-------*/ +KLL_QUANTILES.INIT_INT64(input[, precision[, weight => input_weight]]) ``` -[lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas - -### `ARRAY_FIRST` +**Description** -```sql -ARRAY_FIRST(array_expression) -``` +Takes one or more `input` values and aggregates them into a +[KLL][kll-sketches] sketch. This function represents the output sketch +using the `BYTES` data type. This is an +aggregate function. -**Description** +The `precision` argument defines the exactness of the returned approximate +quantile *q*. By default, the rank of the approximate quantile in the input can +be at most ±1/1000 * *n* off from ⌈Φ * *n*⌉, where *n* is the number of rows in +the input and ⌈Φ * *n*⌉ is the rank of the exact quantile. If you provide a +value for `precision`, the rank of the approximate quantile in the input can be +at most ±1/`precision` * *n* off from the rank of the exact quantile. The error +is within this error bound in 99.999% of cases. This error guarantee only +applies to the difference between exact and approximate ranks: the numerical +difference between the exact and approximated value for a quantile can be +arbitrarily large. -Takes an array and returns the first element in the array. +By default, values in an initialized KLL sketch are weighted equally as `1`. +If you would you like to weight values differently, use the +mandatory-named argument, `weight`, which assigns weight to each input in the +resulting KLL sketch. `weight` is a multiplier. For example, if you assign a +weight of `3` to an input value, it's as if three instances of the input value +are included in the generation of the KLL sketch. The maximum value for +`weight` is `2,147,483,647`. -Produces an error if the array is empty. +**Supported Argument Types** -Returns `NULL` if `array_expression` is `NULL`. ++ `input`: `INT64` ++ `precision`: `INT64` ++ `input_weight`: `INT64` -Note: To get the last element in an array, see [`ARRAY_LAST`][array-last]. +**Return Type** -**Return type** +KLL sketch as `BYTES` -Matches the data type of elements in `array_expression`. +**Examples** -**Example** +The following query takes a column of type `INT64` and outputs a sketch as +`BYTES` that allows you to retrieve values whose ranks are within +±1/1000 * 5 = ±1/200 ≈ 0 ranks of their exact quantile. ```sql -SELECT ARRAY_FIRST(['a','b','c','d']) as first_element +SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch +FROM (SELECT 1 AS x UNION ALL + SELECT 2 AS x UNION ALL + SELECT 3 AS x UNION ALL + SELECT 4 AS x UNION ALL + SELECT 5 AS x); -/*---------------* - | first_element | - +---------------+ - | a | - *---------------*/ +/*----------------------------------------------------------------------* + | kll_sketch | + +----------------------------------------------------------------------+ + | "\010q\020\005 \004\212\007\025\010\200 | + | \020\350\007\032\001\001\"\001\005*\007\n\005\001\002\003\004\005" | + *----------------------------------------------------------------------*/ ``` -[array-last]: #array_last - -### `ARRAY_INCLUDES` +The following examples illustrate how weight works when you initialize a +KLL sketch. The results are converted to quantiles. -+ [Signature 1](#array_includes_signature1): - `ARRAY_INCLUDES(array_to_search, search_value)` -+ [Signature 2](#array_includes_signature2): - `ARRAY_INCLUDES(array_to_search, lambda_expression)` +```sql +WITH points AS ( + SELECT 1 AS x, 1 AS y UNION ALL + SELECT 2 AS x, 1 AS y UNION ALL + SELECT 3 AS x, 1 AS y UNION ALL + SELECT 4 AS x, 1 AS y UNION ALL + SELECT 5 AS x, 1 AS y) +SELECT KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS median +FROM + ( + SELECT KLL_QUANTILES.INIT_INT64(x, 1000, weight=>y) AS kll_sketch + FROM points + ); -#### Signature 1 - +/*---------* + | median | + +---------+ + | [1,3,5] | + *---------*/ +``` ```sql -ARRAY_INCLUDES(array_to_search, search_value) -``` +WITH points AS ( + SELECT 1 AS x, 1 AS y UNION ALL + SELECT 2 AS x, 3 AS y UNION ALL + SELECT 3 AS x, 1 AS y UNION ALL + SELECT 4 AS x, 1 AS y UNION ALL + SELECT 5 AS x, 1 AS y) +SELECT KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS median +FROM + ( + SELECT KLL_QUANTILES.INIT_INT64(x, 1000, weight=>y) AS kll_sketch + FROM points + ); -**Description** - -Takes an array and returns `TRUE` if there is an element in the array that is -equal to the search_value. +/*---------* + | median | + +---------+ + | [1,2,5] | + *---------*/ +``` -+ `array_to_search`: The array to search. -+ `search_value`: The element to search for in the array. +### `KLL_QUANTILES.INIT_UINT64` -Returns `NULL` if `array_to_search` or `search_value` is `NULL`. +```sql +KLL_QUANTILES.INIT_UINT64(input[, precision[, weight => input_weight]]) +``` -**Return type** +**Description** -`BOOL` +Like [`KLL_QUANTILES.INIT_INT64`](#kll-quantilesinit-int64), +but accepts `input` of type `UINT64`. -**Example** +**Supported Argument Types** -In the following example, the query first checks to see if `0` exists in an -array. Then the query checks to see if `1` exists in an array. ++ `input`: `UINT64` ++ `precision`: `INT64` ++ `input_weight`: `INT64` -```sql -SELECT - ARRAY_INCLUDES([1, 2, 3], 0) AS a1, - ARRAY_INCLUDES([1, 2, 3], 1) AS a2; +**Return Type** -/*-------+------* - | a1 | a2 | - +-------+------+ - | false | true | - *-------+------*/ -``` +KLL sketch as `BYTES` -#### Signature 2 - +### `KLL_QUANTILES.INIT_DOUBLE` ```sql -ARRAY_INCLUDES(array_to_search, lambda_expression) - -lambda_expression: element_alias -> boolean_expression +KLL_QUANTILES.INIT_DOUBLE(input[, precision[, weight => input_weight]]) ``` **Description** -Takes an array and returns `TRUE` if the lambda expression evaluates to `TRUE` -for any element in the array. - -+ `array_to_search`: The array to search. -+ `lambda_expression`: Each element in `array_to_search` is evaluated against - the [lambda expression][lambda-definition]. -+ `element_alias`: An alias that represents an array element. -+ `boolean_expression`: The predicate used to evaluate the array elements. - -Returns `NULL` if `array_to_search` is `NULL`. +Like [`KLL_QUANTILES.INIT_INT64`](#kll-quantilesinit-int64), +but accepts `input` of type `DOUBLE`. -**Return type** +`KLL_QUANTILES.INIT_DOUBLE` orders values according to the ZetaSQL +[floating point sort order][sort-order]. For example, `NaN` orders before +‑inf. -`BOOL` +**Supported Argument Types** -**Example** ++ `input`: `DOUBLE` ++ `precision`: `INT64` ++ `input_weight`: `INT64` -In the following example, the query first checks to see if any elements that are -greater than 3 exist in an array (`e > 3`). Then the query checks to see if any -any elements that are greater than 0 exist in an array (`e > 0`). +**Return Type** -```sql -SELECT - ARRAY_INCLUDES([1, 2, 3], e -> e > 3) AS a1, - ARRAY_INCLUDES([1, 2, 3], e -> e > 0) AS a2; +KLL sketch as `BYTES` -/*-------+------* - | a1 | a2 | - +-------+------+ - | false | true | - *-------+------*/ -``` +[kll-sketches]: https://github.com/google/zetasql/blob/master/docs/sketches.md#sketches_kll -[lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas +[sort-order]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparison_operator_examples -### `ARRAY_INCLUDES_ALL` +### `KLL_QUANTILES.MERGE_INT64` ```sql -ARRAY_INCLUDES_ALL(array_to_search, search_values) +KLL_QUANTILES.MERGE_INT64(sketch, number) ``` **Description** -Takes an array to search and an array of search values. Returns `TRUE` if all -search values are in the array to search, otherwise returns `FALSE`. +Takes KLL sketches as `BYTES` and merges them into +a new sketch, then returns the quantiles that divide the input into +`number` equal-sized groups, along with the minimum and maximum values of the +input. The output is an `ARRAY` containing the exact minimum value from +the input data that you used to initialize the sketches, each +approximate quantile, and the exact maximum value from the initial input data. +This is an aggregate function. -+ `array_to_search`: The array to search. -+ `search_values`: The array that contains the elements to search for. +If the merged sketches were initialized with different precisions, the precision +is downgraded to the lowest precision involved in the merge — except if the +aggregations are small enough to still capture the input exactly — then the +mergee's precision is maintained. -Returns `NULL` if `array_to_search` or `search_values` is -`NULL`. +Returns an error if the underlying type of one or more input sketches is not +compatible with type `INT64`. -**Return type** +Returns an error if the input is not a valid KLL quantiles sketch. -`BOOL` +**Supported Argument Types** + ++ `sketch`: `BYTES` KLL sketch initialized on `INT64` data type ++ `number`: `INT64` + +**Return Type** + +`ARRAY` **Example** -In the following example, the query first checks to see if `3`, `4`, and `5` -exists in an array. Then the query checks to see if `4`, `5`, and `6` exists in -an array. +The following query initializes two KLL sketches from five rows of data each. +Then it merges these two sketches and returns an `ARRAY` containing the minimum, +median, and maximum values in the input sketches. ```sql -SELECT - ARRAY_INCLUDES_ALL([1,2,3,4,5], [3,4,5]) AS a1, - ARRAY_INCLUDES_ALL([1,2,3,4,5], [4,5,6]) AS a2; +SELECT KLL_QUANTILES.MERGE_INT64(kll_sketch, 2) AS merged_sketch +FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch + FROM (SELECT 1 AS x UNION ALL + SELECT 2 AS x UNION ALL + SELECT 3 AS x UNION ALL + SELECT 4 AS x UNION ALL + SELECT 5) + UNION ALL + SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch + FROM (SELECT 6 AS x UNION ALL + SELECT 7 AS x UNION ALL + SELECT 8 AS x UNION ALL + SELECT 9 AS x UNION ALL + SELECT 10 AS x)); -/*------+-------* - | a1 | a2 | - +------+-------+ - | true | false | - *------+-------*/ +/*---------------* + | merged_sketch | + +---------------+ + | [1,5,10] | + *---------------*/ ``` -### `ARRAY_INCLUDES_ANY` +### `KLL_QUANTILES.MERGE_UINT64` ```sql -ARRAY_INCLUDES_ANY(array_to_search, search_values) +KLL_QUANTILES.MERGE_UINT64(sketch, number) ``` **Description** -Takes an array to search and an array of search values. Returns `TRUE` if any -search values are in the array to search, otherwise returns `FALSE`. - -+ `array_to_search`: The array to search. -+ `search_values`: The array that contains the elements to search for. - -Returns `NULL` if `array_to_search` or `search_values` is -`NULL`. - -**Return type** - -`BOOL` +Like [`KLL_QUANTILES.MERGE_INT64`](#kll-quantilesmerge-int64), +but accepts KLL sketches initialized on data of type `UINT64`. -**Example** +**Supported Argument Types** -In the following example, the query first checks to see if `3`, `4`, or `5` -exists in an array. Then the query checks to see if `4`, `5`, or `6` exists in -an array. ++ `sketch`: `BYTES` KLL sketch initialized on `UINT64` data type ++ `number`: `INT64` -```sql -SELECT - ARRAY_INCLUDES_ANY([1,2,3], [3,4,5]) AS a1, - ARRAY_INCLUDES_ANY([1,2,3], [4,5,6]) AS a2; +**Return Type** -/*------+-------* - | a1 | a2 | - +------+-------+ - | true | false | - *------+-------*/ -``` +`ARRAY` -### `ARRAY_IS_DISTINCT` +### `KLL_QUANTILES.MERGE_DOUBLE` ```sql -ARRAY_IS_DISTINCT(value) +KLL_QUANTILES.MERGE_DOUBLE(sketch, number) ``` **Description** -Returns `TRUE` if the array contains no repeated elements, using the same -equality comparison logic as `SELECT DISTINCT`. +Like [`KLL_QUANTILES.MERGE_INT64`](#kll-quantilesmerge-int64), +but accepts KLL sketches initialized on data of type +`DOUBLE`. -**Return type** +`KLL_QUANTILES.MERGE_DOUBLE` orders values according to the ZetaSQL +[floating point sort order][sort-order]. For example, `NaN` orders before +‑inf. -`BOOL` +**Supported Argument Types** -**Examples** ++ `sketch`: `BYTES` KLL sketch initialized on + `DOUBLE` data type ++ `number`: `INT64` -```sql -WITH example AS ( - SELECT [1, 2, 3] AS arr UNION ALL - SELECT [1, 1, 1] AS arr UNION ALL - SELECT [1, 2, NULL] AS arr UNION ALL - SELECT [1, 1, NULL] AS arr UNION ALL - SELECT [1, NULL, NULL] AS arr UNION ALL - SELECT [] AS arr UNION ALL - SELECT CAST(NULL AS ARRAY) AS arr -) -SELECT - arr, - ARRAY_IS_DISTINCT(arr) as is_distinct -FROM example; +**Return Type** -/*-----------------+-------------* - | arr | is_distinct | - +-----------------+-------------+ - | [1, 2, 3] | TRUE | - | [1, 1, 1] | FALSE | - | [1, 2, NULL] | TRUE | - | [1, 1, NULL] | FALSE | - | [1, NULL, NULL] | FALSE | - | [] | TRUE | - | NULL | NULL | - *-----------------+-------------*/ -``` +`ARRAY` -### `ARRAY_LAST` +[sort-order]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparison_operator_examples + +### `KLL_QUANTILES.MERGE_PARTIAL` ```sql -ARRAY_LAST(array_expression) +KLL_QUANTILES.MERGE_PARTIAL(sketch) ``` **Description** -Takes an array and returns the last element in the array. - -Produces an error if the array is empty. - -Returns `NULL` if `array_expression` is `NULL`. - -Note: To get the first element in an array, see [`ARRAY_FIRST`][array-first]. - -**Return type** - -Matches the data type of elements in `array_expression`. +Takes KLL sketches of the same underlying type and merges them to return a new +sketch of the same underlying type. This is an aggregate function. -**Example** +If the merged sketches were initialized with different precisions, the precision +is downgraded to the lowest precision involved in the merge — except if the +aggregations are small enough to still capture the input exactly — then the +mergee's precision is maintained. -```sql -SELECT ARRAY_LAST(['a','b','c','d']) as last_element +Returns an error if two or more sketches don't have compatible underlying types, +such as one sketch of `INT64` values and another of +`DOUBLE` values. -/*---------------* - | last_element | - +---------------+ - | d | - *---------------*/ -``` +Returns an error if one or more inputs are not a valid KLL quantiles sketch. -[array-first]: #array_first +Ignores `NULL` sketches. If the input contains zero rows or only `NULL` +sketches, the function returns `NULL`. -### `ARRAY_LENGTH` +You can initialize sketches with different optional clauses and merge them. For +example, you can initialize a sketch with the `DISTINCT` clause and another +sketch without any optional clauses, and then merge these two sketches. +However, if you initialize sketches with the `DISTINCT` clause and merge them, +the resulting sketch may still contain duplicates. -```sql -ARRAY_LENGTH(array_expression) -``` +**Supported Argument Types** -**Description** ++ `sketch`: `BYTES` KLL sketch -Returns the size of the array. Returns 0 for an empty array. Returns `NULL` if -the `array_expression` is `NULL`. +**Return Type** -**Return type** +KLL sketch as `BYTES` -`INT64` +**Example** -**Examples** +The following query initializes two KLL sketches from five rows of data each. +Then it merges these two sketches into a new sketch, also as `BYTES`. Both +input sketches have the same underlying data type and precision. ```sql -WITH items AS - (SELECT ["coffee", NULL, "milk" ] as list - UNION ALL - SELECT ["cake", "pie"] as list) -SELECT ARRAY_TO_STRING(list, ', ', 'NULL'), ARRAY_LENGTH(list) AS size -FROM items -ORDER BY size DESC; +SELECT KLL_QUANTILES.MERGE_PARTIAL(kll_sketch) AS merged_sketch +FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch + FROM (SELECT 1 AS x UNION ALL + SELECT 2 AS x UNION ALL + SELECT 3 AS x UNION ALL + SELECT 4 AS x UNION ALL + SELECT 5) + UNION ALL + SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch + FROM (SELECT 6 AS x UNION ALL + SELECT 7 AS x UNION ALL + SELECT 8 AS x UNION ALL + SELECT 9 AS x UNION ALL + SELECT 10 AS x)); -/*--------------------+------* - | list | size | - +--------------------+------+ - | coffee, NULL, milk | 3 | - | cake, pie | 2 | - *--------------------+------*/ +/*-----------------------------------------------------------------------------* + | merged_sketch | + +-----------------------------------------------------------------------------+ + | "\010q\020\n \004\212\007\032\010\200 \020\350\007\032\001\001\"\001\n* | + | \014\n\n\001\002\003\004\005\006\007\010\t\n" | + *-----------------------------------------------------------------------------*/ ``` -### `ARRAY_MAX` +### `KLL_QUANTILES.MERGE_POINT_INT64` ```sql -ARRAY_MAX(input_array) +KLL_QUANTILES.MERGE_POINT_INT64(sketch, phi) ``` **Description** -Returns the maximum non-`NULL` value in an array. +Takes KLL sketches as `BYTES` and merges them, then extracts a single +quantile from the merged sketch. The `phi` argument specifies the quantile +to return as a fraction of the total number of rows in the input, normalized +between 0 and 1. This means that the function will return a value *v* such that +approximately Φ * *n* inputs are less than or equal to *v*, and a (1-Φ) / *n* +inputs are greater than or equal to *v*. This is an aggregate function. -Caveats: +If the merged sketches were initialized with different precisions, the precision +is downgraded to the lowest precision involved in the merge — except if the +aggregations are small enough to still capture the input exactly — then the +mergee's precision is maintained. -+ If the array is `NULL`, empty, or contains only `NULL`s, returns - `NULL`. -+ If the array contains `NaN`, returns `NaN`. +Returns an error if the underlying type of one or more input sketches is not +compatible with type `INT64`. + +Returns an error if the input is not a valid KLL quantiles sketch. **Supported Argument Types** -In the input array, `ARRAY`, `T` can be an -[orderable data type][data-type-properties]. ++ `sketch`: `BYTES` KLL sketch initialized on `INT64` data type ++ `phi`: `DOUBLE` between 0 and 1 -**Return type** +**Return Type** -The same data type as `T` in the input array. +`INT64` -**Examples** +**Example** + +The following query initializes two KLL sketches from five rows of data each. +Then it merges these two sketches and returns the value of the ninth decile or +90th percentile of the merged sketch. ```sql -SELECT ARRAY_MAX([8, 37, NULL, 55, 4]) as max +SELECT KLL_QUANTILES.MERGE_POINT_INT64(kll_sketch, .9) AS merged_sketch +FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch + FROM (SELECT 1 AS x UNION ALL + SELECT 2 AS x UNION ALL + SELECT 3 AS x UNION ALL + SELECT 4 AS x UNION ALL + SELECT 5) + UNION ALL + SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch + FROM (SELECT 6 AS x UNION ALL + SELECT 7 AS x UNION ALL + SELECT 8 AS x UNION ALL + SELECT 9 AS x UNION ALL + SELECT 10 AS x)); -/*-----* - | max | - +-----+ - | 55 | - *-----*/ +/*---------------* + | merged_sketch | + +---------------+ + | 9 | + *---------------*/ ``` -[data-type-properties]: https://github.com/google/zetasql/blob/master/docs/data-types.md#data_type_properties - -### `ARRAY_MIN` +### `KLL_QUANTILES.MERGE_POINT_UINT64` ```sql -ARRAY_MIN(input_array) +KLL_QUANTILES.MERGE_POINT_UINT64(sketch, phi) ``` **Description** -Returns the minimum non-`NULL` value in an array. - -Caveats: - -+ If the array is `NULL`, empty, or contains only `NULL`s, returns - `NULL`. -+ If the array contains `NaN`, returns `NaN`. +Like [`KLL_QUANTILES.MERGE_POINT_INT64`](#kll-quantilesmerge-point-int64), +but accepts KLL sketches initialized on data of type `UINT64`. **Supported Argument Types** -In the input array, `ARRAY`, `T` can be an -[orderable data type][data-type-properties]. - -**Return type** - -The same data type as `T` in the input array. - -**Examples** - -```sql -SELECT ARRAY_MIN([8, 37, NULL, 4, 55]) as min ++ `sketch`: `BYTES` KLL sketch initialized on `UINT64` data type ++ `phi`: `DOUBLE` between 0 and 1 -/*-----* - | min | - +-----+ - | 4 | - *-----*/ -``` +**Return Type** -[data-type-properties]: https://github.com/google/zetasql/blob/master/docs/data-types.md#data_type_properties +`UINT64` -### `ARRAY_REVERSE` +### `KLL_QUANTILES.MERGE_POINT_DOUBLE` ```sql -ARRAY_REVERSE(value) +KLL_QUANTILES.MERGE_POINT_DOUBLE(sketch, phi) ``` **Description** -Returns the input `ARRAY` with elements in reverse order. +Like [`KLL_QUANTILES.MERGE_POINT_INT64`](#kll-quantilesmerge-point-int64), +but accepts KLL sketches initialized on data of type +`DOUBLE`. -**Return type** +`KLL_QUANTILES.MERGE_POINT_DOUBLE` orders values according to the +ZetaSQL [floating point sort order][sort-order]. For example, `NaN` +orders before ‑inf. -`ARRAY` +**Supported Argument Types** -**Examples** ++ `sketch`: `BYTES` KLL sketch initialized on + `DOUBLE` data type ++ `phi`: `DOUBLE` between 0 and 1 -```sql -WITH example AS ( - SELECT [1, 2, 3] AS arr UNION ALL - SELECT [4, 5] AS arr UNION ALL - SELECT [] AS arr -) -SELECT - arr, - ARRAY_REVERSE(arr) AS reverse_arr -FROM example; +**Return Type** -/*-----------+-------------* - | arr | reverse_arr | - +-----------+-------------+ - | [1, 2, 3] | [3, 2, 1] | - | [4, 5] | [5, 4] | - | [] | [] | - *-----------+-------------*/ -``` +`DOUBLE` -### `ARRAY_SLICE` +[sort-order]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparison_operator_examples -```sql -ARRAY_SLICE(array_to_slice, start_offset, end_offset) -``` +[kll-sketches]: https://github.com/google/zetasql/blob/master/docs/sketches.md#sketches_kll -**Description** +[approx-functions-reference]: #approximate_aggregate_functions -Returns an array containing zero or more consecutive elements from the -input array. +## Mathematical functions -+ `array_to_slice`: The array that contains the elements you want to slice. -+ `start_offset`: The inclusive starting offset. -+ `end_offset`: The inclusive ending offset. +ZetaSQL supports mathematical functions. +All mathematical functions have the following behaviors: -An offset can be positive or negative. A positive offset starts from the -beginning of the input array and is 0-based. A negative offset starts from -the end of the input array. Out-of-bounds offsets are supported. Here are some -examples: ++ They return `NULL` if any of the input parameters is `NULL`. ++ They return `NaN` if any of the arguments is `NaN`. - +### Categories + +
- - - + + - - - + + + + + - - - + + - - + - - + + - - + + - - + + - - + + + + + + + + + + + + + + + + + +
Input offsetFinal offset in arrayNotesCategoryFunctions
0['a', 'b', 'c', 'd']The final offset is 0.Trigonometric + ACOS   + ACOSH   + ASIN   + ASINH   + ATAN   + ATAN2   + ATANH   + COS   + COSH   + COT   + COTH   + CSC   + CSCH   + SEC   + SECH   + SIN   + SINH   + TAN   + TANH   +
+ Exponential and
+ logarithmic +
+ EXP   + LN   + LOG   + LOG10   +
3['a', 'b', 'c', 'd']The final offset is 3. + Rounding and
+ truncation +
+ CEIL   + CEILING   + FLOOR   + ROUND   + TRUNC   +
5['a', 'b', 'c', 'd'] - Because the input offset is out of bounds, - the final offset is 3 (array length - 1). + Power and
+ root +
+ CBRT   + POW   + POWER   + SQRT  
-1['a', 'b', 'c', 'd']Sign - Because a negative offset is used, the offset starts at the end of the - array. The final offset is 3 - (array length - 1). + ABS   + SIGN  
-2['a', 'b', 'c', 'd'] - Because a negative offset is used, the offset starts at the end of the - array. The final offset is 2 - (array length - 2). + Distance + + COSINE_DISTANCE   + EUCLIDEAN_DISTANCE  
-4['a', 'b', 'c', 'd'] - Because a negative offset is used, the offset starts at the end of the - array. The final offset is 0 - (array length - 4). + Comparison + + GREATEST   + LEAST  
-5['a', 'b', 'c', 'd']Random number generator - Because the offset is negative and out of bounds, the final offset is - 0 (array length - array length). + RAND   +
Arithmetic and error handling + DIV   + IEEE_DIVIDE   + IS_INF   + IS_NAN   + MOD   + SAFE_ADD   + SAFE_DIVIDE   + SAFE_MULTIPLY   + SAFE_NEGATE   + SAFE_SUBTRACT   +
Bucket + RANGE_BUCKET   +
Numerical constants + PI   + PI_BIGNUMERIC   + PI_NUMERIC  
-Additional details: - -+ The input array can contain `NULL` elements. `NULL` elements are included - in the resulting array. -+ Returns `NULL` if `array_to_slice`, `start_offset`, or `end_offset` is - `NULL`. -+ Returns an empty array if `array_to_slice` is empty. -+ Returns an empty array if the position of the `start_offset` in the array is - after the position of the `end_offset`. +### Function list -**Return type** + + + + + + + + -`ARRAY` + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, 3) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -1, 3) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, -3) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -1, -3) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -3, -1) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 3, 3) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -3, -3) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, 30) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, -30) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -30, 30) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], -30, -5) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 5, 30) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', 'c', 'd', 'e'], 1, NULL) AS result + + + + -```sql -SELECT ARRAY_SLICE(['a', 'b', NULL, 'd', 'e'], 1, 3) AS result + + + + -### `ARRAY_SUM` + + + + -**Description** + + + + -Caveats: + + + + -[floating-point-types]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating_point_types + + + + -**Supported Argument Types** + + + + -+ Any supported numeric data type -+ `INTERVAL` + + + + -The return type depends upon `T` in the input array: + + + + + -```sql -SELECT ARRAY_SUM([1, 2, 3, 4, 5, NULL, 4, 3, 2, 1]) as sum + + + + -### `ARRAY_TO_STRING` + + + + -**Description** + + + + -If the `null_text` parameter is used, the function replaces any `NULL` values in -the array with the value of `null_text`. + + + + -**Return type** + + + + -**Examples** + + + + -SELECT ARRAY_TO_STRING(list, '--') AS text -FROM items; + + + + -```sql -WITH items AS - (SELECT ['coffee', 'tea', 'milk' ] as list - UNION ALL - SELECT ['cake', 'pie', NULL] as list) + + + + -/*--------------------------------* - | text | - +--------------------------------+ - | coffee--tea--milk | - | cake--pie--MISSING | - *--------------------------------*/ -``` + + + + -```sql -ARRAY_TRANSFORM(array_expression, lambda_expression) + + + + -**Return type** + + + + -**Example** + + + + -/*---------+---------* - | a1 | a2 | - +---------+---------+ - | [2,3,4] | [1,3,5] | - *---------+---------*/ -``` + + + + -### `FLATTEN` + + + + -**Description** + + + + -There are several ways to flatten nested data into arrays. To learn more, see -[Flattening nested data into an array][flatten-tree-to-array]. + + + + -`ARRAY` + + + + -In the following example, all of the arrays for `v.sales.quantity` are -concatenated in a flattened array. + + + + -/*--------------------------* - | all_values | - +--------------------------+ - | [1, 2, 3, 4, 5, 6, 7, 8] | - *--------------------------*/ -``` + + + + -```sql -WITH t AS ( - SELECT - [ - STRUCT([STRUCT([1,2,3] AS quantity), STRUCT([4,5,6] AS quantity)] AS sales), - STRUCT([STRUCT([7,8,9] AS quantity), STRUCT([10,11,12] AS quantity)] AS sales) - ] AS v -) -SELECT FLATTEN(v.sales.quantity[OFFSET(1)]) AS second_values -FROM t; + + + + -In the following example, all values for `v.price` are returned in a -flattened array. + + + + -/*------------* - | all_prices | - +------------+ - | [1, 10] | - *------------*/ -``` + + + + -[flatten-tree-to-array]: https://github.com/google/zetasql/blob/master/docs/arrays.md#flattening_nested_data_into_arrays + + + + -### `GENERATE_ARRAY` + + + + -**Description** + + + + -The `GENERATE_ARRAY` function accepts the following data types as inputs: + + + + -The `step_expression` parameter determines the increment used to -generate array values. The default value for this parameter is `1`. + + + + -If any argument is `NULL`, the function will return a `NULL` array. + + + + -`ARRAY` + + + + -The following returns an array of integers, with a default step of 1. + +
NameSummary
ABS -**Examples** + + Computes the absolute value of X. +
ACOS -/*-----------* - | result | - +-----------+ - | [b, c, d] | - *-----------*/ -``` + + Computes the inverse cosine of X. +
ACOSH -/*-----------* - | result | - +-----------+ - | [] | - *-----------*/ -``` + + Computes the inverse hyperbolic cosine of X. +
ASIN -/*--------* - | result | - +--------+ - | [b, c] | - *--------*/ -``` + + Computes the inverse sine of X. +
ASINH -/*-----------* - | result | - +-----------+ - | [] | - *-----------*/ -``` + + Computes the inverse hyperbolic sine of X. +
ATAN -/*-----------* - | result | - +-----------+ - | [c, d, e] | - *-----------*/ -``` + + Computes the inverse tangent of X. +
ATAN2 -/*--------* - | result | - +--------+ - | [d] | - *--------*/ -``` + + Computes the inverse tangent of X/Y, using the signs of + X and Y to determine the quadrant. +
ATANH -/*--------* - | result | - +--------+ - | [c] | - *--------*/ -``` + + Computes the inverse hyperbolic tangent of X. +
CBRT -/*--------------* - | result | - +--------------+ - | [b, c, d, e] | - *--------------*/ -``` + + Computes the cube root of X. +
CEIL -/*-----------* - | result | - +-----------+ - | [] | - *-----------*/ -``` + + Gets the smallest integral value that is not less than X. +
CEILING -/*-----------------* - | result | - +-----------------+ - | [a, b, c, d, e] | - *-----------------*/ -``` + + Synonym of CEIL. +
COS -/*--------* - | result | - +--------+ - | [a] | - *--------*/ -``` + + Computes the cosine of X. +
COSH -/*--------* - | result | - +--------+ - | [] | - *--------*/ -``` + + Computes the hyperbolic cosine of X. +
COSINE_DISTANCE -/*-----------* - | result | - +-----------+ - | NULL | - *-----------*/ -``` +Computes the cosine distance between two vectors.
COT -/*--------------* - | result | - +--------------+ - | [b, NULL, d] | - *--------------*/ -``` + + Computes the cotangent of X. +
COTH -```sql -ARRAY_SUM(input_array) -``` + + Computes the hyperbolic cotangent of X. +
CSC -Returns the sum of non-`NULL` values in an array. + + Computes the cosecant of X. +
CSCH -+ If the array is `NULL`, empty, or contains only `NULL`s, returns - `NULL`. -+ If the array contains `NaN`, returns `NaN`. -+ If the array contains `[+|-]Infinity`, returns either `[+|-]Infinity` - or `NaN`. -+ If there is numeric overflow, produces an error. -+ If a [floating-point type][floating-point-types] is returned, the result is - [non-deterministic][non-deterministic], which means you might receive a - different result each time you use this function. + + Computes the hyperbolic cosecant of X. +
DIV -[non-deterministic]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating-point-semantics + + Divides integer X by integer Y. +
EXP -In the input array, `ARRAY`, `T` can represent: + + Computes e to the power of X. +
EUCLIDEAN_DISTANCE -**Return type** +Computes the Euclidean distance between two vectors.
FLOOR - + + + - - + + - - - - -
+ Gets the largest integral value that is not greater than X. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLEINTERVALGREATEST + + + Gets the greatest value among X1,...,XN. +
OUTPUTINT64INT64UINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLEINTERVAL
+
IEEE_DIVIDE -**Examples** + + Divides X by Y, but does not generate errors for + division by zero or overflow. +
IS_INF -/*-----* - | sum | - +-----+ - | 25 | - *-----*/ -``` + + Checks if X is positive or negative infinity. +
IS_NAN -```sql -ARRAY_TO_STRING(array_expression, delimiter[, null_text]) -``` + + Checks if X is a NaN value. +
LEAST -Returns a concatenation of the elements in `array_expression` -as a `STRING`. The value for `array_expression` -can either be an array of `STRING` or -`BYTES` data types. + + Gets the least value among X1,...,XN. +
LN -If the `null_text` parameter is not used, the function omits the `NULL` value -and its preceding delimiter. + + Computes the natural logarithm of X. +
LOG -`STRING` + + Computes the natural logarithm of X or the logarithm of + X to base Y. +
LOG10 -```sql -WITH items AS - (SELECT ['coffee', 'tea', 'milk' ] as list - UNION ALL - SELECT ['cake', 'pie', NULL] as list) + + Computes the natural logarithm of X to base 10. +
MOD -/*--------------------------------* - | text | - +--------------------------------+ - | coffee--tea--milk | - | cake--pie | - *--------------------------------*/ -``` + + Gets the remainder of the division of X by Y. +
PI -SELECT ARRAY_TO_STRING(list, '--', 'MISSING') AS text -FROM items; + + Produces the mathematical constant π as a + DOUBLE value. +
PI_BIGNUMERIC -### `ARRAY_TRANSFORM` + + Produces the mathematical constant π as a BIGNUMERIC value. +
PI_NUMERIC -lambda_expression: - { - element_alias -> transform_expression - | (element_alias, index_alias) -> transform_expression - } -``` - -**Description** - -Takes an array, transforms the elements, and returns the results in a new array. -The output array always has the same length as the input array. - -+ `array_expression`: The array to transform. -+ `lambda_expression`: Each element in `array_expression` is evaluated against - the [lambda expression][lambda-definition]. The evaluation results are - returned in a new array. -+ `element_alias`: An alias that represents an array element. -+ `index_alias`: An alias that represents the zero-based offset of the array - element. -+ `transform_expression`: The expression used to transform the array elements. - -Returns `NULL` if the `array_expression` is `NULL`. + + Produces the mathematical constant π as a NUMERIC value. +
POW -`ARRAY` + + Produces the value of X raised to the power of Y. +
POWER -```sql -SELECT - ARRAY_TRANSFORM([1, 2, 3], e -> e + 1) AS a1, - ARRAY_TRANSFORM([1, 2, 3], (e, i) -> e + i) AS a2; + + Synonym of POW. +
RAND -[lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas + + Generates a pseudo-random value of type + DOUBLE in the range of + [0, 1). +
RANGE_BUCKET -```sql -FLATTEN(array_elements_field_access_expression) -``` + + Scans through a sorted array and returns the 0-based position + of a point's upper bound. +
ROUND -Takes a nested array and flattens a specific part of it into a single, flat -array with the -[array elements field access operator][array-el-field-operator]. -Returns `NULL` if the input value is `NULL`. -If `NULL` array elements are -encountered, they are added to the resulting array. + + Rounds X to the nearest integer or rounds X + to N decimal places after the decimal point. +
SAFE_ADD -**Return type** + + Equivalent to the addition operator (X + Y), but returns + NULL if overflow occurs. +
SAFE_DIVIDE -**Examples** + + Equivalent to the division operator (X / Y), but returns + NULL if an error occurs. +
SAFE_MULTIPLY -```sql -WITH t AS ( - SELECT - [ - STRUCT([STRUCT([1,2,3] AS quantity), STRUCT([4,5,6] AS quantity)] AS sales), - STRUCT([STRUCT([7,8] AS quantity), STRUCT([] AS quantity)] AS sales) - ] AS v -) -SELECT FLATTEN(v.sales.quantity) AS all_values -FROM t; + + Equivalent to the multiplication operator (X * Y), + but returns NULL if overflow occurs. +
SAFE_NEGATE -In the following example, `OFFSET` gets the second value in each array and -concatenates them. + + Equivalent to the unary minus operator (-X), but returns + NULL if overflow occurs. +
SAFE_SUBTRACT -/*---------------* - | second_values | - +---------------+ - | [2, 5, 8, 11] | - *---------------*/ -``` + + Equivalent to the subtraction operator (X - Y), but + returns NULL if overflow occurs. +
SEC -```sql -WITH t AS ( - SELECT - [ - STRUCT(1 AS price, 2 AS quantity), - STRUCT(10 AS price, 20 AS quantity) - ] AS v -) -SELECT FLATTEN(v.price) AS all_prices -FROM t; + + Computes the secant of X. +
SECH -For more examples, including how to use protocol buffers with `FLATTEN`, see the -[array elements field access operator][array-el-field-operator]. + + Computes the hyperbolic secant of X. +
SIGN -[array-el-field-operator]: #array_el_field_operator + + Produces -1 , 0, or +1 for negative, zero, and positive arguments + respectively. +
SIN -```sql -GENERATE_ARRAY(start_expression, end_expression[, step_expression]) -``` + + Computes the sine of X. +
SINH -Returns an array of values. The `start_expression` and `end_expression` -parameters determine the inclusive start and end of the array. + + Computes the hyperbolic sine of X. +
SQRT -+ `INT64` -+ `UINT64` -+ `NUMERIC` -+ `BIGNUMERIC` -+ `DOUBLE` + + Computes the square root of X. +
TAN -This function returns an error if `step_expression` is set to 0, or if any -input is `NaN`. + + Computes the tangent of X. +
TANH -**Return Data Type** + + Computes the hyperbolic tangent of X. +
TRUNC -**Examples** + + Rounds a number like ROUND(X) or ROUND(X, N), + but always rounds towards zero and never overflows. +
-```sql -SELECT GENERATE_ARRAY(1, 5) AS example_array; +### `ABS` -/*-----------------* - | example_array | - +-----------------+ - | [1, 2, 3, 4, 5] | - *-----------------*/ +``` +ABS(X) ``` -The following returns an array using a user-specified step size. +**Description** -```sql -SELECT GENERATE_ARRAY(0, 10, 3) AS example_array; +Computes absolute value. Returns an error if the argument is an integer and the +output value cannot be represented as the same type; this happens only for the +largest negative input value, which has no positive representation. -/*---------------* - | example_array | - +---------------+ - | [0, 3, 6, 9] | - *---------------*/ -``` + + + + + + + + + + + + + + + + + + + + + + + + + +
XABS(X)
2525
-2525
+inf+inf
-inf+inf
-The following returns an array using a negative value, `-3` for its step size. +**Return Data Type** -```sql -SELECT GENERATE_ARRAY(10, 0, -3) AS example_array; + -/*---------------* - | example_array | - +---------------+ - | [10, 7, 4, 1] | - *---------------*/ -``` + + + + + + + + -The following returns an array using the same value for the `start_expression` -and `end_expression`. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
-```sql -SELECT GENERATE_ARRAY(4, 4, 10) AS example_array; +### `ACOS` -/*---------------* - | example_array | - +---------------+ - | [4] | - *---------------*/ +``` +ACOS(X) ``` -The following returns an empty array, because the `start_expression` is greater -than the `end_expression`, and the `step_expression` value is positive. - -```sql -SELECT GENERATE_ARRAY(10, 0, 3) AS example_array; +**Description** -/*---------------* - | example_array | - +---------------+ - | [] | - *---------------*/ -``` +Computes the principal value of the inverse cosine of X. The return value is in +the range [0,π]. Generates an error if X is a value outside of the +range [-1, 1]. -The following returns a `NULL` array because `end_expression` is `NULL`. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XACOS(X)
+infNaN
-infNaN
NaNNaN
X < -1Error
X > 1Error
-```sql -SELECT GENERATE_ARRAY(5, NULL, 1) AS example_array; +### `ACOSH` -/*---------------* - | example_array | - +---------------+ - | NULL | - *---------------*/ +``` +ACOSH(X) ``` -The following returns multiple arrays. +**Description** -```sql -SELECT GENERATE_ARRAY(start, 5) AS example_array -FROM UNNEST([3, 4, 5]) AS start; +Computes the inverse hyperbolic cosine of X. Generates an error if X is a value +less than 1. -/*---------------* - | example_array | - +---------------+ - | [3, 4, 5] | - | [4, 5] | - | [5] | - +---------------*/ -``` + + + + + + + + + + + + + + + + + + + + + + + + + +
XACOSH(X)
+inf+inf
-infNaN
NaNNaN
X < 1Error
-### `GENERATE_DATE_ARRAY` +### `ASIN` -```sql -GENERATE_DATE_ARRAY(start_date, end_date[, INTERVAL INT64_expr date_part]) +``` +ASIN(X) ``` **Description** -Returns an array of dates. The `start_date` and `end_date` -parameters determine the inclusive start and end of the array. - -The `GENERATE_DATE_ARRAY` function accepts the following data types as inputs: +Computes the principal value of the inverse sine of X. The return value is in +the range [-π/2,π/2]. Generates an error if X is outside of +the range [-1, 1]. -+ `start_date` must be a `DATE`. -+ `end_date` must be a `DATE`. -+ `INT64_expr` must be an `INT64`. -+ `date_part` must be either DAY, WEEK, MONTH, QUARTER, or YEAR. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XASIN(X)
+infNaN
-infNaN
NaNNaN
X < -1Error
X > 1Error
-The `INT64_expr` parameter determines the increment used to generate dates. The -default value for this parameter is 1 day. +### `ASINH` -This function returns an error if `INT64_expr` is set to 0. +``` +ASINH(X) +``` -**Return Data Type** +**Description** -`ARRAY` containing 0 or more `DATE` values. +Computes the inverse hyperbolic sine of X. Does not fail. -**Examples** - -The following returns an array of dates, with a default step of 1. + + + + + + + + + + + + + + + + + + + + + +
XASINH(X)
+inf+inf
-inf-inf
NaNNaN
-```sql -SELECT GENERATE_DATE_ARRAY('2016-10-05', '2016-10-08') AS example; +### `ATAN` -/*--------------------------------------------------* - | example | - +--------------------------------------------------+ - | [2016-10-05, 2016-10-06, 2016-10-07, 2016-10-08] | - *--------------------------------------------------*/ ``` - -The following returns an array using a user-specified step size. - -```sql -SELECT GENERATE_DATE_ARRAY( - '2016-10-05', '2016-10-09', INTERVAL 2 DAY) AS example; - -/*--------------------------------------* - | example | - +--------------------------------------+ - | [2016-10-05, 2016-10-07, 2016-10-09] | - *--------------------------------------*/ +ATAN(X) ``` -The following returns an array using a negative value, `-3` for its step size. - -```sql -SELECT GENERATE_DATE_ARRAY('2016-10-05', - '2016-10-01', INTERVAL -3 DAY) AS example; +**Description** -/*--------------------------* - | example | - +--------------------------+ - | [2016-10-05, 2016-10-02] | - *--------------------------*/ -``` +Computes the principal value of the inverse tangent of X. The return value is +in the range [-π/2,π/2]. Does not fail. -The following returns an array using the same value for the `start_date`and -`end_date`. + + + + + + + + + + + + + + + + + + + + + +
XATAN(X)
+infπ/2
-inf-π/2
NaNNaN
-```sql -SELECT GENERATE_DATE_ARRAY('2016-10-05', - '2016-10-05', INTERVAL 8 DAY) AS example; +### `ATAN2` -/*--------------* - | example | - +--------------+ - | [2016-10-05] | - *--------------*/ ``` - -The following returns an empty array, because the `start_date` is greater -than the `end_date`, and the `step` value is positive. - -```sql -SELECT GENERATE_DATE_ARRAY('2016-10-05', - '2016-10-01', INTERVAL 1 DAY) AS example; - -/*---------* - | example | - +---------+ - | [] | - *---------*/ +ATAN2(X, Y) ``` -The following returns a `NULL` array, because one of its inputs is -`NULL`. - -```sql -SELECT GENERATE_DATE_ARRAY('2016-10-05', NULL) AS example; +**Description** -/*---------* - | example | - +---------+ - | NULL | - *---------*/ -``` +Calculates the principal value of the inverse tangent of X/Y using the signs of +the two arguments to determine the quadrant. The return value is in the range +[-π,π]. -The following returns an array of dates, using MONTH as the `date_part` -interval: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XYATAN2(X, Y)
NaNAny valueNaN
Any valueNaNNaN
0.00.00.0
Positive Finite value-infπ
Negative Finite value-inf
Finite value+inf0.0
+infFinite valueπ/2
-infFinite value-π/2
+inf-inf¾π
-inf-inf-¾π
+inf+infπ/4
-inf+inf-π/4
-```sql -SELECT GENERATE_DATE_ARRAY('2016-01-01', - '2016-12-31', INTERVAL 2 MONTH) AS example; +### `ATANH` -/*--------------------------------------------------------------------------* - | example | - +--------------------------------------------------------------------------+ - | [2016-01-01, 2016-03-01, 2016-05-01, 2016-07-01, 2016-09-01, 2016-11-01] | - *--------------------------------------------------------------------------*/ +``` +ATANH(X) ``` -The following uses non-constant dates to generate an array. +**Description** -```sql -SELECT GENERATE_DATE_ARRAY(date_start, date_end, INTERVAL 1 WEEK) AS date_range -FROM ( - SELECT DATE '2016-01-01' AS date_start, DATE '2016-01-31' AS date_end - UNION ALL SELECT DATE "2016-04-01", DATE "2016-04-30" - UNION ALL SELECT DATE "2016-07-01", DATE "2016-07-31" - UNION ALL SELECT DATE "2016-10-01", DATE "2016-10-31" -) AS items; +Computes the inverse hyperbolic tangent of X. Generates an error if X is outside +of the range (-1, 1). -/*--------------------------------------------------------------* - | date_range | - +--------------------------------------------------------------+ - | [2016-01-01, 2016-01-08, 2016-01-15, 2016-01-22, 2016-01-29] | - | [2016-04-01, 2016-04-08, 2016-04-15, 2016-04-22, 2016-04-29] | - | [2016-07-01, 2016-07-08, 2016-07-15, 2016-07-22, 2016-07-29] | - | [2016-10-01, 2016-10-08, 2016-10-15, 2016-10-22, 2016-10-29] | - *--------------------------------------------------------------*/ -``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XATANH(X)
+infNaN
-infNaN
NaNNaN
X < -1Error
X > 1Error
-### `GENERATE_TIMESTAMP_ARRAY` +### `CBRT` -```sql -GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, - INTERVAL step_expression date_part) +``` +CBRT(X) ``` **Description** -Returns an `ARRAY` of `TIMESTAMPS` separated by a given interval. The -`start_timestamp` and `end_timestamp` parameters determine the inclusive -lower and upper bounds of the `ARRAY`. - -The `GENERATE_TIMESTAMP_ARRAY` function accepts the following data types as -inputs: - -+ `start_timestamp`: `TIMESTAMP` -+ `end_timestamp`: `TIMESTAMP` -+ `step_expression`: `INT64` -+ Allowed `date_part` values are: - `NANOSECOND` - (if the SQL engine supports it), - `MICROSECOND`, `MILLISECOND`, `SECOND`, `MINUTE`, `HOUR`, or `DAY`. +Computes the cube root of `X`. `X` can be any data type +that [coerces to `DOUBLE`][conversion-rules]. +Supports the `SAFE.` prefix. -The `step_expression` parameter determines the increment used to generate -timestamps. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XCBRT(X)
+infinf
-inf-inf
NaNNaN
00
NULLNULL
**Return Data Type** -An `ARRAY` containing 0 or more `TIMESTAMP` values. - -**Examples** +`DOUBLE` -The following example returns an `ARRAY` of `TIMESTAMP`s at intervals of 1 day. +**Example** ```sql -SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-07 00:00:00', - INTERVAL 1 DAY) AS timestamp_array; +SELECT CBRT(27) AS cube_root; -/*--------------------------------------------------------------------------* - | timestamp_array | - +--------------------------------------------------------------------------+ - | [2016-10-05 00:00:00+00, 2016-10-06 00:00:00+00, 2016-10-07 00:00:00+00] | - *--------------------------------------------------------------------------*/ +/*--------------------* + | cube_root | + +--------------------+ + | 3.0000000000000004 | + *--------------------*/ ``` -The following example returns an `ARRAY` of `TIMESTAMP`s at intervals of 1 -second. +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules -```sql -SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-05 00:00:02', - INTERVAL 1 SECOND) AS timestamp_array; +### `CEIL` -/*--------------------------------------------------------------------------* - | timestamp_array | - +--------------------------------------------------------------------------+ - | [2016-10-05 00:00:00+00, 2016-10-05 00:00:01+00, 2016-10-05 00:00:02+00] | - *--------------------------------------------------------------------------*/ ``` - -The following example returns an `ARRAY` of `TIMESTAMPS` with a negative -interval. - -```sql -SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-06 00:00:00', '2016-10-01 00:00:00', - INTERVAL -2 DAY) AS timestamp_array; - -/*--------------------------------------------------------------------------* - | timestamp_array | - +--------------------------------------------------------------------------+ - | [2016-10-06 00:00:00+00, 2016-10-04 00:00:00+00, 2016-10-02 00:00:00+00] | - *--------------------------------------------------------------------------*/ +CEIL(X) ``` -The following example returns an `ARRAY` with a single element, because -`start_timestamp` and `end_timestamp` have the same value. +**Description** -```sql -SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-05 00:00:00', - INTERVAL 1 HOUR) AS timestamp_array; +Returns the smallest integral value that is not less than X. -/*--------------------------* - | timestamp_array | - +--------------------------+ - | [2016-10-05 00:00:00+00] | - *--------------------------*/ -``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XCEIL(X)
2.02.0
2.33.0
2.83.0
2.53.0
-2.3-2.0
-2.8-2.0
-2.5-2.0
00
+inf+inf
-inf-inf
NaNNaN
-The following example returns an empty `ARRAY`, because `start_timestamp` is -later than `end_timestamp`. +**Return Data Type** -```sql -SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-06 00:00:00', '2016-10-05 00:00:00', - INTERVAL 1 HOUR) AS timestamp_array; + -/*-----------------* - | timestamp_array | - +-----------------+ - | [] | - *-----------------*/ -``` + + + + + + + + -The following example returns a null `ARRAY`, because one of the inputs is -`NULL`. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-```sql -SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', NULL, INTERVAL 1 HOUR) - AS timestamp_array; +### `CEILING` -/*-----------------* - | timestamp_array | - +-----------------+ - | NULL | - *-----------------*/ ``` - -The following example generates `ARRAY`s of `TIMESTAMP`s from columns containing -values for `start_timestamp` and `end_timestamp`. - -```sql -SELECT GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, INTERVAL 1 HOUR) - AS timestamp_array -FROM - (SELECT - TIMESTAMP '2016-10-05 00:00:00' AS start_timestamp, - TIMESTAMP '2016-10-05 02:00:00' AS end_timestamp - UNION ALL - SELECT - TIMESTAMP '2016-10-05 12:00:00' AS start_timestamp, - TIMESTAMP '2016-10-05 14:00:00' AS end_timestamp - UNION ALL - SELECT - TIMESTAMP '2016-10-05 23:59:00' AS start_timestamp, - TIMESTAMP '2016-10-06 01:59:00' AS end_timestamp); - -/*--------------------------------------------------------------------------* - | timestamp_array | - +--------------------------------------------------------------------------+ - | [2016-10-05 00:00:00+00, 2016-10-05 01:00:00+00, 2016-10-05 02:00:00+00] | - | [2016-10-05 12:00:00+00, 2016-10-05 13:00:00+00, 2016-10-05 14:00:00+00] | - | [2016-10-05 23:59:00+00, 2016-10-06 00:59:00+00, 2016-10-06 01:59:00+00] | - *--------------------------------------------------------------------------*/ +CEILING(X) ``` -### OFFSET and ORDINAL - -For information about using `OFFSET` and `ORDINAL` with arrays, see -[Array subscript operator][array-subscript-operator] and [Accessing array -elements][accessing-array-elements]. - - - -[array-subscript-operator]: #array_subscript_operator +**Description** -[accessing-array-elements]: https://github.com/google/zetasql/blob/master/docs/arrays.md#accessing_array_elements +Synonym of CEIL(X) - +### `COS` -## Date functions +``` +COS(X) +``` -ZetaSQL supports the following date functions. +**Description** -### Function list +Computes the cosine of X where X is specified in radians. Never fails. - - + + - - - - - - - - - - - - - - - + + + + + + + + + + + + + +
NameSummaryXCOS(X)
CURRENT_DATE - - - Returns the current date as a DATE value. -
DATE - - - Constructs a DATE value. -
DATE_ADD + +
+infNaN
-infNaN
NaNNaN
- - - Adds a specified time interval to a DATE value. - - +### `COSH` - - DATE_DIFF +``` +COSH(X) +``` - - - Gets the number of intervals between two DATE values. - - +**Description** - - DATE_FROM_UNIX_DATE +Computes the hyperbolic cosine of X where X is specified in radians. +Generates an error if overflow occurs. - - - Interprets an INT64 expression as the number of days - since 1970-01-01. - - + + + + + + + + + + + + + + + + + + + + + +
XCOSH(X)
+inf+inf
-inf+inf
NaNNaN
- - DATE_SUB +### `COSINE_DISTANCE` - - - Subtracts a specified time interval from a DATE value. - - +```sql +COSINE_DISTANCE(vector1, vector2) +``` - - DATE_TRUNC +**Description** - - - Truncates a DATE value. - - +Computes the [cosine distance][wiki-cosine-distance] between two vectors. - - EXTRACT +**Definitions** - - - Extracts part of a date from a DATE value. - - ++ `vector1`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. ++ `vector2`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. - - FORMAT_DATE +**Details** - - - Formats a DATE value according to a specified format string. - - ++ `ARRAY` can be used to represent a vector. Each zero-based index in this + array represents a dimension. The value for each element in this array + represents a magnitude. - - LAST_DAY + `T` can represent the following and must be the same for both + vectors: - - - Gets the last day in a specified time period that contains a - DATE value. - - + + + - - PARSE_DATE + + `FLOAT` + + `DOUBLE` - - - Converts a STRING value to a DATE value. - - + + - - UNIX_DATE + In the following example vector, there are four dimensions. The magnitude + is `10.0` for dimension `0`, `55.0` for dimension `1`, `40.0` for + dimension `2`, and `34.0` for dimension `3`: - - - Converts a DATE value to the number of days since 1970-01-01. - - + ``` + [10.0, 55.0, 40.0, 34.0] + ``` ++ `ARRAY>` can be used to represent a + sparse vector. With a sparse vector, you only need to include + dimension-magnitude pairs for non-zero magnitudes. If a magnitude isn't + present in the sparse vector, the magnitude is implicitly understood to be + zero. - - + For example, if you have a vector with 10,000 dimensions, but only 10 + dimensions have non-zero magnitudes, then the vector is a sparse vector. + As a result, it's more efficient to describe a sparse vector by only + mentioning its non-zero magnitudes. -### `CURRENT_DATE` + In `ARRAY>`, `STRUCT` + represents a dimension-magnitude pair for each non-zero magnitude in a + sparse vector. These parts need to be included for each dimension-magnitude + pair: -```sql -CURRENT_DATE() -``` + + `dimension`: A `STRING` or `INT64` value that represents a + dimension in a vector. -```sql -CURRENT_DATE(time_zone_expression) -``` + + `magnitude`: A `DOUBLE` value that represents a + non-zero magnitude for a specific dimension in a vector. -```sql -CURRENT_DATE -``` + You don't need to include empty dimension-magnitude pairs in a + sparse vector. For example, the following sparse vector and + non-sparse vector are equivalent: -**Description** + ```sql + -- sparse vector ARRAY> + [(1, 10.0), (2: 30.0), (5, 40.0)] + ``` -Returns the current date as a `DATE` object. Parentheses are optional when -called with no arguments. + ```sql + -- vector ARRAY + [0.0, 10.0, 30.0, 0.0, 0.0, 40.0] + ``` -This function supports the following arguments: + In a sparse vector, dimension-magnitude pairs don't need to be in any + particular order. The following sparse vectors are equivalent: -+ `time_zone_expression`: A `STRING` expression that represents a - [time zone][date-timezone-definitions]. If no time zone is specified, the - default time zone, which is implementation defined, is used. If this expression is - used and it evaluates to `NULL`, this function returns `NULL`. + ```sql + [('a', 10.0), ('b': 30.0), ('d': 40.0)] + ``` -The current date is recorded at the start of the query -statement which contains this function, not when this specific function is -evaluated. + ```sql + [('d': 40.0), ('a', 10.0), ('b': 30.0)] + ``` ++ Both non-sparse vectors + in this function must share the same dimensions, and if they don't, an error + is produced. ++ A vector can't be a zero vector. A vector is a zero vector if it has + no dimensions or all dimensions have a magnitude of `0`, such as `[]` or + `[0.0, 0.0]`. If a zero vector is encountered, an error is produced. ++ An error is produced if a magnitude in a vector is `NULL`. ++ If a vector is `NULL`, `NULL` is returned. -**Return Data Type** +**Return type** -`DATE` +`DOUBLE` **Examples** -The following query produces the current date in the default time zone: +In the following example, non-sparsevectors +are used to compute the cosine distance: ```sql -SELECT CURRENT_DATE() AS the_date; +SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; -/*--------------* - | the_date | - +--------------+ - | 2016-12-25 | - *--------------*/ +/*----------* + | results | + +----------+ + | 0.016130 | + *----------*/ ``` -The following queries produce the current date in a specified time zone: +In the following example, sparse vectors are used to compute the +cosine distance: ```sql -SELECT CURRENT_DATE('America/Los_Angeles') AS the_date; +SELECT COSINE_DISTANCE( + [(1, 1.0), (2, 2.0)], + [(2, 4.0), (1, 3.0)]) AS results; -/*--------------* - | the_date | - +--------------+ - | 2016-12-25 | - *--------------*/ + /*----------* + | results | + +----------+ + | 0.016130 | + *----------*/ ``` -```sql -SELECT CURRENT_DATE('-08') AS the_date; +The ordering of numeric values in a vector doesn't impact the results +produced by this function. For example these queries produce the same results +even though the numeric values in each vector is in a different order: -/*--------------* - | the_date | - +--------------+ - | 2016-12-25 | - *--------------*/ +```sql +SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; ``` -The following query produces the current date in the default time zone. -Parentheses are not needed if the function has no arguments. - ```sql -SELECT CURRENT_DATE AS the_date; - -/*--------------* - | the_date | - +--------------+ - | 2016-12-25 | - *--------------*/ +SELECT COSINE_DISTANCE([2.0, 1.0], [4.0, 3.0]) AS results; ``` -When a column named `current_date` is present, the column name and the function -call without parentheses are ambiguous. To ensure the function call, add -parentheses; to ensure the column name, qualify it with its -[range variable][date-range-variables]. For example, the -following query will select the function in the `the_date` column and the table -column in the `current_date` column. - ```sql -WITH t AS (SELECT 'column value' AS `current_date`) -SELECT current_date() AS the_date, t.current_date FROM t; - -/*------------+--------------* - | the_date | current_date | - +------------+--------------+ - | 2016-12-25 | column value | - *------------+--------------*/ +SELECT COSINE_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results; ``` -[date-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables +```sql + /*----------* + | results | + +----------+ + | 0.016130 | + *----------*/ +``` -[date-timezone-definitions]: https://github.com/google/zetasql/blob/master/docs/data-types.md#time_zones +In the following example, the function can't compute cosine distance against +the first vector, which is a zero vector: -### `DATE` +```sql +-- ERROR +SELECT COSINE_DISTANCE([0.0, 0.0], [3.0, 4.0]) AS results; +``` ```sql -DATE(year, month, day) +-- ERROR +SELECT COSINE_DISTANCE([(1, 0.0), (2, 0.0)], [(1, 3.0), (2, 4.0)]) AS results; ``` +Both non-sparse vectors must have the same +dimensions. If not, an error is produced. In the following example, the +first vector has two dimensions and the second vector has three: + ```sql -DATE(timestamp_expression) +-- ERROR +SELECT COSINE_DISTANCE([9.0, 7.0], [8.0, 4.0, 5.0]) AS results; ``` +If you use sparse vectors and you repeat a dimension, an error is +produced: + ```sql -DATE(timestamp_expression, time_zone_expression) +-- ERROR +SELECT COSINE_DISTANCE( + [(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results; ``` +[wiki-cosine-distance]: https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_distance + +### `COT` + ``` -DATE(datetime_expression) +COT(X) ``` **Description** -Constructs or extracts a date. - -This function supports the following arguments: +Computes the cotangent for the angle of `X`, where `X` is specified in radians. +`X` can be any data type +that [coerces to `DOUBLE`][conversion-rules]. +Supports the `SAFE.` prefix. -+ `year`: The `INT64` value for year. -+ `month`: The `INT64` value for month. -+ `day`: The `INT64` value for day. -+ `timestamp_expression`: A `TIMESTAMP` expression that contains the date. -+ `time_zone_expression`: A `STRING` expression that represents a - [time zone][date-timezone-definitions]. If no time zone is specified with - `timestamp_expression`, the default time zone, which is implementation defined, is - used. -+ `datetime_expression`: A `DATETIME` expression that contains the date. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XCOT(X)
+infNaN
-infNaN
NaNNaN
0Error
NULLNULL
**Return Data Type** -`DATE` +`DOUBLE` **Example** ```sql -SELECT - DATE(2016, 12, 25) AS date_ymd, - DATE(DATETIME '2016-12-25 23:59:59') AS date_dt, - DATE(TIMESTAMP '2016-12-25 05:30:00+07', 'America/Los_Angeles') AS date_tstz; +SELECT COT(1) AS a, SAFE.COT(0) AS b; -/*------------+------------+------------* - | date_ymd | date_dt | date_tstz | - +------------+------------+------------+ - | 2016-12-25 | 2016-12-25 | 2016-12-24 | - *------------+------------+------------*/ +/*---------------------+------* + | a | b | + +---------------------+------+ + | 0.64209261593433065 | NULL | + *---------------------+------*/ ``` -[date-timezone-definitions]: #timezone_definitions +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules -### `DATE_ADD` +### `COTH` -```sql -DATE_ADD(date_expression, INTERVAL int64_expression date_part) +``` +COTH(X) ``` **Description** -Adds a specified time interval to a DATE. - -`DATE_ADD` supports the following `date_part` values: - -+ `DAY` -+ `WEEK`. Equivalent to 7 `DAY`s. -+ `MONTH` -+ `QUARTER` -+ `YEAR` +Computes the hyperbolic cotangent for the angle of `X`, where `X` is specified +in radians. `X` can be any data type +that [coerces to `DOUBLE`][conversion-rules]. +Supports the `SAFE.` prefix. -Special handling is required for MONTH, QUARTER, and YEAR parts when -the date is at (or near) the last day of the month. If the resulting -month has fewer days than the original date's day, then the resulting -date is the last date of that month. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XCOTH(X)
+inf1
-inf-1
NaNNaN
0Error
NULLNULL
**Return Data Type** -DATE +`DOUBLE` **Example** ```sql -SELECT DATE_ADD(DATE '2008-12-25', INTERVAL 5 DAY) AS five_days_later; +SELECT COTH(1) AS a, SAFE.COTH(0) AS b; -/*--------------------* - | five_days_later | - +--------------------+ - | 2008-12-30 | - *--------------------*/ +/*----------------+------* + | a | b | + +----------------+------+ + | 1.313035285499 | NULL | + *----------------+------*/ ``` -### `DATE_DIFF` +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules -```sql -DATE_DIFF(date_expression_a, date_expression_b, date_part) +### `CSC` + +``` +CSC(X) ``` **Description** -Returns the whole number of specified `date_part` intervals between two -`DATE` objects (`date_expression_a` - `date_expression_b`). -If the first `DATE` is earlier than the second one, -the output is negative. - -`DATE_DIFF` supports the following `date_part` values: +Computes the cosecant of the input angle, which is in radians. +`X` can be any data type +that [coerces to `DOUBLE`][conversion-rules]. +Supports the `SAFE.` prefix. -+ `DAY` -+ `WEEK` This date part begins on Sunday. -+ `WEEK()`: This date part begins on `WEEKDAY`. Valid values for - `WEEKDAY` are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, - `FRIDAY`, and `SATURDAY`. -+ `ISOWEEK`: Uses [ISO 8601 week][ISO-8601-week] - boundaries. ISO weeks begin on Monday. -+ `MONTH`, except when the first two arguments are `TIMESTAMP` objects. -+ `QUARTER` -+ `YEAR` -+ `ISOYEAR`: Uses the [ISO 8601][ISO-8601] - week-numbering year boundary. The ISO year boundary is the Monday of the - first week whose Thursday belongs to the corresponding Gregorian calendar - year. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XCSC(X)
+infNaN
-infNaN
NaNNaN
0Error
NULLNULL
**Return Data Type** -INT64 +`DOUBLE` **Example** ```sql -SELECT DATE_DIFF(DATE '2010-07-07', DATE '2008-12-25', DAY) AS days_diff; +SELECT CSC(100) AS a, CSC(-1) AS b, SAFE.CSC(0) AS c; -/*-----------* - | days_diff | - +-----------+ - | 559 | - *-----------*/ +/*----------------+-----------------+------* + | a | b | c | + +----------------+-----------------+------+ + | -1.97485753142 | -1.188395105778 | NULL | + *----------------+-----------------+------*/ ``` -```sql -SELECT - DATE_DIFF(DATE '2017-10-15', DATE '2017-10-14', DAY) AS days_diff, - DATE_DIFF(DATE '2017-10-15', DATE '2017-10-14', WEEK) AS weeks_diff; +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules + +### `CSCH` -/*-----------+------------* - | days_diff | weeks_diff | - +-----------+------------+ - | 1 | 1 | - *-----------+------------*/ +``` +CSCH(X) ``` -The example above shows the result of `DATE_DIFF` for two days in succession. -`DATE_DIFF` with the date part `WEEK` returns 1 because `DATE_DIFF` counts the -number of date part boundaries in this range of dates. Each `WEEK` begins on -Sunday, so there is one date part boundary between Saturday, 2017-10-14 -and Sunday, 2017-10-15. +**Description** -The following example shows the result of `DATE_DIFF` for two dates in different -years. `DATE_DIFF` with the date part `YEAR` returns 3 because it counts the -number of Gregorian calendar year boundaries between the two dates. `DATE_DIFF` -with the date part `ISOYEAR` returns 2 because the second date belongs to the -ISO year 2015. The first Thursday of the 2015 calendar year was 2015-01-01, so -the ISO year 2015 begins on the preceding Monday, 2014-12-29. +Computes the hyperbolic cosecant of the input angle, which is in radians. +`X` can be any data type +that [coerces to `DOUBLE`][conversion-rules]. +Supports the `SAFE.` prefix. -```sql -SELECT - DATE_DIFF('2017-12-30', '2014-12-30', YEAR) AS year_diff, - DATE_DIFF('2017-12-30', '2014-12-30', ISOYEAR) AS isoyear_diff; + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XCSCH(X)
+inf0
-inf0
NaNNaN
0Error
NULLNULL
-/*-----------+--------------* - | year_diff | isoyear_diff | - +-----------+--------------+ - | 3 | 2 | - *-----------+--------------*/ -``` +**Return Data Type** -The following example shows the result of `DATE_DIFF` for two days in -succession. The first date falls on a Monday and the second date falls on a -Sunday. `DATE_DIFF` with the date part `WEEK` returns 0 because this date part -uses weeks that begin on Sunday. `DATE_DIFF` with the date part `WEEK(MONDAY)` -returns 1. `DATE_DIFF` with the date part `ISOWEEK` also returns 1 because -ISO weeks begin on Monday. +`DOUBLE` + +**Example** ```sql -SELECT - DATE_DIFF('2017-12-18', '2017-12-17', WEEK) AS week_diff, - DATE_DIFF('2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, - DATE_DIFF('2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff; +SELECT CSCH(0.5) AS a, CSCH(-2) AS b, SAFE.CSCH(0) AS c; -/*-----------+-------------------+--------------* - | week_diff | week_weekday_diff | isoweek_diff | - +-----------+-------------------+--------------+ - | 0 | 1 | 1 | - *-----------+-------------------+--------------*/ +/*----------------+----------------+------* + | a | b | c | + +----------------+----------------+------+ + | 1.919034751334 | -0.27572056477 | NULL | + *----------------+----------------+------*/ ``` -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 - -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules -### `DATE_FROM_UNIX_DATE` +### `DIV` -```sql -DATE_FROM_UNIX_DATE(int64_expression) +``` +DIV(X, Y) ``` **Description** -Interprets `int64_expression` as the number of days since 1970-01-01. +Returns the result of integer division of X by Y. Division by zero returns +an error. Division by -1 may overflow. -**Return Data Type** + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XYDIV(X, Y)
2045
12-7-1
2036
0200
200Error
-DATE +**Return Data Type** -**Example** +The return data type is determined by the argument types with the following +table. + -```sql -SELECT DATE_FROM_UNIX_DATE(14238) AS date_from_epoch; + + + + + + + + + + + + + -/*-----------------* - | date_from_epoch | - +-----------------+ - | 2008-12-25 | - *-----------------+*/ -``` +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERIC
INT32INT64INT64INT64ERRORNUMERICBIGNUMERIC
INT64INT64INT64INT64ERRORNUMERICBIGNUMERIC
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERIC
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERIC
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERIC
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERIC
-### `DATE_SUB` +### `EXP` -```sql -DATE_SUB(date_expression, INTERVAL int64_expression date_part) +``` +EXP(X) ``` **Description** -Subtracts a specified time interval from a DATE. - -`DATE_SUB` supports the following `date_part` values: - -+ `DAY` -+ `WEEK`. Equivalent to 7 `DAY`s. -+ `MONTH` -+ `QUARTER` -+ `YEAR` +Computes *e* to the power of X, also called the natural exponential function. If +the result underflows, this function returns a zero. Generates an error if the +result overflows. -Special handling is required for MONTH, QUARTER, and YEAR parts when -the date is at (or near) the last day of the month. If the resulting -month has fewer days than the original date's day, then the resulting -date is the last date of that month. + + + + + + + + + + + + + + + + + + + + + +
XEXP(X)
0.01.0
+inf+inf
-inf0.0
**Return Data Type** -DATE - -**Example** + -```sql -SELECT DATE_SUB(DATE '2008-12-25', INTERVAL 5 DAY) AS five_days_ago; + + + + + + + + -/*---------------* - | five_days_ago | - +---------------+ - | 2008-12-20 | - *---------------*/ -``` +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-### `DATE_TRUNC` +### `EUCLIDEAN_DISTANCE` ```sql -DATE_TRUNC(date_expression, date_part) +EUCLIDEAN_DISTANCE(vector1, vector2) ``` **Description** -Truncates a `DATE` value to the granularity of `date_part`. The `DATE` value -is always rounded to the beginning of `date_part`, which can be one of the -following: - -+ `DAY`: The day in the Gregorian calendar year that contains the - `DATE` value. -+ `WEEK`: The first day of the week in the week that contains the - `DATE` value. Weeks begin on Sundays. `WEEK` is equivalent to - `WEEK(SUNDAY)`. -+ `WEEK(WEEKDAY)`: The first day of the week in the week that contains the - `DATE` value. Weeks begin on `WEEKDAY`. `WEEKDAY` must be one of the - following: `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, - or `SATURDAY`. -+ `ISOWEEK`: The first day of the [ISO 8601 week][ISO-8601-week] in the - ISO week that contains the `DATE` value. The ISO week begins on - Monday. The first ISO week of each ISO year contains the first Thursday of the - corresponding Gregorian calendar year. -+ `MONTH`: The first day of the month in the month that contains the - `DATE` value. -+ `QUARTER`: The first day of the quarter in the quarter that contains the - `DATE` value. -+ `YEAR`: The first day of the year in the year that contains the - `DATE` value. -+ `ISOYEAR`: The first day of the [ISO 8601][ISO-8601] week-numbering year - in the ISO year that contains the `DATE` value. The ISO year is the - Monday of the first week whose Thursday belongs to the corresponding - Gregorian calendar year. +Computes the [Euclidean distance][wiki-euclidean-distance] between two vectors. - +**Definitions** -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 ++ `vector1`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. ++ `vector2`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date +**Details** - ++ `ARRAY` can be used to represent a vector. Each zero-based index in this + array represents a dimension. The value for each element in this array + represents a magnitude. -**Return Data Type** + `T` can represent the following and must be the same for both + vectors: -DATE + + + -**Examples** + + `FLOAT` + + `DOUBLE` -```sql -SELECT DATE_TRUNC(DATE '2008-12-25', MONTH) AS month; + + -/*------------* - | month | - +------------+ - | 2008-12-01 | - *------------*/ -``` + In the following example vector, there are four dimensions. The magnitude + is `10.0` for dimension `0`, `55.0` for dimension `1`, `40.0` for + dimension `2`, and `34.0` for dimension `3`: -In the following example, the original date falls on a Sunday. Because -the `date_part` is `WEEK(MONDAY)`, `DATE_TRUNC` returns the `DATE` for the -preceding Monday. + ``` + [10.0, 55.0, 40.0, 34.0] + ``` ++ `ARRAY>` can be used to represent a + sparse vector. With a sparse vector, you only need to include + dimension-magnitude pairs for non-zero magnitudes. If a magnitude isn't + present in the sparse vector, the magnitude is implicitly understood to be + zero. -```sql -SELECT date AS original, DATE_TRUNC(date, WEEK(MONDAY)) AS truncated -FROM (SELECT DATE('2017-11-05') AS date); + For example, if you have a vector with 10,000 dimensions, but only 10 + dimensions have non-zero magnitudes, then the vector is a sparse vector. + As a result, it's more efficient to describe a sparse vector by only + mentioning its non-zero magnitudes. -/*------------+------------* - | original | truncated | - +------------+------------+ - | 2017-11-05 | 2017-10-30 | - *------------+------------*/ -``` + In `ARRAY>`, `STRUCT` + represents a dimension-magnitude pair for each non-zero magnitude in a + sparse vector. These parts need to be included for each dimension-magnitude + pair: -In the following example, the original `date_expression` is in the Gregorian -calendar year 2015. However, `DATE_TRUNC` with the `ISOYEAR` date part -truncates the `date_expression` to the beginning of the ISO year, not the -Gregorian calendar year. The first Thursday of the 2015 calendar year was -2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29. -Therefore the ISO year boundary preceding the `date_expression` 2015-06-15 is -2014-12-29. + + `dimension`: A `STRING` or `INT64` value that represents a + dimension in a vector. -```sql -SELECT - DATE_TRUNC('2015-06-15', ISOYEAR) AS isoyear_boundary, - EXTRACT(ISOYEAR FROM DATE '2015-06-15') AS isoyear_number; + + `magnitude`: A `DOUBLE` value that represents a + non-zero magnitude for a specific dimension in a vector. -/*------------------+----------------* - | isoyear_boundary | isoyear_number | - +------------------+----------------+ - | 2014-12-29 | 2015 | - *------------------+----------------*/ -``` + You don't need to include empty dimension-magnitude pairs in a + sparse vector. For example, the following sparse vector and + non-sparse vector are equivalent: -### `EXTRACT` + ```sql + -- sparse vector ARRAY> + [(1, 10.0), (2: 30.0), (5, 40.0)] + ``` -```sql -EXTRACT(part FROM date_expression) -``` + ```sql + -- vector ARRAY + [0.0, 10.0, 30.0, 0.0, 0.0, 40.0] + ``` -**Description** + In a sparse vector, dimension-magnitude pairs don't need to be in any + particular order. The following sparse vectors are equivalent: -Returns the value corresponding to the specified date part. The `part` must -be one of: + ```sql + [('a', 10.0), ('b': 30.0), ('d': 40.0)] + ``` -+ `DAYOFWEEK`: Returns values in the range [1,7] with Sunday as the first day - of the week. -+ `DAY` -+ `DAYOFYEAR` -+ `WEEK`: Returns the week number of the date in the range [0, 53]. Weeks begin - with Sunday, and dates prior to the first Sunday of the year are in week 0. -+ `WEEK()`: Returns the week number of the date in the range [0, 53]. - Weeks begin on `WEEKDAY`. Dates prior to - the first `WEEKDAY` of the year are in week 0. Valid values for `WEEKDAY` are - `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, and - `SATURDAY`. -+ `ISOWEEK`: Returns the [ISO 8601 week][ISO-8601-week] - number of the `date_expression`. `ISOWEEK`s begin on Monday. Return values - are in the range [1, 53]. The first `ISOWEEK` of each ISO year begins on the - Monday before the first Thursday of the Gregorian calendar year. -+ `MONTH` -+ `QUARTER`: Returns values in the range [1,4]. -+ `YEAR` -+ `ISOYEAR`: Returns the [ISO 8601][ISO-8601] - week-numbering year, which is the Gregorian calendar year containing the - Thursday of the week to which `date_expression` belongs. + ```sql + [('d': 40.0), ('a', 10.0), ('b': 30.0)] + ``` ++ Both non-sparse vectors + in this function must share the same dimensions, and if they don't, an error + is produced. ++ A vector can be a zero vector. A vector is a zero vector if it has + no dimensions or all dimensions have a magnitude of `0`, such as `[]` or + `[0.0, 0.0]`. ++ An error is produced if a magnitude in a vector is `NULL`. ++ If a vector is `NULL`, `NULL` is returned. -**Return Data Type** +**Return type** -INT64 +`DOUBLE` **Examples** -In the following example, `EXTRACT` returns a value corresponding to the `DAY` -date part. +In the following example, non-sparse vectors +are used to compute the Euclidean distance: ```sql -SELECT EXTRACT(DAY FROM DATE '2013-12-25') AS the_day; +SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; -/*---------* - | the_day | - +---------+ - | 25 | - *---------*/ +/*----------* + | results | + +----------+ + | 2.828 | + *----------*/ ``` -In the following example, `EXTRACT` returns values corresponding to different -date parts from a column of dates near the end of the year. +In the following example, sparse vectors are used to compute the +Euclidean distance: ```sql -SELECT - date, - EXTRACT(ISOYEAR FROM date) AS isoyear, - EXTRACT(ISOWEEK FROM date) AS isoweek, - EXTRACT(YEAR FROM date) AS year, - EXTRACT(WEEK FROM date) AS week -FROM UNNEST(GENERATE_DATE_ARRAY('2015-12-23', '2016-01-09')) AS date -ORDER BY date; +SELECT EUCLIDEAN_DISTANCE( + [(1, 1.0), (2, 2.0)], + [(2, 4.0), (1, 3.0)]) AS results; -/*------------+---------+---------+------+------* - | date | isoyear | isoweek | year | week | - +------------+---------+---------+------+------+ - | 2015-12-23 | 2015 | 52 | 2015 | 51 | - | 2015-12-24 | 2015 | 52 | 2015 | 51 | - | 2015-12-25 | 2015 | 52 | 2015 | 51 | - | 2015-12-26 | 2015 | 52 | 2015 | 51 | - | 2015-12-27 | 2015 | 52 | 2015 | 52 | - | 2015-12-28 | 2015 | 53 | 2015 | 52 | - | 2015-12-29 | 2015 | 53 | 2015 | 52 | - | 2015-12-30 | 2015 | 53 | 2015 | 52 | - | 2015-12-31 | 2015 | 53 | 2015 | 52 | - | 2016-01-01 | 2015 | 53 | 2016 | 0 | - | 2016-01-02 | 2015 | 53 | 2016 | 0 | - | 2016-01-03 | 2015 | 53 | 2016 | 1 | - | 2016-01-04 | 2016 | 1 | 2016 | 1 | - | 2016-01-05 | 2016 | 1 | 2016 | 1 | - | 2016-01-06 | 2016 | 1 | 2016 | 1 | - | 2016-01-07 | 2016 | 1 | 2016 | 1 | - | 2016-01-08 | 2016 | 1 | 2016 | 1 | - | 2016-01-09 | 2016 | 1 | 2016 | 1 | - *------------+---------+---------+------+------*/ + /*----------* + | results | + +----------+ + | 2.828 | + *----------*/ ``` -In the following example, `date_expression` falls on a Sunday. `EXTRACT` -calculates the first column using weeks that begin on Sunday, and it calculates -the second column using weeks that begin on Monday. +The ordering of magnitudes in a vector doesn't impact the results +produced by this function. For example these queries produce the same results +even though the magnitudes in each vector is in a different order: ```sql -WITH table AS (SELECT DATE('2017-11-05') AS date) -SELECT - date, - EXTRACT(WEEK(SUNDAY) FROM date) AS week_sunday, - EXTRACT(WEEK(MONDAY) FROM date) AS week_monday FROM table; - -/*------------+-------------+-------------* - | date | week_sunday | week_monday | - +------------+-------------+-------------+ - | 2017-11-05 | 45 | 44 | - *------------+-------------+-------------*/ +SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]); ``` -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 - -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date - -### `FORMAT_DATE` - ```sql -FORMAT_DATE(format_string, date_expr) +SELECT EUCLIDEAN_DISTANCE([2.0, 1.0], [4.0, 3.0]); ``` -**Description** - -Formats the `date_expr` according to the specified `format_string`. - -See [Supported Format Elements For DATE][date-format-elements] -for a list of format elements that this function supports. - -**Return Data Type** - -STRING - -**Examples** - ```sql -SELECT FORMAT_DATE('%x', DATE '2008-12-25') AS US_format; - -/*------------* - | US_format | - +------------+ - | 12/25/08 | - *------------*/ +SELECT EUCLIDEAN_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results; ``` ```sql -SELECT FORMAT_DATE('%b-%d-%Y', DATE '2008-12-25') AS formatted; - -/*-------------* - | formatted | - +-------------+ - | Dec-25-2008 | - *-------------*/ + /*----------* + | results | + +----------+ + | 2.828 | + *----------*/ ``` -```sql -SELECT FORMAT_DATE('%b %Y', DATE '2008-12-25') AS formatted; +Both non-sparse vectors must have the same +dimensions. If not, an error is produced. In the following example, the first +vector has two dimensions and the second vector has three: -/*-------------* - | formatted | - +-------------+ - | Dec 2008 | - *-------------*/ +```sql +-- ERROR +SELECT EUCLIDEAN_DISTANCE([9.0, 7.0], [8.0, 4.0, 5.0]) AS results; ``` -[date-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time - -### `LAST_DAY` +If you use sparse vectors and you repeat a dimension, an error is +produced: ```sql -LAST_DAY(date_expression[, date_part]) +-- ERROR +SELECT EUCLIDEAN_DISTANCE( + [(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results; ``` -**Description** - -Returns the last day from a date expression. This is commonly used to return -the last day of the month. - -You can optionally specify the date part for which the last day is returned. -If this parameter is not used, the default value is `MONTH`. -`LAST_DAY` supports the following values for `date_part`: - -+ `YEAR` -+ `QUARTER` -+ `MONTH` -+ `WEEK`. Equivalent to 7 `DAY`s. -+ `WEEK()`. `` represents the starting day of the week. - Valid values are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, - `FRIDAY`, and `SATURDAY`. -+ `ISOWEEK`. Uses [ISO 8601][ISO-8601-week] week boundaries. ISO weeks begin - on Monday. -+ `ISOYEAR`. Uses the [ISO 8601][ISO-8601] week-numbering year boundary. - The ISO year boundary is the Monday of the first week whose Thursday belongs - to the corresponding Gregorian calendar year. - -**Return Data Type** +[wiki-euclidean-distance]: https://en.wikipedia.org/wiki/Euclidean_distance -`DATE` +### `FLOOR` -**Example** +``` +FLOOR(X) +``` -These both return the last day of the month: +**Description** -```sql -SELECT LAST_DAY(DATE '2008-11-25', MONTH) AS last_day +Returns the largest integral value that is not greater than X. -/*------------* - | last_day | - +------------+ - | 2008-11-30 | - *------------*/ -``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XFLOOR(X)
2.02.0
2.32.0
2.82.0
2.52.0
-2.3-3.0
-2.8-3.0
-2.5-3.0
00
+inf+inf
-inf-inf
NaNNaN
-```sql -SELECT LAST_DAY(DATE '2008-11-25') AS last_day +**Return Data Type** -/*------------* - | last_day | - +------------+ - | 2008-11-30 | - *------------*/ -``` + -This returns the last day of the year: + + + + + + + + -```sql -SELECT LAST_DAY(DATE '2008-11-25', YEAR) AS last_day +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-/*------------* - | last_day | - +------------+ - | 2008-12-31 | - *------------*/ -``` +### `GREATEST` -This returns the last day of the week for a week that starts on a Sunday: +``` +GREATEST(X1,...,XN) +``` -```sql -SELECT LAST_DAY(DATE '2008-11-10', WEEK(SUNDAY)) AS last_day +**Description** -/*------------* - | last_day | - +------------+ - | 2008-11-15 | - *------------*/ -``` +Returns the greatest value among `X1,...,XN`. If any argument is `NULL`, returns +`NULL`. Otherwise, in the case of floating-point arguments, if any argument is +`NaN`, returns `NaN`. In all other cases, returns the value among `X1,...,XN` +that has the greatest value according to the ordering used by the `ORDER BY` +clause. The arguments `X1, ..., XN` must be coercible to a common supertype, and +the supertype must support ordering. -This returns the last day of the week for a week that starts on a Monday: + + + + + + + + + + + + + +
X1,...,XNGREATEST(X1,...,XN)
3,5,15
-```sql -SELECT LAST_DAY(DATE '2008-11-10', WEEK(MONDAY)) AS last_day +This function supports specifying [collation][collation]. -/*------------* - | last_day | - +------------+ - | 2008-11-16 | - *------------*/ -``` +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 +**Return Data Types** -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date +Data type of the input values. -### `PARSE_DATE` +### `IEEE_DIVIDE` -```sql -PARSE_DATE(format_string, date_string) +``` +IEEE_DIVIDE(X, Y) ``` **Description** -Converts a [string representation of date][date-format] to a -`DATE` object. +Divides X by Y; this function never fails. Returns +`DOUBLE` unless +both X and Y are `FLOAT`, in which case it returns +`FLOAT`. Unlike the division operator (/), +this function does not generate errors for division by zero or overflow.

-`format_string` contains the [format elements][date-format-elements] -that define how `date_string` is formatted. Each element in -`date_string` must have a corresponding element in `format_string`. The -location of each element in `format_string` must match the location of -each element in `date_string`. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XYIEEE_DIVIDE(X, Y)
20.04.05.0
0.025.00.0
25.00.0+inf
-25.00.0-inf
0.00.0NaN
0.0NaNNaN
NaN0.0NaN
+inf+infNaN
-inf-infNaN
-```sql --- This works because elements on both sides match. -SELECT PARSE_DATE('%A %b %e %Y', 'Thursday Dec 25 2008') +### `IS_INF` --- This produces an error because the year element is in different locations. -SELECT PARSE_DATE('%Y %A %b %e', 'Thursday Dec 25 2008') +``` +IS_INF(X) +``` --- This produces an error because one of the year elements is missing. -SELECT PARSE_DATE('%A %b %e', 'Thursday Dec 25 2008') +**Description** --- This works because %F can find all matching elements in date_string. -SELECT PARSE_DATE('%F', '2000-12-30') -``` +Returns `TRUE` if the value is positive or negative infinity. -When using `PARSE_DATE`, keep the following in mind: + + + + + + + + + + + + + + + + + + + + + +
XIS_INF(X)
+infTRUE
-infTRUE
25FALSE
-+ **Unspecified fields.** Any unspecified field is initialized from `1970-01-01`. -+ **Case insensitivity.** Names, such as `Monday`, `February`, and so on, are - case insensitive. -+ **Whitespace.** One or more consecutive white spaces in the format string - matches zero or more consecutive white spaces in the date string. In - addition, leading and trailing white spaces in the date string are always - allowed -- even if they are not in the format string. -+ **Format precedence.** When two (or more) format elements have overlapping - information (for example both `%F` and `%Y` affect the year), the last one - generally overrides any earlier ones. +### `IS_NAN` -**Return Data Type** +``` +IS_NAN(X) +``` -DATE +**Description** -**Examples** +Returns `TRUE` if the value is a `NaN` value. -This example converts a `MM/DD/YY` formatted string to a `DATE` object: + + + + + + + + + + + + + + + + + +
XIS_NAN(X)
NaNTRUE
25FALSE
-```sql -SELECT PARSE_DATE('%x', '12/25/08') AS parsed; +### `LEAST` -/*------------* - | parsed | - +------------+ - | 2008-12-25 | - *------------*/ +``` +LEAST(X1,...,XN) ``` -This example converts a `YYYYMMDD` formatted string to a `DATE` object: +**Description** -```sql -SELECT PARSE_DATE('%Y%m%d', '20081225') AS parsed; +Returns the least value among `X1,...,XN`. If any argument is `NULL`, returns +`NULL`. Otherwise, in the case of floating-point arguments, if any argument is +`NaN`, returns `NaN`. In all other cases, returns the value among `X1,...,XN` +that has the least value according to the ordering used by the `ORDER BY` +clause. The arguments `X1, ..., XN` must be coercible to a common supertype, and +the supertype must support ordering. -/*------------* - | parsed | - +------------+ - | 2008-12-25 | - *------------*/ -``` + + + + + + + + + + + + + +
X1,...,XNLEAST(X1,...,XN)
3,5,11
-[date-format]: #format_date +This function supports specifying [collation][collation]. -[date-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about -### `UNIX_DATE` +**Return Data Types** + +Data type of the input values. + +### `LN` -```sql -UNIX_DATE(date_expression) +``` +LN(X) ``` **Description** -Returns the number of days since `1970-01-01`. +Computes the natural logarithm of X. Generates an error if X is less than or +equal to zero. + + + + + + + + + + + + + + + + + + + + + + +
XLN(X)
1.00.0
+inf+inf
X < 0Error
**Return Data Type** -INT64 + -**Example** + + + + + + + + -```sql -SELECT UNIX_DATE(DATE '2008-12-25') AS days_from_epoch; +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-/*-----------------* - | days_from_epoch | - +-----------------+ - | 14238 | - *-----------------*/ -``` +### `LOG` -## Datetime functions +``` +LOG(X [, Y]) +``` -ZetaSQL supports the following datetime functions. +**Description** -### Function list +If only X is present, `LOG` is a synonym of `LN`. If Y is also present, +`LOG` computes the logarithm of X to base Y. - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameSummaryXYLOG(X, Y)
100.010.02.0
-infAny valueNaN
Any value+infNaN
+inf0.0 < Y < 1.0-inf
+infY > 1.0+inf
X <= 0Any valueError
Any valueY <= 0Error
Any value1.0Error
- - CURRENT_DATETIME - - - - Returns the current date and time as a DATETIME value. - - - - - DATETIME +**Return Data Type** - - - Constructs a DATETIME value. - - + + - - + + + + + + + + + + + + - - - - +
DATETIME_ADD - - - Adds a specified time interval to a DATETIME value. - INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
INT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
INT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DATETIME_DIFF - - - Gets the number of intervals between two DATETIME values. -
- - DATETIME_SUB +### `LOG10` - - - Subtracts a specified time interval from a DATETIME value. - - +``` +LOG10(X) +``` - - DATETIME_TRUNC +**Description** - - - Truncates a DATETIME value. - - +Similar to `LOG`, but computes logarithm to base 10. - - EXTRACT - - - - Extracts part of a date and time from a DATETIME value. - - - - - FORMAT_DATETIME - - - - Formats a DATETIME value according to a specified - format string. - - + + + + + + + + + + + + + + + + + + + + + + + + + +
XLOG10(X)
100.02.0
-infNaN
+inf+inf
X <= 0Error
- - LAST_DAY +**Return Data Type** - - - Gets the last day in a specified time period that contains a - DATETIME value. - - + + - - + + + + + -
PARSE_DATETIME - - - Converts a STRING value to a DATETIME value. - INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-### `CURRENT_DATETIME` +### `MOD` -```sql -CURRENT_DATETIME([time_zone]) ``` - -```sql -CURRENT_DATETIME +MOD(X, Y) ``` **Description** -Returns the current time as a `DATETIME` object. Parentheses are optional when -called with no arguments. - -This function supports an optional `time_zone` parameter. -See [Time zone definitions][datetime-timezone-definitions] for -information on how to specify a time zone. +Modulo function: returns the remainder of the division of X by Y. Returned +value has the same sign as X. An error is generated if Y is 0. -The current date and time is recorded at the start of the query -statement which contains this function, not when this specific function is -evaluated. + + + + + + + + + + + + + + + + + + + +
XYMOD(X, Y)
25121
250Error
**Return Data Type** -`DATETIME` +The return data type is determined by the argument types with the following +table. + -**Example** + + + + + + + + + + + + + -```sql -SELECT CURRENT_DATETIME() as now; +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERIC
INT32INT64INT64INT64ERRORNUMERICBIGNUMERIC
INT64INT64INT64INT64ERRORNUMERICBIGNUMERIC
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERIC
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERIC
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERIC
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERIC
-/*----------------------------* - | now | - +----------------------------+ - | 2016-05-19 10:38:47.046465 | - *----------------------------*/ +### `PI` + +```sql +PI() ``` -When a column named `current_datetime` is present, the column name and the -function call without parentheses are ambiguous. To ensure the function call, -add parentheses; to ensure the column name, qualify it with its -[range variable][datetime-range-variables]. For example, the -following query will select the function in the `now` column and the table -column in the `current_datetime` column. +**Description** -```sql -WITH t AS (SELECT 'column value' AS `current_datetime`) -SELECT current_datetime() as now, t.current_datetime FROM t; +Returns the mathematical constant `Ï€` as a `DOUBLE` +value. -/*----------------------------+------------------* - | now | current_datetime | - +----------------------------+------------------+ - | 2016-05-19 10:38:47.046465 | column value | - *----------------------------+------------------*/ -``` +**Return type** -[datetime-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables +`DOUBLE` -[datetime-timezone-definitions]: #timezone_definitions +**Example** -### `DATETIME` +```sql +SELECT PI() AS pi + +/*--------------------* + | pi | + +--------------------+ + | 3.1415926535897931 | + *--------------------*/ +``` + +### `PI_BIGNUMERIC` ```sql -1. DATETIME(year, month, day, hour, minute, second) -2. DATETIME(date_expression[, time_expression]) -3. DATETIME(timestamp_expression [, time_zone]) +PI_BIGNUMERIC() ``` **Description** -1. Constructs a `DATETIME` object using `INT64` values - representing the year, month, day, hour, minute, and second. -2. Constructs a `DATETIME` object using a DATE object and an optional `TIME` - object. -3. Constructs a `DATETIME` object using a `TIMESTAMP` object. It supports an - optional parameter to - [specify a time zone][datetime-timezone-definitions]. - If no time zone is specified, the default time zone, which is implementation defined, - is used. +Returns the mathematical constant `Ï€` as a `BIGNUMERIC` value. -**Return Data Type** +**Return type** -`DATETIME` +`BIGNUMERIC` **Example** ```sql -SELECT - DATETIME(2008, 12, 25, 05, 30, 00) as datetime_ymdhms, - DATETIME(TIMESTAMP "2008-12-25 05:30:00+00", "America/Los_Angeles") as datetime_tstz; +SELECT PI_BIGNUMERIC() AS pi -/*---------------------+---------------------* - | datetime_ymdhms | datetime_tstz | - +---------------------+---------------------+ - | 2008-12-25 05:30:00 | 2008-12-24 21:30:00 | - *---------------------+---------------------*/ +/*-----------------------------------------* + | pi | + +-----------------------------------------+ + | 3.1415926535897932384626433832795028842 | + *-----------------------------------------*/ ``` -[datetime-timezone-definitions]: #timezone_definitions - -### `DATETIME_ADD` +### `PI_NUMERIC` ```sql -DATETIME_ADD(datetime_expression, INTERVAL int64_expression part) +PI_NUMERIC() ``` **Description** -Adds `int64_expression` units of `part` to the `DATETIME` object. - -`DATETIME_ADD` supports the following values for `part`: - -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` -+ `DAY` -+ `WEEK`. Equivalent to 7 `DAY`s. -+ `MONTH` -+ `QUARTER` -+ `YEAR` - -Special handling is required for MONTH, QUARTER, and YEAR parts when the -date is at (or near) the last day of the month. If the resulting month has fewer -days than the original DATETIME's day, then the result day is the last day of -the new month. +Returns the mathematical constant `Ï€` as a `NUMERIC` value. -**Return Data Type** +**Return type** -`DATETIME` +`NUMERIC` **Example** ```sql -SELECT - DATETIME "2008-12-25 15:30:00" as original_date, - DATETIME_ADD(DATETIME "2008-12-25 15:30:00", INTERVAL 10 MINUTE) as later; +SELECT PI_NUMERIC() AS pi -/*-----------------------------+------------------------* - | original_date | later | - +-----------------------------+------------------------+ - | 2008-12-25 15:30:00 | 2008-12-25 15:40:00 | - *-----------------------------+------------------------*/ +/*-------------* + | pi | + +-------------+ + | 3.141592654 | + *-------------*/ ``` -### `DATETIME_DIFF` +### `POW` -```sql -DATETIME_DIFF(datetime_expression_a, datetime_expression_b, part) +``` +POW(X, Y) ``` **Description** -Returns the whole number of specified `part` intervals between two -`DATETIME` objects (`datetime_expression_a` - `datetime_expression_b`). -If the first `DATETIME` is earlier than the second one, -the output is negative. Throws an error if the computation overflows the -result type, such as if the difference in -nanoseconds -between the two `DATETIME` objects would overflow an -`INT64` value. - -`DATETIME_DIFF` supports the following values for `part`: +Returns the value of X raised to the power of Y. If the result underflows and is +not representable, then the function returns a value of zero. -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` -+ `DAY` -+ `WEEK`: This date part begins on Sunday. -+ `WEEK()`: This date part begins on `WEEKDAY`. Valid values for - `WEEKDAY` are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, - `FRIDAY`, and `SATURDAY`. -+ `ISOWEEK`: Uses [ISO 8601 week][ISO-8601-week] - boundaries. ISO weeks begin on Monday. -+ `MONTH`, except when the first two arguments are `TIMESTAMP` objects. -+ `QUARTER` -+ `YEAR` -+ `ISOYEAR`: Uses the [ISO 8601][ISO-8601] - week-numbering year boundary. The ISO year boundary is the Monday of the - first week whose Thursday belongs to the corresponding Gregorian calendar - year. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XYPOW(X, Y)
2.03.08.0
1.0Any value including NaN1.0
Any value including NaN01.0
-1.0+inf1.0
-1.0-inf1.0
ABS(X) < 1-inf+inf
ABS(X) > 1-inf0.0
ABS(X) < 1+inf0.0
ABS(X) > 1+inf+inf
-infY < 00.0
-infY > 0-inf if Y is an odd integer, +inf otherwise
+infY < 00
+infY > 0+inf
Finite value < 0Non-integerError
0Finite value < 0Error
**Return Data Type** -`INT64` +The return data type is determined by the argument types with the following +table. -**Example** + -```sql -SELECT - DATETIME "2010-07-07 10:20:00" as first_datetime, - DATETIME "2008-12-25 15:30:00" as second_datetime, - DATETIME_DIFF(DATETIME "2010-07-07 10:20:00", - DATETIME "2008-12-25 15:30:00", DAY) as difference; + + + + + + + + + + + + + + + -/*----------------------------+------------------------+------------------------* - | first_datetime | second_datetime | difference | - +----------------------------+------------------------+------------------------+ - | 2010-07-07 10:20:00 | 2008-12-25 15:30:00 | 559 | - *----------------------------+------------------------+------------------------*/ -``` +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
INT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
INT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
-```sql -SELECT - DATETIME_DIFF(DATETIME '2017-10-15 00:00:00', - DATETIME '2017-10-14 00:00:00', DAY) as days_diff, - DATETIME_DIFF(DATETIME '2017-10-15 00:00:00', - DATETIME '2017-10-14 00:00:00', WEEK) as weeks_diff; +### `POWER` -/*-----------+------------* - | days_diff | weeks_diff | - +-----------+------------+ - | 1 | 1 | - *-----------+------------*/ +``` +POWER(X, Y) ``` -The example above shows the result of `DATETIME_DIFF` for two `DATETIME`s that -are 24 hours apart. `DATETIME_DIFF` with the part `WEEK` returns 1 because -`DATETIME_DIFF` counts the number of part boundaries in this range of -`DATETIME`s. Each `WEEK` begins on Sunday, so there is one part boundary between -Saturday, `2017-10-14 00:00:00` and Sunday, `2017-10-15 00:00:00`. - -The following example shows the result of `DATETIME_DIFF` for two dates in -different years. `DATETIME_DIFF` with the date part `YEAR` returns 3 because it -counts the number of Gregorian calendar year boundaries between the two -`DATETIME`s. `DATETIME_DIFF` with the date part `ISOYEAR` returns 2 because the -second `DATETIME` belongs to the ISO year 2015. The first Thursday of the 2015 -calendar year was 2015-01-01, so the ISO year 2015 begins on the preceding -Monday, 2014-12-29. - -```sql -SELECT - DATETIME_DIFF('2017-12-30 00:00:00', - '2014-12-30 00:00:00', YEAR) AS year_diff, - DATETIME_DIFF('2017-12-30 00:00:00', - '2014-12-30 00:00:00', ISOYEAR) AS isoyear_diff; +**Description** -/*-----------+--------------* - | year_diff | isoyear_diff | - +-----------+--------------+ - | 3 | 2 | - *-----------+--------------*/ -``` +Synonym of [`POW(X, Y)`][pow]. -The following example shows the result of `DATETIME_DIFF` for two days in -succession. The first date falls on a Monday and the second date falls on a -Sunday. `DATETIME_DIFF` with the date part `WEEK` returns 0 because this time -part uses weeks that begin on Sunday. `DATETIME_DIFF` with the date part -`WEEK(MONDAY)` returns 1. `DATETIME_DIFF` with the date part -`ISOWEEK` also returns 1 because ISO weeks begin on Monday. +[pow]: #pow -```sql -SELECT - DATETIME_DIFF('2017-12-18', '2017-12-17', WEEK) AS week_diff, - DATETIME_DIFF('2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff, - DATETIME_DIFF('2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff; +### `RAND` -/*-----------+-------------------+--------------* - | week_diff | week_weekday_diff | isoweek_diff | - +-----------+-------------------+--------------+ - | 0 | 1 | 1 | - *-----------+-------------------+--------------*/ +``` +RAND() ``` -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 +**Description** -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date +Generates a pseudo-random value of type `DOUBLE` in +the range of [0, 1), inclusive of 0 and exclusive of 1. -### `DATETIME_SUB` +### `RANGE_BUCKET` ```sql -DATETIME_SUB(datetime_expression, INTERVAL int64_expression part) +RANGE_BUCKET(point, boundaries_array) ``` **Description** -Subtracts `int64_expression` units of `part` from the `DATETIME`. +`RANGE_BUCKET` scans through a sorted array and returns the 0-based position +of the point's upper bound. This can be useful if you need to group your data to +build partitions, histograms, business-defined rules, and more. -`DATETIME_SUB` supports the following values for `part`: +`RANGE_BUCKET` follows these rules: -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` -+ `DAY` -+ `WEEK`. Equivalent to 7 `DAY`s. -+ `MONTH` -+ `QUARTER` -+ `YEAR` ++ If the point exists in the array, returns the index of the next larger value. -Special handling is required for `MONTH`, `QUARTER`, and `YEAR` parts when the -date is at (or near) the last day of the month. If the resulting month has fewer -days than the original `DATETIME`'s day, then the result day is the last day of -the new month. + ```sql + RANGE_BUCKET(20, [0, 10, 20, 30, 40]) -- 3 is return value + RANGE_BUCKET(20, [0, 10, 20, 20, 40, 40]) -- 4 is return value + ``` ++ If the point does not exist in the array, but it falls between two values, + returns the index of the larger value. -**Return Data Type** + ```sql + RANGE_BUCKET(25, [0, 10, 20, 30, 40]) -- 3 is return value + ``` ++ If the point is smaller than the first value in the array, returns 0. -`DATETIME` + ```sql + RANGE_BUCKET(-10, [5, 10, 20, 30, 40]) -- 0 is return value + ``` ++ If the point is greater than or equal to the last value in the array, + returns the length of the array. -**Example** + ```sql + RANGE_BUCKET(80, [0, 10, 20, 30, 40]) -- 5 is return value + ``` ++ If the array is empty, returns 0. -```sql -SELECT - DATETIME "2008-12-25 15:30:00" as original_date, - DATETIME_SUB(DATETIME "2008-12-25 15:30:00", INTERVAL 10 MINUTE) as earlier; + ```sql + RANGE_BUCKET(80, []) -- 0 is return value + ``` ++ If the point is `NULL` or `NaN`, returns `NULL`. -/*-----------------------------+------------------------* - | original_date | earlier | - +-----------------------------+------------------------+ - | 2008-12-25 15:30:00 | 2008-12-25 15:20:00 | - *-----------------------------+------------------------*/ -``` - -### `DATETIME_TRUNC` + ```sql + RANGE_BUCKET(NULL, [0, 10, 20, 30, 40]) -- NULL is return value + ``` ++ The data type for the point and array must be compatible. -```sql -DATETIME_TRUNC(datetime_expression, date_time_part) -``` + ```sql + RANGE_BUCKET('a', ['a', 'b', 'c', 'd']) -- 1 is return value + RANGE_BUCKET(1.2, [1, 1.2, 1.4, 1.6]) -- 2 is return value + RANGE_BUCKET(1.2, [1, 2, 4, 6]) -- execution failure + ``` -**Description** +Execution failure occurs when: -Truncates a `DATETIME` value to the granularity of `date_time_part`. -The `DATETIME` value is always rounded to the beginning of `date_time_part`, -which can be one of the following: ++ The array has a `NaN` or `NULL` value in it. -+ `NANOSECOND`: If used, nothing is truncated from the value. -+ `MICROSECOND`: The nearest lessor or equal microsecond. -+ `MILLISECOND`: The nearest lessor or equal millisecond. -+ `SECOND`: The nearest lessor or equal second. -+ `MINUTE`: The nearest lessor or equal minute. -+ `HOUR`: The nearest lessor or equal hour. -+ `DAY`: The day in the Gregorian calendar year that contains the - `DATETIME` value. -+ `WEEK`: The first day of the week in the week that contains the - `DATETIME` value. Weeks begin on Sundays. `WEEK` is equivalent to - `WEEK(SUNDAY)`. -+ `WEEK(WEEKDAY)`: The first day of the week in the week that contains the - `DATETIME` value. Weeks begin on `WEEKDAY`. `WEEKDAY` must be one of the - following: `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, - or `SATURDAY`. -+ `ISOWEEK`: The first day of the [ISO 8601 week][ISO-8601-week] in the - ISO week that contains the `DATETIME` value. The ISO week begins on - Monday. The first ISO week of each ISO year contains the first Thursday of the - corresponding Gregorian calendar year. -+ `MONTH`: The first day of the month in the month that contains the - `DATETIME` value. -+ `QUARTER`: The first day of the quarter in the quarter that contains the - `DATETIME` value. -+ `YEAR`: The first day of the year in the year that contains the - `DATETIME` value. -+ `ISOYEAR`: The first day of the [ISO 8601][ISO-8601] week-numbering year - in the ISO year that contains the `DATETIME` value. The ISO year is the - Monday of the first week whose Thursday belongs to the corresponding - Gregorian calendar year. + ```sql + RANGE_BUCKET(80, [NULL, 10, 20, 30, 40]) -- execution failure + ``` ++ The array is not sorted in ascending order. - + ```sql + RANGE_BUCKET(30, [10, 30, 20, 40, 50]) -- execution failure + ``` -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 +**Parameters** -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date ++ `point`: A generic value. ++ `boundaries_array`: A generic array of values. - +Note: The data type for `point` and the element type of `boundaries_array` +must be equivalent. The data type must be [comparable][data-type-properties]. -**Return Data Type** +**Return Value** -`DATETIME` +`INT64` **Examples** -```sql -SELECT - DATETIME "2008-12-25 15:30:00" as original, - DATETIME_TRUNC(DATETIME "2008-12-25 15:30:00", DAY) as truncated; - -/*----------------------------+------------------------* - | original | truncated | - +----------------------------+------------------------+ - | 2008-12-25 15:30:00 | 2008-12-25 00:00:00 | - *----------------------------+------------------------*/ -``` +In a table called `students`, check to see how many records would +exist in each `age_group` bucket, based on a student's age: -In the following example, the original `DATETIME` falls on a Sunday. Because the -`part` is `WEEK(MONDAY)`, `DATE_TRUNC` returns the `DATETIME` for the -preceding Monday. ++ age_group 0 (age < 10) ++ age_group 1 (age >= 10, age < 20) ++ age_group 2 (age >= 20, age < 30) ++ age_group 3 (age >= 30) ```sql -SELECT - datetime AS original, - DATETIME_TRUNC(datetime, WEEK(MONDAY)) AS truncated -FROM (SELECT DATETIME(TIMESTAMP "2017-11-05 00:00:00+00", "UTC") AS datetime); +WITH students AS +( + SELECT 9 AS age UNION ALL + SELECT 20 AS age UNION ALL + SELECT 25 AS age UNION ALL + SELECT 31 AS age UNION ALL + SELECT 32 AS age UNION ALL + SELECT 33 AS age +) +SELECT RANGE_BUCKET(age, [10, 20, 30]) AS age_group, COUNT(*) AS count +FROM students +GROUP BY 1 -/*---------------------+---------------------* - | original | truncated | - +---------------------+---------------------+ - | 2017-11-05 00:00:00 | 2017-10-30 00:00:00 | - *---------------------+---------------------*/ +/*--------------+-------* + | age_group | count | + +--------------+-------+ + | 0 | 1 | + | 2 | 2 | + | 3 | 3 | + *--------------+-------*/ ``` -In the following example, the original `datetime_expression` is in the Gregorian -calendar year 2015. However, `DATETIME_TRUNC` with the `ISOYEAR` date part -truncates the `datetime_expression` to the beginning of the ISO year, not the -Gregorian calendar year. The first Thursday of the 2015 calendar year was -2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29. -Therefore the ISO year boundary preceding the `datetime_expression` -2015-06-15 00:00:00 is 2014-12-29. +[data-type-properties]: https://github.com/google/zetasql/blob/master/docs/data-types.md#data_type_properties -```sql -SELECT - DATETIME_TRUNC('2015-06-15 00:00:00', ISOYEAR) AS isoyear_boundary, - EXTRACT(ISOYEAR FROM DATETIME '2015-06-15 00:00:00') AS isoyear_number; +### `ROUND` -/*---------------------+----------------* - | isoyear_boundary | isoyear_number | - +---------------------+----------------+ - | 2014-12-29 00:00:00 | 2015 | - *---------------------+----------------*/ ``` - -### `EXTRACT` - -```sql -EXTRACT(part FROM datetime_expression) +ROUND(X [, N]) ``` **Description** -Returns a value that corresponds to the -specified `part` from a supplied `datetime_expression`. - -Allowed `part` values are: - -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` -+ `DAYOFWEEK`: Returns values in the range [1,7] with Sunday as the first day of - of the week. -+ `DAY` -+ `DAYOFYEAR` -+ `WEEK`: Returns the week number of the date in the range [0, 53]. Weeks begin - with Sunday, and dates prior to the first Sunday of the year are in week 0. -+ `WEEK()`: Returns the week number of `datetime_expression` in the - range [0, 53]. Weeks begin on `WEEKDAY`. - `datetime`s prior to the first `WEEKDAY` of the year are in week 0. Valid - values for `WEEKDAY` are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, - `THURSDAY`, `FRIDAY`, and `SATURDAY`. -+ `ISOWEEK`: Returns the [ISO 8601 week][ISO-8601-week] - number of the `datetime_expression`. `ISOWEEK`s begin on Monday. Return values - are in the range [1, 53]. The first `ISOWEEK` of each ISO year begins on the - Monday before the first Thursday of the Gregorian calendar year. -+ `MONTH` -+ `QUARTER` -+ `YEAR` -+ `ISOYEAR`: Returns the [ISO 8601][ISO-8601] - week-numbering year, which is the Gregorian calendar year containing the - Thursday of the week to which `date_expression` belongs. -+ `DATE` -+ `TIME` +If only X is present, rounds X to the nearest integer. If N is present, +rounds X to N decimal places after the decimal point. If N is negative, +rounds off digits to the left of the decimal point. Rounds halfway cases +away from zero. Generates an error if overflow occurs. -Returned values truncate lower order time periods. For example, when extracting -seconds, `EXTRACT` truncates the millisecond and microsecond values. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExpressionReturn Value
ROUND(2.0)2.0
ROUND(2.3)2.0
ROUND(2.8)3.0
ROUND(2.5)3.0
ROUND(-2.3)-2.0
ROUND(-2.8)-3.0
ROUND(-2.5)-3.0
ROUND(0)0
ROUND(+inf)+inf
ROUND(-inf)-inf
ROUND(NaN)NaN
ROUND(123.7, -1)120.0
ROUND(1.235, 2)1.24
**Return Data Type** -`INT64`, except in the following cases: - -+ If `part` is `DATE`, returns a `DATE` object. -+ If `part` is `TIME`, returns a `TIME` object. + -**Examples** + + + + + + + + -In the following example, `EXTRACT` returns a value corresponding to the `HOUR` -time part. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-```sql -SELECT EXTRACT(HOUR FROM DATETIME(2008, 12, 25, 15, 30, 00)) as hour; +### `SAFE_ADD` -/*------------------* - | hour | - +------------------+ - | 15 | - *------------------*/ +``` +SAFE_ADD(X, Y) ``` -In the following example, `EXTRACT` returns values corresponding to different -time parts from a column of datetimes. - -```sql -WITH Datetimes AS ( - SELECT DATETIME '2005-01-03 12:34:56' AS datetime UNION ALL - SELECT DATETIME '2007-12-31' UNION ALL - SELECT DATETIME '2009-01-01' UNION ALL - SELECT DATETIME '2009-12-31' UNION ALL - SELECT DATETIME '2017-01-02' UNION ALL - SELECT DATETIME '2017-05-26' -) -SELECT - datetime, - EXTRACT(ISOYEAR FROM datetime) AS isoyear, - EXTRACT(ISOWEEK FROM datetime) AS isoweek, - EXTRACT(YEAR FROM datetime) AS year, - EXTRACT(WEEK FROM datetime) AS week -FROM Datetimes -ORDER BY datetime; +**Description** -/*---------------------+---------+---------+------+------* - | datetime | isoyear | isoweek | year | week | - +---------------------+---------+---------+------+------+ - | 2005-01-03 12:34:56 | 2005 | 1 | 2005 | 1 | - | 2007-12-31 00:00:00 | 2008 | 1 | 2007 | 52 | - | 2009-01-01 00:00:00 | 2009 | 1 | 2009 | 0 | - | 2009-12-31 00:00:00 | 2009 | 53 | 2009 | 52 | - | 2017-01-02 00:00:00 | 2017 | 1 | 2017 | 1 | - | 2017-05-26 00:00:00 | 2017 | 21 | 2017 | 21 | - *---------------------+---------+---------+------+------*/ -``` +Equivalent to the addition operator (`+`), but returns +`NULL` if overflow occurs. -In the following example, `datetime_expression` falls on a Sunday. `EXTRACT` -calculates the first column using weeks that begin on Sunday, and it calculates -the second column using weeks that begin on Monday. + + + + + + + + + + + + + + + +
XYSAFE_ADD(X, Y)
549
-```sql -WITH table AS (SELECT DATETIME(TIMESTAMP "2017-11-05 00:00:00+00", "UTC") AS datetime) -SELECT - datetime, - EXTRACT(WEEK(SUNDAY) FROM datetime) AS week_sunday, - EXTRACT(WEEK(MONDAY) FROM datetime) AS week_monday -FROM table; +**Return Data Type** -/*---------------------+-------------+---------------* - | datetime | week_sunday | week_monday | - +---------------------+-------------+---------------+ - | 2017-11-05 00:00:00 | 45 | 44 | - *---------------------+-------------+---------------*/ -``` + -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 + + + + + + + + + + + + + + + -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
INT32INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
INT64INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
-### `FORMAT_DATETIME` +### `SAFE_DIVIDE` -```sql -FORMAT_DATETIME(format_string, datetime_expression) +``` +SAFE_DIVIDE(X, Y) ``` **Description** -Formats a `DATETIME` object according to the specified `format_string`. See -[Supported Format Elements For DATETIME][datetime-format-elements] -for a list of format elements that this function supports. - -**Return Data Type** - -`STRING` +Equivalent to the division operator (`X / Y`), but returns +`NULL` if an error occurs, such as a division by zero error. -**Examples** + + + + + + + + + + + + + + + + + + + + + + + + + +
XYSAFE_DIVIDE(X, Y)
2045
0200
200NULL
-```sql -SELECT - FORMAT_DATETIME("%c", DATETIME "2008-12-25 15:30:00") - AS formatted; +**Return Data Type** -/*--------------------------* - | formatted | - +--------------------------+ - | Thu Dec 25 15:30:00 2008 | - *--------------------------*/ -``` + -```sql -SELECT - FORMAT_DATETIME("%b-%d-%Y", DATETIME "2008-12-25 15:30:00") - AS formatted; + + + + + + + + + + + + + + + -/*-------------* - | formatted | - +-------------+ - | Dec-25-2008 | - *-------------*/ -``` +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
INT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
INT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT32DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
UINT64DOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
-```sql -SELECT - FORMAT_DATETIME("%b %Y", DATETIME "2008-12-25 15:30:00") - AS formatted; +### `SAFE_MULTIPLY` -/*-------------* - | formatted | - +-------------+ - | Dec 2008 | - *-------------*/ ``` - -[datetime-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time - -### `LAST_DAY` - -```sql -LAST_DAY(datetime_expression[, date_part]) +SAFE_MULTIPLY(X, Y) ``` **Description** -Returns the last day from a datetime expression that contains the date. -This is commonly used to return the last day of the month. - -You can optionally specify the date part for which the last day is returned. -If this parameter is not used, the default value is `MONTH`. -`LAST_DAY` supports the following values for `date_part`: +Equivalent to the multiplication operator (`*`), but returns +`NULL` if overflow occurs. -+ `YEAR` -+ `QUARTER` -+ `MONTH` -+ `WEEK`. Equivalent to 7 `DAY`s. -+ `WEEK()`. `` represents the starting day of the week. - Valid values are `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, - `FRIDAY`, and `SATURDAY`. -+ `ISOWEEK`. Uses [ISO 8601][ISO-8601-week] week boundaries. ISO weeks begin - on Monday. -+ `ISOYEAR`. Uses the [ISO 8601][ISO-8601] week-numbering year boundary. - The ISO year boundary is the Monday of the first week whose Thursday belongs - to the corresponding Gregorian calendar year. + + + + + + + + + + + + + + + +
XYSAFE_MULTIPLY(X, Y)
20480
**Return Data Type** -`DATE` + -**Example** + + + + + + + + + + + + + + + -These both return the last day of the month: +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
INT32INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
INT64INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
UINT32INT64INT64UINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
UINT64ERRORERRORUINT64UINT64NUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
-```sql -SELECT LAST_DAY(DATETIME '2008-11-25', MONTH) AS last_day +### `SAFE_NEGATE` -/*------------* - | last_day | - +------------+ - | 2008-11-30 | - *------------*/ ``` - -```sql -SELECT LAST_DAY(DATETIME '2008-11-25') AS last_day - -/*------------* - | last_day | - +------------+ - | 2008-11-30 | - *------------*/ +SAFE_NEGATE(X) ``` -This returns the last day of the year: - -```sql -SELECT LAST_DAY(DATETIME '2008-11-25 15:30:00', YEAR) AS last_day - -/*------------* - | last_day | - +------------+ - | 2008-12-31 | - *------------*/ -``` +**Description** -This returns the last day of the week for a week that starts on a Sunday: +Equivalent to the unary minus operator (`-`), but returns +`NULL` if overflow occurs. -```sql -SELECT LAST_DAY(DATETIME '2008-11-10 15:30:00', WEEK(SUNDAY)) AS last_day - -/*------------* - | last_day | - +------------+ - | 2008-11-15 | - *------------*/ -``` - -This returns the last day of the week for a week that starts on a Monday: - -```sql -SELECT LAST_DAY(DATETIME '2008-11-10 15:30:00', WEEK(MONDAY)) AS last_day - -/*------------* - | last_day | - +------------+ - | 2008-11-16 | - *------------*/ -``` - -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 - -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date - -### `PARSE_DATETIME` - -```sql -PARSE_DATETIME(format_string, datetime_string) -``` -**Description** - -Converts a [string representation of a datetime][datetime-format] to a -`DATETIME` object. - -`format_string` contains the [format elements][datetime-format-elements] -that define how `datetime_string` is formatted. Each element in -`datetime_string` must have a corresponding element in `format_string`. The -location of each element in `format_string` must match the location of -each element in `datetime_string`. - -```sql --- This works because elements on both sides match. -SELECT PARSE_DATETIME("%a %b %e %I:%M:%S %Y", "Thu Dec 25 07:30:00 2008") - --- This produces an error because the year element is in different locations. -SELECT PARSE_DATETIME("%a %b %e %Y %I:%M:%S", "Thu Dec 25 07:30:00 2008") - --- This produces an error because one of the year elements is missing. -SELECT PARSE_DATETIME("%a %b %e %I:%M:%S", "Thu Dec 25 07:30:00 2008") - --- This works because %c can find all matching elements in datetime_string. -SELECT PARSE_DATETIME("%c", "Thu Dec 25 07:30:00 2008") -``` - -`PARSE_DATETIME` parses `string` according to the following rules: - -+ **Unspecified fields.** Any unspecified field is initialized from - `1970-01-01 00:00:00.0`. For example, if the year is unspecified then it - defaults to `1970`. -+ **Case insensitivity.** Names, such as `Monday` and `February`, - are case insensitive. -+ **Whitespace.** One or more consecutive white spaces in the format string - matches zero or more consecutive white spaces in the - `DATETIME` string. Leading and trailing - white spaces in the `DATETIME` string are always - allowed, even if they are not in the format string. -+ **Format precedence.** When two or more format elements have overlapping - information, the last one generally overrides any earlier ones, with some - exceptions. For example, both `%F` and `%Y` affect the year, so the earlier - element overrides the later. See the descriptions - of `%s`, `%C`, and `%y` in - [Supported Format Elements For DATETIME][datetime-format-elements]. -+ **Format divergence.** `%p` can be used with `am`, `AM`, `pm`, and `PM`. + + + + + + + + + + + + + + + + + + + + + +
XSAFE_NEGATE(X)
+1-1
-1+1
00
**Return Data Type** -`DATETIME` - -**Examples** - -The following examples parse a `STRING` literal as a -`DATETIME`. + -```sql -SELECT PARSE_DATETIME('%Y-%m-%d %H:%M:%S', '1998-10-18 13:45:55') AS datetime; + + + + + + + + -/*---------------------* - | datetime | - +---------------------+ - | 1998-10-18 13:45:55 | - *---------------------*/ -``` +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTINT32INT64ERRORERRORNUMERICBIGNUMERICFLOATDOUBLE
-```sql -SELECT PARSE_DATETIME('%m/%d/%Y %I:%M:%S %p', '8/30/2018 2:23:38 pm') AS datetime +### `SAFE_SUBTRACT` -/*---------------------* - | datetime | - +---------------------+ - | 2018-08-30 14:23:38 | - *---------------------*/ ``` - -The following example parses a `STRING` literal -containing a date in a natural language format as a -`DATETIME`. - -```sql -SELECT PARSE_DATETIME('%A, %B %e, %Y','Wednesday, December 19, 2018') - AS datetime; - -/*---------------------* - | datetime | - +---------------------+ - | 2018-12-19 00:00:00 | - *---------------------*/ +SAFE_SUBTRACT(X, Y) ``` -[datetime-format]: #format_datetime - -[datetime-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time - -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 - -## Time functions - -ZetaSQL supports the following time functions. +**Description** -### Function list +Returns the result of Y subtracted from X. +Equivalent to the subtraction operator (`-`), but returns +`NULL` if overflow occurs. - - + + + + + + + + + +
NameSummaryXYSAFE_SUBTRACT(X, Y)
541
- - CURRENT_TIME - - - - Returns the current time as a TIME value. - - - - - EXTRACT +**Return Data Type** - - - Extracts part of a TIME value. - - + + - - + + + + + + + + + + + + - - - - +
FORMAT_TIME - - - Formats a TIME value according to the specified format string. - INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
INT32INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
INT64INT64INT64INT64ERRORNUMERICBIGNUMERICDOUBLEDOUBLE
UINT32INT64INT64INT64INT64NUMERICBIGNUMERICDOUBLEDOUBLE
UINT64ERRORERRORINT64INT64NUMERICBIGNUMERICDOUBLEDOUBLE
NUMERICNUMERICNUMERICNUMERICNUMERICNUMERICBIGNUMERICDOUBLEDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICBIGNUMERICDOUBLEDOUBLE
FLOATDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLEDOUBLE
PARSE_TIME - - - Converts a STRING value to a TIME value. -
- - TIME +### `SEC` - - - Constructs a TIME value. - - +``` +SEC(X) +``` - - TIME_ADD +**Description** - - - Adds a specified time interval to a TIME value. - - +Computes the secant for the angle of `X`, where `X` is specified in radians. +`X` can be any data type +that [coerces to `DOUBLE`][conversion-rules]. - - TIME_DIFF + + + + + + + + + + + + + + + + + + + + + + + + + +
XSEC(X)
+infNaN
-infNaN
NaNNaN
NULLNULL
- - - Gets the number of intervals between two TIME values. - - +**Return Data Type** - - TIME_SUB +`DOUBLE` - - - Subtracts a specified time interval from a TIME value. - - +**Example** - - TIME_TRUNC +```sql +SELECT SEC(100) AS a, SEC(-1) AS b; - - - Truncates a TIME value. - - +/*----------------+---------------* + | a | b | + +----------------+---------------+ + | 1.159663822905 | 1.85081571768 | + *----------------+---------------*/ +``` - - +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules -### `CURRENT_TIME` +### `SECH` -```sql -CURRENT_TIME([time_zone]) ``` - -```sql -CURRENT_TIME +SECH(X) ``` **Description** -Returns the current time as a `TIME` object. Parentheses are optional when -called with no arguments. - -This function supports an optional `time_zone` parameter. -See [Time zone definitions][time-link-to-timezone-definitions] for information -on how to specify a time zone. +Computes the hyperbolic secant for the angle of `X`, where `X` is specified +in radians. `X` can be any data type +that [coerces to `DOUBLE`][conversion-rules]. +Never produces an error. -The current time is recorded at the start of the query -statement which contains this function, not when this specific function is -evaluated. + + + + + + + + + + + + + + + + + + + + + + + + + +
XSECH(X)
+inf0
-inf0
NaNNaN
NULLNULL
**Return Data Type** -`TIME` +`DOUBLE` **Example** ```sql -SELECT CURRENT_TIME() as now; +SELECT SECH(0.5) AS a, SECH(-2) AS b, SECH(100) AS c; -/*----------------------------* - | now | - +----------------------------+ - | 15:31:38.776361 | - *----------------------------*/ +/*----------------+----------------+---------------------* + | a | b | c | + +----------------+----------------+---------------------+ + | 0.88681888397 | 0.265802228834 | 7.4401519520417E-44 | + *----------------+----------------+---------------------*/ ``` -When a column named `current_time` is present, the column name and the function -call without parentheses are ambiguous. To ensure the function call, add -parentheses; to ensure the column name, qualify it with its -[range variable][time-functions-link-to-range-variables]. For example, the -following query will select the function in the `now` column and the table -column in the `current_time` column. +[conversion-rules]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#conversion_rules -```sql -WITH t AS (SELECT 'column value' AS `current_time`) -SELECT current_time() as now, t.current_time FROM t; +### `SIGN` -/*-----------------+--------------* - | now | current_time | - +-----------------+--------------+ - | 15:31:38.776361 | column value | - *-----------------+--------------*/ ``` - -[time-functions-link-to-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables - -[time-link-to-timezone-definitions]: #timezone_definitions - -### `EXTRACT` - -```sql -EXTRACT(part FROM time_expression) +SIGN(X) ``` **Description** -Returns a value that corresponds to the specified `part` from -a supplied `time_expression`. - -Allowed `part` values are: - -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` +Returns `-1`, `0`, or `+1` for negative, zero and positive arguments +respectively. For floating point arguments, this function does not distinguish +between positive and negative zero. -Returned values truncate lower order time periods. For example, when extracting -seconds, `EXTRACT` truncates the millisecond and microsecond values. + + + + + + + + + + + + + + + + + + + + + + + + + +
XSIGN(X)
25+1
00
-25-1
NaNNaN
**Return Data Type** -`INT64` + -**Example** + + + + + + + + -In the following example, `EXTRACT` returns a value corresponding to the `HOUR` -time part. +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
-```sql -SELECT EXTRACT(HOUR FROM TIME "15:30:00") as hour; +### `SIN` -/*------------------* - | hour | - +------------------+ - | 15 | - *------------------*/ ``` - -### `FORMAT_TIME` - -```sql -FORMAT_TIME(format_string, time_object) +SIN(X) ``` **Description** -Formats a `TIME` object according to the specified `format_string`. See -[Supported Format Elements For TIME][time-format-elements] -for a list of format elements that this function supports. - -**Return Data Type** -`STRING` +Computes the sine of X where X is specified in radians. Never fails. -**Example** + + + + + + + + + + + + + + + + + + + + + +
XSIN(X)
+infNaN
-infNaN
NaNNaN
-```sql -SELECT FORMAT_TIME("%R", TIME "15:30:00") as formatted_time; +### `SINH` -/*----------------* - | formatted_time | - +----------------+ - | 15:30 | - *----------------*/ ``` - -[time-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time - -### `PARSE_TIME` - -```sql -PARSE_TIME(format_string, time_string) +SINH(X) ``` **Description** -Converts a [string representation of time][time-format] to a -`TIME` object. - -`format_string` contains the [format elements][time-format-elements] -that define how `time_string` is formatted. Each element in -`time_string` must have a corresponding element in `format_string`. The -location of each element in `format_string` must match the location of -each element in `time_string`. +Computes the hyperbolic sine of X where X is specified in radians. Generates +an error if overflow occurs. -```sql --- This works because elements on both sides match. -SELECT PARSE_TIME("%I:%M:%S", "07:30:00") - --- This produces an error because the seconds element is in different locations. -SELECT PARSE_TIME("%S:%I:%M", "07:30:00") - --- This produces an error because one of the seconds elements is missing. -SELECT PARSE_TIME("%I:%M", "07:30:00") - --- This works because %T can find all matching elements in time_string. -SELECT PARSE_TIME("%T", "07:30:00") -``` - -When using `PARSE_TIME`, keep the following in mind: - -+ **Unspecified fields.** Any unspecified field is initialized from - `00:00:00.0`. For instance, if `seconds` is unspecified then it - defaults to `00`, and so on. -+ **Whitespace.** One or more consecutive white spaces in the format string - matches zero or more consecutive white spaces in the `TIME` string. In - addition, leading and trailing white spaces in the `TIME` string are always - allowed, even if they are not in the format string. -+ **Format precedence.** When two (or more) format elements have overlapping - information, the last one generally overrides any earlier ones. -+ **Format divergence.** `%p` can be used with `am`, `AM`, `pm`, and `PM`. - -**Return Data Type** - -`TIME` - -**Example** - -```sql -SELECT PARSE_TIME("%H", "15") as parsed_time; - -/*-------------* - | parsed_time | - +-------------+ - | 15:00:00 | - *-------------*/ -``` - -```sql -SELECT PARSE_TIME('%I:%M:%S %p', '2:23:38 pm') AS parsed_time - -/*-------------* - | parsed_time | - +-------------+ - | 14:23:38 | - *-------------*/ -``` - -[time-format]: #format_time - -[time-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time - -### `TIME` - -```sql -1. TIME(hour, minute, second) -2. TIME(timestamp, [time_zone]) -3. TIME(datetime) -``` - -**Description** - -1. Constructs a `TIME` object using `INT64` - values representing the hour, minute, and second. -2. Constructs a `TIME` object using a `TIMESTAMP` object. It supports an - optional - parameter to [specify a time zone][time-link-to-timezone-definitions]. If no - time zone is specified, the default time zone, which is implementation defined, is - used. -3. Constructs a `TIME` object using a - `DATETIME` object. - -**Return Data Type** - -`TIME` - -**Example** - -```sql -SELECT - TIME(15, 30, 00) as time_hms, - TIME(TIMESTAMP "2008-12-25 15:30:00+08", "America/Los_Angeles") as time_tstz; - -/*----------+-----------* - | time_hms | time_tstz | - +----------+-----------+ - | 15:30:00 | 23:30:00 | - *----------+-----------*/ -``` + + + + + + + + + + + + + + + + + + + + + +
XSINH(X)
+inf+inf
-inf-inf
NaNNaN
-```sql -SELECT TIME(DATETIME "2008-12-25 15:30:00.000000") AS time_dt; +### `SQRT` -/*----------* - | time_dt | - +----------+ - | 15:30:00 | - *----------*/ ``` - -[time-link-to-timezone-definitions]: #timezone_definitions - -### `TIME_ADD` - -```sql -TIME_ADD(time_expression, INTERVAL int64_expression part) +SQRT(X) ``` **Description** -Adds `int64_expression` units of `part` to the `TIME` object. - -`TIME_ADD` supports the following values for `part`: +Computes the square root of X. Generates an error if X is less than 0. -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` + + + + + + + + + + + + + + + + + + + + + +
XSQRT(X)
25.05.0
+inf+inf
X < 0Error
-This function automatically adjusts when values fall outside of the 00:00:00 to -24:00:00 boundary. For example, if you add an hour to `23:30:00`, the returned -value is `00:30:00`. +**Return Data Type** -**Return Data Types** + -`TIME` + + + + + + + + -**Example** +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-```sql -SELECT - TIME "15:30:00" as original_time, - TIME_ADD(TIME "15:30:00", INTERVAL 10 MINUTE) as later; +### `TAN` -/*-----------------------------+------------------------* - | original_time | later | - +-----------------------------+------------------------+ - | 15:30:00 | 15:40:00 | - *-----------------------------+------------------------*/ ``` - -### `TIME_DIFF` - -```sql -TIME_DIFF(time_expression_a, time_expression_b, part) +TAN(X) ``` **Description** -Returns the whole number of specified `part` intervals between two -`TIME` objects (`time_expression_a` - `time_expression_b`). If the first -`TIME` is earlier than the second one, the output is negative. Throws an error -if the computation overflows the result type, such as if the difference in -nanoseconds -between the two `TIME` objects would overflow an -`INT64` value. - -`TIME_DIFF` supports the following values for `part`: - -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` - -**Return Data Type** - -`INT64` +Computes the tangent of X where X is specified in radians. Generates an error if +overflow occurs. -**Example** + + + + + + + + + + + + + + + + + + + + + +
XTAN(X)
+infNaN
-infNaN
NaNNaN
-```sql -SELECT - TIME "15:30:00" as first_time, - TIME "14:35:00" as second_time, - TIME_DIFF(TIME "15:30:00", TIME "14:35:00", MINUTE) as difference; +### `TANH` -/*----------------------------+------------------------+------------------------* - | first_time | second_time | difference | - +----------------------------+------------------------+------------------------+ - | 15:30:00 | 14:35:00 | 55 | - *----------------------------+------------------------+------------------------*/ ``` - -### `TIME_SUB` - -```sql -TIME_SUB(time_expression, INTERVAL int64_expression part) +TANH(X) ``` **Description** -Subtracts `int64_expression` units of `part` from the `TIME` object. - -`TIME_SUB` supports the following values for `part`: - -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` - -This function automatically adjusts when values fall outside of the 00:00:00 to -24:00:00 boundary. For example, if you subtract an hour from `00:30:00`, the -returned value is `23:30:00`. - -**Return Data Type** - -`TIME` +Computes the hyperbolic tangent of X where X is specified in radians. Does not +fail. -**Example** + + + + + + + + + + + + + + + + + + + + + +
XTANH(X)
+inf1.0
-inf-1.0
NaNNaN
-```sql -SELECT - TIME "15:30:00" as original_date, - TIME_SUB(TIME "15:30:00", INTERVAL 10 MINUTE) as earlier; +### `TRUNC` -/*-----------------------------+------------------------* - | original_date | earlier | - +-----------------------------+------------------------+ - | 15:30:00 | 15:20:00 | - *-----------------------------+------------------------*/ ``` - -### `TIME_TRUNC` - -```sql -TIME_TRUNC(time_expression, time_part) +TRUNC(X [, N]) ``` **Description** -Truncates a `TIME` value to the granularity of `time_part`. The `TIME` value -is always rounded to the beginning of `time_part`, which can be one of the -following: +If only X is present, `TRUNC` rounds X to the nearest integer whose absolute +value is not greater than the absolute value of X. If N is also present, `TRUNC` +behaves like `ROUND(X, N)`, but always rounds towards zero and never overflows. -+ `NANOSECOND`: If used, nothing is truncated from the value. -+ `MICROSECOND`: The nearest lessor or equal microsecond. -+ `MILLISECOND`: The nearest lessor or equal millisecond. -+ `SECOND`: The nearest lessor or equal second. -+ `MINUTE`: The nearest lessor or equal minute. -+ `HOUR`: The nearest lessor or equal hour. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
XTRUNC(X)
2.02.0
2.32.0
2.82.0
2.52.0
-2.3-2.0
-2.8-2.0
-2.5-2.0
00
+inf+inf
-inf-inf
NaNNaN
**Return Data Type** -`TIME` - -**Example** - -```sql -SELECT - TIME "15:30:00" as original, - TIME_TRUNC(TIME "15:30:00", HOUR) as truncated; + -/*----------------------------+------------------------* - | original | truncated | - +----------------------------+------------------------+ - | 15:30:00 | 15:00:00 | - *----------------------------+------------------------*/ -``` + + + + + + + + -[time-to-string]: #cast +
INPUTINT32INT64UINT32UINT64NUMERICBIGNUMERICFLOATDOUBLE
OUTPUTDOUBLEDOUBLEDOUBLEDOUBLENUMERICBIGNUMERICDOUBLEDOUBLE
-## Timestamp functions +## Navigation functions -ZetaSQL supports the following timestamp functions. +ZetaSQL supports navigation functions. +Navigation functions are a subset of window functions. To create a +window function call and learn about the syntax for window functions, +see [Window function_calls][window-function-calls]. -IMPORTANT: Before working with these functions, you need to understand -the difference between the formats in which timestamps are stored and displayed, -and how time zones are used for the conversion between these formats. -To learn more, see -[How time zones work with timestamp functions][timestamp-link-to-timezone-definitions]. +Navigation functions generally compute some +`value_expression` over a different row in the window frame from the +current row. The `OVER` clause syntax varies across navigation functions. -NOTE: These functions return a runtime error if overflow occurs; result -values are bounded by the defined [`DATE` range][data-types-link-to-date_type] -and [`TIMESTAMP` range][data-types-link-to-timestamp_type]. +For all navigation functions, the result data type is the same type as +`value_expression`. ### Function list @@ -29674,1725 +27604,1765 @@ and [`TIMESTAMP` range][data-types-link-to-timestamp_type]. - CURRENT_TIMESTAMP + FIRST_VALUE - Returns the current date and time as a TIMESTAMP object. + Gets a value for the first row in the current window frame. - EXTRACT + LAG - Extracts part of a TIMESTAMP value. + Gets a value for a preceding row. - FORMAT_TIMESTAMP + LAST_VALUE - Formats a TIMESTAMP value according to the specified - format string. + Gets a value for the last row in the current window frame. - PARSE_TIMESTAMP + LEAD - Converts a STRING value to a TIMESTAMP value. + Gets a value for a subsequent row. - STRING + NTH_VALUE - Converts a TIMESTAMP value to a STRING value. + Gets a value for the Nth row of the current window frame. - TIMESTAMP + PERCENTILE_CONT - Constructs a TIMESTAMP value. + Computes the specified percentile for a value, using + linear interpolation. - TIMESTAMP_ADD + PERCENTILE_DISC - Adds a specified time interval to a TIMESTAMP value. + Computes the specified percentile for a discrete value. - - TIMESTAMP_DIFF - - - - Gets the number of intervals between two TIMESTAMP values. - - + + - - TIMESTAMP_FROM_UNIX_MICROS +### `FIRST_VALUE` - - - Similar to TIMESTAMP_MICROS, except that additionally, a - TIMESTAMP value can be passed in. - - +```sql +FIRST_VALUE (value_expression [{RESPECT | IGNORE} NULLS]) +OVER over_clause - - TIMESTAMP_FROM_UNIX_MILLIS +over_clause: + { named_window | ( [ window_specification ] ) } - - - Similar to TIMESTAMP_MILLIS, except that additionally, a - TIMESTAMP value can be passed in. - - +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] + [ window_frame_clause ] - - TIMESTAMP_FROM_UNIX_SECONDS +``` - - - Similar to TIMESTAMP_SECONDS, except that additionally, a - TIMESTAMP value can be passed in. - - +**Description** - - TIMESTAMP_MICROS +Returns the value of the `value_expression` for the first row in the current +window frame. - - - Converts the number of microseconds since - 1970-01-01 00:00:00 UTC to a TIMESTAMP
. - - - - - TIMESTAMP_MILLIS +This function includes `NULL` values in the calculation unless `IGNORE NULLS` is +present. If `IGNORE NULLS` is present, the function excludes `NULL` values from +the calculation. - - - Converts the number of milliseconds since - 1970-01-01 00:00:00 UTC to a TIMESTAMP
. - - +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. - - TIMESTAMP_SECONDS + - - - Converts the number of seconds since - 1970-01-01 00:00:00 UTC to a TIMESTAMP. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - TIMESTAMP_SUB + - - - Subtracts a specified time interval from a TIMESTAMP value. - - +**Supported Argument Types** - - TIMESTAMP_TRUNC +`value_expression` can be any data type that an expression can return. - - - Truncates a TIMESTAMP value. - - +**Return Data Type** - - UNIX_MICROS +Same type as `value_expression`. - - - Converts a TIMESTAMP value to the number of microseconds since - 1970-01-01 00:00:00 UTC. - - +**Examples** - - UNIX_MILLIS +The following example computes the fastest time for each division. - - - Converts a TIMESTAMP value to the number of milliseconds - since 1970-01-01 00:00:00 UTC. - - +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + FORMAT_TIMESTAMP('%X', finish_time) AS finish_time, + division, + FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time, + TIMESTAMP_DIFF(finish_time, fastest_time, SECOND) AS delta_in_seconds +FROM ( + SELECT name, + finish_time, + division, + FIRST_VALUE(finish_time) + OVER (PARTITION BY division ORDER BY finish_time ASC + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS fastest_time + FROM finishers); - - UNIX_SECONDS +/*-----------------+-------------+----------+--------------+------------------* + | name | finish_time | division | fastest_time | delta_in_seconds | + +-----------------+-------------+----------+--------------+------------------+ + | Carly Forte | 03:08:58 | F25-29 | 03:08:58 | 0 | + | Sophia Liu | 02:51:45 | F30-34 | 02:51:45 | 0 | + | Nikki Leith | 02:59:01 | F30-34 | 02:51:45 | 436 | + | Jen Edwards | 03:06:36 | F30-34 | 02:51:45 | 891 | + | Meghan Lederer | 03:07:41 | F30-34 | 02:51:45 | 956 | + | Lauren Reasoner | 03:10:14 | F30-34 | 02:51:45 | 1109 | + | Lisa Stelzner | 02:54:11 | F35-39 | 02:54:11 | 0 | + | Lauren Matthews | 03:01:17 | F35-39 | 02:54:11 | 426 | + | Desiree Berry | 03:05:42 | F35-39 | 02:54:11 | 691 | + | Suzy Slane | 03:06:24 | F35-39 | 02:54:11 | 733 | + *-----------------+-------------+----------+--------------+------------------*/ +``` - - - Converts a TIMESTAMP value to the number of seconds since - 1970-01-01 00:00:00 UTC. - - +### `LAG` - - +```sql +LAG (value_expression[, offset [, default_expression]]) +OVER over_clause -### `CURRENT_TIMESTAMP` +over_clause: + { named_window | ( [ window_specification ] ) } -```sql -CURRENT_TIMESTAMP() -``` +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] -```sql -CURRENT_TIMESTAMP ``` **Description** -Returns the current date and time as a timestamp object. The timestamp is -continuous, non-ambiguous, has exactly 60 seconds per minute and does not repeat -values over the leap second. Parentheses are optional. +Returns the value of the `value_expression` on a preceding row. Changing the +`offset` value changes which preceding row is returned; the default value is +`1`, indicating the previous row in the window frame. An error occurs if +`offset` is NULL or a negative value. -This function handles leap seconds by smearing them across a window of 20 hours -around the inserted leap second. +The optional `default_expression` is used if there isn't a row in the window +frame at the specified offset. This expression must be a constant expression and +its type must be implicitly coercible to the type of `value_expression`. If left +unspecified, `default_expression` defaults to NULL. -The current date and time is recorded at the start of the query -statement which contains this function, not when this specific function is -evaluated. +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. -**Supported Input Types** + -Not applicable +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md -**Result Data Type** + -`TIMESTAMP` +**Supported Argument Types** + ++ `value_expression` can be any data type that can be returned from an + expression. ++ `offset` must be a non-negative integer literal or parameter. ++ `default_expression` must be compatible with the value expression type. + +**Return Data Type** + +Same type as `value_expression`. **Examples** +The following example illustrates a basic use of the `LAG` function. + ```sql -SELECT CURRENT_TIMESTAMP() AS now; +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + finish_time, + division, + LAG(name) + OVER (PARTITION BY division ORDER BY finish_time ASC) AS preceding_runner +FROM finishers; -/*---------------------------------------------* - | now | - +---------------------------------------------+ - | 2020-06-02 17:00:53.110 America/Los_Angeles | - *---------------------------------------------*/ +/*-----------------+-------------+----------+------------------* + | name | finish_time | division | preceding_runner | + +-----------------+-------------+----------+------------------+ + | Carly Forte | 03:08:58 | F25-29 | NULL | + | Sophia Liu | 02:51:45 | F30-34 | NULL | + | Nikki Leith | 02:59:01 | F30-34 | Sophia Liu | + | Jen Edwards | 03:06:36 | F30-34 | Nikki Leith | + | Meghan Lederer | 03:07:41 | F30-34 | Jen Edwards | + | Lauren Reasoner | 03:10:14 | F30-34 | Meghan Lederer | + | Lisa Stelzner | 02:54:11 | F35-39 | NULL | + | Lauren Matthews | 03:01:17 | F35-39 | Lisa Stelzner | + | Desiree Berry | 03:05:42 | F35-39 | Lauren Matthews | + | Suzy Slane | 03:06:24 | F35-39 | Desiree Berry | + *-----------------+-------------+----------+------------------*/ ``` -When a column named `current_timestamp` is present, the column name and the -function call without parentheses are ambiguous. To ensure the function call, -add parentheses; to ensure the column name, qualify it with its -[range variable][timestamp-functions-link-to-range-variables]. For example, the -following query selects the function in the `now` column and the table -column in the `current_timestamp` column. +This next example uses the optional `offset` parameter. ```sql -WITH t AS (SELECT 'column value' AS `current_timestamp`) -SELECT current_timestamp() AS now, t.current_timestamp FROM t; +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + finish_time, + division, + LAG(name, 2) + OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_ahead +FROM finishers; -/*---------------------------------------------+-------------------* - | now | current_timestamp | - +---------------------------------------------+-------------------+ - | 2020-06-02 17:00:53.110 America/Los_Angeles | column value | - *---------------------------------------------+-------------------*/ +/*-----------------+-------------+----------+-------------------* + | name | finish_time | division | two_runners_ahead | + +-----------------+-------------+----------+-------------------+ + | Carly Forte | 03:08:58 | F25-29 | NULL | + | Sophia Liu | 02:51:45 | F30-34 | NULL | + | Nikki Leith | 02:59:01 | F30-34 | NULL | + | Jen Edwards | 03:06:36 | F30-34 | Sophia Liu | + | Meghan Lederer | 03:07:41 | F30-34 | Nikki Leith | + | Lauren Reasoner | 03:10:14 | F30-34 | Jen Edwards | + | Lisa Stelzner | 02:54:11 | F35-39 | NULL | + | Lauren Matthews | 03:01:17 | F35-39 | NULL | + | Desiree Berry | 03:05:42 | F35-39 | Lisa Stelzner | + | Suzy Slane | 03:06:24 | F35-39 | Lauren Matthews | + *-----------------+-------------+----------+-------------------*/ ``` -[timestamp-functions-link-to-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables +The following example replaces NULL values with a default value. -### `EXTRACT` +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + finish_time, + division, + LAG(name, 2, 'Nobody') + OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_ahead +FROM finishers; + +/*-----------------+-------------+----------+-------------------* + | name | finish_time | division | two_runners_ahead | + +-----------------+-------------+----------+-------------------+ + | Carly Forte | 03:08:58 | F25-29 | Nobody | + | Sophia Liu | 02:51:45 | F30-34 | Nobody | + | Nikki Leith | 02:59:01 | F30-34 | Nobody | + | Jen Edwards | 03:06:36 | F30-34 | Sophia Liu | + | Meghan Lederer | 03:07:41 | F30-34 | Nikki Leith | + | Lauren Reasoner | 03:10:14 | F30-34 | Jen Edwards | + | Lisa Stelzner | 02:54:11 | F35-39 | Nobody | + | Lauren Matthews | 03:01:17 | F35-39 | Nobody | + | Desiree Berry | 03:05:42 | F35-39 | Lisa Stelzner | + | Suzy Slane | 03:06:24 | F35-39 | Lauren Matthews | + *-----------------+-------------+----------+-------------------*/ +``` + +### `LAST_VALUE` ```sql -EXTRACT(part FROM timestamp_expression [AT TIME ZONE time_zone]) +LAST_VALUE (value_expression [{RESPECT | IGNORE} NULLS]) +OVER over_clause + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] + [ window_frame_clause ] + ``` **Description** -Returns a value that corresponds to the specified `part` from -a supplied `timestamp_expression`. This function supports an optional -`time_zone` parameter. See -[Time zone definitions][timestamp-link-to-timezone-definitions] for information -on how to specify a time zone. +Returns the value of the `value_expression` for the last row in the current +window frame. -Allowed `part` values are: +This function includes `NULL` values in the calculation unless `IGNORE NULLS` is +present. If `IGNORE NULLS` is present, the function excludes `NULL` values from +the calculation. -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR` -+ `DAYOFWEEK`: Returns values in the range [1,7] with Sunday as the first day of - of the week. -+ `DAY` -+ `DAYOFYEAR` -+ `WEEK`: Returns the week number of the date in the range [0, 53]. Weeks begin - with Sunday, and dates prior to the first Sunday of the year are in week 0. -+ `WEEK()`: Returns the week number of `timestamp_expression` in the - range [0, 53]. Weeks begin on `WEEKDAY`. `datetime`s prior to the first - `WEEKDAY` of the year are in week 0. Valid values for `WEEKDAY` are `SUNDAY`, - `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, and `SATURDAY`. -+ `ISOWEEK`: Returns the [ISO 8601 week][ISO-8601-week] - number of the `datetime_expression`. `ISOWEEK`s begin on Monday. Return values - are in the range [1, 53]. The first `ISOWEEK` of each ISO year begins on the - Monday before the first Thursday of the Gregorian calendar year. -+ `MONTH` -+ `QUARTER` -+ `YEAR` -+ `ISOYEAR`: Returns the [ISO 8601][ISO-8601] - week-numbering year, which is the Gregorian calendar year containing the - Thursday of the week to which `date_expression` belongs. -+ `DATE` -+ DATETIME -+ TIME +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. -Returned values truncate lower order time periods. For example, when extracting -seconds, `EXTRACT` truncates the millisecond and microsecond values. + -**Return Data Type** +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md -`INT64`, except in the following cases: + -+ If `part` is `DATE`, the function returns a `DATE` object. +**Supported Argument Types** -**Examples** +`value_expression` can be any data type that an expression can return. -In the following example, `EXTRACT` returns a value corresponding to the `DAY` -time part. +**Return Data Type** -```sql -WITH Input AS (SELECT TIMESTAMP("2008-12-25 05:30:00+00") AS timestamp_value) -SELECT - EXTRACT(DAY FROM timestamp_value AT TIME ZONE "UTC") AS the_day_utc, - EXTRACT(DAY FROM timestamp_value AT TIME ZONE "America/Los_Angeles") AS the_day_california -FROM Input +Same type as `value_expression`. -/*-------------+--------------------* - | the_day_utc | the_day_california | - +-------------+--------------------+ - | 25 | 24 | - *-------------+--------------------*/ -``` +**Examples** -In the following example, `EXTRACT` returns values corresponding to different -time parts from a column of type `TIMESTAMP`. +The following example computes the slowest time for each division. ```sql -WITH Timestamps AS ( - SELECT TIMESTAMP("2005-01-03 12:34:56+00") AS timestamp_value UNION ALL - SELECT TIMESTAMP("2007-12-31 12:00:00+00") UNION ALL - SELECT TIMESTAMP("2009-01-01 12:00:00+00") UNION ALL - SELECT TIMESTAMP("2009-12-31 12:00:00+00") UNION ALL - SELECT TIMESTAMP("2017-01-02 12:00:00+00") UNION ALL - SELECT TIMESTAMP("2017-05-26 12:00:00+00") -) -SELECT - timestamp_value, - EXTRACT(ISOYEAR FROM timestamp_value) AS isoyear, - EXTRACT(ISOWEEK FROM timestamp_value) AS isoweek, - EXTRACT(YEAR FROM timestamp_value) AS year, - EXTRACT(WEEK FROM timestamp_value) AS week -FROM Timestamps -ORDER BY timestamp_value; +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + FORMAT_TIMESTAMP('%X', finish_time) AS finish_time, + division, + FORMAT_TIMESTAMP('%X', slowest_time) AS slowest_time, + TIMESTAMP_DIFF(slowest_time, finish_time, SECOND) AS delta_in_seconds +FROM ( + SELECT name, + finish_time, + division, + LAST_VALUE(finish_time) + OVER (PARTITION BY division ORDER BY finish_time ASC + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS slowest_time + FROM finishers); --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+---------+---------+------+------* - | timestamp_value | isoyear | isoweek | year | week | - +---------------------------------------------+---------+---------+------+------+ - | 2005-01-03 04:34:56.000 America/Los_Angeles | 2005 | 1 | 2005 | 1 | - | 2007-12-31 04:00:00.000 America/Los_Angeles | 2008 | 1 | 2007 | 52 | - | 2009-01-01 04:00:00.000 America/Los_Angeles | 2009 | 1 | 2009 | 0 | - | 2009-12-31 04:00:00.000 America/Los_Angeles | 2009 | 53 | 2009 | 52 | - | 2017-01-02 04:00:00.000 America/Los_Angeles | 2017 | 1 | 2017 | 1 | - | 2017-05-26 05:00:00.000 America/Los_Angeles | 2017 | 21 | 2017 | 21 | - *---------------------------------------------+---------+---------+------+------*/ +/*-----------------+-------------+----------+--------------+------------------* + | name | finish_time | division | slowest_time | delta_in_seconds | + +-----------------+-------------+----------+--------------+------------------+ + | Carly Forte | 03:08:58 | F25-29 | 03:08:58 | 0 | + | Sophia Liu | 02:51:45 | F30-34 | 03:10:14 | 1109 | + | Nikki Leith | 02:59:01 | F30-34 | 03:10:14 | 673 | + | Jen Edwards | 03:06:36 | F30-34 | 03:10:14 | 218 | + | Meghan Lederer | 03:07:41 | F30-34 | 03:10:14 | 153 | + | Lauren Reasoner | 03:10:14 | F30-34 | 03:10:14 | 0 | + | Lisa Stelzner | 02:54:11 | F35-39 | 03:06:24 | 733 | + | Lauren Matthews | 03:01:17 | F35-39 | 03:06:24 | 307 | + | Desiree Berry | 03:05:42 | F35-39 | 03:06:24 | 42 | + | Suzy Slane | 03:06:24 | F35-39 | 03:06:24 | 0 | + *-----------------+-------------+----------+--------------+------------------*/ ``` -In the following example, `timestamp_expression` falls on a Monday. `EXTRACT` -calculates the first column using weeks that begin on Sunday, and it calculates -the second column using weeks that begin on Monday. +### `LEAD` ```sql -WITH table AS (SELECT TIMESTAMP("2017-11-06 00:00:00+00") AS timestamp_value) -SELECT - timestamp_value, - EXTRACT(WEEK(SUNDAY) FROM timestamp_value) AS week_sunday, - EXTRACT(WEEK(MONDAY) FROM timestamp_value) AS week_monday -FROM table; +LEAD (value_expression[, offset [, default_expression]]) +OVER over_clause + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+-------------+---------------* - | timestamp_value | week_sunday | week_monday | - +---------------------------------------------+-------------+---------------+ - | 2017-11-05 16:00:00.000 America/Los_Angeles | 45 | 44 | - *---------------------------------------------+-------------+---------------*/ ``` -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 +**Description** -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date +Returns the value of the `value_expression` on a subsequent row. Changing the +`offset` value changes which subsequent row is returned; the default value is +`1`, indicating the next row in the window frame. An error occurs if `offset` is +NULL or a negative value. -[timestamp-link-to-timezone-definitions]: #timezone_definitions +The optional `default_expression` is used if there isn't a row in the window +frame at the specified offset. This expression must be a constant expression and +its type must be implicitly coercible to the type of `value_expression`. If left +unspecified, `default_expression` defaults to NULL. -### `FORMAT_TIMESTAMP` +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. -```sql -FORMAT_TIMESTAMP(format_string, timestamp[, time_zone]) -``` + -**Description** +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md -Formats a timestamp according to the specified `format_string`. + -See [Format elements for date and time parts][timestamp-format-elements] -for a list of format elements that this function supports. +**Supported Argument Types** + ++ `value_expression` can be any data type that can be returned from an + expression. ++ `offset` must be a non-negative integer literal or parameter. ++ `default_expression` must be compatible with the value expression type. **Return Data Type** -`STRING` +Same type as `value_expression`. -**Example** +**Examples** + +The following example illustrates a basic use of the `LEAD` function. ```sql -SELECT FORMAT_TIMESTAMP("%c", TIMESTAMP "2050-12-25 15:30:55+00", "UTC") - AS formatted; +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + finish_time, + division, + LEAD(name) + OVER (PARTITION BY division ORDER BY finish_time ASC) AS followed_by +FROM finishers; -/*--------------------------* - | formatted | - +--------------------------+ - | Sun Dec 25 15:30:55 2050 | - *--------------------------*/ +/*-----------------+-------------+----------+-----------------* + | name | finish_time | division | followed_by | + +-----------------+-------------+----------+-----------------+ + | Carly Forte | 03:08:58 | F25-29 | NULL | + | Sophia Liu | 02:51:45 | F30-34 | Nikki Leith | + | Nikki Leith | 02:59:01 | F30-34 | Jen Edwards | + | Jen Edwards | 03:06:36 | F30-34 | Meghan Lederer | + | Meghan Lederer | 03:07:41 | F30-34 | Lauren Reasoner | + | Lauren Reasoner | 03:10:14 | F30-34 | NULL | + | Lisa Stelzner | 02:54:11 | F35-39 | Lauren Matthews | + | Lauren Matthews | 03:01:17 | F35-39 | Desiree Berry | + | Desiree Berry | 03:05:42 | F35-39 | Suzy Slane | + | Suzy Slane | 03:06:24 | F35-39 | NULL | + *-----------------+-------------+----------+-----------------*/ ``` +This next example uses the optional `offset` parameter. + ```sql -SELECT FORMAT_TIMESTAMP("%b-%d-%Y", TIMESTAMP "2050-12-25 15:30:55+00") - AS formatted; +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + finish_time, + division, + LEAD(name, 2) + OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back +FROM finishers; -/*-------------* - | formatted | - +-------------+ - | Dec-25-2050 | - *-------------*/ +/*-----------------+-------------+----------+------------------* + | name | finish_time | division | two_runners_back | + +-----------------+-------------+----------+------------------+ + | Carly Forte | 03:08:58 | F25-29 | NULL | + | Sophia Liu | 02:51:45 | F30-34 | Jen Edwards | + | Nikki Leith | 02:59:01 | F30-34 | Meghan Lederer | + | Jen Edwards | 03:06:36 | F30-34 | Lauren Reasoner | + | Meghan Lederer | 03:07:41 | F30-34 | NULL | + | Lauren Reasoner | 03:10:14 | F30-34 | NULL | + | Lisa Stelzner | 02:54:11 | F35-39 | Desiree Berry | + | Lauren Matthews | 03:01:17 | F35-39 | Suzy Slane | + | Desiree Berry | 03:05:42 | F35-39 | NULL | + | Suzy Slane | 03:06:24 | F35-39 | NULL | + *-----------------+-------------+----------+------------------*/ ``` +The following example replaces NULL values with a default value. + ```sql -SELECT FORMAT_TIMESTAMP("%b %Y", TIMESTAMP "2050-12-25 15:30:55+00") - AS formatted; +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + finish_time, + division, + LEAD(name, 2, 'Nobody') + OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back +FROM finishers; -/*-------------* - | formatted | - +-------------+ - | Dec 2050 | - *-------------*/ +/*-----------------+-------------+----------+------------------* + | name | finish_time | division | two_runners_back | + +-----------------+-------------+----------+------------------+ + | Carly Forte | 03:08:58 | F25-29 | Nobody | + | Sophia Liu | 02:51:45 | F30-34 | Jen Edwards | + | Nikki Leith | 02:59:01 | F30-34 | Meghan Lederer | + | Jen Edwards | 03:06:36 | F30-34 | Lauren Reasoner | + | Meghan Lederer | 03:07:41 | F30-34 | Nobody | + | Lauren Reasoner | 03:10:14 | F30-34 | Nobody | + | Lisa Stelzner | 02:54:11 | F35-39 | Desiree Berry | + | Lauren Matthews | 03:01:17 | F35-39 | Suzy Slane | + | Desiree Berry | 03:05:42 | F35-39 | Nobody | + | Suzy Slane | 03:06:24 | F35-39 | Nobody | + *-----------------+-------------+----------+------------------*/ ``` -```sql -SELECT FORMAT_TIMESTAMP("%Y-%m-%dT%H:%M:%SZ", TIMESTAMP "2050-12-25 15:30:55", "UTC") - AS formatted; +### `NTH_VALUE` -/*+---------------------* - | formatted | - +----------------------+ - | 2050-12-25T15:30:55Z | - *----------------------*/ -``` +```sql +NTH_VALUE (value_expression, constant_integer_expression [{RESPECT | IGNORE} NULLS]) +OVER over_clause -[timestamp-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time +over_clause: + { named_window | ( [ window_specification ] ) } -### `PARSE_TIMESTAMP` +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] + [ window_frame_clause ] -```sql -PARSE_TIMESTAMP(format_string, timestamp_string[, time_zone]) ``` **Description** -Converts a [string representation of a timestamp][timestamp-format] to a -`TIMESTAMP` object. +Returns the value of `value_expression` at the Nth row of the current window +frame, where Nth is defined by `constant_integer_expression`. Returns NULL if +there is no such row. -`format_string` contains the [format elements][timestamp-format-elements] -that define how `timestamp_string` is formatted. Each element in -`timestamp_string` must have a corresponding element in `format_string`. The -location of each element in `format_string` must match the location of -each element in `timestamp_string`. +This function includes `NULL` values in the calculation unless `IGNORE NULLS` is +present. If `IGNORE NULLS` is present, the function excludes `NULL` values from +the calculation. -```sql --- This works because elements on both sides match. -SELECT PARSE_TIMESTAMP("%a %b %e %I:%M:%S %Y", "Thu Dec 25 07:30:00 2008") +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. --- This produces an error because the year element is in different locations. -SELECT PARSE_TIMESTAMP("%a %b %e %Y %I:%M:%S", "Thu Dec 25 07:30:00 2008") + --- This produces an error because one of the year elements is missing. -SELECT PARSE_TIMESTAMP("%a %b %e %I:%M:%S", "Thu Dec 25 07:30:00 2008") +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md --- This works because %c can find all matching elements in timestamp_string. -SELECT PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008") -``` + -When using `PARSE_TIMESTAMP`, keep the following in mind: +**Supported Argument Types** -+ **Unspecified fields.** Any unspecified field is initialized from `1970-01-01 - 00:00:00.0`. This initialization value uses the time zone specified by the - function's time zone argument, if present. If not, the initialization value - uses the default time zone, which is implementation defined. For instance, if the year - is unspecified then it defaults to `1970`, and so on. -+ **Case insensitivity.** Names, such as `Monday`, `February`, and so on, are - case insensitive. -+ **Whitespace.** One or more consecutive white spaces in the format string - matches zero or more consecutive white spaces in the timestamp string. In - addition, leading and trailing white spaces in the timestamp string are always - allowed, even if they are not in the format string. -+ **Format precedence.** When two (or more) format elements have overlapping - information (for example both `%F` and `%Y` affect the year), the last one - generally overrides any earlier ones, with some exceptions (see the - descriptions of `%s`, `%C`, and `%y`). -+ **Format divergence.** `%p` can be used with `am`, `AM`, `pm`, and `PM`. ++ `value_expression` can be any data type that can be returned from an + expression. ++ `constant_integer_expression` can be any constant expression that returns an + integer. **Return Data Type** -`TIMESTAMP` +Same type as `value_expression`. -**Example** +**Examples** ```sql -SELECT PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008") AS parsed; +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34' + UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29' + UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34') +SELECT name, + FORMAT_TIMESTAMP('%X', finish_time) AS finish_time, + division, + FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time, + FORMAT_TIMESTAMP('%X', second_fastest) AS second_fastest +FROM ( + SELECT name, + finish_time, + division,finishers, + FIRST_VALUE(finish_time) + OVER w1 AS fastest_time, + NTH_VALUE(finish_time, 2) + OVER w1 as second_fastest + FROM finishers + WINDOW w1 AS ( + PARTITION BY division ORDER BY finish_time ASC + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)); --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------* - | parsed | - +---------------------------------------------+ - | 2008-12-25 07:30:00.000 America/Los_Angeles | - *---------------------------------------------*/ +/*-----------------+-------------+----------+--------------+----------------* + | name | finish_time | division | fastest_time | second_fastest | + +-----------------+-------------+----------+--------------+----------------+ + | Carly Forte | 03:08:58 | F25-29 | 03:08:58 | NULL | + | Sophia Liu | 02:51:45 | F30-34 | 02:51:45 | 02:59:01 | + | Nikki Leith | 02:59:01 | F30-34 | 02:51:45 | 02:59:01 | + | Jen Edwards | 03:06:36 | F30-34 | 02:51:45 | 02:59:01 | + | Meghan Lederer | 03:07:41 | F30-34 | 02:51:45 | 02:59:01 | + | Lauren Reasoner | 03:10:14 | F30-34 | 02:51:45 | 02:59:01 | + | Lisa Stelzner | 02:54:11 | F35-39 | 02:54:11 | 03:01:17 | + | Lauren Matthews | 03:01:17 | F35-39 | 02:54:11 | 03:01:17 | + | Desiree Berry | 03:05:42 | F35-39 | 02:54:11 | 03:01:17 | + | Suzy Slane | 03:06:24 | F35-39 | 02:54:11 | 03:01:17 | + *-----------------+-------------+----------+--------------+----------------*/ ``` -[timestamp-format]: #format_timestamp +### `PERCENTILE_CONT` -[timestamp-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time +```sql +PERCENTILE_CONT (value_expression, percentile [{RESPECT | IGNORE} NULLS]) +OVER over_clause -### `STRING` +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] -```sql -STRING(timestamp_expression[, time_zone]) ``` **Description** -Converts a timestamp to a string. Supports an optional -parameter to specify a time zone. See -[Time zone definitions][timestamp-link-to-timezone-definitions] for information -on how to specify a time zone. +Computes the specified percentile value for the value_expression, with linear +interpolation. -**Return Data Type** +This function ignores NULL +values if +`RESPECT NULLS` is absent. If `RESPECT NULLS` is present: -`STRING` ++ Interpolation between two `NULL` values returns `NULL`. ++ Interpolation between a `NULL` value and a non-`NULL` value returns the + non-`NULL` value. -**Example** +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. -```sql -SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00", "UTC") AS string; + -/*-------------------------------* - | string | - +-------------------------------+ - | 2008-12-25 15:30:00+00 | - *-------------------------------*/ -``` +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md -[timestamp-link-to-timezone-definitions]: #timezone_definitions + -### `TIMESTAMP` +`PERCENTILE_CONT` can be used with differential privacy. To learn more, see +[Differentially private aggregate functions][dp-functions]. -```sql -TIMESTAMP(string_expression[, time_zone]) -TIMESTAMP(date_expression[, time_zone]) -TIMESTAMP(datetime_expression[, time_zone]) -``` +**Supported Argument Types** -**Description** ++ `value_expression` and `percentile` must have one of the following types: + + `NUMERIC` + + `BIGNUMERIC` + + `DOUBLE` ++ `percentile` must be a literal in the range `[0, 1]`. -+ `string_expression[, time_zone]`: Converts a string to a - timestamp. `string_expression` must include a - timestamp literal. - If `string_expression` includes a time zone in the timestamp literal, do - not include an explicit `time_zone` - argument. -+ `date_expression[, time_zone]`: Converts a date to a timestamp. - The value returned is the earliest timestamp that falls within - the given date. -+ `datetime_expression[, time_zone]`: Converts a - datetime to a timestamp. +**Return Data Type** -This function supports an optional -parameter to [specify a time zone][timestamp-link-to-timezone-definitions]. If -no time zone is specified, the default time zone, which is implementation defined, -is used. +The return data type is determined by the argument types with the following +table. + -**Return Data Type** + + + + + + + + + + -`TIMESTAMP` +
INPUTNUMERICBIGNUMERICDOUBLE
NUMERICNUMERICBIGNUMERICDOUBLE
BIGNUMERICBIGNUMERICBIGNUMERICDOUBLE
DOUBLEDOUBLEDOUBLEDOUBLE
**Examples** +The following example computes the value for some percentiles from a column of +values while ignoring nulls. + ```sql -SELECT TIMESTAMP("2008-12-25 15:30:00+00") AS timestamp_str; +SELECT + PERCENTILE_CONT(x, 0) OVER() AS min, + PERCENTILE_CONT(x, 0.01) OVER() AS percentile1, + PERCENTILE_CONT(x, 0.5) OVER() AS median, + PERCENTILE_CONT(x, 0.9) OVER() AS percentile90, + PERCENTILE_CONT(x, 1) OVER() AS max +FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1; --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------* - | timestamp_str | - +---------------------------------------------+ - | 2008-12-25 07:30:00.000 America/Los_Angeles | - *---------------------------------------------*/ + /*-----+-------------+--------+--------------+-----* + | min | percentile1 | median | percentile90 | max | + +-----+-------------+--------+--------------+-----+ + | 0 | 0.03 | 1.5 | 2.7 | 3 | + *-----+-------------+--------+--------------+-----+ ``` +The following example computes the value for some percentiles from a column of +values while respecting nulls. + ```sql -SELECT TIMESTAMP("2008-12-25 15:30:00", "America/Los_Angeles") AS timestamp_str; +SELECT + PERCENTILE_CONT(x, 0 RESPECT NULLS) OVER() AS min, + PERCENTILE_CONT(x, 0.01 RESPECT NULLS) OVER() AS percentile1, + PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER() AS median, + PERCENTILE_CONT(x, 0.9 RESPECT NULLS) OVER() AS percentile90, + PERCENTILE_CONT(x, 1 RESPECT NULLS) OVER() AS max +FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1; --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------* - | timestamp_str | - +---------------------------------------------+ - | 2008-12-25 15:30:00.000 America/Los_Angeles | - *---------------------------------------------*/ +/*------+-------------+--------+--------------+-----* + | min | percentile1 | median | percentile90 | max | + +------+-------------+--------+--------------+-----+ + | NULL | 0 | 1 | 2.6 | 3 | + *------+-------------+--------+--------------+-----+ ``` -```sql -SELECT TIMESTAMP("2008-12-25 15:30:00 UTC") AS timestamp_str; +[dp-functions]: #aggregate-dp-functions --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------* - | timestamp_str | - +---------------------------------------------+ - | 2008-12-25 07:30:00.000 America/Los_Angeles | - *---------------------------------------------*/ -``` +### `PERCENTILE_DISC` ```sql -SELECT TIMESTAMP(DATETIME "2008-12-25 15:30:00") AS timestamp_datetime; +PERCENTILE_DISC (value_expression, percentile [{RESPECT | IGNORE} NULLS]) +OVER over_clause --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------* - | timestamp_datetime | - +---------------------------------------------+ - | 2008-12-25 15:30:00.000 America/Los_Angeles | - *---------------------------------------------*/ -``` +over_clause: + { named_window | ( [ window_specification ] ) } -```sql -SELECT TIMESTAMP(DATE "2008-12-25") AS timestamp_date; +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------* - | timestamp_date | - +---------------------------------------------+ - | 2008-12-25 00:00:00.000 America/Los_Angeles | - *---------------------------------------------*/ ``` -[timestamp-literals]: https://github.com/google/zetasql/blob/master/docs/lexical.md#timestamp_literals +**Description** -[timestamp-link-to-timezone-definitions]: #timezone_definitions +Computes the specified percentile value for a discrete `value_expression`. The +returned value is the first sorted value of `value_expression` with cumulative +distribution greater than or equal to the given `percentile` value. -### `TIMESTAMP_ADD` +This function ignores `NULL` +values unless +`RESPECT NULLS` is present. -```sql -TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression date_part) -``` +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. -**Description** + -Adds `int64_expression` units of `date_part` to the timestamp, independent of -any time zone. +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md -`TIMESTAMP_ADD` supports the following values for `date_part`: + -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR`. Equivalent to 60 `MINUTE` parts. -+ `DAY`. Equivalent to 24 `HOUR` parts. +**Supported Argument Types** -**Return Data Types** ++ `value_expression` can be any orderable type. ++ `percentile` must be a literal in the range `[0, 1]`, with one of the + following types: + + `NUMERIC` + + `BIGNUMERIC` + + `DOUBLE` -`TIMESTAMP` +**Return Data Type** -**Example** +Same type as `value_expression`. + +**Examples** + +The following example computes the value for some percentiles from a column of +values while ignoring nulls. ```sql SELECT - TIMESTAMP("2008-12-25 15:30:00+00") AS original, - TIMESTAMP_ADD(TIMESTAMP "2008-12-25 15:30:00+00", INTERVAL 10 MINUTE) AS later; + x, + PERCENTILE_DISC(x, 0) OVER() AS min, + PERCENTILE_DISC(x, 0.5) OVER() AS median, + PERCENTILE_DISC(x, 1) OVER() AS max +FROM UNNEST(['c', NULL, 'b', 'a']) AS x; --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+---------------------------------------------* - | original | later | - +---------------------------------------------+---------------------------------------------+ - | 2008-12-25 07:30:00.000 America/Los_Angeles | 2008-12-25 07:40:00.000 America/Los_Angeles | - *---------------------------------------------+---------------------------------------------*/ +/*------+-----+--------+-----* + | x | min | median | max | + +------+-----+--------+-----+ + | c | a | b | c | + | NULL | a | b | c | + | b | a | b | c | + | a | a | b | c | + *------+-----+--------+-----*/ ``` -### `TIMESTAMP_DIFF` +The following example computes the value for some percentiles from a column of +values while respecting nulls. ```sql -TIMESTAMP_DIFF(timestamp_expression_a, timestamp_expression_b, date_part) -``` - -**Description** +SELECT + x, + PERCENTILE_DISC(x, 0 RESPECT NULLS) OVER() AS min, + PERCENTILE_DISC(x, 0.5 RESPECT NULLS) OVER() AS median, + PERCENTILE_DISC(x, 1 RESPECT NULLS) OVER() AS max +FROM UNNEST(['c', NULL, 'b', 'a']) AS x; -Returns the whole number of specified `date_part` intervals between two -timestamps (`timestamp_expression_a` - `timestamp_expression_b`). -If the first timestamp is earlier than the second one, -the output is negative. Produces an error if the computation overflows the -result type, such as if the difference in -nanoseconds -between the two timestamps would overflow an -`INT64` value. - -`TIMESTAMP_DIFF` supports the following values for `date_part`: - -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR`. Equivalent to 60 `MINUTE`s. -+ `DAY`. Equivalent to 24 `HOUR`s. +/*------+------+--------+-----* + | x | min | median | max | + +------+------+--------+-----+ + | c | NULL | a | c | + | NULL | NULL | a | c | + | b | NULL | a | c | + | a | NULL | a | c | + *------+------+--------+-----*/ -**Return Data Type** +``` -`INT64` +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md -**Example** +## Net functions -```sql -SELECT - TIMESTAMP("2010-07-07 10:20:00+00") AS later_timestamp, - TIMESTAMP("2008-12-25 15:30:00+00") AS earlier_timestamp, - TIMESTAMP_DIFF(TIMESTAMP "2010-07-07 10:20:00+00", TIMESTAMP "2008-12-25 15:30:00+00", HOUR) AS hours; +ZetaSQL supports the following Net functions. --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+---------------------------------------------+-------* - | later_timestamp | earlier_timestamp | hours | - +---------------------------------------------+---------------------------------------------+-------+ - | 2010-07-07 03:20:00.000 America/Los_Angeles | 2008-12-25 07:30:00.000 America/Los_Angeles | 13410 | - *---------------------------------------------+---------------------------------------------+-------*/ -``` +### Function list -In the following example, the first timestamp occurs before the -second timestamp, resulting in a negative output. + + + + + + + + -```sql -SELECT TIMESTAMP_DIFF(TIMESTAMP "2018-08-14", TIMESTAMP "2018-10-14", DAY) AS negative_diff; + + + + -In this example, the result is 0 because only the number of whole specified -`HOUR` intervals are included. + + + + -/*---------------* - | diff | - +---------------+ - | 0 | - *---------------+ -``` + + + + -```sql -TIMESTAMP_FROM_UNIX_MICROS(int64_expression) -``` + + + + -**Description** + + + + -**Return Data Type** + + + + -**Example** + + + + --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*------------------------* - | timestamp_value | - +------------------------+ - | 2008-12-25 15:30:00+00 | - *------------------------*/ -``` + + + + -```sql -TIMESTAMP_FROM_UNIX_MILLIS(int64_expression) -``` + + + + -**Description** + + + + -**Return Data Type** + + + + -**Example** + + + + --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*------------------------* - | timestamp_value | - +------------------------+ - | 2008-12-25 15:30:00+00 | - *------------------------*/ -``` + + + + -```sql -TIMESTAMP_FROM_UNIX_SECONDS(int64_expression) -``` + + + + -**Description** + + + + -**Return Data Type** + + + + -**Example** + +
NameSummary
NET.FORMAT_IP -/*---------------* - | negative_diff | - +---------------+ - | -61 | - *---------------+ -``` + + (Deprecated) Converts an + IPv4 address from an INT64 value to a + STRING value. +
NET.FORMAT_PACKED_IP -```sql -SELECT TIMESTAMP_DIFF("2001-02-01 01:00:00", "2001-02-01 00:00:01", HOUR) AS diff; + + (Deprecated) Converts an + IPv4 or IPv6 address from a BYTES value to a + STRING value. +
NET.HOST -### `TIMESTAMP_FROM_UNIX_MICROS` + + Gets the hostname from a URL. +
NET.IP_FROM_STRING -```sql -TIMESTAMP_FROM_UNIX_MICROS(timestamp_expression) -``` + + Converts an IPv4 or IPv6 address from a STRING value to + a BYTES value in network byte order. +
NET.IP_IN_NET -Interprets `int64_expression` as the number of microseconds since -1970-01-01 00:00:00 UTC and returns a timestamp. If a timestamp is passed in, -the same timestamp is returned. + + Checks if an IP address is in a subnet. +
NET.IP_NET_MASK -`TIMESTAMP` + + Gets a network mask. +
NET.IP_TO_STRING -```sql -SELECT TIMESTAMP_FROM_UNIX_MICROS(1230219000000000) AS timestamp_value; + + Converts an IPv4 or IPv6 address from a BYTES value in + network byte order to a STRING value. +
NET.IP_TRUNC -### `TIMESTAMP_FROM_UNIX_MILLIS` + + Converts a BYTES IPv4 or IPv6 address in + network byte order to a BYTES subnet address. +
NET.IPV4_FROM_INT64 -```sql -TIMESTAMP_FROM_UNIX_MILLIS(timestamp_expression) -``` + + Converts an IPv4 address from an INT64 value to a + BYTES value in network byte order. +
NET.IPV4_TO_INT64 -Interprets `int64_expression` as the number of milliseconds since -1970-01-01 00:00:00 UTC and returns a timestamp. If a timestamp is passed in, -the same timestamp is returned. + + Converts an IPv4 address from a BYTES value in network + byte order to an INT64 value. +
NET.MAKE_NET -`TIMESTAMP` + + Takes a IPv4 or IPv6 address and the prefix length, and produces a + CIDR subnet. +
NET.PARSE_IP -```sql -SELECT TIMESTAMP_FROM_UNIX_MILLIS(1230219000000) AS timestamp_value; + + (Deprecated) Converts an + IPv4 address from a STRING value to an + INT64 value. +
NET.PARSE_PACKED_IP -### `TIMESTAMP_FROM_UNIX_SECONDS` + + (Deprecated) Converts an + IPv4 or IPv6 address from a STRING value to a + BYTES value. +
NET.PUBLIC_SUFFIX -```sql -TIMESTAMP_FROM_UNIX_SECONDS(timestamp_expression) -``` + + Gets the public suffix from a URL. +
NET.REG_DOMAIN -Interprets `int64_expression` as the number of seconds since -1970-01-01 00:00:00 UTC and returns a timestamp. If a timestamp is passed in, -the same timestamp is returned. + + Gets the registered or registrable domain from a URL. +
NET.SAFE_IP_FROM_STRING -`TIMESTAMP` + + Similar to the NET.IP_FROM_STRING, but returns + NULL instead of producing an error if the input is invalid. +
-```sql -SELECT TIMESTAMP_FROM_UNIX_SECONDS(1230219000) AS timestamp_value; +### `NET.FORMAT_IP` (DEPRECATED) + --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*------------------------* - | timestamp_value | - +------------------------+ - | 2008-12-25 15:30:00+00 | - *------------------------*/ ``` - -### `TIMESTAMP_MICROS` - -```sql -TIMESTAMP_MICROS(int64_expression) +NET.FORMAT_IP(integer) ``` **Description** -Interprets `int64_expression` as the number of microseconds since 1970-01-01 -00:00:00 UTC and returns a timestamp. +This function is deprecated. It is the same as +[`NET.IP_TO_STRING`][net-link-to-ip-to-string]`(`[`NET.IPV4_FROM_INT64`][net-link-to-ipv4-from-int64]`(integer))`, +except that this function does not allow negative input values. **Return Data Type** -`TIMESTAMP` - -**Example** +STRING -```sql -SELECT TIMESTAMP_MICROS(1230219000000000) AS timestamp_value; +[net-link-to-ip-to-string]: #netip_to_string --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*------------------------* - | timestamp_value | - +------------------------+ - | 2008-12-25 15:30:00+00 | - *------------------------*/ -``` +[net-link-to-ipv4-from-int64]: #netipv4_from_int64 -### `TIMESTAMP_MILLIS` +### `NET.FORMAT_PACKED_IP` (DEPRECATED) + -```sql -TIMESTAMP_MILLIS(int64_expression) +``` +NET.FORMAT_PACKED_IP(bytes_value) ``` **Description** -Interprets `int64_expression` as the number of milliseconds since 1970-01-01 -00:00:00 UTC and returns a timestamp. +This function is deprecated. It is the same as [`NET.IP_TO_STRING`][net-link-to-ip-to-string]. **Return Data Type** -`TIMESTAMP` +STRING -**Example** +[net-link-to-ip-to-string]: #netip_to_string -```sql -SELECT TIMESTAMP_MILLIS(1230219000000) AS timestamp_value; +### `NET.HOST` --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*------------------------* - | timestamp_value | - +------------------------+ - | 2008-12-25 15:30:00+00 | - *------------------------*/ ``` - -### `TIMESTAMP_SECONDS` - -```sql -TIMESTAMP_SECONDS(int64_expression) +NET.HOST(url) ``` **Description** -Interprets `int64_expression` as the number of seconds since 1970-01-01 00:00:00 -UTC and returns a timestamp. +Takes a URL as a `STRING` value and returns the host. For best results, URL +values should comply with the format as defined by +[RFC 3986][net-link-to-rfc-3986-appendix-a]. If the URL value does not comply +with RFC 3986 formatting, this function makes a best effort to parse the input +and return a relevant result. If the function cannot parse the input, it +returns `NULL`. + +Note: The function does not perform any normalization. **Return Data Type** -`TIMESTAMP` +`STRING` **Example** ```sql -SELECT TIMESTAMP_SECONDS(1230219000) AS timestamp_value; - --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*------------------------* - | timestamp_value | - +------------------------+ - | 2008-12-25 15:30:00+00 | - *------------------------*/ +SELECT + FORMAT("%T", input) AS input, + description, + FORMAT("%T", NET.HOST(input)) AS host, + FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix, + FORMAT("%T", NET.REG_DOMAIN(input)) AS domain +FROM ( + SELECT "" AS input, "invalid input" AS description + UNION ALL SELECT "http://abc.xyz", "standard URL" + UNION ALL SELECT "//user:password@a.b:80/path?query", + "standard URL with relative scheme, port, path and query, but no public suffix" + UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host" + UNION ALL SELECT "http://例å­.å·ç­’纸.中国", "standard URL with internationalized domain name" + UNION ALL SELECT " www.Example.Co.UK ", + "non-standard URL with spaces, upper case letters, and without scheme" + UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported" +); ``` -### `TIMESTAMP_SUB` +| input | description | host | suffix | domain | +|---------------------------------------------------------------------|-------------------------------------------------------------------------------|--------------------|---------|----------------| +| "" | invalid input | NULL | NULL | NULL | +| "http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" | +| "//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL | +| "https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL | +| "http://例å­.å·ç­’纸.中国" | standard URL with internationalized domain name | "例å­.å·ç­’纸.中国" | "中国" | "å·ç­’纸.中国" | +| "    www.Example.Co.UK    " | non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK"| "Co.UK" | "Example.Co.UK"| +| "mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL | + +[net-link-to-rfc-3986-appendix-a]: https://tools.ietf.org/html/rfc3986#appendix-A + +### `NET.IP_FROM_STRING` -```sql -TIMESTAMP_SUB(timestamp_expression, INTERVAL int64_expression date_part) +``` +NET.IP_FROM_STRING(addr_str) ``` **Description** -Subtracts `int64_expression` units of `date_part` from the timestamp, -independent of any time zone. +Converts an IPv4 or IPv6 address from text (STRING) format to binary (BYTES) +format in network byte order. -`TIMESTAMP_SUB` supports the following values for `date_part`: +This function supports the following formats for `addr_str`: -+ `NANOSECOND` - (if the SQL engine supports it) -+ `MICROSECOND` -+ `MILLISECOND` -+ `SECOND` -+ `MINUTE` -+ `HOUR`. Equivalent to 60 `MINUTE` parts. -+ `DAY`. Equivalent to 24 `HOUR` parts. ++ IPv4: Dotted-quad format. For example, `10.1.2.3`. ++ IPv6: Colon-separated format. For example, + `1234:5678:90ab:cdef:1234:5678:90ab:cdef`. For more examples, see the + [IP Version 6 Addressing Architecture][net-link-to-ipv6-rfc]. + +This function does not support [CIDR notation][net-link-to-cidr-notation], such as `10.1.2.3/32`. + +If this function receives a `NULL` input, it returns `NULL`. If the input is +considered invalid, an `OUT_OF_RANGE` error occurs. **Return Data Type** -`TIMESTAMP` +BYTES **Example** ```sql SELECT - TIMESTAMP("2008-12-25 15:30:00+00") AS original, - TIMESTAMP_SUB(TIMESTAMP "2008-12-25 15:30:00+00", INTERVAL 10 MINUTE) AS earlier; + addr_str, FORMAT("%T", NET.IP_FROM_STRING(addr_str)) AS ip_from_string +FROM UNNEST([ + '48.49.50.51', + '::1', + '3031:3233:3435:3637:3839:4041:4243:4445', + '::ffff:192.0.2.128' +]) AS addr_str; --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+---------------------------------------------* - | original | earlier | - +---------------------------------------------+---------------------------------------------+ - | 2008-12-25 07:30:00.000 America/Los_Angeles | 2008-12-25 07:20:00.000 America/Los_Angeles | - *---------------------------------------------+---------------------------------------------*/ +/*---------------------------------------------------------------------------------------------------------------* + | addr_str | ip_from_string | + +---------------------------------------------------------------------------------------------------------------+ + | 48.49.50.51 | b"0123" | + | ::1 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" | + | 3031:3233:3435:3637:3839:4041:4243:4445 | b"0123456789@ABCDE" | + | ::ffff:192.0.2.128 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" | + *---------------------------------------------------------------------------------------------------------------*/ ``` -### `TIMESTAMP_TRUNC` +[net-link-to-ipv6-rfc]: http://www.ietf.org/rfc/rfc2373.txt -```sql -TIMESTAMP_TRUNC(timestamp_expression, date_time_part[, time_zone]) +[net-link-to-cidr-notation]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing + +### `NET.IP_IN_NET` + +``` +NET.IP_IN_NET(address, subnet) ``` **Description** -Truncates a timestamp to the granularity of `date_time_part`. -The timestamp is always rounded to the beginning of `date_time_part`, -which can be one of the following: +Takes an IP address and a subnet CIDR as STRING and returns true if the IP +address is contained in the subnet. -+ `NANOSECOND`: If used, nothing is truncated from the value. -+ `MICROSECOND`: The nearest lessor or equal microsecond. -+ `MILLISECOND`: The nearest lessor or equal millisecond. -+ `SECOND`: The nearest lessor or equal second. -+ `MINUTE`: The nearest lessor or equal minute. -+ `HOUR`: The nearest lessor or equal hour. -+ `DAY`: The day in the Gregorian calendar year that contains the - `TIMESTAMP` value. -+ `WEEK`: The first day of the week in the week that contains the - `TIMESTAMP` value. Weeks begin on Sundays. `WEEK` is equivalent to - `WEEK(SUNDAY)`. -+ `WEEK(WEEKDAY)`: The first day of the week in the week that contains the - `TIMESTAMP` value. Weeks begin on `WEEKDAY`. `WEEKDAY` must be one of the - following: `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, - or `SATURDAY`. -+ `ISOWEEK`: The first day of the [ISO 8601 week][ISO-8601-week] in the - ISO week that contains the `TIMESTAMP` value. The ISO week begins on - Monday. The first ISO week of each ISO year contains the first Thursday of the - corresponding Gregorian calendar year. -+ `MONTH`: The first day of the month in the month that contains the - `TIMESTAMP` value. -+ `QUARTER`: The first day of the quarter in the quarter that contains the - `TIMESTAMP` value. -+ `YEAR`: The first day of the year in the year that contains the - `TIMESTAMP` value. -+ `ISOYEAR`: The first day of the [ISO 8601][ISO-8601] week-numbering year - in the ISO year that contains the `TIMESTAMP` value. The ISO year is the - Monday of the first week whose Thursday belongs to the corresponding - Gregorian calendar year. +This function supports the following formats for `address` and `subnet`: - ++ IPv4: Dotted-quad format. For example, `10.1.2.3`. ++ IPv6: Colon-separated format. For example, + `1234:5678:90ab:cdef:1234:5678:90ab:cdef`. For more examples, see the + [IP Version 6 Addressing Architecture][net-link-to-ipv6-rfc]. ++ CIDR (IPv4): Dotted-quad format. For example, `10.1.2.0/24` ++ CIDR (IPv6): Colon-separated format. For example, `1:2::/48`. -[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 +If this function receives a `NULL` input, it returns `NULL`. If the input is +considered invalid, an `OUT_OF_RANGE` error occurs. -[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date +**Return Data Type** - +BOOL -`TIMESTAMP_TRUNC` function supports an optional `time_zone` parameter. This -parameter applies to the following `date_time_part`: +[net-link-to-ipv6-rfc]: http://www.ietf.org/rfc/rfc2373.txt -+ `MINUTE` -+ `HOUR` -+ `DAY` -+ `WEEK` -+ `WEEK()` -+ `ISOWEEK` -+ `MONTH` -+ `QUARTER` -+ `YEAR` -+ `ISOYEAR` +### `NET.IP_NET_MASK` -Use this parameter if you want to use a time zone other than the -default time zone, which is implementation defined, as part of the -truncate operation. +``` +NET.IP_NET_MASK(num_output_bytes, prefix_length) +``` -When truncating a timestamp to `MINUTE` -or`HOUR` parts, `TIMESTAMP_TRUNC` determines the civil time of the -timestamp in the specified (or default) time zone -and subtracts the minutes and seconds (when truncating to `HOUR`) or the seconds -(when truncating to `MINUTE`) from that timestamp. -While this provides intuitive results in most cases, the result is -non-intuitive near daylight savings transitions that are not hour-aligned. - -**Return Data Type** - -`TIMESTAMP` +**Description** -**Examples** +Returns a network mask: a byte sequence with length equal to `num_output_bytes`, +where the first `prefix_length` bits are set to 1 and the other bits are set to +0. `num_output_bytes` and `prefix_length` are INT64. +This function throws an error if `num_output_bytes` is not 4 (for IPv4) or 16 +(for IPv6). It also throws an error if `prefix_length` is negative or greater +than `8 * num_output_bytes`. -```sql -SELECT - TIMESTAMP_TRUNC(TIMESTAMP "2008-12-25 15:30:00+00", DAY, "UTC") AS utc, - TIMESTAMP_TRUNC(TIMESTAMP "2008-12-25 15:30:00+00", DAY, "America/Los_Angeles") AS la; +**Return Data Type** --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+---------------------------------------------* - | utc | la | - +---------------------------------------------+---------------------------------------------+ - | 2008-12-24 16:00:00.000 America/Los_Angeles | 2008-12-25 00:00:00.000 America/Los_Angeles | - *---------------------------------------------+---------------------------------------------*/ -``` +BYTES -In the following example, `timestamp_expression` has a time zone offset of +12. -The first column shows the `timestamp_expression` in UTC time. The second -column shows the output of `TIMESTAMP_TRUNC` using weeks that start on Monday. -Because the `timestamp_expression` falls on a Sunday in UTC, `TIMESTAMP_TRUNC` -truncates it to the preceding Monday. The third column shows the same function -with the optional [Time zone definition][timestamp-link-to-timezone-definitions] -argument 'Pacific/Auckland'. Here, the function truncates the -`timestamp_expression` using New Zealand Daylight Time, where it falls on a -Monday. +**Example** ```sql -SELECT - timestamp_value AS timestamp_value, - TIMESTAMP_TRUNC(timestamp_value, WEEK(MONDAY), "UTC") AS utc_truncated, - TIMESTAMP_TRUNC(timestamp_value, WEEK(MONDAY), "Pacific/Auckland") AS nzdt_truncated -FROM (SELECT TIMESTAMP("2017-11-06 00:00:00+12") AS timestamp_value); +SELECT x, y, FORMAT("%T", NET.IP_NET_MASK(x, y)) AS ip_net_mask +FROM UNNEST([ + STRUCT(4 as x, 0 as y), + (4, 20), + (4, 32), + (16, 0), + (16, 1), + (16, 128) +]); --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+---------------------------------------------+---------------------------------------------* - | timestamp_value | utc_truncated | nzdt_truncated | - +---------------------------------------------+---------------------------------------------+---------------------------------------------+ - | 2017-11-05 04:00:00.000 America/Los_Angeles | 2017-10-29 17:00:00.000 America/Los_Angeles | 2017-11-05 03:00:00.000 America/Los_Angeles | - *---------------------------------------------+---------------------------------------------+---------------------------------------------*/ +/*--------------------------------------------------------------------------------* + | x | y | ip_net_mask | + +--------------------------------------------------------------------------------+ + | 4 | 0 | b"\x00\x00\x00\x00" | + | 4 | 20 | b"\xff\xff\xf0\x00" | + | 4 | 32 | b"\xff\xff\xff\xff" | + | 16 | 0 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | + | 16 | 1 | b"\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | + | 16 | 128 | b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff" | + *--------------------------------------------------------------------------------*/ ``` -In the following example, the original `timestamp_expression` is in the -Gregorian calendar year 2015. However, `TIMESTAMP_TRUNC` with the `ISOYEAR` date -part truncates the `timestamp_expression` to the beginning of the ISO year, not -the Gregorian calendar year. The first Thursday of the 2015 calendar year was -2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29. -Therefore the ISO year boundary preceding the `timestamp_expression` -2015-06-15 00:00:00+00 is 2014-12-29. - -```sql -SELECT - TIMESTAMP_TRUNC("2015-06-15 00:00:00+00", ISOYEAR) AS isoyear_boundary, - EXTRACT(ISOYEAR FROM TIMESTAMP "2015-06-15 00:00:00+00") AS isoyear_number; +### `NET.IP_TO_STRING` --- Display of results may differ, depending upon the environment and time zone where this query was executed. -/*---------------------------------------------+----------------* - | isoyear_boundary | isoyear_number | - +---------------------------------------------+----------------+ - | 2014-12-29 00:00:00.000 America/Los_Angeles | 2015 | - *---------------------------------------------+----------------*/ ``` - -[timestamp-link-to-timezone-definitions]: #timezone_definitions - -### `UNIX_MICROS` - -```sql -UNIX_MICROS(timestamp_expression) +NET.IP_TO_STRING(addr_bin) ``` **Description** +Converts an IPv4 or IPv6 address from binary (BYTES) format in network byte +order to text (STRING) format. -Returns the number of microseconds since `1970-01-01 00:00:00 UTC`. -Truncates higher levels of precision by -rounding down to the beginning of the microsecond. +If the input is 4 bytes, this function returns an IPv4 address as a STRING. If +the input is 16 bytes, it returns an IPv6 address as a STRING. + +If this function receives a `NULL` input, it returns `NULL`. If the input has +a length different from 4 or 16, an `OUT_OF_RANGE` error occurs. **Return Data Type** -`INT64` +STRING -**Examples** +**Example** ```sql -SELECT UNIX_MICROS(TIMESTAMP "2008-12-25 15:30:00+00") AS micros; +SELECT FORMAT("%T", x) AS addr_bin, NET.IP_TO_STRING(x) AS ip_to_string +FROM UNNEST([ + b"0123", + b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01", + b"0123456789@ABCDE", + b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" +]) AS x; -/*------------------* - | micros | - +------------------+ - | 1230219000000000 | - *------------------*/ +/*---------------------------------------------------------------------------------------------------------------* + | addr_bin | ip_to_string | + +---------------------------------------------------------------------------------------------------------------+ + | b"0123" | 48.49.50.51 | + | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" | ::1 | + | b"0123456789@ABCDE" | 3031:3233:3435:3637:3839:4041:4243:4445 | + | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" | ::ffff:192.0.2.128 | + *---------------------------------------------------------------------------------------------------------------*/ ``` -```sql -SELECT UNIX_MICROS(TIMESTAMP "1970-01-01 00:00:00.0000018+00") AS micros; +### `NET.IP_TRUNC` -/*------------------* - | micros | - +------------------+ - | 1 | - *------------------*/ ``` - -### `UNIX_MILLIS` - -```sql -UNIX_MILLIS(timestamp_expression) +NET.IP_TRUNC(addr_bin, prefix_length) ``` **Description** +Takes `addr_bin`, an IPv4 or IPv6 address in binary (BYTES) format in network +byte order, and returns a subnet address in the same format. The result has the +same length as `addr_bin`, where the first `prefix_length` bits are equal to +those in `addr_bin` and the remaining bits are 0. -Returns the number of milliseconds since `1970-01-01 00:00:00 UTC`. Truncates -higher levels of precision by rounding down to the beginning of the millisecond. +This function throws an error if `LENGTH(addr_bin)` is not 4 or 16, or if +`prefix_len` is negative or greater than `LENGTH(addr_bin) * 8`. **Return Data Type** -`INT64` +BYTES -**Examples** +**Example** ```sql -SELECT UNIX_MILLIS(TIMESTAMP "2008-12-25 15:30:00+00") AS millis; +SELECT + FORMAT("%T", x) as addr_bin, prefix_length, + FORMAT("%T", NET.IP_TRUNC(x, prefix_length)) AS ip_trunc +FROM UNNEST([ + STRUCT(b"\xAA\xBB\xCC\xDD" as x, 0 as prefix_length), + (b"\xAA\xBB\xCC\xDD", 11), (b"\xAA\xBB\xCC\xDD", 12), + (b"\xAA\xBB\xCC\xDD", 24), (b"\xAA\xBB\xCC\xDD", 32), + (b'0123456789@ABCDE', 80) +]); -/*---------------* - | millis | - +---------------+ - | 1230219000000 | - *---------------*/ +/*-----------------------------------------------------------------------------* + | addr_bin | prefix_length | ip_trunc | + +-----------------------------------------------------------------------------+ + | b"\xaa\xbb\xcc\xdd" | 0 | b"\x00\x00\x00\x00" | + | b"\xaa\xbb\xcc\xdd" | 11 | b"\xaa\xa0\x00\x00" | + | b"\xaa\xbb\xcc\xdd" | 12 | b"\xaa\xb0\x00\x00" | + | b"\xaa\xbb\xcc\xdd" | 24 | b"\xaa\xbb\xcc\x00" | + | b"\xaa\xbb\xcc\xdd" | 32 | b"\xaa\xbb\xcc\xdd" | + | b"0123456789@ABCDE" | 80 | b"0123456789\x00\x00\x00\x00\x00\x00" | + *-----------------------------------------------------------------------------*/ ``` -```sql -SELECT UNIX_MILLIS(TIMESTAMP "1970-01-01 00:00:00.0018+00") AS millis; +### `NET.IPV4_FROM_INT64` -/*---------------* - | millis | - +---------------+ - | 1 | - *---------------*/ ``` - -### `UNIX_SECONDS` - -```sql -UNIX_SECONDS(timestamp_expression) +NET.IPV4_FROM_INT64(integer_value) ``` **Description** -Returns the number of seconds since `1970-01-01 00:00:00 UTC`. Truncates higher -levels of precision by rounding down to the beginning of the second. +Converts an IPv4 address from integer format to binary (BYTES) format in network +byte order. In the integer input, the least significant bit of the IP address is +stored in the least significant bit of the integer, regardless of host or client +architecture. For example, `1` means `0.0.0.1`, and `0x1FF` means `0.0.1.255`. + +This function checks that either all the most significant 32 bits are 0, or all +the most significant 33 bits are 1 (sign-extended from a 32-bit integer). +In other words, the input should be in the range `[-0x80000000, 0xFFFFFFFF]`; +otherwise, this function throws an error. + +This function does not support IPv6. **Return Data Type** -`INT64` +BYTES -**Examples** +**Example** ```sql -SELECT UNIX_SECONDS(TIMESTAMP "2008-12-25 15:30:00+00") AS seconds; +SELECT x, x_hex, FORMAT("%T", NET.IPV4_FROM_INT64(x)) AS ipv4_from_int64 +FROM ( + SELECT CAST(x_hex AS INT64) x, x_hex + FROM UNNEST(["0x0", "0xABCDEF", "0xFFFFFFFF", "-0x1", "-0x2"]) AS x_hex +); -/*------------* - | seconds | - +------------+ - | 1230219000 | - *------------*/ +/*-----------------------------------------------* + | x | x_hex | ipv4_from_int64 | + +-----------------------------------------------+ + | 0 | 0x0 | b"\x00\x00\x00\x00" | + | 11259375 | 0xABCDEF | b"\x00\xab\xcd\xef" | + | 4294967295 | 0xFFFFFFFF | b"\xff\xff\xff\xff" | + | -1 | -0x1 | b"\xff\xff\xff\xff" | + | -2 | -0x2 | b"\xff\xff\xff\xfe" | + *-----------------------------------------------*/ ``` -```sql -SELECT UNIX_SECONDS(TIMESTAMP "1970-01-01 00:00:01.8+00") AS seconds; +### `NET.IPV4_TO_INT64` -/*------------* - | seconds | - +------------+ - | 1 | - *------------*/ +``` +NET.IPV4_TO_INT64(addr_bin) ``` -### How time zones work with timestamp functions - +**Description** -A timestamp represents an absolute point in time, independent of any time -zone. However, when a timestamp value is displayed, it is usually converted to -a human-readable format consisting of a civil date and time -(YYYY-MM-DD HH:MM:SS) -and a time zone. This is not the internal representation of the -`TIMESTAMP`; it is only a human-understandable way to describe the point in time -that the timestamp represents. +Converts an IPv4 address from binary (BYTES) format in network byte order to +integer format. In the integer output, the least significant bit of the IP +address is stored in the least significant bit of the integer, regardless of +host or client architecture. For example, `1` means `0.0.0.1`, and `0x1FF` means +`0.0.1.255`. The output is in the range `[0, 0xFFFFFFFF]`. -Some timestamp functions have a time zone argument. A time zone is needed to -convert between civil time (YYYY-MM-DD HH:MM:SS) and the absolute time -represented by a timestamp. -A function like `PARSE_TIMESTAMP` takes an input string that represents a -civil time and returns a timestamp that represents an absolute time. A -time zone is needed for this conversion. A function like `EXTRACT` takes an -input timestamp (absolute time) and converts it to civil time in order to -extract a part of that civil time. This conversion requires a time zone. -If no time zone is specified, the default time zone, which is implementation defined, -is used. +If the input length is not 4, this function throws an error. -Certain date and timestamp functions allow you to override the default time zone -and specify a different one. You can specify a time zone by either supplying -the time zone name (for example, `America/Los_Angeles`) -or time zone offset from UTC (for example, -08). +This function does not support IPv6. -To learn more about how time zones work with the `TIMESTAMP` type, see -[Time zones][data-types-timezones]. +**Return Data Type** -[timezone-by-name]: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones +INT64 -[data-types-timezones]: https://github.com/google/zetasql/blob/master/docs/data-types.md#time_zones +**Example** -[timestamp-link-to-timezone-definitions]: #timezone_definitions +```sql +SELECT + FORMAT("%T", x) AS addr_bin, + FORMAT("0x%X", NET.IPV4_TO_INT64(x)) AS ipv4_to_int64 +FROM +UNNEST([b"\x00\x00\x00\x00", b"\x00\xab\xcd\xef", b"\xff\xff\xff\xff"]) AS x; -[data-types-link-to-date_type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#date_type +/*-------------------------------------* + | addr_bin | ipv4_to_int64 | + +-------------------------------------+ + | b"\x00\x00\x00\x00" | 0x0 | + | b"\x00\xab\xcd\xef" | 0xABCDEF | + | b"\xff\xff\xff\xff" | 0xFFFFFFFF | + *-------------------------------------*/ +``` -[data-types-link-to-timestamp_type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#timestamp_type +### `NET.MAKE_NET` -## Interval functions +``` +NET.MAKE_NET(address, prefix_length) +``` -ZetaSQL supports the following interval functions. +**Description** -### Function list +Takes an IPv4 or IPv6 address as STRING and an integer representing the prefix +length (the number of leading 1-bits in the network mask). Returns a +STRING representing the [CIDR subnet][net-link-to-cidr-notation] with the given prefix length. - - - - - - - - +The value of `prefix_length` must be greater than or equal to 0. A smaller value +means a bigger subnet, covering more IP addresses. The result CIDR subnet must +be no smaller than `address`, meaning that the value of `prefix_length` must be +less than or equal to the prefix length in `address`. See the effective upper +bound below. - - - - ++ IPv4: Dotted-quad format, such as `10.1.2.3`. The value of `prefix_length` + must be less than or equal to 32. ++ IPv6: Colon-separated format, such as + `1234:5678:90ab:cdef:1234:5678:90ab:cdef`. The value of `prefix_length` must + be less than or equal to 128. ++ CIDR (IPv4): Dotted-quad format, such as `10.1.2.0/24`. + The value of `prefix_length` must be less than or equal to the number after + the slash in `address` (24 in the example), which must be less than or equal + to 32. ++ CIDR (IPv6): Colon-separated format, such as `1:2::/48`. + The value of `prefix_length` must be less than or equal to the number after + the slash in `address` (48 in the example), which must be less than or equal + to 128. - - - - +**Return Data Type** - - - - +[net-link-to-cidr-notation]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing - - - - +``` +NET.PARSE_IP(addr_str) +``` - - - - +This function is deprecated. It is the same as +[`NET.IPV4_TO_INT64`][net-link-to-ipv4-to-int64]`(`[`NET.IP_FROM_STRING`][net-link-to-ip-from-string]`(addr_str))`, +except that this function truncates the input at the first `'\x00'` character, +if any, while `NET.IP_FROM_STRING` treats `'\x00'` as invalid. - -
NameSummary
EXTRACT +This function supports the following formats for `address`: - - Extracts part of an INTERVAL value. -
JUSTIFY_DAYS +If this function receives a `NULL` input, it returns `NULL`. If the input is +considered invalid, an `OUT_OF_RANGE` error occurs. - - Normalizes the day part of an INTERVAL value. -
JUSTIFY_HOURS +STRING - - Normalizes the time part of an INTERVAL value. -
JUSTIFY_INTERVAL +### `NET.PARSE_IP` (DEPRECATED) + - - Normalizes the day and time parts of an INTERVAL value. -
MAKE_INTERVAL +**Description** - - Constructs an INTERVAL value. -
+**Return Data Type** -### `EXTRACT` +INT64 + +[net-link-to-ip-to-string]: #netip_to_string + +[net-link-to-ipv4-to-int64]: #netipv4_to_int64 + +### `NET.PARSE_PACKED_IP` (DEPRECATED) + -```sql -EXTRACT(part FROM interval_expression) +``` +NET.PARSE_PACKED_IP(addr_str) ``` **Description** -Returns the value corresponding to the specified date part. The `part` must be -one of `YEAR`, `MONTH`, `DAY`, `HOUR`, `MINUTE`, `SECOND`, `MILLISECOND` or -`MICROSECOND`. +This function is deprecated. It is the same as +[`NET.IP_FROM_STRING`][net-link-to-ip-from-string], except that this function truncates +the input at the first `'\x00'` character, if any, while `NET.IP_FROM_STRING` +treats `'\x00'` as invalid. **Return Data Type** -`INTERVAL` - -**Examples** +BYTES -In the following example, different parts of two intervals are extracted. +[net-link-to-ip-from-string]: #netip_from_string -```sql -SELECT - EXTRACT(YEAR FROM i) AS year, - EXTRACT(MONTH FROM i) AS month, - EXTRACT(DAY FROM i) AS day, - EXTRACT(HOUR FROM i) AS hour, - EXTRACT(MINUTE FROM i) AS minute, - EXTRACT(SECOND FROM i) AS second, - EXTRACT(MILLISECOND FROM i) AS milli, - EXTRACT(MICROSECOND FROM i) AS micro -FROM - UNNEST([INTERVAL '1-2 3 4:5:6.789999' YEAR TO SECOND, - INTERVAL '0-13 370 48:61:61' YEAR TO SECOND]) AS i +### `NET.PUBLIC_SUFFIX` -/*------+-------+-----+------+--------+--------+-------+--------* - | year | month | day | hour | minute | second | milli | micro | - +------+-------+-----+------+--------+--------+-------+--------+ - | 1 | 2 | 3 | 4 | 5 | 6 | 789 | 789999 | - | 1 | 1 | 370 | 49 | 2 | 1 | 0 | 0 | - *------+-------+-----+------+--------+--------+-------+--------*/ ``` - -When a negative sign precedes the time part in an interval, the negative sign -distributes over the hours, minutes, and seconds. For example: - -```sql -SELECT - EXTRACT(HOUR FROM i) AS hour, - EXTRACT(MINUTE FROM i) AS minute -FROM - UNNEST([INTERVAL '10 -12:30' DAY TO MINUTE]) AS i - -/*------+--------* - | hour | minute | - +------+--------+ - | -12 | -30 | - *------+--------*/ +NET.PUBLIC_SUFFIX(url) ``` -When a negative sign precedes the year and month part in an interval, the -negative sign distributes over the years and months. For example: +**Description** -```sql -SELECT - EXTRACT(YEAR FROM i) AS year, - EXTRACT(MONTH FROM i) AS month -FROM - UNNEST([INTERVAL '-22-6 10 -12:30' YEAR TO MINUTE]) AS i +Takes a URL as a `STRING` value and returns the public suffix (such as `com`, +`org`, or `net`). A public suffix is an ICANN domain registered at +[publicsuffix.org][net-link-to-public-suffix]. For best results, URL values +should comply with the format as defined by +[RFC 3986][net-link-to-rfc-3986-appendix-a]. If the URL value does not comply +with RFC 3986 formatting, this function makes a best effort to parse the input +and return a relevant result. -/*------+--------* - | year | month | - +------+--------+ - | -22 | -6 | - *------+--------*/ -``` +This function returns `NULL` if any of the following is true: -### `JUSTIFY_DAYS` ++ It cannot parse the host from the input; ++ The parsed host contains adjacent dots in the middle + (not leading or trailing); ++ The parsed host does not contain any public suffix. -```sql -JUSTIFY_DAYS(interval_expression) -``` +Before looking up the public suffix, this function temporarily normalizes the +host by converting uppercase English letters to lowercase and encoding all +non-ASCII characters with [Punycode][net-link-to-punycode]. +The function then returns the public suffix as part of the original host instead +of the normalized host. -**Description** +Note: The function does not perform +[Unicode normalization][unicode-normalization]. -Normalizes the day part of the interval to the range from -29 to 29 by -incrementing/decrementing the month or year part of the interval. +Note: The public suffix data at +[publicsuffix.org][net-link-to-public-suffix] also contains +private domains. This function ignores the private domains. + +Note: The public suffix data may change over time. Consequently, input that +produces a `NULL` result now may produce a non-`NULL` value in the future. **Return Data Type** -`INTERVAL` +`STRING` **Example** ```sql SELECT - JUSTIFY_DAYS(INTERVAL 29 DAY) AS i1, - JUSTIFY_DAYS(INTERVAL -30 DAY) AS i2, - JUSTIFY_DAYS(INTERVAL 31 DAY) AS i3, - JUSTIFY_DAYS(INTERVAL -65 DAY) AS i4, - JUSTIFY_DAYS(INTERVAL 370 DAY) AS i5 - -/*--------------+--------------+-------------+---------------+--------------* - | i1 | i2 | i3 | i4 | i5 | - +--------------+--------------+-------------+---------------+--------------+ - | 0-0 29 0:0:0 | -0-1 0 0:0:0 | 0-1 1 0:0:0 | -0-2 -5 0:0:0 | 1-0 10 0:0:0 | - *--------------+--------------+-------------+---------------+--------------*/ -``` - -### `JUSTIFY_HOURS` - -```sql -JUSTIFY_HOURS(interval_expression) + FORMAT("%T", input) AS input, + description, + FORMAT("%T", NET.HOST(input)) AS host, + FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix, + FORMAT("%T", NET.REG_DOMAIN(input)) AS domain +FROM ( + SELECT "" AS input, "invalid input" AS description + UNION ALL SELECT "http://abc.xyz", "standard URL" + UNION ALL SELECT "//user:password@a.b:80/path?query", + "standard URL with relative scheme, port, path and query, but no public suffix" + UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host" + UNION ALL SELECT "http://例å­.å·ç­’纸.中国", "standard URL with internationalized domain name" + UNION ALL SELECT " www.Example.Co.UK ", + "non-standard URL with spaces, upper case letters, and without scheme" + UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported" +); ``` -**Description** +| input | description | host | suffix | domain | +|--------------------------------------------------------------------|-------------------------------------------------------------------------------|--------------------|---------|----------------| +| "" | invalid input | NULL | NULL | NULL | +| "http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" | +| "//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL | +| "https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL | +| "http://例å­.å·ç­’纸.中国" | standard URL with internationalized domain name | "例å­.å·ç­’纸.中国" | "中国" | "å·ç­’纸.中国" | +| "    www.Example.Co.UK    "| non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK"| "Co.UK" | "Example.Co.UK | +| "mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL | -Normalizes the time part of the interval to the range from -23:59:59.999999 to -23:59:59.999999 by incrementing/decrementing the day part of the interval. +[unicode-normalization]: https://en.wikipedia.org/wiki/Unicode_equivalence -**Return Data Type** +[net-link-to-punycode]: https://en.wikipedia.org/wiki/Punycode -`INTERVAL` +[net-link-to-public-suffix]: https://publicsuffix.org/list/ -**Example** +[net-link-to-rfc-3986-appendix-a]: https://tools.ietf.org/html/rfc3986#appendix-A -```sql -SELECT - JUSTIFY_HOURS(INTERVAL 23 HOUR) AS i1, - JUSTIFY_HOURS(INTERVAL -24 HOUR) AS i2, - JUSTIFY_HOURS(INTERVAL 47 HOUR) AS i3, - JUSTIFY_HOURS(INTERVAL -12345 MINUTE) AS i4 +### `NET.REG_DOMAIN` -/*--------------+--------------+--------------+-----------------* - | i1 | i2 | i3 | i4 | - +--------------+--------------+--------------+-----------------+ - | 0-0 0 23:0:0 | 0-0 -1 0:0:0 | 0-0 1 23:0:0 | 0-0 -8 -13:45:0 | - *--------------+--------------+--------------+-----------------*/ ``` - -### `JUSTIFY_INTERVAL` - -```sql -JUSTIFY_INTERVAL(interval_expression) +NET.REG_DOMAIN(url) ``` **Description** -Normalizes the days and time parts of the interval. +Takes a URL as a string and returns the registered or registrable domain (the +[public suffix](#netpublic_suffix) plus one preceding label), as a +string. For best results, URL values should comply with the format as defined by +[RFC 3986][net-link-to-rfc-3986-appendix-a]. If the URL value does not comply +with RFC 3986 formatting, this function makes a best effort to parse the input +and return a relevant result. + +This function returns `NULL` if any of the following is true: + ++ It cannot parse the host from the input; ++ The parsed host contains adjacent dots in the middle + (not leading or trailing); ++ The parsed host does not contain any public suffix; ++ The parsed host contains only a public suffix without any preceding label. + +Before looking up the public suffix, this function temporarily normalizes the +host by converting uppercase English letters to lowercase and encoding all +non-ASCII characters with [Punycode][net-link-to-punycode]. The function then +returns the registered or registerable domain as part of the original host +instead of the normalized host. + +Note: The function does not perform +[Unicode normalization][unicode-normalization]. + +Note: The public suffix data at +[publicsuffix.org][net-link-to-public-suffix] also contains +private domains. This function does not treat a private domain as a public +suffix. For example, if `us.com` is a private domain in the public suffix data, +`NET.REG_DOMAIN("foo.us.com")` returns `us.com` (the public suffix `com` plus +the preceding label `us`) rather than `foo.us.com` (the private domain `us.com` +plus the preceding label `foo`). + +Note: The public suffix data may change over time. +Consequently, input that produces a `NULL` result now may produce a non-`NULL` +value in the future. **Return Data Type** -`INTERVAL` +`STRING` **Example** ```sql -SELECT JUSTIFY_INTERVAL(INTERVAL '29 49:00:00' DAY TO SECOND) AS i - -/*-------------* - | i | - +-------------+ - | 0-1 1 1:0:0 | - *-------------*/ +SELECT + FORMAT("%T", input) AS input, + description, + FORMAT("%T", NET.HOST(input)) AS host, + FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix, + FORMAT("%T", NET.REG_DOMAIN(input)) AS domain +FROM ( + SELECT "" AS input, "invalid input" AS description + UNION ALL SELECT "http://abc.xyz", "standard URL" + UNION ALL SELECT "//user:password@a.b:80/path?query", + "standard URL with relative scheme, port, path and query, but no public suffix" + UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host" + UNION ALL SELECT "http://例å­.å·ç­’纸.中国", "standard URL with internationalized domain name" + UNION ALL SELECT " www.Example.Co.UK ", + "non-standard URL with spaces, upper case letters, and without scheme" + UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported" +); ``` -### `MAKE_INTERVAL` +| input | description | host | suffix | domain | +|--------------------------------------------------------------------|-------------------------------------------------------------------------------|--------------------|---------|----------------| +| "" | invalid input | NULL | NULL | NULL | +| "http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" | +| "//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL | +| "https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL | +| "http://例å­.å·ç­’纸.中国" | standard URL with internationalized domain name | "例å­.å·ç­’纸.中国" | "中国" | "å·ç­’纸.中国" | +| "    www.Example.Co.UK    "| non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK"| "Co.UK" | "Example.Co.UK"| +| "mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL | -```sql -MAKE_INTERVAL([year][, month][, day][, hour][, minute][, second]) +[unicode-normalization]: https://en.wikipedia.org/wiki/Unicode_equivalence + +[net-link-to-public-suffix]: https://publicsuffix.org/list/ + +[net-link-to-punycode]: https://en.wikipedia.org/wiki/Punycode + +[net-link-to-rfc-3986-appendix-a]: https://tools.ietf.org/html/rfc3986#appendix-A + +### `NET.SAFE_IP_FROM_STRING` + +``` +NET.SAFE_IP_FROM_STRING(addr_str) ``` **Description** -Constructs an [`INTERVAL`][interval-type] object using `INT64` values -representing the year, month, day, hour, minute, and second. All arguments are -optional, `0` by default, and can be [named arguments][named-arguments]. +Similar to [`NET.IP_FROM_STRING`][net-link-to-ip-from-string], but returns `NULL` +instead of throwing an error if the input is invalid. **Return Data Type** -`INTERVAL` +BYTES **Example** ```sql SELECT - MAKE_INTERVAL(1, 6, 15) AS i1, - MAKE_INTERVAL(hour => 10, second => 20) AS i2, - MAKE_INTERVAL(1, minute => 5, day => 2) AS i3 + addr_str, + FORMAT("%T", NET.SAFE_IP_FROM_STRING(addr_str)) AS safe_ip_from_string +FROM UNNEST([ + '48.49.50.51', + '::1', + '3031:3233:3435:3637:3839:4041:4243:4445', + '::ffff:192.0.2.128', + '48.49.50.51/32', + '48.49.50', + '::wxyz' +]) AS addr_str; -/*--------------+---------------+-------------* - | i1 | i2 | i3 | - +--------------+---------------+-------------+ - | 1-6 15 0:0:0 | 0-0 0 10:0:20 | 1-0 2 0:5:0 | - *--------------+---------------+-------------*/ +/*---------------------------------------------------------------------------------------------------------------* + | addr_str | safe_ip_from_string | + +---------------------------------------------------------------------------------------------------------------+ + | 48.49.50.51 | b"0123" | + | ::1 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" | + | 3031:3233:3435:3637:3839:4041:4243:4445 | b"0123456789@ABCDE" | + | ::ffff:192.0.2.128 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" | + | 48.49.50.51/32 | NULL | + | 48.49.50 | NULL | + | ::wxyz | NULL | + *---------------------------------------------------------------------------------------------------------------*/ ``` -[interval-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#interval_type - -[named-arguments]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#named_arguments - -## Geography functions - -ZetaSQL supports geography functions. -Geography functions operate on or generate ZetaSQL -`GEOGRAPHY` values. The signature of most geography -functions starts with `ST_`. ZetaSQL supports the following functions -that can be used to analyze geographical data, determine spatial relationships -between geographical features, and construct or manipulate -`GEOGRAPHY`s. - -All ZetaSQL geography functions return `NULL` if any input argument -is `NULL`. +[net-link-to-ip-from-string]: #netip_from_string -### Categories +## Numbering functions -The geography functions are grouped into the following categories based on their -behavior: +ZetaSQL supports numbering functions. +Numbering functions are a subset of window functions. To create a +window function call and learn about the syntax for window functions, +see [Window function calls][window-function-calls]. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CategoryFunctionsDescription
Constructors - ST_GEOGPOINT
- ST_MAKELINE
- ST_MAKEPOLYGON
- ST_MAKEPOLYGONORIENTED -
- Functions that build new - geography values from coordinates - or existing geographies. -
Parsers - ST_GEOGFROM
- ST_GEOGFROMGEOJSON
- ST_GEOGFROMKML
- ST_GEOGFROMTEXT
- ST_GEOGFROMWKB
- ST_GEOGPOINTFROMGEOHASH
-
- Functions that create geographies - from an external format such as - WKT and - GeoJSON. -
Formatters - ST_ASBINARY
- ST_ASGEOJSON
- ST_ASKML
- ST_ASTEXT
- ST_GEOHASH -
- Functions that export geographies - to an external format such as WKT. -
Transformations - ST_ACCUM (Aggregate)
- ST_BOUNDARY
- ST_BUFFER
- ST_BUFFERWITHTOLERANCE
- ST_CENTROID
- ST_CENTROID_AGG (Aggregate)
- ST_CLOSESTPOINT
- ST_CONVEXHULL
- ST_DIFFERENCE
- ST_EXTERIORRING
- ST_INTERIORRINGS
- ST_INTERSECTION
- ST_LINESUBSTRING
- ST_SIMPLIFY
- ST_SNAPTOGRID
- ST_UNION
- ST_UNION_AGG (Aggregate)
-
- Functions that generate a new - geography based on input. -
Accessors - ST_DIMENSION
- ST_DUMP
- ST_DUMPPOINTS
- ST_ENDPOINT
- ST_GEOMETRYTYPE
- ST_ISCLOSED
- ST_ISCOLLECTION
- ST_ISEMPTY
- ST_ISRING
- ST_NPOINTS
- ST_NUMGEOMETRIES
- ST_NUMPOINTS
- ST_POINTN
- ST_STARTPOINT
- ST_X
- ST_Y
-
- Functions that provide access to - properties of a geography without - side-effects. -
Predicates - ST_CONTAINS
- ST_COVEREDBY
- ST_COVERS
- ST_DISJOINT
- ST_DWITHIN
- ST_EQUALS
- ST_INTERSECTS
- ST_INTERSECTSBOX
- ST_TOUCHES
- ST_WITHIN
-
- Functions that return TRUE or - FALSE for some spatial - relationship between two - geographies or some property of - a geography. These functions - are commonly used in filter - clauses. -
Measures - ST_ANGLE
- ST_AREA
- ST_AZIMUTH
- ST_BOUNDINGBOX
- ST_DISTANCE
- ST_EXTENT (Aggregate)
- ST_HAUSDORFFDISTANCE
- ST_LINELOCATEPOINT
- ST_LENGTH
- ST_MAXDISTANCE
- ST_PERIMETER
-
- Functions that compute measurements - of one or more geographies. -
Clustering - ST_CLUSTERDBSCAN - - Functions that perform clustering on geographies. -
+Numbering functions assign integer values to each row based on their position +within the specified window. The `OVER` clause syntax varies across +numbering functions. ### Function list @@ -31406,3608 +29376,8097 @@ behavior: - ST_ACCUM + CUME_DIST - Aggregates GEOGRAPHY values into an array of - GEOGRAPHY elements. + Gets the cumulative distribution (relative position (0,1]) of each row + within a window. - ST_ANGLE + DENSE_RANK - Takes three point GEOGRAPHY values, which represent two - intersecting lines, and returns the angle between these lines. + Gets the dense rank (1-based, no gaps) of each row within a window. - ST_AREA + NTILE - Gets the area covered by the polygons in a GEOGRAPHY value. + Gets the quantile bucket number (1-based) of each row within a window. - ST_ASBINARY + PERCENT_RANK - Converts a GEOGRAPHY value to a - BYTES WKB geography value. + Gets the percentile rank (from 0 to 1) of each row within a window. - ST_ASGEOJSON + RANK - Converts a GEOGRAPHY value to a STRING - GeoJSON geography value. + Gets the rank (1-based) of each row within a window. - ST_ASKML + ROW_NUMBER - Converts a GEOGRAPHY value to a STRING - KML geometry value. + Gets the sequential row number (1-based) of each row within a window. - - ST_ASTEXT - - - - Converts a GEOGRAPHY value to a - STRING WKT geography value. - - + + - - ST_AZIMUTH +### `CUME_DIST` - - - Gets the azimuth of a line segment formed by two - point GEOGRAPHY values. - - +```sql +CUME_DIST() +OVER over_clause - - ST_BOUNDARY +over_clause: + { named_window | ( [ window_specification ] ) } - - - Gets the union of component boundaries in a - GEOGRAPHY value. - - +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] - - ST_BOUNDINGBOX +``` - - - Gets the bounding box for a GEOGRAPHY value. - - +**Description** - - ST_BUFFER +Return the relative rank of a row defined as NP/NR. NP is defined to be the +number of rows that either precede or are peers with the current row. NR is the +number of rows in the partition. - - - Gets the buffer around a GEOGRAPHY value, using a specific - number of segments. - - +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. - - ST_BUFFERWITHTOLERANCE + - - - Gets the buffer around a GEOGRAPHY value, using tolerance. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - ST_CENTROID + - - - Gets the centroid of a GEOGRAPHY value. - - +**Return Type** - - ST_CLOSESTPOINT +`DOUBLE` - - - Gets the point on a GEOGRAPHY value which is closest to any - point in a second GEOGRAPHY value. - - +**Example** - - ST_CLUSTERDBSCAN +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') +SELECT name, + finish_time, + division, + CUME_DIST() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank +FROM finishers; - - - Performs DBSCAN clustering on a group of GEOGRAPHY values and - produces a 0-based cluster number for this row. - - - - - ST_CONTAINS - - - - Checks if one GEOGRAPHY value contains another - GEOGRAPHY value. - - +/*-----------------+------------------------+----------+-------------* + | name | finish_time | division | finish_rank | + +-----------------+------------------------+----------+-------------+ + | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 0.25 | + | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 0.75 | + | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 0.75 | + | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 1 | + | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 0.25 | + | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 0.5 | + | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 0.75 | + | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 1 | + *-----------------+------------------------+----------+-------------*/ +``` - - ST_CONVEXHULL +### `DENSE_RANK` - - - Returns the convex hull for a GEOGRAPHY value. - - +```sql +DENSE_RANK() +OVER over_clause - - ST_COVEREDBY +over_clause: + { named_window | ( [ window_specification ] ) } - - - Checks if all points of a GEOGRAPHY value are on the boundary - or interior of another GEOGRAPHY value. - - +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] - - ST_COVERS +``` - - - Checks if all points of a GEOGRAPHY value are on the boundary - or interior of another GEOGRAPHY value. - - +**Description** - - ST_DIFFERENCE +Returns the ordinal (1-based) rank of each row within the window partition. +All peer rows receive the same rank value, and the subsequent rank value is +incremented by one. - - - Gets the point set difference between two GEOGRAPHY values. - - +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. - - ST_DIMENSION + - - - Gets the dimension of the highest-dimensional element in a - GEOGRAPHY value. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - ST_DISJOINT + - - - Checks if two GEOGRAPHY values are disjoint (do not intersect). - - +**Return Type** - - ST_DISTANCE +`INT64` - - - Gets the shortest distance in meters between two GEOGRAPHY - values. - - +**Examples** - - ST_DUMP +```sql +WITH Numbers AS + (SELECT 1 as x + UNION ALL SELECT 2 + UNION ALL SELECT 2 + UNION ALL SELECT 5 + UNION ALL SELECT 8 + UNION ALL SELECT 10 + UNION ALL SELECT 10 +) +SELECT x, + DENSE_RANK() OVER (ORDER BY x ASC) AS dense_rank +FROM Numbers - - - Returns an array of simple GEOGRAPHY components in a - GEOGRAPHY value. - - +/*-------------------------* + | x | dense_rank | + +-------------------------+ + | 1 | 1 | + | 2 | 2 | + | 2 | 2 | + | 5 | 3 | + | 8 | 4 | + | 10 | 5 | + | 10 | 5 | + *-------------------------*/ +``` - - ST_DUMPPOINTS +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') +SELECT name, + finish_time, + division, + DENSE_RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank +FROM finishers; - - - Produces an array of GEOGRAPHY points with all points, line - vertices, and polygon vertices in a GEOGRAPHY value. - - +/*-----------------+------------------------+----------+-------------* + | name | finish_time | division | finish_rank | + +-----------------+------------------------+----------+-------------+ + | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | + | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 | + | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 | + | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 3 | + | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | + | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 | + | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 | + | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 | + *-----------------+------------------------+----------+-------------*/ +``` - - ST_DWITHIN +### `NTILE` - - - Checks if any points in two GEOGRAPHY values are within a given - distance. - - +```sql +NTILE(constant_integer_expression) +OVER over_clause - - ST_ENDPOINT +over_clause: + { named_window | ( [ window_specification ] ) } - - - Gets the last point of a linestring GEOGRAPHY value. - - +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] - - ST_EQUALS +``` - - - Checks if two GEOGRAPHY values represent the same - GEOGRAPHY value. - - +**Description** - - ST_EXTENT +This function divides the rows into `constant_integer_expression` +buckets based on row ordering and returns the 1-based bucket number that is +assigned to each row. The number of rows in the buckets can differ by at most 1. +The remainder values (the remainder of number of rows divided by buckets) are +distributed one for each bucket, starting with bucket 1. If +`constant_integer_expression` evaluates to NULL, 0 or negative, an +error is provided. - - - Gets the bounding box for a group of GEOGRAPHY values. - - +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. - - ST_EXTERIORRING + - - - Returns a linestring GEOGRAPHY value that corresponds to the - outermost ring of a polygon GEOGRAPHY value. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - ST_GEOGFROM + - - - Converts a STRING or BYTES value - into a GEOGRAPHY value. - - +**Return Type** - - ST_GEOGFROMGEOJSON +`INT64` - - - Converts a STRING GeoJSON geometry value into a - GEOGRAPHY value. - - +**Example** - - ST_GEOGFROMKML +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') +SELECT name, + finish_time, + division, + NTILE(3) OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank +FROM finishers; - - - Converts a STRING KML geometry value into a - GEOGRAPHY value. - - +/*-----------------+------------------------+----------+-------------* + | name | finish_time | division | finish_rank | + +-----------------+------------------------+----------+-------------+ + | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | + | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 1 | + | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 | + | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 3 | + | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | + | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 1 | + | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 2 | + | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 3 | + *-----------------+------------------------+----------+-------------*/ +``` - - ST_GEOGFROMTEXT +### `PERCENT_RANK` - - - Converts a STRING WKT geometry value into a - GEOGRAPHY value. - - +```sql +PERCENT_RANK() +OVER over_clause - - ST_GEOGFROMWKB +over_clause: + { named_window | ( [ window_specification ] ) } - - - Converts a BYTES or hexadecimal-text STRING WKT - geometry value into a GEOGRAPHY value. - - +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] - - ST_GEOGPOINT +``` - - - Creates a point GEOGRAPHY value for a given longitude and - latitude. - - +**Description** - - ST_GEOGPOINTFROMGEOHASH +Return the percentile rank of a row defined as (RK-1)/(NR-1), where RK is +the `RANK` of the row and NR is the number of rows in the partition. +Returns 0 if NR=1. - - - Gets a point GEOGRAPHY value that is in the middle of a - bounding box defined in a STRING GeoHash value. - - +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. - - ST_GEOHASH + - - - Converts a point GEOGRAPHY value to a STRING - GeoHash value. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - ST_GEOMETRYTYPE + - - - Gets the Open Geospatial Consortium (OGC) geometry type for a - GEOGRAPHY value. - - +**Return Type** - - ST_HAUSDORFFDISTANCE +`DOUBLE` - - Gets the discrete Hausdorff distance between two geometries. - +**Example** - - ST_INTERIORRINGS +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') +SELECT name, + finish_time, + division, + PERCENT_RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank +FROM finishers; - - - Gets the interior rings of a polygon GEOGRAPHY value. - - +/*-----------------+------------------------+----------+---------------------* + | name | finish_time | division | finish_rank | + +-----------------+------------------------+----------+---------------------+ + | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 0 | + | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 0.33333333333333331 | + | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 0.33333333333333331 | + | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 1 | + | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 0 | + | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 0.33333333333333331 | + | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 0.66666666666666663 | + | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 1 | + *-----------------+------------------------+----------+---------------------*/ +``` - - ST_INTERSECTION +### `RANK` - - - Gets the point set intersection of two GEOGRAPHY values. - - +```sql +RANK() +OVER over_clause - - ST_INTERSECTS +over_clause: + { named_window | ( [ window_specification ] ) } - - - Checks if at least one point appears in two GEOGRAPHY - values. - - +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + ORDER BY expression [ { ASC | DESC } ] [, ...] - - ST_INTERSECTSBOX +``` - - - Checks if a GEOGRAPHY value intersects a rectangle. - - +**Description** - - ST_ISCLOSED +Returns the ordinal (1-based) rank of each row within the ordered partition. +All peer rows receive the same rank value. The next row or set of peer rows +receives a rank value which increments by the number of peers with the previous +rank value, instead of `DENSE_RANK`, which always increments by 1. - - - Checks if all components in a GEOGRAPHY value are closed. - - +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. - - ST_ISCOLLECTION + - - - Checks if the total number of points, linestrings, and polygons is - greater than one in a GEOGRAPHY value. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - ST_ISEMPTY + - - - Checks if a GEOGRAPHY value is empty. - - +**Return Type** - - ST_ISRING +`INT64` - - - Checks if a GEOGRAPHY value is a closed, simple - linestring. - - +**Examples** - - ST_LENGTH +```sql +WITH Numbers AS + (SELECT 1 as x + UNION ALL SELECT 2 + UNION ALL SELECT 2 + UNION ALL SELECT 5 + UNION ALL SELECT 8 + UNION ALL SELECT 10 + UNION ALL SELECT 10 +) +SELECT x, + RANK() OVER (ORDER BY x ASC) AS rank +FROM Numbers - - - Gets the total length of lines in a GEOGRAPHY value. - - +/*-------------------------* + | x | rank | + +-------------------------+ + | 1 | 1 | + | 2 | 2 | + | 2 | 2 | + | 5 | 4 | + | 8 | 5 | + | 10 | 6 | + | 10 | 6 | + *-------------------------*/ +``` - - ST_LINELOCATEPOINT +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') +SELECT name, + finish_time, + division, + RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank +FROM finishers; - - - Gets a section of a linestring GEOGRAPHY value between the - start point and a point GEOGRAPHY value. - - +/*-----------------+------------------------+----------+-------------* + | name | finish_time | division | finish_rank | + +-----------------+------------------------+----------+-------------+ + | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | + | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 | + | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 | + | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 4 | + | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | + | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 | + | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 | + | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 | + *-----------------+------------------------+----------+-------------*/ +``` - - ST_LINESUBSTRING +### `ROW_NUMBER` - - - Gets a segment of a single linestring at a specific starting and - ending fraction. - - +```sql +ROW_NUMBER() +OVER over_clause - - ST_MAKELINE +over_clause: + { named_window | ( [ window_specification ] ) } - - - Creates a linestring GEOGRAPHY value by concatenating the point - and linestring vertices of GEOGRAPHY values. - - +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - - ST_MAKEPOLYGON +``` - - - Constructs a polygon GEOGRAPHY value by combining - a polygon shell with polygon holes. - - +**Description** - - ST_MAKEPOLYGONORIENTED +Does not require the `ORDER BY` clause. Returns the sequential +row ordinal (1-based) of each row for each ordered partition. If the +`ORDER BY` clause is unspecified then the result is +non-deterministic. - - - Constructs a polygon GEOGRAPHY value, using an array of - linestring GEOGRAPHY values. The vertex ordering of each - linestring determines the orientation of each polygon ring. - - +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. - - ST_MAXDISTANCE + - - - Gets the longest distance between two non-empty - GEOGRAPHY values. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - ST_NPOINTS + - - - An alias of ST_NUMPOINTS. - - +**Return Type** - - ST_NUMGEOMETRIES +`INT64` - - - Gets the number of geometries in a GEOGRAPHY value. - - +**Examples** - - ST_NUMPOINTS +```sql +WITH Numbers AS + (SELECT 1 as x + UNION ALL SELECT 2 + UNION ALL SELECT 2 + UNION ALL SELECT 5 + UNION ALL SELECT 8 + UNION ALL SELECT 10 + UNION ALL SELECT 10 +) +SELECT x, + ROW_NUMBER() OVER (ORDER BY x) AS row_num +FROM Numbers - - - Gets the number of vertices in the a GEOGRAPHY value. - - +/*-------------------------* + | x | row_num | + +-------------------------+ + | 1 | 1 | + | 2 | 2 | + | 2 | 3 | + | 5 | 4 | + | 8 | 5 | + | 10 | 6 | + | 10 | 7 | + *-------------------------*/ +``` - - ST_PERIMETER +```sql +WITH finishers AS + (SELECT 'Sophia Liu' as name, + TIMESTAMP '2016-10-18 2:51:45' as finish_time, + 'F30-34' as division + UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39' + UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34' + UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39' + UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39' + UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39' + UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34' + UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34') +SELECT name, + finish_time, + division, + ROW_NUMBER() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank +FROM finishers; - - - Gets the length of the boundary of the polygons in a - GEOGRAPHY value. - - +/*-----------------+------------------------+----------+-------------* + | name | finish_time | division | finish_rank | + +-----------------+------------------------+----------+-------------+ + | Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 | + | Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 | + | Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 3 | + | Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 4 | + | Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 | + | Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 | + | Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 | + | Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 | + *-----------------+------------------------+----------+-------------*/ +``` - - ST_POINTN + - - - Gets the point at a specific index of a linestring GEOGRAPHY - value. - - +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - ST_SIMPLIFY + - - - Converts a GEOGRAPHY value into a simplified - GEOGRAPHY value, using tolerance. - - +## Protocol buffer functions + +ZetaSQL supports the following protocol buffer functions. + +### Function list + + + + + + + + + - - - - - - - -
NameSummary
ST_SNAPTOGRID + CONTAINS_KEY - Produces a GEOGRAPHY value, where each vertex has - been snapped to a longitude/latitude grid. + Checks if a protocol buffer map field contains a given key.
ST_STARTPOINT + EXTRACT - Gets the first point of a linestring GEOGRAPHY value. + Extracts a value or metadata from a protocol buffer.
ST_TOUCHES + FILTER_FIELDS - Checks if two GEOGRAPHY values intersect and their interiors - have no elements in common. + Removed unwanted fields from a protocol buffer.
ST_UNION + FROM_PROTO - Gets the point set union of multiple GEOGRAPHY values. + Converts a protocol buffer value into ZetaSQL value.
ST_UNION_AGG + MODIFY_MAP - Aggregates over GEOGRAPHY values and gets their - point set union. + Modifies a protocol buffer map field.
ST_WITHIN + PROTO_DEFAULT_IF_NULL - Checks if one GEOGRAPHY value contains another - GEOGRAPHY value. + Produces the default protocol buffer field value if the + protocol buffer field is NULL. Otherwise, returns the + protocol buffer field value.
ST_X + REPLACE_FIELDS - Gets the longitude from a point GEOGRAPHY value. + Replaces the values in one or more protocol buffer fields.
ST_Y + TO_PROTO - Gets the latitude from a point GEOGRAPHY value. + Converts a ZetaSQL value into a protocol buffer value.
-### `ST_ACCUM` +### `CONTAINS_KEY` ```sql -ST_ACCUM(geography) +CONTAINS_KEY(proto_map_field_expression, key) ``` **Description** -Takes a `GEOGRAPHY` and returns an array of -`GEOGRAPHY` elements. -This function is identical to [ARRAY_AGG][geography-link-array-agg], -but only applies to `GEOGRAPHY` objects. - -**Return type** - -`ARRAY` +Returns whether a [protocol buffer map field][proto-map] contains a given key. -[geography-link-array-agg]: #array_agg +Input values: -### `ST_ANGLE` ++ `proto_map_field_expression`: A protocol buffer map field. ++ `key`: A key in the protocol buffer map field. -```sql -ST_ANGLE(point_geography_1, point_geography_2, point_geography_3) -``` +`NULL` handling: -**Description** ++ If `map_field` is `NULL`, returns `NULL`. ++ If `key` is `NULL`, returns `FALSE`. -Takes three point `GEOGRAPHY` values, which represent two intersecting lines. -Returns the angle between these lines. Point 2 and point 1 represent the first -line and point 2 and point 3 represent the second line. The angle between -these lines is in radians, in the range `[0, 2pi)`. The angle is measured -clockwise from the first line to the second line. +**Return type** -`ST_ANGLE` has the following edge cases: +`BOOL` -+ If points 2 and 3 are the same, returns `NULL`. -+ If points 2 and 1 are the same, returns `NULL`. -+ If points 2 and 3 are exactly antipodal, returns `NULL`. -+ If points 2 and 1 are exactly antipodal, returns `NULL`. -+ If any of the input geographies are not single points or are the empty - geography, then throws an error. +**Examples** -**Return type** +To illustrate the use of this function, consider the protocol buffer message +`Item`: -`DOUBLE` +```proto +message Item { + optional map purchased = 1; +}; +``` -**Example** +In the following example, the function returns `TRUE` when the key is +present, `FALSE` otherwise. ```sql -WITH geos AS ( - SELECT 1 id, ST_GEOGPOINT(1, 0) geo1, ST_GEOGPOINT(0, 0) geo2, ST_GEOGPOINT(0, 1) geo3 UNION ALL - SELECT 2 id, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0), ST_GEOGPOINT(0, 1) UNION ALL - SELECT 3 id, ST_GEOGPOINT(1, 0), ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0) UNION ALL - SELECT 4 id, ST_GEOGPOINT(1, 0) geo1, ST_GEOGPOINT(0, 0) geo2, ST_GEOGPOINT(0, 0) geo3 UNION ALL - SELECT 5 id, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(-30, 0), ST_GEOGPOINT(150, 0) UNION ALL - SELECT 6 id, ST_GEOGPOINT(0, 0), NULL, NULL UNION ALL - SELECT 7 id, NULL, ST_GEOGPOINT(0, 0), NULL UNION ALL - SELECT 8 id, NULL, NULL, ST_GEOGPOINT(0, 0)) -SELECT ST_ANGLE(geo1,geo2,geo3) AS angle FROM geos ORDER BY id; +SELECT + CONTAINS_KEY(m.purchased, 'A') AS contains_a, + CONTAINS_KEY(m.purchased, 'B') AS contains_b +FROM + (SELECT AS VALUE CAST("purchased { key: 'A' value: 2 }" AS Item)) AS m; -/*---------------------* - | angle | - +---------------------+ - | 4.71238898038469 | - | 0.78547432161873854 | - | 0 | - | NULL | - | NULL | - | NULL | - | NULL | - | NULL | - *---------------------*/ +/*------------+------------* + | contains_a | contains_b | + +------------+------------+ + | TRUE | FALSE | + *------------+------------*/ ``` -### `ST_AREA` +[proto-map]: https://developers.google.com/protocol-buffers/docs/proto3#maps + +### `EXTRACT` + ```sql -ST_AREA(geography_expression[, use_spheroid]) +EXTRACT( extraction_type (proto_field) FROM proto_expression ) + +extraction_type: + { FIELD | RAW | HAS | ONEOF_CASE } ``` **Description** -Returns the area in square meters covered by the polygons in the input -`GEOGRAPHY`. +Extracts a value from a protocol buffer. `proto_expression` represents the +expression that returns a protocol buffer, `proto_field` represents the field +of the protocol buffer to extract from, and `extraction_type` determines the +type of data to return. `EXTRACT` can be used to get values of ambiguous fields. +An alternative to `EXTRACT` is the [dot operator][querying-protocol-buffers]. -If `geography_expression` is a point or a line, returns zero. If -`geography_expression` is a collection, returns the area of the polygons in the -collection; if the collection does not contain polygons, returns zero. +**Extraction Types** -The optional `use_spheroid` parameter determines how this function measures -distance. If `use_spheroid` is `FALSE`, the function measures distance on the -surface of a perfect sphere. +You can choose the type of information to get with `EXTRACT`. Your choices are: -The `use_spheroid` parameter currently only supports -the value `FALSE`. The default value of `use_spheroid` is `FALSE`. ++ `FIELD`: Extract a value from a protocol buffer field. ++ `RAW`: Extract an uninterpreted value from a + protocol buffer field. Raw values + ignore any ZetaSQL type annotations. ++ `HAS`: Returns `TRUE` if a protocol buffer field is set in a proto message; + otherwise, `FALSE`. Returns an error if this is used with a scalar proto3 + field. Alternatively, use [`has_x`][has-value], to perform this task. ++ `ONEOF_CASE`: Returns the name of the set protocol buffer field in a Oneof. + If no field is set, returns an empty string. -**Return type** +**Return Type** -`DOUBLE` +The return type depends upon the extraction type in the query. -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System ++ `FIELD`: Protocol buffer field type. ++ `RAW`: Protocol buffer field + type. Format annotations are + ignored. ++ `HAS`: `BOOL` ++ `ONEOF_CASE`: `STRING` -### `ST_ASBINARY` +**Examples** + +The examples in this section reference two protocol buffers called `Album` and +`Chart`, and one table called `AlbumList`. + +```proto +message Album { + optional string album_name = 1; + repeated string song = 2; + oneof group_name { + string solo = 3; + string duet = 4; + string band = 5; + } +} +``` + +```proto +message Chart { + optional int64 date = 1 [(zetasql.format) = DATE]; + optional string chart_name = 2; + optional int64 rank = 3; +} +``` ```sql -ST_ASBINARY(geography_expression) +WITH AlbumList AS ( + SELECT + NEW Album( + 'Alana Yah' AS solo, + 'New Moon' AS album_name, + ['Sandstorm','Wait'] AS song) AS album_col, + NEW Chart( + 'Billboard' AS chart_name, + '2016-04-23' AS date, + 1 AS rank) AS chart_col + UNION ALL + SELECT + NEW Album( + 'The Roadlands' AS band, + 'Grit' AS album_name, + ['The Way', 'Awake', 'Lost Things'] AS song) AS album_col, + NEW Chart( + 'Billboard' AS chart_name, + 1 as rank) AS chart_col +) +SELECT * FROM AlbumList ``` -**Description** +The following example extracts the album names from a table called `AlbumList` +that contains a proto-typed column called `Album`. -Returns the [WKB][wkb-link] representation of an input -`GEOGRAPHY`. +```sql +SELECT EXTRACT(FIELD(album_name) FROM album_col) AS name_of_album +FROM AlbumList -See [`ST_GEOGFROMWKB`][st-geogfromwkb] to construct a -`GEOGRAPHY` from WKB. +/*------------------* + | name_of_album | + +------------------+ + | New Moon | + | Grit | + *------------------*/ +``` -**Return type** +A table called `AlbumList` contains a proto-typed column called `Album`. +`Album` contains a field called `date`, which can store an integer. The +`date` field has an annotated format called `DATE` assigned to it, which means +that when you extract the value in this field, it returns a `DATE`, not an +`INT64`. -`BYTES` +If you would like to return the value for `date` as an `INT64`, not +as a `DATE`, use the `RAW` extraction type in your query. For example: -[wkb-link]: https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary +```sql +SELECT + EXTRACT(RAW(date) FROM chart_col) AS raw_date, + EXTRACT(FIELD(date) FROM chart_col) AS formatted_date +FROM AlbumList -[st-geogfromwkb]: #st_geogfromwkb +/*----------+----------------* + | raw_date | formatted_date | + +----------+----------------+ + | 16914 | 2016-04-23 | + | 0 | 1970-01-01 | + *----------+----------------*/ +``` -### `ST_ASGEOJSON` +The following example checks to see if release dates exist in a table called +`AlbumList` that contains a protocol buffer called `Chart`. ```sql -ST_ASGEOJSON(geography_expression) +SELECT EXTRACT(HAS(date) FROM chart_col) AS has_release_date +FROM AlbumList + +/*------------------* + | has_release_date | + +------------------+ + | TRUE | + | FALSE | + *------------------*/ ``` -**Description** +The following example extracts the group name that is assigned to an artist in +a table called `AlbumList`. The group name is set for exactly one +protocol buffer field inside of the `group_name` Oneof. The `group_name` Oneof +exists inside the `Chart` protocol buffer. -Returns the [RFC 7946][GeoJSON-spec-link] compliant [GeoJSON][geojson-link] -representation of the input `GEOGRAPHY`. +```sql +SELECT EXTRACT(ONEOF_CASE(group_name) FROM album_col) AS artist_type +FROM AlbumList; -A ZetaSQL `GEOGRAPHY` has spherical -geodesic edges, whereas a GeoJSON `Geometry` object explicitly has planar edges. -To convert between these two types of edges, ZetaSQL adds additional -points to the line where necessary so that the resulting sequence of edges -remains within 10 meters of the original edge. +/*-------------* + | artist_type | + +-------------+ + | solo | + | band | + *-------------*/ +``` -See [`ST_GEOGFROMGEOJSON`][st-geogfromgeojson] to construct a -`GEOGRAPHY` from GeoJSON. +[querying-protocol-buffers]: https://github.com/google/zetasql/blob/master/docs/protocol-buffers.md#querying_protocol_buffers -**Return type** - -`STRING` - -[geojson-spec-link]: https://tools.ietf.org/html/rfc7946 +[has-value]: https://github.com/google/zetasql/blob/master/docs/protocol-buffers.md#checking_if_a_field_has_a_value -[geojson-link]: https://en.wikipedia.org/wiki/GeoJSON +### `FILTER_FIELDS` -[st-geogfromgeojson]: #st_geogfromgeojson +```sql +FILTER_FIELDS(proto_expression, proto_field_list [, reset_fields_named_arg]) -### `ST_ASKML` +proto_field_list: + {+|-}proto_field_path[, ...] -```sql -ST_ASKML(geography) +reset_fields_named_arg: + RESET_CLEARED_REQUIRED_FIELDS => { TRUE | FALSE } ``` **Description** -Takes a `GEOGRAPHY` and returns a `STRING` [KML geometry][kml-geometry-link]. -Coordinates are formatted with as few digits as possible without loss -of precision. - -**Return type** +Takes a protocol buffer and a list of its fields to include or exclude. +Returns a version of that protocol buffer with unwanted fields removed. +Returns `NULL` if the protocol buffer is `NULL`. -`STRING` +Input values: -[kml-geometry-link]: https://developers.google.com/kml/documentation/kmlreference#geometry ++ `proto_expression`: The protocol buffer to filter. ++ `proto_field_list`: The fields to exclude or include in the resulting + protocol buffer. ++ `+`: Include a protocol buffer field and its children in the results. ++ `-`: Exclude a protocol buffer field and its children in the results. ++ `proto_field_path`: The protocol buffer field to include or exclude. + If the field represents an [extension][querying-proto-extensions], you can use + syntax for that extension in the path. ++ `reset_fields_named_arg`: You can optionally add the + `RESET_CLEARED_REQUIRED_FIELDS` named argument. + If not explicitly set, `FALSE` is used implicitly. + If `FALSE`, you must include all protocol buffer `required` fields in the + `FILTER_FIELDS` function. If `TRUE`, you do not need to include all required + protocol buffer fields and the value of required fields + defaults to these values: -### `ST_ASTEXT` + Type | Default value + ----------------------- | -------- + Floating point | `0.0` + Integer | `0` + Boolean | `FALSE` + String, byte | `""` + Protocol buffer message | Empty message -```sql -ST_ASTEXT(geography_expression) -``` +Protocol buffer field expression behavior: -**Description** ++ The first field in `proto_field_list` determines the default + inclusion/exclusion. By default, when you include the first field, all other + fields are excluded. Or by default, when you exclude the first field, all + other fields are included. ++ A required field in the protocol buffer cannot be excluded explicitly or + implicitly, unless you have the + `RESET_CLEARED_REQUIRED_FIELDS` named argument set as `TRUE`. ++ If a field is included, its child fields and descendants are implicitly + included in the results. ++ If a field is excluded, its child fields and descendants are + implicitly excluded in the results. ++ A child field must be listed after its parent field in the argument list, + but does not need to come right after the parent field. -Returns the [WKT][wkt-link] representation of an input -`GEOGRAPHY`. +Caveats: -See [`ST_GEOGFROMTEXT`][st-geogfromtext] to construct a -`GEOGRAPHY` from WKT. ++ If you attempt to exclude/include a field that already has been + implicitly excluded/included, an error is produced. ++ If you attempt to explicitly include/exclude a field that has already + implicitly been included/excluded, an error is produced. **Return type** -`STRING` +Type of `proto_expression` -[wkt-link]: https://en.wikipedia.org/wiki/Well-known_text +**Examples** -[st-geogfromtext]: #st_geogfromtext +The examples in this section reference a protocol buffer called `Award` and +a table called `MusicAwards`. -### `ST_AZIMUTH` +```proto +message Award { + required int32 year = 1; + optional int32 month = 2; + repeated Type type = 3; + + message Type { + optional string award_name = 1; + optional string category = 2; + } +} +``` ```sql -ST_AZIMUTH(point_geography_1, point_geography_2) +WITH + MusicAwards AS ( + SELECT + CAST( + ''' + year: 2001 + month: 9 + type { award_name: 'Best Artist' category: 'Artist' } + type { award_name: 'Best Album' category: 'Album' } + ''' + AS zetasql.examples.music.Award) AS award_col + UNION ALL + SELECT + CAST( + ''' + year: 2001 + month: 12 + type { award_name: 'Best Song' category: 'Song' } + ''' + AS zetasql.examples.music.Award) AS award_col + ) +SELECT * +FROM MusicAwards + +/*---------------------------------------------------------* + | award_col | + +---------------------------------------------------------+ + | { | + | year: 2001 | + | month: 9 | + | type { award_name: "Best Artist" category: "Artist" } | + | type { award_name: "Best Album" category: "Album" } | + | } | + | { | + | year: 2001 | + | month: 12 | + | type { award_name: "Best Song" category: "Song" } | + | } | + *---------------------------------------------------------*/ ``` -**Description** +The following example returns protocol buffers that only include the `year` +field. -Takes two point `GEOGRAPHY` values, and returns the azimuth of the line segment -formed by points 1 and 2. The azimuth is the angle in radians measured between -the line from point 1 facing true North to the line segment from point 1 to -point 2. +```sql +SELECT FILTER_FIELDS(award_col, +year) AS filtered_fields +FROM MusicAwards -The positive angle is measured clockwise on the surface of a sphere. For -example, the azimuth for a line segment: +/*-----------------* + | filtered_fields | + +-----------------+ + | {year: 2001} | + | {year: 2001} | + *-----------------*/ +``` -+ Pointing North is `0` -+ Pointing East is `PI/2` -+ Pointing South is `PI` -+ Pointing West is `3PI/2` +The following example returns protocol buffers that include all but the `type` +field. -`ST_AZIMUTH` has the following edge cases: +```sql +SELECT FILTER_FIELDS(award_col, -type) AS filtered_fields +FROM MusicAwards -+ If the two input points are the same, returns `NULL`. -+ If the two input points are exactly antipodal, returns `NULL`. -+ If either of the input geographies are not single points or are the empty - geography, throws an error. +/*------------------------* + | filtered_fields | + +------------------------+ + | {year: 2001 month: 9} | + | {year: 2001 month: 12} | + *------------------------*/ +``` -**Return type** +The following example returns protocol buffers that only include the `year` and +`type.award_name` fields. -`DOUBLE` +```sql +SELECT FILTER_FIELDS(award_col, +year, +type.award_name) AS filtered_fields +FROM MusicAwards -**Example** +/*--------------------------------------* + | filtered_fields | + +--------------------------------------+ + | { | + | year: 2001 | + | type { award_name: "Best Artist" } | + | type { award_name: "Best Album" } | + | } | + | { | + | year: 2001 | + | type { award_name: "Best Song" } | + | } | + *--------------------------------------*/ +``` + +The following example returns the `year` and `type` fields, but excludes the +`award_name` field in the `type` field. ```sql -WITH geos AS ( - SELECT 1 id, ST_GEOGPOINT(1, 0) AS geo1, ST_GEOGPOINT(0, 0) AS geo2 UNION ALL - SELECT 2, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0) UNION ALL - SELECT 3, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(0, 1) UNION ALL - -- identical - SELECT 4, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(0, 0) UNION ALL - -- antipode - SELECT 5, ST_GEOGPOINT(-30, 0), ST_GEOGPOINT(150, 0) UNION ALL - -- nulls - SELECT 6, ST_GEOGPOINT(0, 0), NULL UNION ALL - SELECT 7, NULL, ST_GEOGPOINT(0, 0)) -SELECT ST_AZIMUTH(geo1, geo2) AS azimuth FROM geos ORDER BY id; +SELECT FILTER_FIELDS(award_col, +year, +type, -type.award_name) AS filtered_fields +FROM MusicAwards -/*--------------------* - | azimuth | - +--------------------+ - | 4.71238898038469 | - | 1.5707963267948966 | - | 0 | - | NULL | - | NULL | - | NULL | - | NULL | - *--------------------*/ +/*---------------------------------* + | filtered_fields | + +---------------------------------+ + | { | + | year: 2001 | + | type { category: "Artist" } | + | type { category: "Album" } | + | } | + | { | + | year: 2001 | + | type { category: "Song" } | + | } | + *---------------------------------*/ ``` -### `ST_BOUNDARY` +The following example produces an error because `year` is a required field +and cannot be excluded explicitly or implicitly from the results. ```sql -ST_BOUNDARY(geography_expression) +SELECT FILTER_FIELDS(award_col, -year) AS filtered_fields +FROM MusicAwards + +-- Error ``` -**Description** +The following example produces an error because when `year` was included, +`month` was implicitly excluded. You cannot explicitly exclude a field that +has already been implicitly excluded. -Returns a single `GEOGRAPHY` that contains the union -of the boundaries of each component in the given input -`GEOGRAPHY`. +```sql +SELECT FILTER_FIELDS(award_col, +year, -month) AS filtered_fields +FROM MusicAwards -The boundary of each component of a `GEOGRAPHY` is -defined as follows: +-- Error +``` -+ The boundary of a point is empty. -+ The boundary of a linestring consists of the endpoints of the linestring. -+ The boundary of a polygon consists of the linestrings that form the polygon - shell and each of the polygon's holes. +When `RESET_CLEARED_REQUIRED_FIELDS` is set as `TRUE`, `FILTER_FIELDS` doesn't +need to include required fields. In the example below, `MusicAwards` has a +required field called `year`, but this is not added as an argument for +`FILTER_FIELDS`. `year` is added to the results with its default value, `0`. -**Return type** +```sql +SELECT FILTER_FIELDS( + award_col, + +month, + RESET_CLEARED_REQUIRED_FIELDS => TRUE) AS filtered_fields +FROM MusicAwards; -`GEOGRAPHY` +/*---------------------------------* + | filtered_fields | + +---------------------------------+ + | { | + | year: 0, | + | month: 9 | + | } | + | { | + | year: 0, | + | month: 12 | + | } | + *---------------------------------*/ +``` -### `ST_BOUNDINGBOX` +[querying-proto-extensions]: https://github.com/google/zetasql/blob/master/docs/protocol-buffers.md#extensions + +### `FROM_PROTO` ```sql -ST_BOUNDINGBOX(geography_expression) +FROM_PROTO(expression) ``` **Description** -Returns a `STRUCT` that represents the bounding box for the specified geography. -The bounding box is the minimal rectangle that encloses the geography. The edges -of the rectangle follow constant lines of longitude and latitude. - -Caveats: +Returns a ZetaSQL value. The valid `expression` types are defined +in the table below, along with the return types that they produce. +Other input `expression` types are invalid. If `expression` cannot be converted +to a valid value, an error is returned. -+ Returns `NULL` if the input is `NULL` or an empty geography. -+ The bounding box might cross the antimeridian if this allows for a smaller - rectangle. In this case, the bounding box has one of its longitudinal bounds - outside of the [-180, 180] range, so that `xmin` is smaller than the eastmost - value `xmax`. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
expression typeReturn type
+
    +
  • INT32
  • +
  • google.protobuf.Int32Value
  • +
+
INT32
+
    +
  • UINT32
  • +
  • google.protobuf.UInt32Value
  • +
+
UINT32
+
    +
  • INT64
  • +
  • google.protobuf.Int64Value
  • +
+
INT64
+
    +
  • UINT64
  • +
  • google.protobuf.UInt64Value
  • +
+
UINT64
+
    +
  • FLOAT
  • +
  • google.protobuf.FloatValue
  • +
+
FLOAT
+
    +
  • DOUBLE
  • +
  • google.protobuf.DoubleValue
  • +
+
DOUBLE
+
    +
  • BOOL
  • +
  • google.protobuf.BoolValue
  • +
+
BOOL
+
    +
  • STRING
  • +
  • + google.protobuf.StringValue +

    + Note: The StringValue + value field must be + UTF-8 encoded. +

    +
  • +
+
STRING
+
    +
  • BYTES
  • +
  • google.protobuf.BytesValue
  • +
+
BYTES
+
    +
  • DATE
  • +
  • google.type.Date
  • +
+
DATE
+
    +
  • TIME
  • +
  • + google.type.TimeOfDay -**Return type** + -`STRUCT`. + -Bounding box parts: +
  • +
+
TIME
+
    +
  • TIMESTAMP
  • +
  • + google.protobuf.Timestamp -+ `xmin`: The westmost constant longitude line that bounds the rectangle. -+ `xmax`: The eastmost constant longitude line that bounds the rectangle. -+ `ymin`: The minimum constant latitude line that bounds the rectangle. -+ `ymax`: The maximum constant latitude line that bounds the rectangle. + -**Example** + -```sql -WITH data AS ( - SELECT 1 id, ST_GEOGFROMTEXT('POLYGON((-125 48, -124 46, -117 46, -117 49, -125 48))') g - UNION ALL - SELECT 2 id, ST_GEOGFROMTEXT('POLYGON((172 53, -130 55, -141 70, 172 53))') g - UNION ALL - SELECT 3 id, ST_GEOGFROMTEXT('POINT EMPTY') g - UNION ALL - SELECT 4 id, ST_GEOGFROMTEXT('POLYGON((172 53, -141 70, -130 55, 172 53))', oriented => TRUE) -) -SELECT id, ST_BOUNDINGBOX(g) AS box -FROM data +
  • +
+
TIMESTAMP
-/*----+------------------------------------------* - | id | box | - +----+------------------------------------------+ - | 1 | {xmin:-125, ymin:46, xmax:-117, ymax:49} | - | 2 | {xmin:172, ymin:53, xmax:230, ymax:70} | - | 3 | NULL | - | 4 | {xmin:-180, ymin:-90, xmax:180, ymax:90} | - *----+------------------------------------------*/ -``` +**Return Type** -See [`ST_EXTENT`][st-extent] for the aggregate version of `ST_BOUNDINGBOX`. +The return type depends upon the `expression` type. See the return types +in the table above. -[st-extent]: #st_extent +**Examples** -### `ST_BUFFER` +Convert a `google.type.Date` type into a `DATE` type. ```sql -ST_BUFFER( - geography, - buffer_radius - [, num_seg_quarter_circle => num_segments] - [, use_spheroid => boolean_expression] - [, endcap => endcap_style] - [, side => line_side]) +SELECT FROM_PROTO( + new google.type.Date( + 2019 as year, + 10 as month, + 30 as day + ) +) + +/*------------* + | $col1 | + +------------+ + | 2019-10-30 | + *------------*/ ``` -**Description** +Pass in and return a `DATE` type. -Returns a `GEOGRAPHY` that represents the buffer around the input `GEOGRAPHY`. -This function is similar to [`ST_BUFFERWITHTOLERANCE`][st-bufferwithtolerance], -but you specify the number of segments instead of providing tolerance to -determine how much the resulting geography can deviate from the ideal -buffer radius. +```sql +SELECT FROM_PROTO(DATE '2019-10-30') -+ `geography`: The input `GEOGRAPHY` to encircle with the buffer radius. -+ `buffer_radius`: `DOUBLE` that represents the radius of the buffer - around the input geography. The radius is in meters. Note that polygons - contract when buffered with a negative `buffer_radius`. Polygon shells and - holes that are contracted to a point are discarded. -+ `num_seg_quarter_circle`: (Optional) `DOUBLE` specifies the number of - segments that are used to approximate a quarter circle. The default value is - `8.0`. Naming this argument is optional. -+ `endcap`: (Optional) `STRING` allows you to specify one of two endcap - styles: `ROUND` and `FLAT`. The default value is `ROUND`. This option only - affects the endcaps of buffered linestrings. -+ `side`: (Optional) `STRING` allows you to specify one of three possibilities - for lines: `BOTH`, `LEFT`, and `RIGHT`. The default is `BOTH`. This option - only affects how linestrings are buffered. -+ `use_spheroid`: (Optional) `BOOL` determines how this function measures - distance. If `use_spheroid` is `FALSE`, the function measures distance on - the surface of a perfect sphere. The `use_spheroid` parameter - currently only supports the value `FALSE`. The default value of - `use_spheroid` is `FALSE`. - -**Return type** - -Polygon `GEOGRAPHY` - -**Example** +/*------------* + | $col1 | + +------------+ + | 2019-10-30 | + *------------*/ +``` -The following example shows the result of `ST_BUFFER` on a point. A buffered -point is an approximated circle. When `num_seg_quarter_circle = 2`, there are -two line segments in a quarter circle, and therefore the buffered circle has -eight sides and [`ST_NUMPOINTS`][st-numpoints] returns nine vertices. When -`num_seg_quarter_circle = 8`, there are eight line segments in a quarter circle, -and therefore the buffered circle has thirty-two sides and -[`ST_NUMPOINTS`][st-numpoints] returns thirty-three vertices. +### `MODIFY_MAP` ```sql -SELECT - -- num_seg_quarter_circle=2 - ST_NUMPOINTS(ST_BUFFER(ST_GEOGFROMTEXT('POINT(1 2)'), 50, 2)) AS eight_sides, - -- num_seg_quarter_circle=8, since 8 is the default - ST_NUMPOINTS(ST_BUFFER(ST_GEOGFROMTEXT('POINT(100 2)'), 50)) AS thirty_two_sides; +MODIFY_MAP(proto_map_field_expression, key_value_pair[, ...]) -/*-------------+------------------* - | eight_sides | thirty_two_sides | - +-------------+------------------+ - | 9 | 33 | - *-------------+------------------*/ +key_value_pair: + key, value ``` -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +**Description** -[st-bufferwithtolerance]: #st_bufferwithtolerance +Modifies a [protocol buffer map field][proto-map] and returns the modified map +field. -[st-numpoints]: #st_numpoints +Input values: -### `ST_BUFFERWITHTOLERANCE` ++ `proto_map_field_expression`: A protocol buffer map field. ++ `key_value_pair`: A key-value pair in the protocol buffer map field. -```sql -ST_BUFFERWITHTOLERANCE( - geography, - buffer_radius, - tolerance_meters => tolerance - [, use_spheroid => boolean_expression] - [, endcap => endcap_style] - [, side => line_side]) -``` +Modification behavior: -Returns a `GEOGRAPHY` that represents the buffer around the input `GEOGRAPHY`. -This function is similar to [`ST_BUFFER`][st-buffer], -but you provide tolerance instead of segments to determine how much the -resulting geography can deviate from the ideal buffer radius. ++ If the key is not already in the map field, adds the key and its value to the + map field. ++ If the key is already in the map field, replaces its value. ++ If the key is in the map field and the value is `NULL`, removes the key and + its value from the map field. -+ `geography`: The input `GEOGRAPHY` to encircle with the buffer radius. -+ `buffer_radius`: `DOUBLE` that represents the radius of the buffer - around the input geography. The radius is in meters. Note that polygons - contract when buffered with a negative `buffer_radius`. Polygon shells - and holes that are contracted to a point are discarded. -+ `tolerance_meters`: `DOUBLE` specifies a tolerance in meters with - which the shape is approximated. Tolerance determines how much a polygon can - deviate from the ideal radius. Naming this argument is optional. -+ `endcap`: (Optional) `STRING` allows you to specify one of two endcap - styles: `ROUND` and `FLAT`. The default value is `ROUND`. This option only - affects the endcaps of buffered linestrings. -+ `side`: (Optional) `STRING` allows you to specify one of three possible line - styles: `BOTH`, `LEFT`, and `RIGHT`. The default is `BOTH`. This option only - affects the endcaps of buffered linestrings. -+ `use_spheroid`: (Optional) `BOOL` determines how this function measures - distance. If `use_spheroid` is `FALSE`, the function measures distance on - the surface of a perfect sphere. The `use_spheroid` parameter - currently only supports the value `FALSE`. The default value of - `use_spheroid` is `FALSE`. +`NULL` handling: -**Return type** ++ If `key` is `NULL`, produces an error. ++ If the same `key` appears more than once, produces an error. ++ If `map` is `NULL`, `map` is treated as empty. -Polygon `GEOGRAPHY` +**Return type** -**Example** +In the input protocol buffer map field, `V` as represented in `map`. -The following example shows the results of `ST_BUFFERWITHTOLERANCE` on a point, -given two different values for tolerance but with the same buffer radius of -`100`. A buffered point is an approximated circle. When `tolerance_meters=25`, -the tolerance is a large percentage of the buffer radius, and therefore only -five segments are used to approximate a circle around the input point. When -`tolerance_meters=1`, the tolerance is a much smaller percentage of the buffer -radius, and therefore twenty-four edges are used to approximate a circle around -the input point. +**Examples** -```sql -SELECT - -- tolerance_meters=25, or 25% of the buffer radius. - ST_NumPoints(ST_BUFFERWITHTOLERANCE(ST_GEOGFROMTEXT('POINT(1 2)'), 100, 25)) AS five_sides, - -- tolerance_meters=1, or 1% of the buffer radius. - st_NumPoints(ST_BUFFERWITHTOLERANCE(ST_GEOGFROMTEXT('POINT(100 2)'), 100, 1)) AS twenty_four_sides; +To illustrate the use of this function, consider the protocol buffer message +`Item`: -/*------------+-------------------* - | five_sides | twenty_four_sides | - +------------+-------------------+ - | 6 | 24 | - *------------+-------------------*/ +```proto +message Item { + optional map purchased = 1; +}; ``` -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System - -[st-buffer]: #st_buffer - -### `ST_CENTROID` +In the following example, the query deletes key `A`, replaces `B`, and adds +`C` in a map field called `purchased`. ```sql -ST_CENTROID(geography_expression) -``` - -**Description** - -Returns the _centroid_ of the input `GEOGRAPHY` as a single point `GEOGRAPHY`. - -The _centroid_ of a `GEOGRAPHY` is the weighted average of the centroids of the -highest-dimensional components in the `GEOGRAPHY`. The centroid for components -in each dimension is defined as follows: - -+ The centroid of points is the arithmetic mean of the input coordinates. -+ The centroid of linestrings is the centroid of all the edges weighted by - length. The centroid of each edge is the geodesic midpoint of the edge. -+ The centroid of a polygon is its center of mass. - -If the input `GEOGRAPHY` is empty, an empty `GEOGRAPHY` is returned. - -**Constraints** - -In the unlikely event that the centroid of a `GEOGRAPHY` cannot be defined by a -single point on the surface of the Earth, a deterministic but otherwise -arbitrary point is returned. This can only happen if the centroid is exactly at -the center of the Earth, such as the centroid for a pair of antipodal points, -and the likelihood of this happening is vanishingly small. +SELECT + MODIFY_MAP(m.purchased, 'A', NULL, 'B', 4, 'C', 6) AS result_map +FROM + (SELECT AS VALUE CAST("purchased { key: 'A' value: 2 } purchased { key: 'B' value: 3}" AS Item)) AS m; -**Return type** +/*---------------------------------------------* + | result_map | + +---------------------------------------------+ + | { key: 'B' value: 4 } { key: 'C' value: 6 } | + *---------------------------------------------*/ +``` -Point `GEOGRAPHY` +[proto-map]: https://developers.google.com/protocol-buffers/docs/proto3#maps -### `ST_CLOSESTPOINT` +### `PROTO_DEFAULT_IF_NULL` ```sql -ST_CLOSESTPOINT(geography_1, geography_2[, use_spheroid]) +PROTO_DEFAULT_IF_NULL(proto_field_expression) ``` **Description** -Returns a `GEOGRAPHY` containing a point on -`geography_1` with the smallest possible distance to `geography_2`. This implies -that the distance between the point returned by `ST_CLOSESTPOINT` and -`geography_2` is less than or equal to the distance between any other point on -`geography_1` and `geography_2`. +Evaluates any expression that results in a proto field access. +If the `proto_field_expression` evaluates to `NULL`, returns the default +value for the field. Otherwise, returns the field value. -If either of the input `GEOGRAPHY`s is empty, `ST_CLOSESTPOINT` returns `NULL`. +Stipulations: -The optional `use_spheroid` parameter determines how this function measures -distance. If `use_spheroid` is `FALSE`, the function measures distance on the -surface of a perfect sphere. ++ The expression cannot resolve to a required field. ++ The expression cannot resolve to a message field. ++ The expression must resolve to a regular proto field access, not + a virtual field. ++ The expression cannot access a field with + `zetasql.use_defaults=false`. -The `use_spheroid` parameter currently only supports -the value `FALSE`. The default value of `use_spheroid` is `FALSE`. +**Return Type** -**Return type** +Type of `proto_field_expression`. -Point `GEOGRAPHY` +**Example** -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +In the following example, each book in a library has a country of origin. If +the country is not set, the country defaults to unknown. -### `ST_CLUSTERDBSCAN` +In this statement, table `library_books` contains a column named `book`, +whose type is `Book`. ```sql -ST_CLUSTERDBSCAN(geography_column, epsilon, minimum_geographies) -OVER over_clause - -over_clause: - { named_window | ( [ window_specification ] ) } - -window_specification: - [ named_window ] - [ PARTITION BY partition_expression [, ...] ] - [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] - +SELECT PROTO_DEFAULT_IF_NULL(book.country) AS origin FROM library_books; ``` -Performs [DBSCAN clustering][dbscan-link] on a column of geographies. Returns a -0-based cluster number. - -To learn more about the `OVER` clause and how to use it, see -[Window function calls][window-function-calls]. - - - -[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md - - - -**Input parameters** - -+ `geography_column`: A column of `GEOGRAPHY`s that - is clustered. -+ `epsilon`: The epsilon that specifies the radius, measured in meters, around - a core value. Non-negative `DOUBLE` value. -+ `minimum_geographies`: Specifies the minimum number of geographies in a - single cluster. Only dense input forms a cluster, otherwise it is classified - as noise. Non-negative `INT64` value. - -**Geography types and the DBSCAN algorithm** - -The DBSCAN algorithm identifies high-density clusters of data and marks outliers -in low-density areas of noise. Geographies passed in through `geography_column` -are classified in one of three ways by the DBSCAN algorithm: - -+ Core value: A geography is a core value if it is within `epsilon` distance - of `minimum_geographies` geographies, including itself. The core value - starts a new cluster, or is added to the same cluster as a core value within - `epsilon` distance. Core values are grouped in a cluster together with all - other core and border values that are within `epsilon` distance. -+ Border value: A geography is a border value if it is within epsilon distance - of a core value. It is added to the same cluster as a core value within - `epsilon` distance. A border value may be within `epsilon` distance of more - than one cluster. In this case, it may be arbitrarily assigned to either - cluster and the function will produce the same result in subsequent calls. -+ Noise: A geography is noise if it is neither a core nor a border value. - Noise values are assigned to a `NULL` cluster. An empty - `GEOGRAPHY` is always classified as noise. - -**Constraints** - -+ The argument `minimum_geographies` is a non-negative - `INT64`and `epsilon` is a non-negative - `DOUBLE`. -+ An empty geography cannot join any cluster. -+ Multiple clustering assignments could be possible for a border value. If a - geography is a border value, `ST_CLUSTERDBSCAN` will assign it to an - arbitrary valid cluster. - -**Return type** - -`INT64` for each geography in the geography column. - -**Examples** - -This example performs DBSCAN clustering with a radius of 100,000 meters with a -`minimum_geographies` argument of 1. The geographies being analyzed are a -mixture of points, lines, and polygons. - -```sql -WITH Geos as - (SELECT 1 as row_id, ST_GEOGFROMTEXT('POINT EMPTY') as geo UNION ALL - SELECT 2, ST_GEOGFROMTEXT('MULTIPOINT(1 1, 2 2, 4 4, 5 2)') UNION ALL - SELECT 3, ST_GEOGFROMTEXT('POINT(14 15)') UNION ALL - SELECT 4, ST_GEOGFROMTEXT('LINESTRING(40 1, 42 34, 44 39)') UNION ALL - SELECT 5, ST_GEOGFROMTEXT('POLYGON((40 2, 40 1, 41 2, 40 2))')) -SELECT row_id, geo, ST_CLUSTERDBSCAN(geo, 1e5, 1) OVER () AS cluster_num FROM -Geos ORDER BY row_id +`Book` is a type that contains a field called `country`. -/*--------+-----------------------------------+-------------* - | row_id | geo | cluster_num | - +--------+-----------------------------------+-------------+ - | 1 | GEOMETRYCOLLECTION EMPTY | NULL | - | 2 | MULTIPOINT(1 1, 2 2, 5 2, 4 4) | 0 | - | 3 | POINT(14 15) | 1 | - | 4 | LINESTRING(40 1, 42 34, 44 39) | 2 | - | 5 | POLYGON((40 2, 40 1, 41 2, 40 2)) | 2 | - *--------+-----------------------------------+-------------*/ +```proto +message Book { + optional string country = 4 [default = 'Unknown']; +} ``` -[dbscan-link]: https://en.wikipedia.org/wiki/DBSCAN - -### `ST_CONTAINS` +This is the result if `book.country` evaluates to `Canada`. ```sql -ST_CONTAINS(geography_1, geography_2) +/*-----------------* + | origin | + +-----------------+ + | Canada | + *-----------------*/ ``` -**Description** - -Returns `TRUE` if no point of `geography_2` is outside `geography_1`, and -the interiors intersect; returns `FALSE` otherwise. - -NOTE: A `GEOGRAPHY` *does not* contain its own -boundary. Compare with [`ST_COVERS`][st_covers]. - -**Return type** - -`BOOL` - -**Example** - -The following query tests whether the polygon `POLYGON((1 1, 20 1, 10 20, 1 1))` -contains each of the three points `(0, 0)`, `(1, 1)`, and `(10, 10)`, which lie -on the exterior, the boundary, and the interior of the polygon respectively. +This is the result if `book` is `NULL`. Since `book` is `NULL`, +`book.country` evaluates to `NULL` and therefore the function result is the +default value for `country`. ```sql -SELECT - ST_GEOGPOINT(i, i) AS p, - ST_CONTAINS(ST_GEOGFROMTEXT('POLYGON((1 1, 20 1, 10 20, 1 1))'), - ST_GEOGPOINT(i, i)) AS `contains` -FROM UNNEST([0, 1, 10]) AS i; - -/*--------------+----------* - | p | contains | - +--------------+----------+ - | POINT(0 0) | FALSE | - | POINT(1 1) | FALSE | - | POINT(10 10) | TRUE | - *--------------+----------*/ +/*-----------------* + | origin | + +-----------------+ + | Unknown | + *-----------------*/ ``` -[st_covers]: #st_covers - -### `ST_CONVEXHULL` +### `REPLACE_FIELDS` ```sql -ST_CONVEXHULL(geography_expression) +REPLACE_FIELDS(proto_expression, value AS field_path [, ... ]) ``` **Description** -Returns the convex hull for the input `GEOGRAPHY`. The convex hull is the -smallest convex `GEOGRAPHY` that covers the input. A `GEOGRAPHY` is convex if -for every pair of points in the `GEOGRAPHY`, the geodesic edge connecting the -points are also contained in the same `GEOGRAPHY`. - -In most cases, the convex hull consists of a single polygon. Notable edge cases -include the following: +Returns a copy of a protocol buffer, replacing the values in one or more fields. +`field_path` is a delimited path to the protocol buffer field to be replaced. -* The convex hull of a single point is also a point. -* The convex hull of two or more collinear points is a linestring as long as - that linestring is convex. -* If the input `GEOGRAPHY` spans more than a - hemisphere, the convex hull is the full globe. This includes any input that - contains a pair of antipodal points. -* `ST_CONVEXHULL` returns `NULL` if the input is either `NULL` or the empty - `GEOGRAPHY`. ++ If `value` is `NULL`, it un-sets `field_path` or returns an error if the last + component of `field_path` is a required field. ++ Replacing subfields will succeed only if the message containing the field is + set. ++ Replacing subfields of repeated field is not allowed. ++ A repeated field can be replaced with an `ARRAY` value. **Return type** -`GEOGRAPHY` +Type of `proto_expression` **Examples** -The convex hull returned by `ST_CONVEXHULL` can be a point, linestring, or a -polygon, depending on the input. +To illustrate the usage of this function, we use protocol buffer messages +`Book` and `BookDetails`. -```sql -WITH Geographies AS - (SELECT ST_GEOGFROMTEXT('POINT(1 1)') AS g UNION ALL - SELECT ST_GEOGFROMTEXT('LINESTRING(1 1, 2 2)') AS g UNION ALL - SELECT ST_GEOGFROMTEXT('MULTIPOINT(2 11, 4 12, 0 15, 1 9, 1 12)') AS g) -SELECT - g AS input_geography, - ST_CONVEXHULL(g) AS convex_hull -FROM Geographies; +``` +message Book { + required string title = 1; + repeated string reviews = 2; + optional BookDetails details = 3; +}; -/*-----------------------------------------+--------------------------------------------------------* - | input_geography | convex_hull | - +-----------------------------------------+--------------------------------------------------------+ - | POINT(1 1) | POINT(0.999999999999943 1) | - | LINESTRING(1 1, 2 2) | LINESTRING(2 2, 1.49988573656168 1.5000570914792, 1 1) | - | MULTIPOINT(1 9, 4 12, 2 11, 1 12, 0 15) | POLYGON((1 9, 4 12, 0 15, 1 9)) | - *-----------------------------------------+--------------------------------------------------------*/ +message BookDetails { + optional string author = 1; + optional int32 chapters = 2; +}; ``` -### `ST_COVEREDBY` +This statement replaces value of field `title` and subfield `chapters` +of proto type `Book`. Note that field `details` must be set for the statement +to succeed. ```sql -ST_COVEREDBY(geography_1, geography_2) +SELECT REPLACE_FIELDS( + NEW Book( + "The Hummingbird" AS title, + NEW BookDetails(10 AS chapters) AS details), + "The Hummingbird II" AS title, + 11 AS details.chapters) +AS proto; + +/*-----------------------------------------------------------------------------* + | proto | + +-----------------------------------------------------------------------------+ + |{title: "The Hummingbird II" details: {chapters: 11 }} | + *-----------------------------------------------------------------------------*/ ``` -**Description** +The function can replace value of repeated fields. -Returns `FALSE` if `geography_1` or `geography_2` is empty. Returns `TRUE` if no -points of `geography_1` lie in the exterior of `geography_2`. +```sql +SELECT REPLACE_FIELDS( + NEW Book("The Hummingbird" AS title, + NEW BookDetails(10 AS chapters) AS details), + ["A good read!", "Highly recommended."] AS reviews) +AS proto; -Given two `GEOGRAPHY`s `a` and `b`, -`ST_COVEREDBY(a, b)` returns the same result as -[`ST_COVERS`][st-covers]`(b, a)`. Note the opposite order of arguments. +/*-----------------------------------------------------------------------------* + | proto | + +-----------------------------------------------------------------------------+ + |{title: "The Hummingbird" review: "A good read" review: "Highly recommended."| + | details: {chapters: 10 }} | + *-----------------------------------------------------------------------------*/ +``` -**Return type** +It can set a field to `NULL`. -`BOOL` +```sql +SELECT REPLACE_FIELDS( + NEW Book("The Hummingbird" AS title, + NEW BookDetails(10 AS chapters) AS details), + NULL AS details) +AS proto; -[st-covers]: #st_covers +/*-----------------------------------------------------------------------------* + | proto | + +-----------------------------------------------------------------------------+ + |{title: "The Hummingbird" } | + *-----------------------------------------------------------------------------*/ +``` -### `ST_COVERS` +### `TO_PROTO` -```sql -ST_COVERS(geography_1, geography_2) +``` +TO_PROTO(expression) ``` **Description** -Returns `FALSE` if `geography_1` or `geography_2` is empty. -Returns `TRUE` if no points of `geography_2` lie in the exterior of -`geography_1`. +Returns a PROTO value. The valid `expression` types are defined in the +table below, along with the return types that they produce. Other input +`expression` types are invalid. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
expression typeReturn type
+
    +
  • INT32
  • +
  • google.protobuf.Int32Value
  • +
+
google.protobuf.Int32Value
+
    +
  • UINT32
  • +
  • google.protobuf.UInt32Value
  • +
+
google.protobuf.UInt32Value
+
    +
  • INT64
  • +
  • google.protobuf.Int64Value
  • +
+
google.protobuf.Int64Value
+
    +
  • UINT64
  • +
  • google.protobuf.UInt64Value
  • +
+
google.protobuf.UInt64Value
+
    +
  • FLOAT
  • +
  • google.protobuf.FloatValue
  • +
+
google.protobuf.FloatValue
+
    +
  • DOUBLE
  • +
  • google.protobuf.DoubleValue
  • +
+
google.protobuf.DoubleValue
+
    +
  • BOOL
  • +
  • google.protobuf.BoolValue
  • +
+
google.protobuf.BoolValue
+
    +
  • STRING
  • +
  • google.protobuf.StringValue
  • +
+
google.protobuf.StringValue
+
    +
  • BYTES
  • +
  • google.protobuf.BytesValue
  • +
+
google.protobuf.BytesValue
+
    +
  • DATE
  • +
  • google.type.Date
  • +
+
google.type.Date
+
    +
  • TIME
  • +
  • google.type.TimeOfDay
  • +
+
google.type.TimeOfDay
+
    +
  • TIMESTAMP
  • +
  • google.protobuf.Timestamp
  • +
+
google.protobuf.Timestamp
+ +**Return Type** + +The return type depends upon the `expression` type. See the return types +in the table above. + +**Examples** + +Convert a `DATE` type into a `google.type.Date` type. + +```sql +SELECT TO_PROTO(DATE '2019-10-30') + +/*--------------------------------* + | $col1 | + +--------------------------------+ + | {year: 2019 month: 10 day: 30} | + *--------------------------------*/ +``` + +Pass in and return a `google.type.Date` type. + +```sql +SELECT TO_PROTO( + new google.type.Date( + 2019 as year, + 10 as month, + 30 as day + ) +) + +/*--------------------------------* + | $col1 | + +--------------------------------+ + | {year: 2019 month: 10 day: 30} | + *--------------------------------*/ +``` + +## Range functions + +ZetaSQL supports the following range functions. + +### Function list + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameSummary
GENERATE_RANGE_ARRAY + +Splits a range into an array of subranges.
RANGE + + + Constructs a range of DATE, DATETIME, + or TIMESTAMP values. +
RANGE_CONTAINS + + + Signature 1: Checks if one range is in another range. +

+ Signature 2: Checks if a value is in a range. +
RANGE_END + +Gets the upper bound of a range.
RANGE_INTERSECT + +Gets a segment of two ranges that intersect.
RANGE_OVERLAPS + +Checks if two ranges overlap.
RANGE_SESSIONIZE + +Produces a table of sessionized ranges.
RANGE_START + +Gets the lower bound of a range.
+ +### `GENERATE_RANGE_ARRAY` + +```sql +GENERATE_RANGE_ARRAY(range_to_split, step_interval) +``` + +```sql +GENERATE_RANGE_ARRAY(range_to_split, step_interval, include_last_partial_range) +``` + +**Description** + +Splits a range into an array of subranges. + +**Definitions** + ++ `range_to_split`: The `RANGE` value to split. ++ `step_interval`: The `INTERVAL` value, which determines the maximum size of + each subrange in the resulting array. An + [interval single date and time part][interval-single] + is supported, but an interval range of date and time parts is not. + + + If `range_to_split` is `RANGE`, these interval + date parts are supported: `YEAR` to `DAY`. + + + If `range_to_split` is `RANGE`, these interval + date and time parts are supported: `YEAR` to `SECOND`. + + + If `range_to_split` is `RANGE`, these interval + date and time parts are supported: `DAY` to `SECOND`. ++ `include_last_partial_range`: A `BOOL` value, which determines whether or + not to include the last subrange if it's a partial subrange. + If this argument is not specified, the default value is `TRUE`. + + + `TRUE` (default): The last subrange is included, even if it's + smaller than `step_interval`. + + + `FALSE`: Exclude the last subrange if it's smaller than + `step_interval`. + +**Details** + +Returns `NULL` if any input is` NULL`. + +**Return type** + +`ARRAY>` + +**Examples** + +In the following example, a date range between `2020-01-01` and `2020-01-06` +is split into an array of subranges that are one day long. There are +no partial ranges. + +```sql +SELECT GENERATE_RANGE_ARRAY( + RANGE(DATE '2020-01-01', DATE '2020-01-06'), + INTERVAL 1 DAY) AS results; + +/*----------------------------+ + | results | + +----------------------------+ + | [ | + | [2020-01-01, 2020-01-02), | + | [2020-01-02, 2020-01-03), | + | [2020-01-03, 2020-01-04), | + | [2020-01-04, 2020-01-05), | + | [2020-01-05, 2020-01-06), | + | ] | + +----------------------------*/ +``` + +In the following examples, a date range between `2020-01-01` and `2020-01-06` +is split into an array of subranges that are two days long. The final subrange +is smaller than two days: + +```sql +SELECT GENERATE_RANGE_ARRAY( + RANGE(DATE '2020-01-01', DATE '2020-01-06'), + INTERVAL 2 DAY) AS results; + +/*----------------------------+ + | results | + +----------------------------+ + | [ | + | [2020-01-01, 2020-01-03), | + | [2020-01-03, 2020-01-05), | + | [2020-01-05, 2020-01-06) | + | ] | + +----------------------------*/ +``` + +```sql +SELECT GENERATE_RANGE_ARRAY( + RANGE(DATE '2020-01-01', DATE '2020-01-06'), + INTERVAL 2 DAY, + TRUE) AS results; + +/*----------------------------+ + | results | + +----------------------------+ + | [ | + | [2020-01-01, 2020-01-03), | + | [2020-01-03, 2020-01-05), | + | [2020-01-05, 2020-01-06) | + | ] | + +----------------------------*/ +``` + +In the following example, a date range between `2020-01-01` and `2020-01-06` +is split into an array of subranges that are two days long, but the final +subrange is excluded because it's smaller than two days: + +```sql +SELECT GENERATE_RANGE_ARRAY( + RANGE(DATE '2020-01-01', DATE '2020-01-06'), + INTERVAL 2 DAY, + FALSE) AS results; + +/*----------------------------+ + | results | + +----------------------------+ + | [ | + | [2020-01-01, 2020-01-03), | + | [2020-01-03, 2020-01-05) | + | ] | + +----------------------------*/ +``` + +[interval-single]: https://github.com/google/zetasql/blob/master/docs/data-types.md#single_datetime_part_interval + +### `RANGE` + +```sql +RANGE(lower_bound, upper_bound) +``` + +**Description** + +Constructs a range of [`DATE`][date-type], [`DATETIME`][datetime-type], or +[`TIMESTAMP`][timestamp-type] values. + +**Definitions** + ++ `lower_bound`: The range starts from this value. This can be a + `DATE`, `DATETIME`, or `TIMESTAMP` value. If this value is `NULL`, the range + doesn't include a lower bound. ++ `upper_bound`: The range ends before this value. This can be a + `DATE`, `DATETIME`, or `TIMESTAMP` value. If this value is `NULL`, the range + doesn't include an upper bound. + +**Details** + +`lower_bound` and `upper_bound` must be of the same data type. + +Produces an error if `lower_bound` is greater than or equal to `upper_bound`. +To return `NULL` instead, add the `SAFE.` prefix to the function name. + +**Return type** + +`RANGE`, where `T` is the same data type as the input. + +**Examples** + +The following query constructs a date range: + +```sql +SELECT RANGE(DATE '2022-12-01', DATE '2022-12-31') AS results; + +/*--------------------------+ + | results | + +--------------------------+ + | [2022-12-01, 2022-12-31) | + +--------------------------*/ +``` + +The following query constructs a datetime range: + +```sql +SELECT RANGE(DATETIME '2022-10-01 14:53:27', + DATETIME '2022-10-01 16:00:00') AS results; + +/*---------------------------------------------+ + | results | + +---------------------------------------------+ + | [2022-10-01 14:53:27, 2022-10-01T16:00:00) | + +---------------------------------------------*/ +``` + +The following query constructs a timestamp range: + +```sql +SELECT RANGE(TIMESTAMP '2022-10-01 14:53:27 America/Los_Angeles', + TIMESTAMP '2022-10-01 16:00:00 America/Los_Angeles') AS results; + +-- Results depend upon where this query was executed. +/*-----------------------------------------------------------------+ + | results | + +-----------------------------------------------------------------+ + | [2022-10-01 21:53:27.000000+00, 2022-10-01 23:00:00.000000+00) | + +-----------------------------------------------------------------*/ +``` + +The following query constructs a date range with no lower bound: + +```sql +SELECT RANGE(NULL, DATE '2022-12-31') AS results; + +/*-------------------------+ + | results | + +-------------------------+ + | [UNBOUNDED, 2022-12-31) | + +-------------------------*/ +``` + +The following query constructs a date range with no upper bound: + +```sql +SELECT RANGE(DATE '2022-10-01', NULL) AS results; + +/*--------------------------+ + | results | + +--------------------------+ + | [2022-10-01, UNBOUNDED) | + +--------------------------*/ +``` + +[timestamp-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#timestamp_type + +[date-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#date_type + +[datetime-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#datetime_type + +### `RANGE_CONTAINS` + ++ [Signature 1][range_contains-sig1]: Checks if every value in one range is + in another range. ++ [Signature 2][range_contains-sig2]: Checks if a value is in a range. + +#### Signature 1 + +```sql +RANGE_CONTAINS(outer_range, inner_range) +``` + +**Description** + +Checks if the inner range is in the outer range. + +**Definitions** + ++ `outer_range`: The `RANGE` value to search within. ++ `inner_range`: The `RANGE` value to search for in `outer_range`. + +**Details** + +Returns `TRUE` if `inner_range` exists in `outer_range`. +Otherwise, returns `FALSE`. + +`T` must be of the same type for all inputs. + +**Return type** + +`BOOL` + +**Examples** + +In the following query, the inner range is in the outer range: + +```sql +SELECT RANGE_CONTAINS( + RANGE '[2022-01-01, 2023-01-01)', + RANGE '[2022-04-01, 2022-07-01)') AS results; + +/*---------+ + | results | + +---------+ + | TRUE | + +---------*/ +``` + +In the following query, the inner range is not in the outer range: + +```sql +SELECT RANGE_CONTAINS( + RANGE '[2022-01-01, 2023-01-01)', + RANGE '[2023-01-01, 2023-04-01)') AS results; + +/*---------+ + | results | + +---------+ + | FALSE | + +---------*/ +``` + +#### Signature 2 + +```sql +RANGE_CONTAINS(range_to_search, value_to_find) +``` + +**Description** + +Checks if a value is in a range. + +**Definitions** + ++ `range_to_search`: The `RANGE` value to search within. ++ `value_to_find`: The value to search for in `range_to_search`. + +**Details** + +Returns `TRUE` if `value_to_find` exists in `range_to_search`. +Otherwise, returns `FALSE`. + +The data type for `value_to_find` must be the same data type as `T`in +`range_to_search`. + +**Return type** + +`BOOL` + +**Examples** + +In the following query, the value `2022-04-01` is found in the range +`[2022-01-01, 2023-01-01)`: + +```sql +SELECT RANGE_CONTAINS( + RANGE '[2022-01-01, 2023-01-01)', + DATE '2022-04-01') AS results; + +/*---------+ + | results | + +---------+ + | TRUE | + +---------*/ +``` + +In the following query, the value `2023-04-01` is not found in the range +`[2022-01-01, 2023-01-01)`: + +```sql +SELECT RANGE_CONTAINS( + RANGE '[2022-01-01, 2023-01-01)', + DATE '2023-04-01') AS results; + +/*---------+ + | results | + +---------+ + | FALSE | + +---------*/ +``` + +[range_contains-sig1]: #signature_1 + +[range_contains-sig2]: #signature_2 + +### `RANGE_END` + +```sql +RANGE_END(range_to_check) +``` + +**Description** + +Gets the upper bound of a range. + +**Definitions** + ++ `range_to_check`: The `RANGE` value. + +**Details** + +Returns `NULL` if the upper bound in `range_value` is `UNBOUNDED`. + +Returns `NULL` if `range_to_check` is `NULL`. + +**Return type** + +`T` in `range_value` + +**Examples** + +In the following query, the upper bound of the range is retrieved: + +```sql +SELECT RANGE_END(RANGE '[2022-12-01, 2022-12-31)') AS results; + +/*------------+ + | results | + +------------+ + | 2022-12-31 | + +------------*/ +``` + +In the following query, the upper bound of the range is unbounded, so +`NULL` is returned: + +```sql +SELECT RANGE_END(RANGE '[2022-12-01, UNBOUNDED)') AS results; + +/*------------+ + | results | + +------------+ + | NULL | + +------------*/ +``` + +### `RANGE_INTERSECT` + +```sql +RANGE_INTERSECT(range_a, range_b) +``` + +**Description** + +Gets a segment of two ranges that intersect. + +**Definitions** + ++ `range_a`: The first `RANGE` value. ++ `range_b`: The second `RANGE` value. + +**Details** + +Returns `NULL` if any input is` NULL`. + +Produces an error if `range_a` and `range_b` don't overlap. To return +`NULL` instead, add the `SAFE.` prefix to the function name. + +`T` must be of the same type for all inputs. + +**Return type** + +`RANGE` + +**Examples** + +```sql +SELECT RANGE_INTERSECT( + RANGE '[2022-02-01, 2022-09-01)', + RANGE '[2021-06-15, 2022-04-15)') AS results; + +/*--------------------------+ + | results | + +--------------------------+ + | [2022-02-01, 2022-04-15) | + +--------------------------*/ +``` + +```sql +SELECT RANGE_INTERSECT( + RANGE '[2022-02-01, UNBOUNDED)', + RANGE '[2021-06-15, 2022-04-15)') AS results; + +/*--------------------------+ + | results | + +--------------------------+ + | [2022-02-01, 2022-04-15) | + +--------------------------*/ +``` + +```sql +SELECT RANGE_INTERSECT( + RANGE '[2022-02-01, UNBOUNDED)', + RANGE '[2021-06-15, UNBOUNDED)') AS results; + +/*-------------------------+ + | results | + +-------------------------+ + | [2022-02-01, UNBOUNDED) | + +-------------------------*/ +``` + +### `RANGE_OVERLAPS` + +```sql +RANGE_OVERLAPS(range_a, range_b) +``` + +**Description** + +Checks if two ranges overlap. + +**Definitions** + ++ `range_a`: The first `RANGE` value. ++ `range_b`: The second `RANGE` value. + +**Details** + +Returns `TRUE` if a part of `range_a` intersects with `range_b`, otherwise +returns `FALSE`. + +`T` must be of the same type for all inputs. + +To get the part of the range that overlaps, use the +[`RANGE_INTERSECT`][range-intersect] function. + +**Return type** + +`BOOL` + +**Examples** + +In the following query, the first and second ranges overlap between +`2022-02-01` and `2022-04-15`: + +```sql +SELECT RANGE_OVERLAPS( + RANGE '[2022-02-01, 2022-09-01)', + RANGE '[2021-06-15, 2022-04-15)') AS results; + +/*---------+ + | results | + +---------+ + | TRUE | + +---------*/ +``` + +In the following query, the first and second ranges don't overlap: + +```sql +SELECT RANGE_OVERLAPS( + RANGE '[2020-02-01, 2020-09-01)', + RANGE '[2021-06-15, 2022-04-15)') AS results; + +/*---------+ + | results | + +---------+ + | FALSE | + +---------*/ +``` + +In the following query, the first and second ranges overlap between +`2022-02-01` and `UNBOUNDED`: + +```sql +SELECT RANGE_OVERLAPS( + RANGE '[2022-02-01, UNBOUNDED)', + RANGE '[2021-06-15, UNBOUNDED)') AS results; + +/*---------+ + | results | + +---------+ + | TRUE | + +---------*/ +``` + +[range-intersect]: #range_intersect + +### `RANGE_SESSIONIZE` + +```sql +RANGE_SESSIONIZE( + TABLE table_name, + range_column, + partitioning_columns +) +``` + +```sql +RANGE_SESSIONIZE( + TABLE table_name, + range_column, + partitioning_columns, + sessionize_option +) +``` + +**Description** + +Produces a table of sessionized ranges. + +**Definitions** + ++ `table_name`: A table expression that represents the name of the table to + construct. This can represent any relation with `range_column`. ++ `range_column`: A `STRING` literal that indicates which `RANGE` column + in a table contains the data to sessionize. ++ `partitioning_columns`: An `ARRAY` literal that indicates which + columns should partition the data before the data is sessionized. ++ `sessionize_option`: A `STRING` value that describes how order-adjacent + ranges are sessionized. Your choices are as follows: + + + `MEETS` (default): Ranges that meet or overlap are sessionized. + + + `OVERLAPS`: Only a range that is overlapped by another range is + sessionized. + + If this argument is not provided, `MEETS` is used by default. + +**Details** + +This function produces a table that includes all columns in the +input table and an additional `RANGE` column called +`session_range`, which indicates the start and end of a session. The +start and end of each session is determined by the `sessionize_option` +argument. + +**Return type** + +`TABLE` + +**Examples** + +The examples in this section reference the following table called +`my_sessionized_range_table` in a dataset called `mydataset`: + +```sql +INSERT mydataset.my_sessionized_range_table (emp_id, dept_id, duration) +VALUES(10, 1000, RANGE '[2010-01-10, 2010-03-10)'), + (10, 2000, RANGE '[2010-03-10, 2010-07-15)'), + (10, 2000, RANGE '[2010-06-15, 2010-08-18)'), + (20, 2000, RANGE '[2010-03-10, 2010-07-20)'), + (20, 1000, RANGE '[2020-05-10, 2020-09-20)'); + +SELECT * FROM mydataset.my_sessionized_range_table ORDER BY emp_id; + +/*--------+---------+--------------------------+ + | emp_id | dept_id | duration | + +--------+---------+--------------------------+ + | 10 | 1000 | [2010-01-10, 2010-03-10) | + | 10 | 2000 | [2010-03-10, 2010-07-15) | + | 10 | 2000 | [2010-06-15, 2010-08-18) | + | 20 | 2000 | [2010-03-10, 2010-07-20) | + | 20 | 1000 | [2020-05-10, 2020-09-20) | + +--------+---------+--------------------------*/ +``` + +In the following query, a table of sessionized data is produced for +`my_sessionized_range_table`, and only ranges that meet or overlap are +sessionized: + +```sql +SELECT + emp_id, duration, session_range +FROM + RANGE_SESSIONIZE( + TABLE mydataset.my_sessionized_range_table, + 'duration', + ['emp_id']) +ORDER BY emp_id; + +/*--------+--------------------------+--------------------------+ + | emp_id | duration | session_range | + +--------+--------------------------+--------------------------+ + | 10 | [2010-01-10, 2010-03-10) | [2010-01-10, 2010-08-18) | + | 10 | [2010-03-10, 2010-07-15) | [2010-01-10, 2010-08-18) | + | 10 | [2010-06-15, 2010-08-18) | [2010-01-10, 2010-08-18) | + | 20 | [2010-03-10, 2010-07-20) | [2010-03-10, 2010-07-20) | + | 20 | [2020-05-10, 2020-09-20) | [2020-05-10, 2020-09-20) | + +--------+-----------------------------------------------------*/ +``` + +In the following query, a table of sessionized data is produced for +`my_sessionized_range_table`, and only a range that is overlapped by another +range is sessionized: + +```sql +SELECT + emp_id, duration, session_range +FROM + RANGE_SESSIONIZE( + TABLE mydataset.my_sessionized_range_table, + 'duration', + ['emp_id'], + 'OVERLAPS') +ORDER BY emp_id; + +/*--------+--------------------------+--------------------------+ + | emp_id | duration | session_range | + +--------+--------------------------+--------------------------+ + | 10 | [2010-03-10, 2010-07-15) | [2010-03-10, 2010-08-18) | + | 10 | [2010-06-15, 2010-08-18) | [2010-03-10, 2010-08-18) | + | 10 | [2010-01-10, 2010-03-10) | [2010-01-10, 2010-03-10) | + | 20 | [2020-05-10, 2020-09-20) | [2020-05-10, 2020-09-20) | + | 20 | [2010-03-10, 2010-07-20) | [2010-03-10, 2010-07-20) | + +--------+-----------------------------------------------------*/ +``` + +If you need to normalize sessionized data, you can use a query similar to the +following: + +```sql +SELECT emp_id, session_range AS normalized FROM ( + SELECT emp_id, session_range + FROM RANGE_SESSIONIZE( + TABLE mydataset.my_sessionized_range_table, + 'duration', + ['emp_id'], + 'MEETS') +) +GROUP BY emp_id, normalized; + +/*--------+--------------------------+ + | emp_id | normalized | + +--------+--------------------------+ + | 20 | [2010-03-10, 2010-07-20) | + | 10 | [2010-01-10, 2010-08-18) | + | 20 | [2020-05-10, 2020-09-20) | + +--------+--------------------------*/ +``` + +### `RANGE_START` + +```sql +RANGE_START(range_to_check) +``` + +**Description** + +Gets the lower bound of a range. + +**Definitions** + ++ `range_to_check`: The `RANGE` value. + +**Details** + +Returns `NULL` if the lower bound of `range_value` is `UNBOUNDED`. + +Returns `NULL` if `range_to_check` is `NULL`. + +**Return type** + +`T` in `range_value` + +**Examples** + +In the following query, the lower bound of the range is retrieved: + +```sql +SELECT RANGE_START(RANGE '[2022-12-01, 2022-12-31)') AS results; + +/*------------+ + | results | + +------------+ + | 2022-12-01 | + +------------*/ +``` + +In the following query, the lower bound of the range is unbounded, so +`NULL` is returned: + +```sql +SELECT RANGE_START(RANGE '[UNBOUNDED, 2022-12-31)') AS results; + +/*------------+ + | results | + +------------+ + | NULL | + +------------*/ +``` + +## Security functions + +ZetaSQL supports the following security functions. + +### Function list + + + + + + + + + + + + + + + + +
NameSummary
SESSION_USER + + + Get the email address or principal identifier of the user that is running + the query. +
+ +### `SESSION_USER` + +``` +SESSION_USER() +``` + +**Description** + +For first-party users, returns the email address of the user that is running the +query. +For third-party users, returns the +[principal identifier](https://cloud.google.com/iam/docs/principal-identifiers) +of the user that is running the query. +For more information about identities, see +[Principals](https://cloud.google.com/docs/authentication#principal). + +**Return Data Type** + +`STRING` + +**Example** + +```sql +SELECT SESSION_USER() as user; + +/*----------------------* + | user | + +----------------------+ + | jdoe@example.com | + *----------------------*/ +``` + +## Statistical aggregate functions + +ZetaSQL supports statistical aggregate functions. +To learn about the syntax for aggregate function calls, see +[Aggregate function calls][agg-function-calls]. + +### Function list + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameSummary
CORR + + + Computes the Pearson coefficient of correlation of a set of number pairs. +
COVAR_POP + + + Computes the population covariance of a set of number pairs. +
COVAR_SAMP + + + Computes the sample covariance of a set of number pairs. +
STDDEV + + + An alias of the STDDEV_SAMP function. +
STDDEV_POP + + + Computes the population (biased) standard deviation of the values. +
STDDEV_SAMP + + + Computes the sample (unbiased) standard deviation of the values. +
VAR_POP + + + Computes the population (biased) variance of the values. +
VAR_SAMP + + + Computes the sample (unbiased) variance of the values. +
VARIANCE + + + An alias of VAR_SAMP. +
+ +### `CORR` + +```sql +CORR( + X1, X2 + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +Returns the [Pearson coefficient][stat-agg-link-to-pearson-coefficient] +of correlation of a set of number pairs. For each number pair, the first number +is the dependent variable and the second number is the independent variable. +The return result is between `-1` and `1`. A result of `0` indicates no +correlation. + +All numeric types are supported. If the +input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is +stable with the final output converted to a `DOUBLE`. +Otherwise the input is converted to a `DOUBLE` +before aggregation, resulting in a potentially unstable result. + +This function ignores any input pairs that contain one or more `NULL` values. If +there are fewer than two input pairs without `NULL` values, this function +returns `NULL`. + +`NaN` is produced if: + ++ Any input value is `NaN` ++ Any input value is positive infinity or negative infinity. ++ The variance of `X1` or `X2` is `0`. ++ The covariance of `X1` and `X2` is `0`. + +To learn more about the optional aggregate clauses that you can pass +into this function, see +[Aggregate function calls][aggregate-function-calls]. + + + +[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + + + +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. + + + +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + +**Return Data Type** + +`DOUBLE` + +**Examples** + +```sql +SELECT CORR(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 5.0 AS x), + (3.0, 9.0), + (4.0, 7.0)]); + +/*--------------------* + | results | + +--------------------+ + | 0.6546536707079772 | + *--------------------*/ +``` + +```sql +SELECT CORR(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 5.0 AS x), + (3.0, 9.0), + (4.0, NULL)]); + +/*---------* + | results | + +---------+ + | 1 | + *---------*/ +``` + +```sql +SELECT CORR(y, x) AS results +FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)]) + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT CORR(y, x) AS results +FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)]) + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT CORR(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 5.0 AS x), + (3.0, 9.0), + (4.0, 7.0), + (5.0, 1.0), + (7.0, CAST('Infinity' as DOUBLE))]) + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +```sql +SELECT CORR(x, y) AS results +FROM + ( + SELECT 0 AS x, 0 AS y + UNION ALL + SELECT 0 AS x, 0 AS y + ) + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +[stat-agg-link-to-pearson-coefficient]: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient + +### `COVAR_POP` + +```sql +COVAR_POP( + X1, X2 + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +Returns the population [covariance][stat-agg-link-to-covariance] of +a set of number pairs. The first number is the dependent variable; the second +number is the independent variable. The return result is between `-Inf` and +`+Inf`. + +All numeric types are supported. If the +input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is +stable with the final output converted to a `DOUBLE`. +Otherwise the input is converted to a `DOUBLE` +before aggregation, resulting in a potentially unstable result. + +This function ignores any input pairs that contain one or more `NULL` values. If +there is no input pair without `NULL` values, this function returns `NULL`. +If there is exactly one input pair without `NULL` values, this function returns +`0`. + +`NaN` is produced if: + ++ Any input value is `NaN` ++ Any input value is positive infinity or negative infinity. + +To learn more about the optional aggregate clauses that you can pass +into this function, see +[Aggregate function calls][aggregate-function-calls]. + + + +[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + + + +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. + + + +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + +**Return Data Type** + +`DOUBLE` + +**Examples** + +```sql +SELECT COVAR_POP(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 1.0 AS x), + (2.0, 6.0), + (9.0, 3.0), + (2.0, 6.0), + (9.0, 3.0)]) + +/*---------------------* + | results | + +---------------------+ + | -1.6800000000000002 | + *---------------------*/ +``` + +```sql +SELECT COVAR_POP(y, x) AS results +FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)]) + +/*---------* + | results | + +---------+ + | 0 | + *---------*/ +``` + +```sql +SELECT COVAR_POP(y, x) AS results +FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)]) + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT COVAR_POP(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 1.0 AS x), + (2.0, 6.0), + (9.0, 3.0), + (2.0, 6.0), + (NULL, 3.0)]) + +/*---------* + | results | + +---------+ + | -1 | + *---------*/ +``` + +```sql +SELECT COVAR_POP(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 1.0 AS x), + (2.0, 6.0), + (9.0, 3.0), + (2.0, 6.0), + (CAST('Infinity' as DOUBLE), 3.0)]) + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +[stat-agg-link-to-covariance]: https://en.wikipedia.org/wiki/Covariance + +### `COVAR_SAMP` + +```sql +COVAR_SAMP( + X1, X2 + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +Returns the sample [covariance][stat-agg-link-to-covariance] of a +set of number pairs. The first number is the dependent variable; the second +number is the independent variable. The return result is between `-Inf` and +`+Inf`. + +All numeric types are supported. If the +input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is +stable with the final output converted to a `DOUBLE`. +Otherwise the input is converted to a `DOUBLE` +before aggregation, resulting in a potentially unstable result. + +This function ignores any input pairs that contain one or more `NULL` values. If +there are fewer than two input pairs without `NULL` values, this function +returns `NULL`. + +`NaN` is produced if: + ++ Any input value is `NaN` ++ Any input value is positive infinity or negative infinity. + +To learn more about the optional aggregate clauses that you can pass +into this function, see +[Aggregate function calls][aggregate-function-calls]. + + + +[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + + + +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. + + + +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + +**Return Data Type** + +`DOUBLE` + +**Examples** + +```sql +SELECT COVAR_SAMP(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 1.0 AS x), + (2.0, 6.0), + (9.0, 3.0), + (2.0, 6.0), + (9.0, 3.0)]) + +/*---------* + | results | + +---------+ + | -2.1 | + *---------*/ +``` + +```sql +SELECT COVAR_SAMP(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 1.0 AS x), + (2.0, 6.0), + (9.0, 3.0), + (2.0, 6.0), + (NULL, 3.0)]) + +/*----------------------* + | results | + +----------------------+ + | --1.3333333333333333 | + *----------------------*/ +``` + +```sql +SELECT COVAR_SAMP(y, x) AS results +FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)]) + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT COVAR_SAMP(y, x) AS results +FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)]) + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT COVAR_SAMP(y, x) AS results +FROM + UNNEST( + [ + STRUCT(1.0 AS y, 1.0 AS x), + (2.0, 6.0), + (9.0, 3.0), + (2.0, 6.0), + (CAST('Infinity' as DOUBLE), 3.0)]) + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +[stat-agg-link-to-covariance]: https://en.wikipedia.org/wiki/Covariance + +### `STDDEV` + +```sql +STDDEV( + [ DISTINCT ] + expression + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +An alias of [STDDEV_SAMP][stat-agg-link-to-stddev-samp]. + +[stat-agg-link-to-stddev-samp]: #stddev_samp + +### `STDDEV_POP` + +```sql +STDDEV_POP( + [ DISTINCT ] + expression + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +Returns the population (biased) standard deviation of the values. The return +result is between `0` and `+Inf`. + +All numeric types are supported. If the +input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is +stable with the final output converted to a `DOUBLE`. +Otherwise the input is converted to a `DOUBLE` +before aggregation, resulting in a potentially unstable result. + +This function ignores any `NULL` inputs. If all inputs are ignored, this +function returns `NULL`. If this function receives a single non-`NULL` input, +it returns `0`. + +`NaN` is produced if: + ++ Any input value is `NaN` ++ Any input value is positive infinity or negative infinity. + +To learn more about the optional aggregate clauses that you can pass +into this function, see +[Aggregate function calls][aggregate-function-calls]. + + + +[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + + + +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. + + + +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + +`STDDEV_POP` can be used with differential privacy. To learn more, see +[Differentially private aggregate functions][dp-functions]. + +**Return Data Type** + +`DOUBLE` + +**Examples** + +```sql +SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, 18]) AS x + +/*-------------------* + | results | + +-------------------+ + | 3.265986323710904 | + *-------------------*/ +``` + +```sql +SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, NULL]) AS x + +/*---------* + | results | + +---------+ + | 2 | + *---------*/ +``` + +```sql +SELECT STDDEV_POP(x) AS results FROM UNNEST([10, NULL]) AS x + +/*---------* + | results | + +---------+ + | 0 | + *---------*/ +``` + +```sql +SELECT STDDEV_POP(x) AS results FROM UNNEST([NULL]) AS x + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +[dp-functions]: #aggregate-dp-functions + +### `STDDEV_SAMP` + +```sql +STDDEV_SAMP( + [ DISTINCT ] + expression + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +Returns the sample (unbiased) standard deviation of the values. The return +result is between `0` and `+Inf`. + +All numeric types are supported. If the +input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is +stable with the final output converted to a `DOUBLE`. +Otherwise the input is converted to a `DOUBLE` +before aggregation, resulting in a potentially unstable result. + +This function ignores any `NULL` inputs. If there are fewer than two non-`NULL` +inputs, this function returns `NULL`. + +`NaN` is produced if: + ++ Any input value is `NaN` ++ Any input value is positive infinity or negative infinity. + +To learn more about the optional aggregate clauses that you can pass +into this function, see +[Aggregate function calls][aggregate-function-calls]. + + + +[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + + + +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. + + + +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + +**Return Data Type** + +`DOUBLE` + +**Examples** + +```sql +SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, 18]) AS x + +/*---------* + | results | + +---------+ + | 4 | + *---------*/ +``` + +```sql +SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, NULL]) AS x + +/*--------------------* + | results | + +--------------------+ + | 2.8284271247461903 | + *--------------------*/ +``` + +```sql +SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, NULL]) AS x + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT STDDEV_SAMP(x) AS results FROM UNNEST([NULL]) AS x + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +### `VAR_POP` + +```sql +VAR_POP( + [ DISTINCT ] + expression + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +Returns the population (biased) variance of the values. The return result is +between `0` and `+Inf`. + +All numeric types are supported. If the +input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is +stable with the final output converted to a `DOUBLE`. +Otherwise the input is converted to a `DOUBLE` +before aggregation, resulting in a potentially unstable result. + +This function ignores any `NULL` inputs. If all inputs are ignored, this +function returns `NULL`. If this function receives a single non-`NULL` input, +it returns `0`. + +`NaN` is produced if: + ++ Any input value is `NaN` ++ Any input value is positive infinity or negative infinity. + +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. + + + +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + +`VAR_POP` can be used with differential privacy. To learn more, see +[Differentially private aggregate functions][dp-functions]. + +**Return Data Type** + +`DOUBLE` + +**Examples** + +```sql +SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, 18]) AS x + +/*--------------------* + | results | + +--------------------+ + | 10.666666666666666 | + *--------------------*/ +``` + +```sql +SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, NULL]) AS x + +/*----------* + | results | + +---------+ + | 4 | + *---------*/ +``` + +```sql +SELECT VAR_POP(x) AS results FROM UNNEST([10, NULL]) AS x + +/*----------* + | results | + +---------+ + | 0 | + *---------*/ +``` + +```sql +SELECT VAR_POP(x) AS results FROM UNNEST([NULL]) AS x + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +[dp-functions]: #aggregate-dp-functions + +### `VAR_SAMP` + +```sql +VAR_SAMP( + [ DISTINCT ] + expression + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +Returns the sample (unbiased) variance of the values. The return result is +between `0` and `+Inf`. + +All numeric types are supported. If the +input is `NUMERIC` or `BIGNUMERIC` then the internal aggregation is +stable with the final output converted to a `DOUBLE`. +Otherwise the input is converted to a `DOUBLE` +before aggregation, resulting in a potentially unstable result. + +This function ignores any `NULL` inputs. If there are fewer than two non-`NULL` +inputs, this function returns `NULL`. + +`NaN` is produced if: + ++ Any input value is `NaN` ++ Any input value is positive infinity or negative infinity. + +To learn more about the optional aggregate clauses that you can pass +into this function, see +[Aggregate function calls][aggregate-function-calls]. + + + +[aggregate-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + + + +To learn more about the `OVER` clause and how to use it, see +[Window function calls][window-function-calls]. + + + +[window-function-calls]: https://github.com/google/zetasql/blob/master/docs/window-function-calls.md + + + +**Return Data Type** + +`DOUBLE` + +**Examples** + +```sql +SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, 18]) AS x + +/*---------* + | results | + +---------+ + | 16 | + *---------*/ +``` + +```sql +SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, NULL]) AS x + +/*---------* + | results | + +---------+ + | 8 | + *---------*/ +``` + +```sql +SELECT VAR_SAMP(x) AS results FROM UNNEST([10, NULL]) AS x + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT VAR_SAMP(x) AS results FROM UNNEST([NULL]) AS x + +/*---------* + | results | + +---------+ + | NULL | + *---------*/ +``` + +```sql +SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as DOUBLE)]) AS x + +/*---------* + | results | + +---------+ + | NaN | + *---------*/ +``` + +### `VARIANCE` + +```sql +VARIANCE( + [ DISTINCT ] + expression + [ HAVING { MAX | MIN } expression2 ] +) +[ OVER over_clause ] + +over_clause: + { named_window | ( [ window_specification ] ) } + +window_specification: + [ named_window ] + [ PARTITION BY partition_expression [, ...] ] + [ ORDER BY expression [ { ASC | DESC } ] [, ...] ] + [ window_frame_clause ] + +``` + +**Description** + +An alias of [VAR_SAMP][stat-agg-link-to-var-samp]. + +[stat-agg-link-to-var-samp]: #var_samp + +[agg-function-calls]: https://github.com/google/zetasql/blob/master/docs/aggregate-function-calls.md + +## String functions + +ZetaSQL supports string functions. +These string functions work on two different values: +`STRING` and `BYTES` data types. `STRING` values must be well-formed UTF-8. + +Functions that return position values, such as [STRPOS][string-link-to-strpos], +encode those positions as `INT64`. The value `1` +refers to the first character (or byte), `2` refers to the second, and so on. +The value `0` indicates an invalid position. When working on `STRING` types, the +returned positions refer to character positions. + +All string comparisons are done byte-by-byte, without regard to Unicode +canonical equivalence. + +### Function list + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameSummary
ASCII + + + Gets the ASCII code for the first character or byte in a STRING + or BYTES value. +
BYTE_LENGTH + + + Gets the number of BYTES in a STRING or + BYTES value. +
CHAR_LENGTH + + + Gets the number of characters in a STRING value. +
CHARACTER_LENGTH + + + Synonym for CHAR_LENGTH. +
CHR + + + Converts a Unicode code point to a character. +
CODE_POINTS_TO_BYTES + + + Converts an array of extended ASCII code points to a + BYTES value. +
CODE_POINTS_TO_STRING + + + Converts an array of extended ASCII code points to a + STRING value. +
COLLATE + + + Combines a STRING value and a collation specification into a + collation specification-supported STRING value. +
CONCAT + + + Concatenates one or more STRING or BYTES + values into a single result. +
EDIT_DISTANCE + + + Computes the Levenshtein distance between two STRING + or BYTES values. +
ENDS_WITH + + + Checks if a STRING or BYTES value is the suffix + of another value. +
FORMAT + + + Formats data and produces the results as a STRING value. +
FROM_BASE32 + + + Converts a base32-encoded STRING value into a + BYTES value. +
FROM_BASE64 + + + Converts a base64-encoded STRING value into a + BYTES value. +
FROM_HEX + + + Converts a hexadecimal-encoded STRING value into a + BYTES value. +
INITCAP + + + Formats a STRING as proper case, which means that the first + character in each word is uppercase and all other characters are lowercase. +
INSTR + + + Finds the position of a subvalue inside another value, optionally starting + the search at a given offset or occurrence. +
LEFT + + + Gets the specified leftmost portion from a STRING or + BYTES value. +
LENGTH + + + Gets the length of a STRING or BYTES value. +
LOWER + + + Formats alphabetic characters in a STRING value as + lowercase. +

+ Formats ASCII characters in a BYTES value as + lowercase. +
LPAD + + + Prepends a STRING or BYTES value with a pattern. +
LTRIM + + + Identical to the TRIM function, but only removes leading + characters. +
NORMALIZE + + + Case-sensitively normalizes the characters in a STRING value. +
NORMALIZE_AND_CASEFOLD + + + Case-insensitively normalizes the characters in a STRING value. +
OCTET_LENGTH + + + Alias for BYTE_LENGTH. +
REGEXP_CONTAINS + + + Checks if a value is a partial match for a regular expression. +
REGEXP_EXTRACT + + + Produces a substring that matches a regular expression. +
REGEXP_EXTRACT_ALL + + + Produces an array of all substrings that match a + regular expression. +
REGEXP_INSTR + + + Finds the position of a regular expression match in a value, optionally + starting the search at a given offset or occurrence. +
REGEXP_MATCH + + + (Deprecated) Checks if a value is a full match for a regular expression. +
REGEXP_REPLACE + + + Produces a STRING value where all substrings that match a + regular expression are replaced with a specified value. +
REPEAT + + + Produces a STRING or BYTES value that consists of + an original value, repeated. +
REPLACE + + + Replaces all occurrences of a pattern with another pattern in a + STRING or BYTES value. +
REVERSE + + + Reverses a STRING or BYTES value. +
RIGHT + + + Gets the specified rightmost portion from a STRING or + BYTES value. +
RPAD + + + Appends a STRING or BYTES value with a pattern. +
RTRIM + + + Identical to the TRIM function, but only removes trailing + characters. +
SAFE_CONVERT_BYTES_TO_STRING + + + Converts a BYTES value to a STRING value and + replace any invalid UTF-8 characters with the Unicode replacement character, + U+FFFD. +
SOUNDEX + + + Gets the Soundex codes for words in a STRING value. +
SPLIT + + + Splits a STRING or BYTES value, using a delimiter. +
STARTS_WITH + + + Checks if a STRING or BYTES value is a + prefix of another value. +
STRPOS + + + Finds the position of the first occurrence of a subvalue inside another + value. +
SUBSTR + + + Gets a portion of a STRING or BYTES value. +
SUBSTRING + +Alias for SUBSTR
TO_BASE32 + + + Converts a BYTES value to a + base32-encoded STRING value. +
TO_BASE64 + + + Converts a BYTES value to a + base64-encoded STRING value. +
TO_CODE_POINTS + + + Converts a STRING or BYTES value into an array of + extended ASCII code points. +
TO_HEX + + + Converts a BYTES value to a + hexadecimal STRING value. +
TRANSLATE + + + Within a value, replaces each source character with the corresponding + target character. +
TRIM + + + Removes the specified leading and trailing Unicode code points or bytes + from a STRING or BYTES value. +
UNICODE + + + Gets the Unicode code point for the first character in a value. +
UPPER + + + Formats alphabetic characters in a STRING value as + uppercase. +

+ Formats ASCII characters in a BYTES value as + uppercase. +
+ +### `ASCII` + +```sql +ASCII(value) +``` + +**Description** + +Returns the ASCII code for the first character or byte in `value`. Returns +`0` if `value` is empty or the ASCII code is `0` for the first character +or byte. + +**Return type** + +`INT64` + +**Examples** + +```sql +SELECT ASCII('abcd') as A, ASCII('a') as B, ASCII('') as C, ASCII(NULL) as D; + +/*-------+-------+-------+-------* + | A | B | C | D | + +-------+-------+-------+-------+ + | 97 | 97 | 0 | NULL | + *-------+-------+-------+-------*/ +``` + +### `BYTE_LENGTH` + +```sql +BYTE_LENGTH(value) +``` + +**Description** + +Gets the number of `BYTES` in a `STRING` or `BYTES` value, +regardless of whether the value is a `STRING` or `BYTES` type. + +**Return type** + +`INT64` + +**Examples** + +```sql +WITH example AS + (SELECT 'абвгд' AS characters, b'абвгд' AS bytes) + +SELECT + characters, + BYTE_LENGTH(characters) AS string_example, + bytes, + BYTE_LENGTH(bytes) AS bytes_example +FROM example; + +/*------------+----------------+-------+---------------* + | characters | string_example | bytes | bytes_example | + +------------+----------------+-------+---------------+ + | абвгд | 10 | абвгд | 10 | + *------------+----------------+-------+---------------*/ +``` + +### `CHAR_LENGTH` + +```sql +CHAR_LENGTH(value) +``` + +**Description** + +Gets the number of characters in a `STRING` value. + +**Return type** + +`INT64` + +**Examples** + +```sql +WITH example AS + (SELECT 'абвгд' AS characters) + +SELECT + characters, + CHAR_LENGTH(characters) AS char_length_example +FROM example; + +/*------------+---------------------* + | characters | char_length_example | + +------------+---------------------+ + | абвгд | 5 | + *------------+---------------------*/ +``` + +### `CHARACTER_LENGTH` + +```sql +CHARACTER_LENGTH(value) +``` + +**Description** + +Synonym for [CHAR_LENGTH][string-link-to-char-length]. + +**Return type** + +`INT64` + +**Examples** + +```sql +WITH example AS + (SELECT 'абвгд' AS characters) + +SELECT + characters, + CHARACTER_LENGTH(characters) AS char_length_example +FROM example; + +/*------------+---------------------* + | characters | char_length_example | + +------------+---------------------+ + | абвгд | 5 | + *------------+---------------------*/ +``` + +[string-link-to-char-length]: #char_length + +### `CHR` + +```sql +CHR(value) +``` + +**Description** + +Takes a Unicode [code point][string-link-to-code-points-wikipedia] and returns +the character that matches the code point. Each valid code point should fall +within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. Returns an empty string +if the code point is `0`. If an invalid Unicode code point is specified, an +error is returned. + +To work with an array of Unicode code points, see +[`CODE_POINTS_TO_STRING`][string-link-to-codepoints-to-string] + +**Return type** + +`STRING` + +**Examples** + +```sql +SELECT CHR(65) AS A, CHR(255) AS B, CHR(513) AS C, CHR(1024) AS D; + +/*-------+-------+-------+-------* + | A | B | C | D | + +-------+-------+-------+-------+ + | A | ÿ | È | Ѐ | + *-------+-------+-------+-------*/ +``` + +```sql +SELECT CHR(97) AS A, CHR(0xF9B5) AS B, CHR(0) AS C, CHR(NULL) AS D; + +/*-------+-------+-------+-------* + | A | B | C | D | + +-------+-------+-------+-------+ + | a | 例 | | NULL | + *-------+-------+-------+-------*/ +``` + +[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point + +[string-link-to-codepoints-to-string]: #code_points_to_string + +### `CODE_POINTS_TO_BYTES` + +```sql +CODE_POINTS_TO_BYTES(ascii_code_points) +``` + +**Description** + +Takes an array of extended ASCII +[code points][string-link-to-code-points-wikipedia] +as `ARRAY` and returns `BYTES`. + +To convert from `BYTES` to an array of code points, see +[TO_CODE_POINTS][string-link-to-code-points]. + +**Return type** + +`BYTES` + +**Examples** + +The following is a basic example using `CODE_POINTS_TO_BYTES`. + +```sql +SELECT CODE_POINTS_TO_BYTES([65, 98, 67, 100]) AS bytes; + +/*----------* + | bytes | + +----------+ + | AbCd | + *----------*/ +``` + +The following example uses a rotate-by-13 places (ROT13) algorithm to encode a +string. + +```sql +SELECT CODE_POINTS_TO_BYTES(ARRAY_AGG( + (SELECT + CASE + WHEN chr BETWEEN b'a' and b'z' + THEN TO_CODE_POINTS(b'a')[offset(0)] + + MOD(code+13-TO_CODE_POINTS(b'a')[offset(0)],26) + WHEN chr BETWEEN b'A' and b'Z' + THEN TO_CODE_POINTS(b'A')[offset(0)] + + MOD(code+13-TO_CODE_POINTS(b'A')[offset(0)],26) + ELSE code + END + FROM + (SELECT code, CODE_POINTS_TO_BYTES([code]) chr) + ) ORDER BY OFFSET)) AS encoded_string +FROM UNNEST(TO_CODE_POINTS(b'Test String!')) code WITH OFFSET; + +/*------------------* + | encoded_string | + +------------------+ + | Grfg Fgevat! | + *------------------*/ +``` + +[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point + +[string-link-to-code-points]: #to_code_points + +### `CODE_POINTS_TO_STRING` + +```sql +CODE_POINTS_TO_STRING(unicode_code_points) +``` + +**Description** + +Takes an array of Unicode [code points][string-link-to-code-points-wikipedia] +as `ARRAY` and returns a `STRING`. + +To convert from a string to an array of code points, see +[TO_CODE_POINTS][string-link-to-code-points]. + +**Return type** + +`STRING` + +**Examples** + +The following are basic examples using `CODE_POINTS_TO_STRING`. + +```sql +SELECT CODE_POINTS_TO_STRING([65, 255, 513, 1024]) AS string; + +/*--------* + | string | + +--------+ + | AÿÈЀ | + *--------*/ +``` + +```sql +SELECT CODE_POINTS_TO_STRING([97, 0, 0xF9B5]) AS string; + +/*--------* + | string | + +--------+ + | a例 | + *--------*/ +``` + +```sql +SELECT CODE_POINTS_TO_STRING([65, 255, NULL, 1024]) AS string; + +/*--------* + | string | + +--------+ + | NULL | + *--------*/ +``` + +The following example computes the frequency of letters in a set of words. + +```sql +WITH Words AS ( + SELECT word + FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word +) +SELECT + CODE_POINTS_TO_STRING([code_point]) AS letter, + COUNT(*) AS letter_count +FROM Words, + UNNEST(TO_CODE_POINTS(word)) AS code_point +GROUP BY 1 +ORDER BY 2 DESC; + +/*--------+--------------* + | letter | letter_count | + +--------+--------------+ + | a | 5 | + | f | 3 | + | r | 2 | + | b | 2 | + | l | 2 | + | o | 2 | + | g | 1 | + | z | 1 | + | e | 1 | + | m | 1 | + | i | 1 | + *--------+--------------*/ +``` + +[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point + +[string-link-to-code-points]: #to_code_points + +### `COLLATE` + +```sql +COLLATE(value, collate_specification) +``` + +Takes a `STRING` and a [collation specification][link-collation-spec]. Returns +a `STRING` with a collation specification. If `collate_specification` is empty, +returns a value with collation removed from the `STRING`. + +The collation specification defines how the resulting `STRING` can be compared +and sorted. To learn more, see +[Working with collation][link-collation-concepts]. + ++ `collation_specification` must be a string literal, otherwise an error is + thrown. ++ Returns `NULL` if `value` is `NULL`. + +**Return type** + +`STRING` + +**Examples** + +In this example, the weight of `a` is less than the weight of `Z`. This +is because the collate specification, `und:ci` assigns more weight to `Z`. + +```sql +WITH Words AS ( + SELECT + COLLATE('a', 'und:ci') AS char1, + COLLATE('Z', 'und:ci') AS char2 +) +SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z +FROM Words; + +/*----------------* + | a_less_than_Z | + +----------------+ + | TRUE | + *----------------*/ +``` + +In this example, the weight of `a` is greater than the weight of `Z`. This +is because the default collate specification assigns more weight to `a`. + +```sql +WITH Words AS ( + SELECT + 'a' AS char1, + 'Z' AS char2 +) +SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z +FROM Words; + +/*----------------* + | a_less_than_Z | + +----------------+ + | FALSE | + *----------------*/ +``` + +[link-collation-spec]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_spec_details + +[link-collation-concepts]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#working_with_collation + +### `CONCAT` + +```sql +CONCAT(value1[, ...]) +``` + +**Description** + +Concatenates one or more values into a single result. All values must be +`BYTES` or data types that can be cast to `STRING`. + +The function returns `NULL` if any input argument is `NULL`. + +Note: You can also use the +[|| concatenation operator][string-link-to-operators] to concatenate +values into a string. + +**Return type** + +`STRING` or `BYTES` + +**Examples** + +```sql +SELECT CONCAT('T.P.', ' ', 'Bar') as author; + +/*---------------------* + | author | + +---------------------+ + | T.P. Bar | + *---------------------*/ +``` + +```sql +SELECT CONCAT('Summer', ' ', 1923) as release_date; + +/*---------------------* + | release_date | + +---------------------+ + | Summer 1923 | + *---------------------*/ +``` + +```sql + +With Employees AS + (SELECT + 'John' AS first_name, + 'Doe' AS last_name + UNION ALL + SELECT + 'Jane' AS first_name, + 'Smith' AS last_name + UNION ALL + SELECT + 'Joe' AS first_name, + 'Jackson' AS last_name) + +SELECT + CONCAT(first_name, ' ', last_name) + AS full_name +FROM Employees; + +/*---------------------* + | full_name | + +---------------------+ + | John Doe | + | Jane Smith | + | Joe Jackson | + *---------------------*/ +``` + +[string-link-to-operators]: #operators + +### `EDIT_DISTANCE` + +```sql +EDIT_DISTANCE(value1, value2, [max_distance => max_distance_value]) +``` + +**Description** + +Computes the [Levenshtein distance][l-distance] between two `STRING` or +`BYTES` values. + +**Definitions** + ++ `value1`: The first `STRING` or `BYTES` value to compare. ++ `value2`: The second `STRING` or `BYTES` value to compare. ++ `max_distance`: Optional mandatory-named argument. Takes a non-negative + `INT64` value that represents the maximum distance between the two values + to compute. + + If this distance is exceeded, the function returns this value. + The default value for this argument is the maximum size of + `value1` and `value2`. + +**Details** + +If `value1` or `value2` is `NULL`, `NULL` is returned. + +You can only compare values of the same type. Otherwise, an error is produced. + +**Return type** + +`INT64` + +**Examples** + +In the following example, the first character in both strings is different: + +```sql +SELECT EDIT_DISTANCE('a', 'b') AS results; + +/*---------* + | results | + +---------+ + | 1 | + *---------*/ +``` + +In the following example, the first and second characters in both strings are +different: + +```sql +SELECT EDIT_DISTANCE('aa', 'b') AS results; + +/*---------* + | results | + +---------+ + | 2 | + *---------*/ +``` + +In the following example, only the first character in both strings is +different: + +```sql +SELECT EDIT_DISTANCE('aa', 'ba') AS results; + +/*---------* + | results | + +---------+ + | 1 | + *---------*/ +``` + +In the following example, the last six characters are different, but because +the maximum distance is `2`, this function exits early and returns `2`, the +maximum distance: + +```sql +SELECT EDIT_DISTANCE('abcdefg', 'a', max_distance => 2) AS results; + +/*---------* + | results | + +---------+ + | 2 | + *---------*/ +``` + +[l-distance]: https://en.wikipedia.org/wiki/Levenshtein_distance + +### `ENDS_WITH` + +```sql +ENDS_WITH(value, suffix) +``` + +**Description** + +Takes two `STRING` or `BYTES` values. Returns `TRUE` if `suffix` +is a suffix of `value`. + +This function supports specifying [collation][collation]. + +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about + +**Return type** + +`BOOL` + +**Examples** + +```sql +WITH items AS + (SELECT 'apple' as item + UNION ALL + SELECT 'banana' as item + UNION ALL + SELECT 'orange' as item) + +SELECT + ENDS_WITH(item, 'e') as example +FROM items; + +/*---------* + | example | + +---------+ + | True | + | False | + | True | + *---------*/ +``` + +### `FORMAT` + + +```sql +FORMAT(format_string_expression, data_type_expression[, ...]) +``` + +**Description** + +`FORMAT` formats a data type expression as a string. + ++ `format_string_expression`: Can contain zero or more + [format specifiers][format-specifiers]. Each format specifier is introduced + by the `%` symbol, and must map to one or more of the remaining arguments. + In general, this is a one-to-one mapping, except when the `*` specifier is + present. For example, `%.*i` maps to two arguments—a length argument + and a signed integer argument. If the number of arguments related to the + format specifiers is not the same as the number of arguments, an error occurs. ++ `data_type_expression`: The value to format as a string. This can be any + ZetaSQL data type. + +**Return type** + +`STRING` + +**Examples** + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DescriptionStatementResult
Simple integerFORMAT('%d', 10)10
Integer with left blank paddingFORMAT('|%10d|', 11)|           11|
Integer with left zero paddingFORMAT('+%010d+', 12)+0000000012+
Integer with commasFORMAT("%'d", 123456789)123,456,789
STRINGFORMAT('-%s-', 'abcd efg')-abcd efg-
DOUBLEFORMAT('%f %E', 1.1, 2.2)1.100000 2.200000E+00
DATEFORMAT('%t', date '2015-09-01')2015-09-01
TIMESTAMPFORMAT('%t', timestamp '2015-09-01 12:34:56 +America/Los_Angeles')2015‑09‑01 19:34:56+00
+ +The `FORMAT()` function does not provide fully customizable formatting for all +types and values, nor formatting that is sensitive to locale. + +If custom formatting is necessary for a type, you must first format it using +type-specific format functions, such as `FORMAT_DATE()` or `FORMAT_TIMESTAMP()`. +For example: + +```sql +SELECT FORMAT('date: %s!', FORMAT_DATE('%B %d, %Y', date '2015-01-02')); +``` + +Returns + +``` +date: January 02, 2015! +``` + +#### Supported format specifiers + + +``` +%[flags][width][.precision]specifier +``` + +A [format specifier][format-specifier-list] adds formatting when casting a +value to a string. It can optionally contain these sub-specifiers: + ++ [Flags][flags] ++ [Width][width] ++ [Precision][precision] + +Additional information about format specifiers: + ++ [%g and %G behavior][g-and-g-behavior] ++ [%p and %P behavior][p-and-p-behavior] ++ [%t and %T behavior][t-and-t-behavior] ++ [Error conditions][error-format-specifiers] ++ [NULL argument handling][null-format-specifiers] ++ [Additional semantic rules][rules-format-specifiers] + +##### Format specifiers + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SpecifierDescriptionExamplesTypes
d or iDecimal integer392 + +INT32
INT64
UINT32
UINT64
+
uUnsigned integer7235 + +UINT32
UINT64
+
o + Octal +

+ Note: If an INT64 value is negative, an error is produced. +
610 + +INT32
INT64
UINT32
UINT64
+
x + Hexadecimal integer +

+ Note: If an INT64 value is negative, an error is produced. +
7fa + +INT32
INT64
UINT32
UINT64
+
X + Hexadecimal integer (uppercase) +

+ Note: If an INT64 value is negative, an error is produced. +
7FA + +INT32
INT64
UINT32
UINT64
+ +
fDecimal notation, in [-](integer part).(fractional part) for finite + values, and in lowercase for non-finite values392.650000
+ inf
+ nan
+ +NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
+
FDecimal notation, in [-](integer part).(fractional part) for finite + values, and in uppercase for non-finite values392.650000
+ INF
+ NAN
+ +NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
+
eScientific notation (mantissa/exponent), lowercase3.926500e+02
+ inf
+ nan
+ +NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
+
EScientific notation (mantissa/exponent), uppercase3.926500E+02
+ INF
+ NAN
+ +NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
+
gEither decimal notation or scientific notation, depending on the input + value's exponent and the specified precision. Lowercase. + See %g and %G behavior for details.392.65
+ 3.9265e+07
+ inf
+ nan
+ +NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
+
G + Either decimal notation or scientific notation, depending on the input + value's exponent and the specified precision. Uppercase. + See %g and %G behavior for details. + + 392.65
+ 3.9265E+07
+ INF
+ NAN +
+ +NUMERIC
BIGNUMERIC
FLOAT
DOUBLE
+
p + + Produces a one-line printable string representing a protocol buffer + or JSON. + + See %p and %P behavior. + + +
year: 2019 month: 10
+ + +
{"month":10,"year":2019}
+ +
+ +JSON
PROTO
+
P + + Produces a multi-line printable string representing a protocol buffer + or JSON. + + See %p and %P behavior. + + +
+year: 2019
+month: 10
+
+ + +
+{
+  "month": 10,
+  "year": 2019
+}
+
+ +
+ +JSON
PROTO
+
sString of characterssample + +STRING
+
t + Returns a printable string representing the value. Often looks + similar to casting the argument to STRING. + See %t and %T behavior. + + sample
+ 2014‑01‑01 +
Any type
T + Produces a string that is a valid ZetaSQL constant with a + similar type to the value's type (maybe wider, or maybe string). + See %t and %T behavior. + + 'sample'
+ b'bytes sample'
+ 1234
+ 2.3
+ date '2014‑01‑01' +
Any type
%'%%' produces a single '%'%n/a
+ +The format specifier can optionally contain the sub-specifiers identified above +in the specifier prototype. + +These sub-specifiers must comply with the following specifications. + +##### Flags + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FlagsDescription
-Left-justify within the given field width; Right justification is the +default (see width sub-specifier)
+Forces to precede the result with a plus or minus sign (+ +or -) even for positive numbers. By default, only negative numbers +are preceded with a - sign
<space>If no sign is going to be written, a blank space is inserted before the +value
#
    +
  • For `%o`, `%x`, and `%X`, this flag means to precede the + value with 0, 0x or 0X respectively for values different than zero.
  • +
  • For `%f`, `%F`, `%e`, and `%E`, this flag means to add the decimal + point even when there is no fractional part, unless the value + is non-finite.
  • +
  • For `%g` and `%G`, this flag means to add the decimal point even + when there is no fractional part unless the value is non-finite, and + never remove the trailing zeros after the decimal point.
  • +
+
0 + Left-pads the number with zeroes (0) instead of spaces when padding is + specified (see width sub-specifier)
' +

Formats integers using the appropriating grouping character. + For example:

+
    +
  • FORMAT("%'d", 12345678) returns 12,345,678
  • +
  • FORMAT("%'x", 12345678) returns bc:614e
  • +
  • FORMAT("%'o", 55555) returns 15,4403
  • +

    This flag is only relevant for decimal, hex, and octal values.

    +
+
+ +Flags may be specified in any order. Duplicate flags are not an error. When +flags are not relevant for some element type, they are ignored. + +##### Width + + + + + + + + + + + + + + + +
WidthDescription
<number> + Minimum number of characters to be printed. If the value to be printed + is shorter than this number, the result is padded with blank spaces. + The value is not truncated even if the result is larger +
* + The width is not specified in the format string, but as an additional + integer value argument preceding the argument that has to be formatted +
+ +##### Precision + + + + + + + + + + + + + + + +
PrecisionDescription
.<number> +
    +
  • For integer specifiers `%d`, `%i`, `%o`, `%u`, `%x`, and `%X`: + precision specifies the + minimum number of digits to be written. If the value to be written is + shorter than this number, the result is padded with trailing zeros. + The value is not truncated even if the result is longer. A precision + of 0 means that no character is written for the value 0.
  • +
  • For specifiers `%a`, `%A`, `%e`, `%E`, `%f`, and `%F`: this is the + number of digits to be printed after the decimal point. The default + value is 6.
  • +
  • For specifiers `%g` and `%G`: this is the number of significant digits + to be printed, before the removal of the trailing zeros after the + decimal point. The default value is 6.
  • +
+
.* + The precision is not specified in the format string, but as an + additional integer value argument preceding the argument that has to be + formatted +
+ +##### %g and %G behavior + +The `%g` and `%G` format specifiers choose either the decimal notation (like +the `%f` and `%F` specifiers) or the scientific notation (like the `%e` and `%E` +specifiers), depending on the input value's exponent and the specified +[precision](#precision). + +Let p stand for the specified [precision](#precision) (defaults to 6; 1 if the +specified precision is less than 1). The input value is first converted to +scientific notation with precision = (p - 1). If the resulting exponent part x +is less than -4 or no less than p, the scientific notation with precision = +(p - 1) is used; otherwise the decimal notation with precision = (p - 1 - x) is +used. + +Unless [`#` flag](#flags) is present, the trailing zeros after the decimal point +are removed, and the decimal point is also removed if there is no digit after +it. + +##### %p and %P behavior + + +The `%p` format specifier produces a one-line printable string. The `%P` +format specifier produces a multi-line printable string. You can use these +format specifiers with the following data types: + + + + + + + + + + + + + + + + + + + + + +
Type%p%P
PROTO +

PROTO input:

+
+message ReleaseDate {
+ required int32 year = 1 [default=2019];
+ required int32 month = 2 [default=10];
+}
+

Produces a one-line printable string representing a protocol buffer:

+
year: 2019 month: 10
+
+

PROTO input:

+
+message ReleaseDate {
+ required int32 year = 1 [default=2019];
+ required int32 month = 2 [default=10];
+}
+

Produces a multi-line printable string representing a protocol buffer:

+
+year: 2019
+month: 10
+
+
JSON +

JSON input:

+
+JSON '
+{
+  "month": 10,
+  "year": 2019
+}
+'
+

Produces a one-line printable string representing JSON:

+
{"month":10,"year":2019}
+
+

JSON input:

+
+JSON '
+{
+  "month": 10,
+  "year": 2019
+}
+'
+

Produces a multi-line printable string representing JSON:

+
+{
+  "month": 10,
+  "year": 2019
+}
+
+
+ +##### %t and %T behavior + + +The `%t` and `%T` format specifiers are defined for all types. The +[width](#width), [precision](#precision), and [flags](#flags) act as they do +for `%s`: the [width](#width) is the minimum width and the `STRING` will be +padded to that size, and [precision](#precision) is the maximum width +of content to show and the `STRING` will be truncated to that size, prior to +padding to width. + +The `%t` specifier is always meant to be a readable form of the value. + +The `%T` specifier is always a valid SQL literal of a similar type, such as a +wider numeric type. +The literal will not include casts or a type name, except for the special case +of non-finite floating point values. + +The `STRING` is formatted as follows: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Type%t%T
NULL of any typeNULLNULL
+ +INT32
INT64
UINT32
UINT64
+
123123
NUMERIC123.0 (always with .0) + NUMERIC "123.0"
FLOAT, DOUBLE + 123.0 (always with .0)
+ 123e+10
inf
-inf
NaN +
+ 123.0 (always with .0)
+ 123e+10
+ CAST("inf" AS <type>)
+ CAST("-inf" AS <type>)
+ CAST("nan" AS <type>) +
STRINGunquoted string valuequoted string literal
BYTES + unquoted escaped bytes
+ e.g., abc\x01\x02 +
+ quoted bytes literal
+ e.g., b"abc\x01\x02" +
BOOLboolean valueboolean value
ENUMEnumName"EnumName"
DATE2011-02-03DATE "2011-02-03"
TIMESTAMP2011-02-03 04:05:06+00TIMESTAMP "2011-02-03 04:05:06+00"
INTERVAL1-2 3 4:5:6.789INTERVAL "1-2 3 4:5:6.789" YEAR TO SECOND
PROTO + one-line printable string representing a protocol buffer. + + quoted string literal with one-line printable string representing a + protocol buffer. +
ARRAY[value, value, ...]
+ where values are formatted with %t
[value, value, ...]
+ where values are formatted with %T
STRUCT(value, value, ...)
+ where fields are formatted with %t
(value, value, ...)
+ where fields are formatted with %T
+
+ Special cases:
+ Zero fields: STRUCT()
+ One field: STRUCT(value)
JSON + one-line printable string representing JSON.
+
{"name":"apple","stock":3}
+
+ one-line printable string representing a JSON literal.
+
JSON '{"name":"apple","stock":3}'
+
+ +##### Error conditions + + +If a format specifier is invalid, or is not compatible with the related +argument type, or the wrong number or arguments are provided, then an error is +produced. For example, the following `` expressions are invalid: + +```sql +FORMAT('%s', 1) +``` -**Return type** +```sql +FORMAT('%') +``` -`BOOL` +##### NULL argument handling + -**Example** +A `NULL` format string results in a `NULL` output `STRING`. Any other arguments +are ignored in this case. -The following query tests whether the polygon `POLYGON((1 1, 20 1, 10 20, 1 1))` -covers each of the three points `(0, 0)`, `(1, 1)`, and `(10, 10)`, which lie -on the exterior, the boundary, and the interior of the polygon respectively. +The function generally produces a `NULL` value if a `NULL` argument is present. +For example, `FORMAT('%i', NULL_expression)` produces a `NULL STRING` as +output. -```sql -SELECT - ST_GEOGPOINT(i, i) AS p, - ST_COVERS(ST_GEOGFROMTEXT('POLYGON((1 1, 20 1, 10 20, 1 1))'), - ST_GEOGPOINT(i, i)) AS `covers` -FROM UNNEST([0, 1, 10]) AS i; +However, there are some exceptions: if the format specifier is %t or %T +(both of which produce `STRING`s that effectively match CAST and literal value +semantics), a `NULL` value produces 'NULL' (without the quotes) in the result +`STRING`. For example, the function: -/*--------------+--------* - | p | covers | - +--------------+--------+ - | POINT(0 0) | FALSE | - | POINT(1 1) | TRUE | - | POINT(10 10) | TRUE | - *--------------+--------*/ +```sql +FORMAT('00-%t-00', NULL_expression); ``` -### `ST_DIFFERENCE` +Returns ```sql -ST_DIFFERENCE(geography_1, geography_2) +00-NULL-00 ``` -**Description** +##### Additional semantic rules + -Returns a `GEOGRAPHY` that represents the point set -difference of `geography_1` and `geography_2`. Therefore, the result consists of -the part of `geography_1` that does not intersect with `geography_2`. +`DOUBLE` and +`FLOAT` values can be `+/-inf` or `NaN`. +When an argument has one of those values, the result of the format specifiers +`%f`, `%F`, `%e`, `%E`, `%g`, `%G`, and `%t` are `inf`, `-inf`, or `nan` +(or the same in uppercase) as appropriate. This is consistent with how +ZetaSQL casts these values to `STRING`. For `%T`, +ZetaSQL returns quoted strings for +`DOUBLE` values that don't have non-string literal +representations. -If `geometry_1` is completely contained in `geometry_2`, then `ST_DIFFERENCE` -returns an empty `GEOGRAPHY`. +[format-specifiers]: #format_specifiers -**Constraints** +[format-specifier-list]: #format_specifier_list -The underlying geometric objects that a ZetaSQL -`GEOGRAPHY` represents correspond to a *closed* point -set. Therefore, `ST_DIFFERENCE` is the closure of the point set difference of -`geography_1` and `geography_2`. This implies that if `geography_1` and -`geography_2` intersect, then a portion of the boundary of `geography_2` could -be in the difference. +[flags]: #flags -**Return type** +[width]: #width -`GEOGRAPHY` +[precision]: #precision -**Example** +[g-and-g-behavior]: #g_and_g_behavior -The following query illustrates the difference between `geog1`, a larger polygon -`POLYGON((0 0, 10 0, 10 10, 0 0))` and `geog1`, a smaller polygon -`POLYGON((4 2, 6 2, 8 6, 4 2))` that intersects with `geog1`. The result is -`geog1` with a hole where `geog2` intersects with it. +[p-and-p-behavior]: #p_and_p_behavior -```sql -SELECT - ST_DIFFERENCE( - ST_GEOGFROMTEXT('POLYGON((0 0, 10 0, 10 10, 0 0))'), - ST_GEOGFROMTEXT('POLYGON((4 2, 6 2, 8 6, 4 2))') - ); +[t-and-t-behavior]: #t_and_t_behavior -/*--------------------------------------------------------* - | difference_of_geog1_and_geog2 | - +--------------------------------------------------------+ - | POLYGON((0 0, 10 0, 10 10, 0 0), (8 6, 6 2, 4 2, 8 6)) | - *--------------------------------------------------------*/ -``` +[error-format-specifiers]: #error_format_specifiers -### `ST_DIMENSION` +[null-format-specifiers]: #null_format_specifiers + +[rules-format-specifiers]: #rules_format_specifiers + +### `FROM_BASE32` ```sql -ST_DIMENSION(geography_expression) +FROM_BASE32(string_expr) ``` **Description** -Returns the dimension of the highest-dimensional element in the input -`GEOGRAPHY`. +Converts the base32-encoded input `string_expr` into `BYTES` format. To convert +`BYTES` to a base32-encoded `STRING`, use [TO_BASE32][string-link-to-base32]. -The dimension of each possible element is as follows: +**Return type** -+ The dimension of a point is `0`. -+ The dimension of a linestring is `1`. -+ The dimension of a polygon is `2`. +`BYTES` -If the input `GEOGRAPHY` is empty, `ST_DIMENSION` -returns `-1`. +**Example** -**Return type** +```sql +SELECT FROM_BASE32('MFRGGZDF74======') AS byte_data; -`INT64` +/*-----------* + | byte_data | + +-----------+ + | abcde\xff | + *-----------*/ +``` -### `ST_DISJOINT` +[string-link-to-base32]: #to_base32 + +### `FROM_BASE64` ```sql -ST_DISJOINT(geography_1, geography_2) +FROM_BASE64(string_expr) ``` **Description** -Returns `TRUE` if the intersection of `geography_1` and `geography_2` is empty, -that is, no point in `geography_1` also appears in `geography_2`. +Converts the base64-encoded input `string_expr` into +`BYTES` format. To convert +`BYTES` to a base64-encoded `STRING`, +use [TO_BASE64][string-link-to-base64]. -`ST_DISJOINT` is the logical negation of [`ST_INTERSECTS`][st-intersects]. +There are several base64 encodings in common use that vary in exactly which +alphabet of 65 ASCII characters are used to encode the 64 digits and padding. +See [RFC 4648][RFC-4648] for details. This +function expects the alphabet `[A-Za-z0-9+/=]`. **Return type** -`BOOL` +`BYTES` -[st-intersects]: #st_intersects +**Example** -### `ST_DISTANCE` +```sql +SELECT FROM_BASE64('/+A=') AS byte_data; +/*-----------* + | byte_data | + +-----------+ + | \377\340 | + *-----------*/ ``` -ST_DISTANCE(geography_1, geography_2[, use_spheroid]) -``` - -**Description** - -Returns the shortest distance in meters between two non-empty -`GEOGRAPHY`s. -If either of the input `GEOGRAPHY`s is empty, -`ST_DISTANCE` returns `NULL`. +To work with an encoding using a different base64 alphabet, you might need to +compose `FROM_BASE64` with the `REPLACE` function. For instance, the +`base64url` url-safe and filename-safe encoding commonly used in web programming +uses `-_=` as the last characters rather than `+/=`. To decode a +`base64url`-encoded string, replace `-` and `_` with `+` and `/` respectively. -The optional `use_spheroid` parameter determines how this function measures -distance. If `use_spheroid` is `FALSE`, the function measures distance on the -surface of a perfect sphere. If `use_spheroid` is `TRUE`, the function measures -distance on the surface of the [WGS84][wgs84-link] spheroid. The default value -of `use_spheroid` is `FALSE`. +```sql +SELECT FROM_BASE64(REPLACE(REPLACE('_-A=', '-', '+'), '_', '/')) AS binary; -**Return type** +/*-----------* + | binary | + +-----------+ + | \377\340 | + *-----------*/ +``` -`DOUBLE` +[RFC-4648]: https://tools.ietf.org/html/rfc4648#section-4 -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +[string-link-to-from-base64]: #from_base64 -### `ST_DUMP` +### `FROM_HEX` ```sql -ST_DUMP(geography[, dimension]) +FROM_HEX(string) ``` **Description** -Returns an `ARRAY` of simple -`GEOGRAPHY`s where each element is a component of -the input `GEOGRAPHY`. A simple -`GEOGRAPHY` consists of a single point, linestring, -or polygon. If the input `GEOGRAPHY` is simple, the -result is a single element. When the input -`GEOGRAPHY` is a collection, `ST_DUMP` returns an -`ARRAY` with one simple -`GEOGRAPHY` for each component in the collection. - -If `dimension` is provided, the function only returns -`GEOGRAPHY`s of the corresponding dimension. A -dimension of -1 is equivalent to omitting `dimension`. - -**Return Type** - -`ARRAY` - -**Examples** - -The following example shows how `ST_DUMP` returns the simple geographies within -a complex geography. +Converts a hexadecimal-encoded `STRING` into `BYTES` format. Returns an error +if the input `STRING` contains characters outside the range +`(0..9, A..F, a..f)`. The lettercase of the characters does not matter. If the +input `STRING` has an odd number of characters, the function acts as if the +input has an additional leading `0`. To convert `BYTES` to a hexadecimal-encoded +`STRING`, use [TO_HEX][string-link-to-to-hex]. -```sql -WITH example AS ( - SELECT ST_GEOGFROMTEXT('POINT(0 0)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 0, 1 1)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))')) -SELECT - geography AS original_geography, - ST_DUMP(geography) AS dumped_geographies -FROM example +**Return type** -/*-------------------------------------+------------------------------------* - | original_geographies | dumped_geographies | - +-------------------------------------+------------------------------------+ - | POINT(0 0) | [POINT(0 0)] | - | MULTIPOINT(0 0, 1 1) | [POINT(0 0), POINT(1 1)] | - | GEOMETRYCOLLECTION(POINT(0 0), | [POINT(0 0), LINESTRING(1 2, 2 1)] | - | LINESTRING(1 2, 2 1)) | | - *-------------------------------------+------------------------------------*/ - ``` +`BYTES` -The following example shows how `ST_DUMP` with the dimension argument only -returns simple geographies of the given dimension. +**Example** ```sql -WITH example AS ( - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography) -SELECT - geography AS original_geography, - ST_DUMP(geography, 1) AS dumped_geographies -FROM example +WITH Input AS ( + SELECT '00010203aaeeefff' AS hex_str UNION ALL + SELECT '0AF' UNION ALL + SELECT '666f6f626172' +) +SELECT hex_str, FROM_HEX(hex_str) AS bytes_str +FROM Input; -/*-------------------------------------+------------------------------* - | original_geographies | dumped_geographies | - +-------------------------------------+------------------------------+ - | GEOMETRYCOLLECTION(POINT(0 0), | [LINESTRING(1 2, 2 1)] | - | LINESTRING(1 2, 2 1)) | | - *-------------------------------------+------------------------------*/ +/*------------------+----------------------------------* + | hex_str | bytes_str | + +------------------+----------------------------------+ + | 0AF | \x00\xaf | + | 00010203aaeeefff | \x00\x01\x02\x03\xaa\xee\xef\xff | + | 666f6f626172 | foobar | + *------------------+----------------------------------*/ ``` -### `ST_DUMPPOINTS` +[string-link-to-to-hex]: #to_hex + +### `INITCAP` ```sql -ST_DUMPPOINTS(geography) +INITCAP(value[, delimiters]) ``` **Description** -Takes an input geography and returns all of its points, line vertices, and -polygon vertices as an array of point geographies. +Takes a `STRING` and returns it with the first character in each word in +uppercase and all other characters in lowercase. Non-alphabetic characters +remain the same. -**Return Type** +`delimiters` is an optional string argument that is used to override the default +set of characters used to separate words. If `delimiters` is not specified, it +defaults to the following characters: \ +` [ ] ( ) { } / | \ < > ! ? @ " ^ # $ & ~ _ , . : ; * % + -` -`ARRAY` +If `value` or `delimiters` is `NULL`, the function returns `NULL`. + +**Return type** + +`STRING` **Examples** ```sql -WITH example AS ( - SELECT ST_GEOGFROMTEXT('POINT(0 0)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 0, 1 1)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))')) -SELECT - geography AS original_geography, - ST_DUMPPOINTS(geography) AS dumped_points_geographies -FROM example +WITH example AS +( + SELECT 'Hello World-everyone!' AS value UNION ALL + SELECT 'tHe dog BARKS loudly+friendly' AS value UNION ALL + SELECT 'apples&oranges;&pears' AS value UNION ALL + SELECT 'καθίσματα ταινιών' AS value +) +SELECT value, INITCAP(value) AS initcap_value FROM example -/*-------------------------------------+------------------------------------* - | original_geographies | dumped_points_geographies | - +-------------------------------------+------------------------------------+ - | POINT(0 0) | [POINT(0 0)] | - | MULTIPOINT(0 0, 1 1) | [POINT(0 0),POINT(1 1)] | - | GEOMETRYCOLLECTION(POINT(0 0), | [POINT(0 0),POINT(1 2),POINT(2 1)] | - | LINESTRING(1 2, 2 1)) | | - *-------------------------------------+------------------------------------*/ +/*-------------------------------+-------------------------------* + | value | initcap_value | + +-------------------------------+-------------------------------+ + | Hello World-everyone! | Hello World-Everyone! | + | tHe dog BARKS loudly+friendly | The Dog Barks Loudly+Friendly | + | apples&oranges;&pears | Apples&Oranges;&Pears | + | καθίσματα ταινιών | Καθίσματα Ταινιών | + *-------------------------------+-------------------------------*/ + +WITH example AS +( + SELECT 'hello WORLD!' AS value, '' AS delimiters UNION ALL + SELECT 'καθίσματα ταιντιώ@ν' AS value, 'Ï„@' AS delimiters UNION ALL + SELECT 'Apples1oranges2pears' AS value, '12' AS delimiters UNION ALL + SELECT 'tHisEisEaESentence' AS value, 'E' AS delimiters +) +SELECT value, delimiters, INITCAP(value, delimiters) AS initcap_value FROM example; + +/*----------------------+------------+----------------------* + | value | delimiters | initcap_value | + +----------------------+------------+----------------------+ + | hello WORLD! | | Hello world! | + | καθίσματα ταιντιώ@ν | Ï„@ | ΚαθίσματΑ τΑιντΙώ@Î | + | Apples1oranges2pears | 12 | Apples1Oranges2Pears | + | tHisEisEaESentence | E | ThisEIsEAESentence | + *----------------------+------------+----------------------*/ ``` -### `ST_DWITHIN` +### `INSTR` ```sql -ST_DWITHIN(geography_1, geography_2, distance[, use_spheroid]) +INSTR(value, subvalue[, position[, occurrence]]) ``` **Description** -Returns `TRUE` if the distance between at least one point in `geography_1` and -one point in `geography_2` is less than or equal to the distance given by the -`distance` argument; otherwise, returns `FALSE`. If either input -`GEOGRAPHY` is empty, `ST_DWithin` returns `FALSE`. The -given `distance` is in meters on the surface of the Earth. +Returns the lowest 1-based position of `subvalue` in `value`. +`value` and `subvalue` must be the same type, either +`STRING` or `BYTES`. -The optional `use_spheroid` parameter determines how this function measures -distance. If `use_spheroid` is `FALSE`, the function measures distance on the -surface of a perfect sphere. +If `position` is specified, the search starts at this position in +`value`, otherwise it starts at `1`, which is the beginning of +`value`. If `position` is negative, the function searches backwards +from the end of `value`, with `-1` indicating the last character. +`position` is of type `INT64` and cannot be `0`. -The `use_spheroid` parameter currently only supports -the value `FALSE`. The default value of `use_spheroid` is `FALSE`. +If `occurrence` is specified, the search returns the position of a specific +instance of `subvalue` in `value`. If not specified, `occurrence` +defaults to `1` and returns the position of the first occurrence. +For `occurrence` > `1`, the function includes overlapping occurrences. +`occurrence` is of type `INT64` and must be positive. -**Return type** +This function supports specifying [collation][collation]. -`BOOL` +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +Returns `0` if: -### `ST_ENDPOINT` ++ No match is found. ++ If `occurrence` is greater than the number of matches found. ++ If `position` is greater than the length of `value`. -```sql -ST_ENDPOINT(linestring_geography) -``` +Returns `NULL` if: -**Description** ++ Any input argument is `NULL`. -Returns the last point of a linestring geography as a point geography. Returns -an error if the input is not a linestring or if the input is empty. Use the -`SAFE` prefix to obtain `NULL` for invalid input instead of an error. +Returns an error if: -**Return Type** ++ `position` is `0`. ++ `occurrence` is `0` or negative. -Point `GEOGRAPHY` +**Return type** -**Example** +`INT64` + +**Examples** ```sql -SELECT ST_ENDPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)')) last +WITH example AS +(SELECT 'banana' as value, 'an' as subvalue, 1 as position, 1 as +occurrence UNION ALL +SELECT 'banana' as value, 'an' as subvalue, 1 as position, 2 as +occurrence UNION ALL +SELECT 'banana' as value, 'an' as subvalue, 1 as position, 3 as +occurrence UNION ALL +SELECT 'banana' as value, 'an' as subvalue, 3 as position, 1 as +occurrence UNION ALL +SELECT 'banana' as value, 'an' as subvalue, -1 as position, 1 as +occurrence UNION ALL +SELECT 'banana' as value, 'an' as subvalue, -3 as position, 1 as +occurrence UNION ALL +SELECT 'banana' as value, 'ann' as subvalue, 1 as position, 1 as +occurrence UNION ALL +SELECT 'helloooo' as value, 'oo' as subvalue, 1 as position, 1 as +occurrence UNION ALL +SELECT 'helloooo' as value, 'oo' as subvalue, 1 as position, 2 as +occurrence +) +SELECT value, subvalue, position, occurrence, INSTR(value, +subvalue, position, occurrence) AS instr +FROM example; -/*--------------* - | last | - +--------------+ - | POINT(3 3) | - *--------------*/ +/*--------------+--------------+----------+------------+-------* + | value | subvalue | position | occurrence | instr | + +--------------+--------------+----------+------------+-------+ + | banana | an | 1 | 1 | 2 | + | banana | an | 1 | 2 | 4 | + | banana | an | 1 | 3 | 0 | + | banana | an | 3 | 1 | 4 | + | banana | an | -1 | 1 | 4 | + | banana | an | -3 | 1 | 4 | + | banana | ann | 1 | 1 | 0 | + | helloooo | oo | 1 | 1 | 5 | + | helloooo | oo | 1 | 2 | 6 | + *--------------+--------------+----------+------------+-------*/ ``` -### `ST_EQUALS` +### `LEFT` ```sql -ST_EQUALS(geography_1, geography_2) +LEFT(value, length) ``` **Description** -Returns `TRUE` if `geography_1` and `geography_2` represent the same - -`GEOGRAPHY` value. More precisely, this means that -one of the following conditions holds: -+ `ST_COVERS(geography_1, geography_2) = TRUE` and `ST_COVERS(geography_2, - geography_1) = TRUE` -+ Both `geography_1` and `geography_2` are empty. - -Therefore, two `GEOGRAPHY`s may be equal even if the -ordering of points or vertices differ, as long as they still represent the same -geometric structure. - -**Constraints** +Returns a `STRING` or `BYTES` value that consists of the specified +number of leftmost characters or bytes from `value`. The `length` is an +`INT64` that specifies the length of the returned +value. If `value` is of type `BYTES`, `length` is the number of leftmost bytes +to return. If `value` is `STRING`, `length` is the number of leftmost characters +to return. -`ST_EQUALS` is not guaranteed to be a transitive function. +If `length` is 0, an empty `STRING` or `BYTES` value will be +returned. If `length` is negative, an error will be returned. If `length` +exceeds the number of characters or bytes from `value`, the original `value` +will be returned. **Return type** -`BOOL` +`STRING` or `BYTES` -### `ST_EXTENT` +**Examples** ```sql -ST_EXTENT(geography_expression) +WITH examples AS +(SELECT 'apple' as example +UNION ALL +SELECT 'banana' as example +UNION ALL +SELECT 'абвгд' as example +) +SELECT example, LEFT(example, 3) AS left_example +FROM examples; + +/*---------+--------------* + | example | left_example | + +---------+--------------+ + | apple | app | + | banana | ban | + | абвгд | абв | + *---------+--------------*/ ``` -**Description** +```sql +WITH examples AS +(SELECT b'apple' as example +UNION ALL +SELECT b'banana' as example +UNION ALL +SELECT b'\xab\xcd\xef\xaa\xbb' as example +) +SELECT example, LEFT(example, 3) AS left_example +FROM examples; -Returns a `STRUCT` that represents the bounding box for the set of input -`GEOGRAPHY` values. The bounding box is the minimal rectangle that encloses the -geography. The edges of the rectangle follow constant lines of longitude and -latitude. +/*----------------------+--------------* + | example | left_example | + +----------------------+--------------+ + | apple | app | + | banana | ban | + | \xab\xcd\xef\xaa\xbb | \xab\xcd\xef | + *----------------------+--------------*/ +``` -Caveats: +### `LENGTH` -+ Returns `NULL` if all the inputs are `NULL` or empty geographies. -+ The bounding box might cross the antimeridian if this allows for a smaller - rectangle. In this case, the bounding box has one of its longitudinal bounds - outside of the [-180, 180] range, so that `xmin` is smaller than the eastmost - value `xmax`. -+ If the longitude span of the bounding box is larger than or equal to 180 - degrees, the function returns the bounding box with the longitude range of - [-180, 180]. +```sql +LENGTH(value) +``` -**Return type** +**Description** -`STRUCT`. +Returns the length of the `STRING` or `BYTES` value. The returned +value is in characters for `STRING` arguments and in bytes for the `BYTES` +argument. -Bounding box parts: +**Return type** -+ `xmin`: The westmost constant longitude line that bounds the rectangle. -+ `xmax`: The eastmost constant longitude line that bounds the rectangle. -+ `ymin`: The minimum constant latitude line that bounds the rectangle. -+ `ymax`: The maximum constant latitude line that bounds the rectangle. +`INT64` -**Example** +**Examples** ```sql -WITH data AS ( - SELECT 1 id, ST_GEOGFROMTEXT('POLYGON((-125 48, -124 46, -117 46, -117 49, -125 48))') g - UNION ALL - SELECT 2 id, ST_GEOGFROMTEXT('POLYGON((172 53, -130 55, -141 70, 172 53))') g - UNION ALL - SELECT 3 id, ST_GEOGFROMTEXT('POINT EMPTY') g -) -SELECT ST_EXTENT(g) AS box -FROM data -/*----------------------------------------------* - | box | - +----------------------------------------------+ - | {xmin:172, ymin:46, xmax:243, ymax:70} | - *----------------------------------------------*/ -``` +WITH example AS + (SELECT 'абвгд' AS characters) -[`ST_BOUNDINGBOX`][st-boundingbox] for the non-aggregate version of `ST_EXTENT`. +SELECT + characters, + LENGTH(characters) AS string_example, + LENGTH(CAST(characters AS BYTES)) AS bytes_example +FROM example; -[st-boundingbox]: #st_boundingbox +/*------------+----------------+---------------* + | characters | string_example | bytes_example | + +------------+----------------+---------------+ + | абвгд | 5 | 10 | + *------------+----------------+---------------*/ +``` -### `ST_EXTERIORRING` +### `LOWER` ```sql -ST_EXTERIORRING(polygon_geography) +LOWER(value) ``` **Description** -Returns a linestring geography that corresponds to the outermost ring of a -polygon geography. - -+ If the input geography is a polygon, gets the outermost ring of the polygon - geography and returns the corresponding linestring. -+ If the input is the full `GEOGRAPHY`, returns an empty geography. -+ Returns an error if the input is not a single polygon. +For `STRING` arguments, returns the original string with all alphabetic +characters in lowercase. Mapping between lowercase and uppercase is done +according to the +[Unicode Character Database][string-link-to-unicode-character-definitions] +without taking into account language-specific mappings. -Use the `SAFE` prefix to return `NULL` for invalid input instead of an error. +For `BYTES` arguments, the argument is treated as ASCII text, with all bytes +greater than 127 left intact. **Return type** -+ Linestring `GEOGRAPHY` -+ Empty `GEOGRAPHY` +`STRING` or `BYTES` **Examples** ```sql -WITH geo as - (SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 4, 2 2, 0 0))') AS g UNION ALL - SELECT ST_GEOGFROMTEXT('''POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), - (2 2, 3 4, 2 4, 2 2))''') as g) -SELECT ST_EXTERIORRING(g) AS ring FROM geo; -/*---------------------------------------* - | ring | - +---------------------------------------+ - | LINESTRING(2 2, 1 4, 0 0, 2 2) | - | LINESTRING(5 1, 5 10, 1 10, 1 1, 5 1) | - *---------------------------------------*/ +WITH items AS + (SELECT + 'FOO' as item + UNION ALL + SELECT + 'BAR' as item + UNION ALL + SELECT + 'BAZ' as item) + +SELECT + LOWER(item) AS example +FROM items; + +/*---------* + | example | + +---------+ + | foo | + | bar | + | baz | + *---------*/ ``` -### `ST_GEOGFROM` +[string-link-to-unicode-character-definitions]: http://unicode.org/ucd/ + +### `LPAD` ```sql -ST_GEOGFROM(expression) +LPAD(original_value, return_length[, pattern]) ``` **Description** -Converts an expression for a `STRING` or `BYTES` value into a -`GEOGRAPHY` value. +Returns a `STRING` or `BYTES` value that consists of `original_value` prepended +with `pattern`. The `return_length` is an `INT64` that +specifies the length of the returned value. If `original_value` is of type +`BYTES`, `return_length` is the number of bytes. If `original_value` is +of type `STRING`, `return_length` is the number of characters. -If `expression` represents a `STRING` value, it must be a valid -`GEOGRAPHY` representation in one of the following formats: +The default value of `pattern` is a blank space. -+ WKT format. To learn more about this format and the requirements to use it, - see [ST_GEOGFROMTEXT][st-geogfromtext]. -+ WKB in hexadecimal text format. To learn more about this format and the - requirements to use it, see [ST_GEOGFROMWKB][st-geogfromwkb]. -+ GeoJSON format. To learn more about this format and the - requirements to use it, see [ST_GEOGFROMGEOJSON][st-geogfromgeojson]. +Both `original_value` and `pattern` must be the same data type. -If `expression` represents a `BYTES` value, it must be a valid `GEOGRAPHY` -binary expression in WKB format. To learn more about this format and the -requirements to use it, see [ST_GEOGFROMWKB][st-geogfromwkb]. +If `return_length` is less than or equal to the `original_value` length, this +function returns the `original_value` value, truncated to the value of +`return_length`. For example, `LPAD('hello world', 7);` returns `'hello w'`. -If `expression` is `NULL`, the output is `NULL`. +If `original_value`, `return_length`, or `pattern` is `NULL`, this function +returns `NULL`. + +This function returns an error if: + ++ `return_length` is negative ++ `pattern` is empty **Return type** -`GEOGRAPHY` +`STRING` or `BYTES` **Examples** -This takes a WKT-formatted string and returns a `GEOGRAPHY` polygon: - ```sql -SELECT ST_GEOGFROM('POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))') AS WKT_format +SELECT t, len, FORMAT('%T', LPAD(t, len)) AS LPAD FROM UNNEST([ + STRUCT('abc' AS t, 5 AS len), + ('abc', 2), + ('例å­', 4) +]); -/*------------------------------------* - | WKT_format | - +------------------------------------+ - | POLYGON((2 0, 2 2, 0 2, 0 0, 2 0)) | - *------------------------------------*/ +/*------+-----+----------* + | t | len | LPAD | + |------|-----|----------| + | abc | 5 | " abc" | + | abc | 2 | "ab" | + | ä¾‹å­ | 4 | " 例å­" | + *------+-----+----------*/ ``` -This takes a WKB-formatted hexadecimal-encoded string and returns a -`GEOGRAPHY` point: - ```sql -SELECT ST_GEOGFROM(FROM_HEX('010100000000000000000000400000000000001040')) AS WKB_format +SELECT t, len, pattern, FORMAT('%T', LPAD(t, len, pattern)) AS LPAD FROM UNNEST([ + STRUCT('abc' AS t, 8 AS len, 'def' AS pattern), + ('abc', 5, '-'), + ('例å­', 5, '中文') +]); -/*----------------* - | WKB_format | - +----------------+ - | POINT(2 4) | - *----------------*/ +/*------+-----+---------+--------------* + | t | len | pattern | LPAD | + |------|-----|---------|--------------| + | abc | 8 | def | "defdeabc" | + | abc | 5 | - | "--abc" | + | ä¾‹å­ | 5 | 中文 | "中文中例å­" | + *------+-----+---------+--------------*/ ``` -This takes WKB-formatted bytes and returns a `GEOGRAPHY` point: - ```sql -SELECT ST_GEOGFROM('010100000000000000000000400000000000001040')-AS WKB_format +SELECT FORMAT('%T', t) AS t, len, FORMAT('%T', LPAD(t, len)) AS LPAD FROM UNNEST([ + STRUCT(b'abc' AS t, 5 AS len), + (b'abc', 2), + (b'\xab\xcd\xef', 4) +]); -/*----------------* - | WKB_format | - +----------------+ - | POINT(2 4) | - *----------------*/ +/*-----------------+-----+------------------* + | t | len | LPAD | + |-----------------|-----|------------------| + | b"abc" | 5 | b" abc" | + | b"abc" | 2 | b"ab" | + | b"\xab\xcd\xef" | 4 | b" \xab\xcd\xef" | + *-----------------+-----+------------------*/ ``` -This takes a GeoJSON-formatted string and returns a `GEOGRAPHY` polygon: - ```sql -SELECT ST_GEOGFROM( - '{ "type": "Polygon", "coordinates": [ [ [2, 0], [2, 2], [1, 2], [0, 2], [0, 0], [2, 0] ] ] }' -) AS GEOJSON_format +SELECT + FORMAT('%T', t) AS t, + len, + FORMAT('%T', pattern) AS pattern, + FORMAT('%T', LPAD(t, len, pattern)) AS LPAD +FROM UNNEST([ + STRUCT(b'abc' AS t, 8 AS len, b'def' AS pattern), + (b'abc', 5, b'-'), + (b'\xab\xcd\xef', 5, b'\x00') +]); -/*-----------------------------------------* - | GEOJSON_format | - +-----------------------------------------+ - | POLYGON((2 0, 2 2, 1 2, 0 2, 0 0, 2 0)) | - *-----------------------------------------*/ +/*-----------------+-----+---------+-------------------------* + | t | len | pattern | LPAD | + |-----------------|-----|---------|-------------------------| + | b"abc" | 8 | b"def" | b"defdeabc" | + | b"abc" | 5 | b"-" | b"--abc" | + | b"\xab\xcd\xef" | 5 | b"\x00" | b"\x00\x00\xab\xcd\xef" | + *-----------------+-----+---------+-------------------------*/ ``` -[st-geogfromtext]: #st_geogfromtext - -[st-geogfromwkb]: #st_geogfromwkb - -[st-geogfromgeojson]: #st_geogfromgeojson - -### `ST_GEOGFROMGEOJSON` +### `LTRIM` ```sql -ST_GEOGFROMGEOJSON(geojson_string [, make_valid => constant_expression]) +LTRIM(value1[, value2]) ``` **Description** -Returns a `GEOGRAPHY` value that corresponds to the -input [GeoJSON][geojson-link] representation. - -`ST_GEOGFROMGEOJSON` accepts input that is [RFC 7946][geojson-spec-link] -compliant. - -If the parameter `make_valid` is set to `TRUE`, the function attempts to repair -polygons that don't conform to [Open Geospatial Consortium][ogc-link] semantics. -This parameter uses named argument syntax, and should be specified using -`make_valid => argument_value` syntax. - -A ZetaSQL `GEOGRAPHY` has spherical -geodesic edges, whereas a GeoJSON `Geometry` object explicitly has planar edges. -To convert between these two types of edges, ZetaSQL adds additional -points to the line where necessary so that the resulting sequence of edges -remains within 10 meters of the original edge. - -See [`ST_ASGEOJSON`][st-asgeojson] to format a -`GEOGRAPHY` as GeoJSON. - -**Constraints** +Identical to [TRIM][string-link-to-trim], but only removes leading characters. -The JSON input is subject to the following constraints: +**Return type** -+ `ST_GEOGFROMGEOJSON` only accepts JSON geometry fragments and cannot be used - to ingest a whole JSON document. -+ The input JSON fragment must consist of a GeoJSON geometry type, which - includes `Point`, `MultiPoint`, `LineString`, `MultiLineString`, `Polygon`, - `MultiPolygon`, and `GeometryCollection`. Any other GeoJSON type such as - `Feature` or `FeatureCollection` will result in an error. -+ A position in the `coordinates` member of a GeoJSON geometry type must - consist of exactly two elements. The first is the longitude and the second - is the latitude. Therefore, `ST_GEOGFROMGEOJSON` does not support the - optional third element for a position in the `coordinates` member. +`STRING` or `BYTES` -**Return type** +**Examples** -`GEOGRAPHY` +```sql +WITH items AS + (SELECT ' apple ' as item + UNION ALL + SELECT ' banana ' as item + UNION ALL + SELECT ' orange ' as item) -[geojson-link]: https://en.wikipedia.org/wiki/GeoJSON +SELECT + CONCAT('#', LTRIM(item), '#') as example +FROM items; -[geojson-spec-link]: https://tools.ietf.org/html/rfc7946 +/*-------------* + | example | + +-------------+ + | #apple # | + | #banana # | + | #orange # | + *-------------*/ +``` -[ogc-link]: https://www.ogc.org/standards/sfa +```sql +WITH items AS + (SELECT '***apple***' as item + UNION ALL + SELECT '***banana***' as item + UNION ALL + SELECT '***orange***' as item) -[st-asgeojson]: #st_asgeojson +SELECT + LTRIM(item, '*') as example +FROM items; -### `ST_GEOGFROMKML` +/*-----------* + | example | + +-----------+ + | apple*** | + | banana*** | + | orange*** | + *-----------*/ +``` ```sql -ST_GEOGFROMKML(kml_geometry) +WITH items AS + (SELECT 'xxxapplexxx' as item + UNION ALL + SELECT 'yyybananayyy' as item + UNION ALL + SELECT 'zzzorangezzz' as item + UNION ALL + SELECT 'xyzpearxyz' as item) ``` -Takes a `STRING` [KML geometry][kml-geometry-link] and returns a -`GEOGRAPHY`. The KML geomentry can include: - -+ Point with coordinates element only -+ Linestring with coordinates element only -+ Polygon with boundary elements only -+ Multigeometry - -[kml-geometry-link]: https://developers.google.com/kml/documentation/kmlreference#geometry +```sql +SELECT + LTRIM(item, 'xyz') as example +FROM items; -### `ST_GEOGFROMTEXT` +/*-----------* + | example | + +-----------+ + | applexxx | + | bananayyy | + | orangezzz | + | pearxyz | + *-----------*/ +``` -+ [Signature 1](#st_geogfromtext_signature1) -+ [Signature 2](#st_geogfromtext_signature2) +[string-link-to-trim]: #trim -#### Signature 1 - +### `NORMALIZE` ```sql -ST_GEOGFROMTEXT(wkt_string[, oriented]) +NORMALIZE(value[, normalization_mode]) ``` **Description** -Returns a `GEOGRAPHY` value that corresponds to the -input [WKT][wkt-link] representation. - -This function supports an optional parameter of type -`BOOL`, `oriented`. If this parameter is set to -`TRUE`, any polygons in the input are assumed to be oriented as follows: -if someone walks along the boundary of the polygon in the order of -the input vertices, the interior of the polygon is on the left. This allows -WKT to represent polygons larger than a hemisphere. If `oriented` is `FALSE` or -omitted, this function returns the polygon with the smaller area. -See also [`ST_MAKEPOLYGONORIENTED`][st-makepolygonoriented] which is similar -to `ST_GEOGFROMTEXT` with `oriented=TRUE`. +Takes a string value and returns it as a normalized string. If you do not +provide a normalization mode, `NFC` is used. -To format `GEOGRAPHY` as WKT, use -[`ST_ASTEXT`][st-astext]. +[Normalization][string-link-to-normalization-wikipedia] is used to ensure that +two strings are equivalent. Normalization is often used in situations in which +two strings render the same on the screen but have different Unicode code +points. -**Constraints** +`NORMALIZE` supports four optional normalization modes: -* All input edges are assumed to be spherical geodesics, and *not* planar - straight lines. For reading data in a planar projection, consider using - [`ST_GEOGFROMGEOJSON`][st-geogfromgeojson]. -* The function does not support three-dimensional geometries that have a `Z` - suffix, nor does it support linear referencing system geometries with an `M` - suffix. -* The function only supports geometry primitives and multipart geometries. In - particular it supports only point, multipoint, linestring, multilinestring, - polygon, multipolygon, and geometry collection. +| Value | Name | Description| +|---------|------------------------------------------------|------------| +| `NFC` | Normalization Form Canonical Composition | Decomposes and recomposes characters by canonical equivalence.| +| `NFKC` | Normalization Form Compatibility Composition | Decomposes characters by compatibility, then recomposes them by canonical equivalence.| +| `NFD` | Normalization Form Canonical Decomposition | Decomposes characters by canonical equivalence, and multiple combining characters are arranged in a specific order.| +| `NFKD` | Normalization Form Compatibility Decomposition | Decomposes characters by compatibility, and multiple combining characters are arranged in a specific order.| **Return type** -`GEOGRAPHY` +`STRING` -**Example** +**Examples** -The following query reads the WKT string `POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))` -both as a non-oriented polygon and as an oriented polygon, and checks whether -each result contains the point `(1, 1)`. +```sql +SELECT a, b, a = b as normalized +FROM (SELECT NORMALIZE('\u00ea') as a, NORMALIZE('\u0065\u0302') as b); + +/*---+---+------------* + | a | b | normalized | + +---+---+------------+ + | ê | ê | true | + *---+---+------------*/ +``` +The following example normalizes different space characters. ```sql -WITH polygon AS (SELECT 'POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))' AS p) +WITH EquivalentNames AS ( + SELECT name + FROM UNNEST([ + 'Jane\u2004Doe', + 'John\u2004Smith', + 'Jane\u2005Doe', + 'Jane\u2006Doe', + 'John Smith']) AS name +) SELECT - ST_CONTAINS(ST_GEOGFROMTEXT(p), ST_GEOGPOINT(1, 1)) AS fromtext_default, - ST_CONTAINS(ST_GEOGFROMTEXT(p, FALSE), ST_GEOGPOINT(1, 1)) AS non_oriented, - ST_CONTAINS(ST_GEOGFROMTEXT(p, TRUE), ST_GEOGPOINT(1, 1)) AS oriented -FROM polygon; + NORMALIZE(name, NFKC) AS normalized_name, + COUNT(*) AS name_count +FROM EquivalentNames +GROUP BY 1; -/*-------------------+---------------+-----------* - | fromtext_default | non_oriented | oriented | - +-------------------+---------------+-----------+ - | TRUE | TRUE | FALSE | - *-------------------+---------------+-----------*/ +/*-----------------+------------* + | normalized_name | name_count | + +-----------------+------------+ + | John Smith | 2 | + | Jane Doe | 3 | + *-----------------+------------*/ ``` -#### Signature 2 - +[string-link-to-normalization-wikipedia]: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization + +### `NORMALIZE_AND_CASEFOLD` ```sql -ST_GEOGFROMTEXT(wkt_string[, oriented => boolean_constant_1] - [, planar => boolean_constant_2] [, make_valid => boolean_constant_3]) +NORMALIZE_AND_CASEFOLD(value[, normalization_mode]) ``` **Description** -Returns a `GEOGRAPHY` value that corresponds to the -input [WKT][wkt-link] representation. - -This function supports three optional parameters of type -`BOOL`: `oriented`, `planar`, and `make_valid`. -This signature uses named arguments syntax, and the parameters should be -specified using `parameter_name => parameter_value` syntax, in any order. - -If the `oriented` parameter is set to -`TRUE`, any polygons in the input are assumed to be oriented as follows: -if someone walks along the boundary of the polygon in the order of -the input vertices, the interior of the polygon is on the left. This allows -WKT to represent polygons larger than a hemisphere. If `oriented` is `FALSE` or -omitted, this function returns the polygon with the smaller area. -See also [`ST_MAKEPOLYGONORIENTED`][st-makepolygonoriented] which is similar -to `ST_GEOGFROMTEXT` with `oriented=TRUE`. - -If the parameter `planar` is set to `TRUE`, the edges of the line strings and -polygons are assumed to use planar map semantics, rather than ZetaSQL -default spherical geodesics semantics. +Takes a string value and returns it as a normalized string. If you do not +provide a normalization mode, `NFC` is used. -If the parameter `make_valid` is set to `TRUE`, the function attempts to repair -polygons that don't conform to [Open Geospatial Consortium][ogc-link] semantics. +[Normalization][string-link-to-normalization-wikipedia] is used to ensure that +two strings are equivalent. Normalization is often used in situations in which +two strings render the same on the screen but have different Unicode code +points. -To format `GEOGRAPHY` as WKT, use -[`ST_ASTEXT`][st-astext]. +[Case folding][string-link-to-case-folding-wikipedia] is used for the caseless +comparison of strings. If you need to compare strings and case should not be +considered, use `NORMALIZE_AND_CASEFOLD`, otherwise use +[`NORMALIZE`][string-link-to-normalize]. -**Constraints** +`NORMALIZE_AND_CASEFOLD` supports four optional normalization modes: -* All input edges are assumed to be spherical geodesics by default, and *not* - planar straight lines. For reading data in a planar projection, - pass `planar => TRUE` argument, or consider using - [`ST_GEOGFROMGEOJSON`][st-geogfromgeojson]. -* The function does not support three-dimensional geometries that have a `Z` - suffix, nor does it support linear referencing system geometries with an `M` - suffix. -* The function only supports geometry primitives and multipart geometries. In - particular it supports only point, multipoint, linestring, multilinestring, - polygon, multipolygon, and geometry collection. -* `oriented` and `planar` cannot be equal to `TRUE` at the same time. -* `oriented` and `make_valid` cannot be equal to `TRUE` at the same time. +| Value | Name | Description| +|---------|------------------------------------------------|------------| +| `NFC` | Normalization Form Canonical Composition | Decomposes and recomposes characters by canonical equivalence.| +| `NFKC` | Normalization Form Compatibility Composition | Decomposes characters by compatibility, then recomposes them by canonical equivalence.| +| `NFD` | Normalization Form Canonical Decomposition | Decomposes characters by canonical equivalence, and multiple combining characters are arranged in a specific order.| +| `NFKD` | Normalization Form Compatibility Decomposition | Decomposes characters by compatibility, and multiple combining characters are arranged in a specific order.| -**Example** +**Return type** -The following query reads the WKT string `POLYGON((0 0, 0 2, 2 2, 0 2, 0 0))` -both as a non-oriented polygon and as an oriented polygon, and checks whether -each result contains the point `(1, 1)`. +`STRING` + +**Examples** ```sql -WITH polygon AS (SELECT 'POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))' AS p) SELECT - ST_CONTAINS(ST_GEOGFROMTEXT(p), ST_GEOGPOINT(1, 1)) AS fromtext_default, - ST_CONTAINS(ST_GEOGFROMTEXT(p, oriented => FALSE), ST_GEOGPOINT(1, 1)) AS non_oriented, - ST_CONTAINS(ST_GEOGFROMTEXT(p, oriented => TRUE), ST_GEOGPOINT(1, 1)) AS oriented -FROM polygon; + a, b, + NORMALIZE(a) = NORMALIZE(b) as normalized, + NORMALIZE_AND_CASEFOLD(a) = NORMALIZE_AND_CASEFOLD(b) as normalized_with_case_folding +FROM (SELECT 'The red barn' AS a, 'The Red Barn' AS b); -/*-------------------+---------------+-----------* - | fromtext_default | non_oriented | oriented | - +-------------------+---------------+-----------+ - | TRUE | TRUE | FALSE | - *-------------------+---------------+-----------*/ +/*--------------+--------------+------------+------------------------------* + | a | b | normalized | normalized_with_case_folding | + +--------------+--------------+------------+------------------------------+ + | The red barn | The Red Barn | false | true | + *--------------+--------------+------------+------------------------------*/ ``` -The following query converts a WKT string with an invalid polygon to -`GEOGRAPHY`. The WKT string violates two properties -of a valid polygon - the loop describing the polygon is not closed, and it -contains self-intersection. With the `make_valid` option, `ST_GEOGFROMTEXT` -successfully converts it to a multipolygon shape. - ```sql -WITH data AS ( - SELECT 'POLYGON((0 -1, 2 1, 2 -1, 0 1))' wkt) -SELECT - SAFE.ST_GEOGFROMTEXT(wkt) as geom, - SAFE.ST_GEOGFROMTEXT(wkt, make_valid => TRUE) as valid_geom -FROM data +WITH Strings AS ( + SELECT '\u2168' AS a, 'IX' AS b UNION ALL + SELECT '\u0041\u030A', '\u00C5' +) +SELECT a, b, + NORMALIZE_AND_CASEFOLD(a, NFD)=NORMALIZE_AND_CASEFOLD(b, NFD) AS nfd, + NORMALIZE_AND_CASEFOLD(a, NFC)=NORMALIZE_AND_CASEFOLD(b, NFC) AS nfc, + NORMALIZE_AND_CASEFOLD(a, NFKD)=NORMALIZE_AND_CASEFOLD(b, NFKD) AS nkfd, + NORMALIZE_AND_CASEFOLD(a, NFKC)=NORMALIZE_AND_CASEFOLD(b, NFKC) AS nkfc +FROM Strings; -/*------+-----------------------------------------------------------------* - | geom | valid_geom | - +------+-----------------------------------------------------------------+ - | NULL | MULTIPOLYGON(((0 -1, 1 0, 0 1, 0 -1)), ((1 0, 2 -1, 2 1, 1 0))) | - *------+-----------------------------------------------------------------*/ +/*---+----+-------+-------+------+------* + | a | b | nfd | nfc | nkfd | nkfc | + +---+----+-------+-------+------+------+ + | â…¨ | IX | false | false | true | true | + | AÌŠ | Ã… | true | true | true | true | + *---+----+-------+-------+------+------*/ ``` -[ogc-link]: https://www.ogc.org/standards/sfa - -[wkt-link]: https://en.wikipedia.org/wiki/Well-known_text - -[st-makepolygonoriented]: #st_makepolygonoriented +[string-link-to-normalization-wikipedia]: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization -[st-astext]: #st_astext +[string-link-to-case-folding-wikipedia]: https://en.wikipedia.org/wiki/Letter_case#Case_folding -[st-geogfromgeojson]: #st_geogfromgeojson +[string-link-to-normalize]: #normalize -### `ST_GEOGFROMWKB` +### `OCTET_LENGTH` ```sql -ST_GEOGFROMWKB(wkb_bytes_expression) +OCTET_LENGTH(value) ``` +Alias for [`BYTE_LENGTH`][byte-length]. + +[byte-length]: #byte_length + +### `REGEXP_CONTAINS` + ```sql -ST_GEOGFROMWKB(wkb_hex_string_expression) +REGEXP_CONTAINS(value, regexp) ``` **Description** -Converts an expression for a hexadecimal-text `STRING` or `BYTES` -value into a `GEOGRAPHY` value. The expression must be in -[WKB][wkb-link] format. +Returns `TRUE` if `value` is a partial match for the regular expression, +`regexp`. -To format `GEOGRAPHY` as WKB, use -[`ST_ASBINARY`][st-asbinary]. +If the `regexp` argument is invalid, the function returns an error. -**Constraints** +You can search for a full match by using `^` (beginning of text) and `$` (end of +text). Due to regular expression operator precedence, it is good practice to use +parentheses around everything between `^` and `$`. -All input edges are assumed to be spherical geodesics, and *not* planar straight -lines. For reading data in a planar projection, consider using -[`ST_GEOGFROMGEOJSON`][st-geogfromgeojson]. +Note: ZetaSQL provides regular expression support using the +[re2][string-link-to-re2] library; see that documentation for its +regular expression syntax. **Return type** -`GEOGRAPHY` - -[wkb-link]: https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary - -[st-asbinary]: #st_asbinary - -[st-geogfromgeojson]: #st_geogfromgeojson +`BOOL` -### `ST_GEOGPOINT` +**Examples** ```sql -ST_GEOGPOINT(longitude, latitude) -``` - -**Description** - -Creates a `GEOGRAPHY` with a single point. `ST_GEOGPOINT` creates a point from -the specified `DOUBLE` longitude (in degrees, -negative west of the Prime Meridian, positive east) and latitude (in degrees, -positive north of the Equator, negative south) parameters and returns that point -in a `GEOGRAPHY` value. - -NOTE: Some systems present latitude first; take care with argument order. +SELECT + email, + REGEXP_CONTAINS(email, r'@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') AS is_valid +FROM + (SELECT + ['foo@example.com', 'bar@example.org', 'www.example.net'] + AS addresses), + UNNEST(addresses) AS email; -**Constraints** +/*-----------------+----------* + | email | is_valid | + +-----------------+----------+ + | foo@example.com | true | + | bar@example.org | true | + | www.example.net | false | + *-----------------+----------*/ -+ Longitudes outside the range \[-180, 180\] are allowed; `ST_GEOGPOINT` uses - the input longitude modulo 360 to obtain a longitude within \[-180, 180\]. -+ Latitudes must be in the range \[-90, 90\]. Latitudes outside this range - will result in an error. +-- Performs a full match, using ^ and $. Due to regular expression operator +-- precedence, it is good practice to use parentheses around everything between ^ +-- and $. +SELECT + email, + REGEXP_CONTAINS(email, r'^([\w.+-]+@foo\.com|[\w.+-]+@bar\.org)$') + AS valid_email_address, + REGEXP_CONTAINS(email, r'^[\w.+-]+@foo\.com|[\w.+-]+@bar\.org$') + AS without_parentheses +FROM + (SELECT + ['a@foo.com', 'a@foo.computer', 'b@bar.org', '!b@bar.org', 'c@buz.net'] + AS addresses), + UNNEST(addresses) AS email; -**Return type** +/*----------------+---------------------+---------------------* + | email | valid_email_address | without_parentheses | + +----------------+---------------------+---------------------+ + | a@foo.com | true | true | + | a@foo.computer | false | true | + | b@bar.org | true | true | + | !b@bar.org | false | true | + | c@buz.net | false | false | + *----------------+---------------------+---------------------*/ +``` -Point `GEOGRAPHY` +[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax -### `ST_GEOGPOINTFROMGEOHASH` +### `REGEXP_EXTRACT` ```sql -ST_GEOGPOINTFROMGEOHASH(geohash) +REGEXP_EXTRACT(value, regexp) ``` **Description** -Returns a `GEOGRAPHY` value that corresponds to a -point in the middle of a bounding box defined in the [GeoHash][geohash-link]. +Returns the first substring in `value` that matches the +[re2 regular expression][string-link-to-re2], +`regexp`. Returns `NULL` if there is no match. -**Return type** +If the regular expression contains a capturing group (`(...)`), and there is a +match for that capturing group, that match is returned. If there +are multiple matches for a capturing group, the first match is returned. -Point `GEOGRAPHY` +Returns an error if: -[geohash-link]: https://en.wikipedia.org/wiki/Geohash ++ The regular expression is invalid ++ The regular expression has more than one capturing group -### `ST_GEOHASH` +**Return type** -```sql -ST_GEOHASH(geography_expression[, maxchars]) -``` +`STRING` or `BYTES` -**Description** +**Examples** -Takes a single-point `GEOGRAPHY` and returns a [GeoHash][geohash-link] -representation of that `GEOGRAPHY` object. +```sql +WITH email_addresses AS + (SELECT 'foo@example.com' as email + UNION ALL + SELECT 'bar@example.org' as email + UNION ALL + SELECT 'baz@example.net' as email) -+ `geography_expression`: Represents a `GEOGRAPHY` object. Only a `GEOGRAPHY` - object that represents a single point is supported. If `ST_GEOHASH` is used - over an empty `GEOGRAPHY` object, returns `NULL`. -+ `maxchars`: This optional `INT64` parameter specifies the maximum number of - characters the hash will contain. Fewer characters corresponds to lower - precision (or, described differently, to a bigger bounding box). `maxchars` - defaults to 20 if not explicitly specified. A valid `maxchars` value is 1 - to 20. Any value below or above is considered unspecified and the default of - 20 is used. +SELECT + REGEXP_EXTRACT(email, r'^[a-zA-Z0-9_.+-]+') + AS user_name +FROM email_addresses; -**Return type** +/*-----------* + | user_name | + +-----------+ + | foo | + | bar | + | baz | + *-----------*/ +``` -`STRING` +```sql +WITH email_addresses AS + (SELECT 'foo@example.com' as email + UNION ALL + SELECT 'bar@example.org' as email + UNION ALL + SELECT 'baz@example.net' as email) -**Example** +SELECT + REGEXP_EXTRACT(email, r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.([a-zA-Z0-9-.]+$)') + AS top_level_domain +FROM email_addresses; -Returns a GeoHash of the Seattle Center with 10 characters of precision. +/*------------------* + | top_level_domain | + +------------------+ + | com | + | org | + | net | + *------------------*/ +``` ```sql -SELECT ST_GEOHASH(ST_GEOGPOINT(-122.35, 47.62), 10) geohash +WITH + characters AS ( + SELECT 'ab' AS value, '.b' AS regex UNION ALL + SELECT 'ab' AS value, '(.)b' AS regex UNION ALL + SELECT 'xyztb' AS value, '(.)+b' AS regex UNION ALL + SELECT 'ab' AS value, '(z)?b' AS regex + ) +SELECT value, regex, REGEXP_EXTRACT(value, regex) AS result FROM characters; -/*--------------* - | geohash | - +--------------+ - | c22yzugqw7 | - *--------------*/ +/*-------+---------+----------* + | value | regex | result | + +-------+---------+----------+ + | ab | .b | ab | + | ab | (.)b | a | + | xyztb | (.)+b | t | + | ab | (z)?b | NULL | + *-------+---------+----------*/ ``` -[geohash-link]: https://en.wikipedia.org/wiki/Geohash +[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax -### `ST_GEOMETRYTYPE` +### `REGEXP_EXTRACT_ALL` ```sql -ST_GEOMETRYTYPE(geography_expression) +REGEXP_EXTRACT_ALL(value, regexp) ``` **Description** -Returns the [Open Geospatial Consortium][ogc-link] (OGC) geometry type that -describes the input `GEOGRAPHY`. The OGC geometry type matches the -types that are used in [WKT][wkt-link] and [GeoJSON][geojson-link] formats and -printed for [ST_ASTEXT][st-astext] and [ST_ASGEOJSON][st-asgeojson]. -`ST_GEOMETRYTYPE` returns the OGC geometry type with the "ST_" prefix. +Returns an array of all substrings of `value` that match the +[re2 regular expression][string-link-to-re2], `regexp`. Returns an empty array +if there is no match. -`ST_GEOMETRYTYPE` returns the following given the type on the input: +If the regular expression contains a capturing group (`(...)`), and there is a +match for that capturing group, that match is added to the results. -+ Single point geography: Returns `ST_Point`. -+ Collection of only points: Returns `ST_MultiPoint`. -+ Single linestring geography: Returns `ST_LineString`. -+ Collection of only linestrings: Returns `ST_MultiLineString`. -+ Single polygon geography: Returns `ST_Polygon`. -+ Collection of only polygons: Returns `ST_MultiPolygon`. -+ Collection with elements of different dimensions, or the input is the empty - geography: Returns `ST_GeometryCollection`. +The `REGEXP_EXTRACT_ALL` function only returns non-overlapping matches. For +example, using this function to extract `ana` from `banana` returns only one +substring, not two. -**Return type** +Returns an error if: -`STRING` ++ The regular expression is invalid ++ The regular expression has more than one capturing group -**Example** +**Return type** -The following example shows how `ST_GEOMETRYTYPE` takes geographies and returns -the names of their OGC geometry types. +`ARRAY` or `ARRAY` + +**Examples** ```sql -WITH example AS( - SELECT ST_GEOGFROMTEXT('POINT(0 1)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('MULTILINESTRING((2 2, 3 4), (5 6, 7 7))') - UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(MULTIPOINT(-1 2, 0 12), LINESTRING(-2 4, 0 6))') - UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')) +WITH code_markdown AS + (SELECT 'Try `function(x)` or `function(y)`' as code) + SELECT - geography AS WKT, - ST_GEOMETRYTYPE(geography) AS geometry_type_name -FROM example; + REGEXP_EXTRACT_ALL(code, '`(.+?)`') AS example +FROM code_markdown; -/*-------------------------------------------------------------------+-----------------------* - | WKT | geometry_type_name | - +-------------------------------------------------------------------+-----------------------+ - | POINT(0 1) | ST_Point | - | MULTILINESTRING((2 2, 3 4), (5 6, 7 7)) | ST_MultiLineString | - | GEOMETRYCOLLECTION(MULTIPOINT(-1 2, 0 12), LINESTRING(-2 4, 0 6)) | ST_GeometryCollection | - | GEOMETRYCOLLECTION EMPTY | ST_GeometryCollection | - *-------------------------------------------------------------------+-----------------------*/ +/*----------------------------* + | example | + +----------------------------+ + | [function(x), function(y)] | + *----------------------------*/ ``` -[ogc-link]: https://www.ogc.org/standards/sfa - -[wkt-link]: https://en.wikipedia.org/wiki/Well-known_text +[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax -[geojson-link]: https://en.wikipedia.org/wiki/GeoJSON +### `REGEXP_INSTR` -[st-astext]: #st_astext +```sql +REGEXP_INSTR(source_value, regexp [, position[, occurrence, [occurrence_position]]]) +``` -[st-asgeojson]: #st_asgeojson +**Description** -### `ST_HAUSDORFFDISTANCE` +Returns the lowest 1-based position of a regular expression, `regexp`, in +`source_value`. `source_value` and `regexp` must be the same type, either +`STRING` or `BYTES`. -```sql -ST_HAUSDORFFDISTANCE(geography_1, geography_2) -``` +If `position` is specified, the search starts at this position in +`source_value`, otherwise it starts at `1`, which is the beginning of +`source_value`. `position` is of type `INT64` and must be positive. -```sql -ST_HAUSDORFFDISTANCE(geography_1, geography_2, directed=>{ TRUE | FALSE }) -``` +If `occurrence` is specified, the search returns the position of a specific +instance of `regexp` in `source_value`. If not specified, `occurrence` defaults +to `1` and returns the position of the first occurrence. For `occurrence` > 1, +the function searches for the next, non-overlapping occurrence. +`occurrence` is of type `INT64` and must be positive. -**Description** +You can optionally use `occurrence_position` to specify where a position +in relation to an `occurrence` starts. Your choices are: -Gets the discrete [Hausdorff distance][h-distance], which is the greatest of all -the distances from a discrete point in one geography to the closest -discrete point in another geography. ++ `0`: Returns the start position of `occurrence`. ++ `1`: Returns the end position of `occurrence` + `1`. If the + end of the occurrence is at the end of `source_value `, + `LENGTH(source_value) + 1` is returned. -**Definitions** +Returns `0` if: -+ `geography_1`: A `GEOGRAPHY` value that represents the first geography. -+ `geography_2`: A `GEOGRAPHY` value that represents the second geography. -+ `directed`: Optional, required named argument that represents the type of - computation to use on the input geographies. If this argument is not - specified, `directed=>FALSE` is used by default. ++ No match is found. ++ If `occurrence` is greater than the number of matches found. ++ If `position` is greater than the length of `source_value`. ++ The regular expression is empty. - + `FALSE`: The largest Hausdorff distance found in - (`geography_1`, `geography_2`) and - (`geography_2`, `geography_1`). +Returns `NULL` if: - + `TRUE` (default): The Hausdorff distance for - (`geography_1`, `geography_2`). ++ `position` is `NULL`. ++ `occurrence` is `NULL`. -**Details** +Returns an error if: -If an input geography is `NULL`, the function returns `NULL`. ++ `position` is `0` or negative. ++ `occurrence` is `0` or negative. ++ `occurrence_position` is neither `0` nor `1`. ++ The regular expression is invalid. ++ The regular expression has more than one capturing group. **Return type** -`DOUBLE` - -**Example** +`INT64` -The following query gets the Hausdorff distance between `geo1` and `geo2`: +**Examples** ```sql -WITH data AS ( - SELECT - ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, - ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 -) -SELECT ST_HAUSDORFFDISTANCE(geo1, geo2, directed=>TRUE) AS distance -FROM data; +WITH example AS ( + SELECT 'ab@cd-ef' AS source_value, '@[^-]*' AS regexp UNION ALL + SELECT 'ab@d-ef', '@[^-]*' UNION ALL + SELECT 'abc@cd-ef', '@[^-]*' UNION ALL + SELECT 'abc-ef', '@[^-]*') +SELECT source_value, regexp, REGEXP_INSTR(source_value, regexp) AS instr +FROM example; -/*--------------------+ - | distance | - +--------------------+ - | 1688933.9832041925 | - +--------------------*/ +/*--------------+--------+-------* + | source_value | regexp | instr | + +--------------+--------+-------+ + | ab@cd-ef | @[^-]* | 3 | + | ab@d-ef | @[^-]* | 3 | + | abc@cd-ef | @[^-]* | 4 | + | abc-ef | @[^-]* | 0 | + *--------------+--------+-------*/ ``` -The following query gets the Hausdorff distance between `geo2` and `geo1`: - ```sql -WITH data AS ( - SELECT - ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, - ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 -) -SELECT ST_HAUSDORFFDISTANCE(geo2, geo1, directed=>TRUE) AS distance -FROM data; +WITH example AS ( + SELECT 'a@cd-ef b@cd-ef' AS source_value, '@[^-]*' AS regexp, 1 AS position UNION ALL + SELECT 'a@cd-ef b@cd-ef', '@[^-]*', 2 UNION ALL + SELECT 'a@cd-ef b@cd-ef', '@[^-]*', 3 UNION ALL + SELECT 'a@cd-ef b@cd-ef', '@[^-]*', 4) +SELECT + source_value, regexp, position, + REGEXP_INSTR(source_value, regexp, position) AS instr +FROM example; -/*--------------------+ - | distance | - +--------------------+ - | 5802892.745488612 | - +--------------------*/ +/*-----------------+--------+----------+-------* + | source_value | regexp | position | instr | + +-----------------+--------+----------+-------+ + | a@cd-ef b@cd-ef | @[^-]* | 1 | 2 | + | a@cd-ef b@cd-ef | @[^-]* | 2 | 2 | + | a@cd-ef b@cd-ef | @[^-]* | 3 | 10 | + | a@cd-ef b@cd-ef | @[^-]* | 4 | 10 | + *-----------------+--------+----------+-------*/ ``` -The following query gets the largest Hausdorff distance between -(`geo1` and `geo2`) and (`geo2` and `geo1`): - ```sql -WITH data AS ( - SELECT - ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, - ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 -) -SELECT ST_HAUSDORFFDISTANCE(geo1, geo2, directed=>FALSE) AS distance -FROM data; +WITH example AS ( + SELECT 'a@cd-ef b@cd-ef c@cd-ef' AS source_value, + '@[^-]*' AS regexp, 1 AS position, 1 AS occurrence UNION ALL + SELECT 'a@cd-ef b@cd-ef c@cd-ef', '@[^-]*', 1, 2 UNION ALL + SELECT 'a@cd-ef b@cd-ef c@cd-ef', '@[^-]*', 1, 3) +SELECT + source_value, regexp, position, occurrence, + REGEXP_INSTR(source_value, regexp, position, occurrence) AS instr +FROM example; -/*--------------------+ - | distance | - +--------------------+ - | 5802892.745488612 | - +--------------------*/ +/*-------------------------+--------+----------+------------+-------* + | source_value | regexp | position | occurrence | instr | + +-------------------------+--------+----------+------------+-------+ + | a@cd-ef b@cd-ef c@cd-ef | @[^-]* | 1 | 1 | 2 | + | a@cd-ef b@cd-ef c@cd-ef | @[^-]* | 1 | 2 | 10 | + | a@cd-ef b@cd-ef c@cd-ef | @[^-]* | 1 | 3 | 18 | + *-------------------------+--------+----------+------------+-------*/ ``` -The following query produces the same results as the previous query because -`ST_HAUSDORFFDISTANCE` uses `directed=>FALSE` by default. +```sql +WITH example AS ( + SELECT 'a@cd-ef' AS source_value, '@[^-]*' AS regexp, + 1 AS position, 1 AS occurrence, 0 AS o_position UNION ALL + SELECT 'a@cd-ef', '@[^-]*', 1, 1, 1) +SELECT + source_value, regexp, position, occurrence, o_position, + REGEXP_INSTR(source_value, regexp, position, occurrence, o_position) AS instr +FROM example; -```sql -WITH data AS ( - SELECT - ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1, - ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2 -) -SELECT ST_HAUSDORFFDISTANCE(geo1, geo2) AS distance -FROM data; +/*--------------+--------+----------+------------+------------+-------* + | source_value | regexp | position | occurrence | o_position | instr | + +--------------+--------+----------+------------+------------+-------+ + | a@cd-ef | @[^-]* | 1 | 1 | 0 | 2 | + | a@cd-ef | @[^-]* | 1 | 1 | 1 | 5 | + *--------------+--------+----------+------------+------------+-------*/ ``` -[h-distance]: http://en.wikipedia.org/wiki/Hausdorff_distance - -### `ST_INTERIORRINGS` +### `REGEXP_MATCH` (Deprecated) + ```sql -ST_INTERIORRINGS(polygon_geography) +REGEXP_MATCH(value, regexp) ``` **Description** -Returns an array of linestring geographies that corresponds to the interior -rings of a polygon geography. Each interior ring is the border of a hole within -the input polygon. +Returns `TRUE` if `value` is a full match for the regular expression, `regexp`. -+ If the input geography is a polygon, excludes the outermost ring of the - polygon geography and returns the linestrings corresponding to the interior - rings. -+ If the input is the full `GEOGRAPHY`, returns an empty array. -+ If the input polygon has no holes, returns an empty array. -+ Returns an error if the input is not a single polygon. +If the `regexp` argument is invalid, the function returns an error. -Use the `SAFE` prefix to return `NULL` for invalid input instead of an error. +This function is deprecated. When possible, use +[`REGEXP_CONTAINS`][regexp-contains] to find a partial match for a +regular expression. + +Note: ZetaSQL provides regular expression support using the +[re2][string-link-to-re2] library; see that documentation for its +regular expression syntax. **Return type** -`ARRAY` +`BOOL` **Examples** ```sql -WITH geo AS ( - SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 1, 1 2, 0 0))') AS g UNION ALL - SELECT ST_GEOGFROMTEXT('POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), (2 2, 3 4, 2 4, 2 2))') UNION ALL - SELECT ST_GEOGFROMTEXT('POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), (2 2.5, 3.5 3, 2.5 2, 2 2.5), (3.5 7, 4 6, 3 3, 3.5 7))') UNION ALL - SELECT ST_GEOGFROMTEXT('fullglobe') UNION ALL - SELECT NULL) -SELECT ST_INTERIORRINGS(g) AS rings FROM geo; +WITH email_addresses AS + (SELECT 'foo@example.com' as email + UNION ALL + SELECT 'bar@example.org' as email + UNION ALL + SELECT 'notavalidemailaddress' as email) -/*----------------------------------------------------------------------------* - | rings | - +----------------------------------------------------------------------------+ - | [] | - | [LINESTRING(2 2, 3 4, 2 4, 2 2)] | - | [LINESTRING(2.5 2, 3.5 3, 2 2.5, 2.5 2), LINESTRING(3 3, 4 6, 3.5 7, 3 3)] | - | [] | - | NULL | - *----------------------------------------------------------------------------*/ +SELECT + email, + REGEXP_MATCH(email, + r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') + AS valid_email_address +FROM email_addresses; + +/*-----------------------+---------------------* + | email | valid_email_address | + +-----------------------+---------------------+ + | foo@example.com | true | + | bar@example.org | true | + | notavalidemailaddress | false | + *-----------------------+---------------------*/ ``` -### `ST_INTERSECTION` +[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax + +[regexp-contains]: #regexp_contains + +### `REGEXP_REPLACE` ```sql -ST_INTERSECTION(geography_1, geography_2) +REGEXP_REPLACE(value, regexp, replacement) ``` **Description** -Returns a `GEOGRAPHY` that represents the point set -intersection of the two input `GEOGRAPHY`s. Thus, -every point in the intersection appears in both `geography_1` and `geography_2`. +Returns a `STRING` where all substrings of `value` that +match regular expression `regexp` are replaced with `replacement`. -If the two input `GEOGRAPHY`s are disjoint, that is, -there are no points that appear in both input `geometry_1` and `geometry_2`, -then an empty `GEOGRAPHY` is returned. +You can use backslashed-escaped digits (\1 to \9) within the `replacement` +argument to insert text matching the corresponding parenthesized group in the +`regexp` pattern. Use \0 to refer to the entire matching text. -See [ST_INTERSECTS][st-intersects], [ST_DISJOINT][st-disjoint] for related -predicate functions. +To add a backslash in your regular expression, you must first escape it. For +example, `SELECT REGEXP_REPLACE('abc', 'b(.)', 'X\\1');` returns `aXc`. You can +also use [raw strings][string-link-to-lexical-literals] to remove one layer of +escaping, for example `SELECT REGEXP_REPLACE('abc', 'b(.)', r'X\1');`. + +The `REGEXP_REPLACE` function only replaces non-overlapping matches. For +example, replacing `ana` within `banana` results in only one replacement, not +two. + +If the `regexp` argument is not a valid regular expression, this function +returns an error. + +Note: ZetaSQL provides regular expression support using the +[re2][string-link-to-re2] library; see that documentation for its +regular expression syntax. **Return type** -`GEOGRAPHY` +`STRING` or `BYTES` -[st-intersects]: #st_intersects +**Examples** -[st-disjoint]: #st_disjoint +```sql +WITH markdown AS + (SELECT '# Heading' as heading + UNION ALL + SELECT '# Another heading' as heading) -### `ST_INTERSECTS` +SELECT + REGEXP_REPLACE(heading, r'^# ([a-zA-Z0-9\s]+$)', '

\\1

') + AS html +FROM markdown; + +/*--------------------------* + | html | + +--------------------------+ + |

Heading

| + |

Another heading

| + *--------------------------*/ +``` + +[string-link-to-re2]: https://github.com/google/re2/wiki/Syntax + +[string-link-to-lexical-literals]: https://github.com/google/zetasql/blob/master/docs/lexical.md#string_and_bytes_literals + +### `REPEAT` ```sql -ST_INTERSECTS(geography_1, geography_2) +REPEAT(original_value, repetitions) ``` **Description** -Returns `TRUE` if the point set intersection of `geography_1` and `geography_2` -is non-empty. Thus, this function returns `TRUE` if there is at least one point -that appears in both input `GEOGRAPHY`s. +Returns a `STRING` or `BYTES` value that consists of `original_value`, repeated. +The `repetitions` parameter specifies the number of times to repeat +`original_value`. Returns `NULL` if either `original_value` or `repetitions` +are `NULL`. -If `ST_INTERSECTS` returns `TRUE`, it implies that [`ST_DISJOINT`][st-disjoint] -returns `FALSE`. +This function returns an error if the `repetitions` value is negative. **Return type** -`BOOL` - -[st-disjoint]: #st_disjoint +`STRING` or `BYTES` -### `ST_INTERSECTSBOX` +**Examples** ```sql -ST_INTERSECTSBOX(geography, lng1, lat1, lng2, lat2) +SELECT t, n, REPEAT(t, n) AS REPEAT FROM UNNEST([ + STRUCT('abc' AS t, 3 AS n), + ('例å­', 2), + ('abc', null), + (null, 3) +]); + +/*------+------+-----------* + | t | n | REPEAT | + |------|------|-----------| + | abc | 3 | abcabcabc | + | ä¾‹å­ | 2 | 例å­ä¾‹å­ | + | abc | NULL | NULL | + | NULL | 3 | NULL | + *------+------+-----------*/ ``` -**Description** +### `REPLACE` -Returns `TRUE` if `geography` intersects the rectangle between `[lng1, lng2]` -and `[lat1, lat2]`. The edges of the rectangle follow constant lines of -longitude and latitude. `lng1` and `lng2` specify the westmost and eastmost -constant longitude lines that bound the rectangle, and `lat1` and `lat2` specify -the minimum and maximum constant latitude lines that bound the rectangle. +```sql +REPLACE(original_value, from_pattern, to_pattern) +``` -Specify all longitude and latitude arguments in degrees. +**Description** -**Constraints** +Replaces all occurrences of `from_pattern` with `to_pattern` in +`original_value`. If `from_pattern` is empty, no replacement is made. -The input arguments are subject to the following constraints: +This function supports specifying [collation][collation]. -+ Latitudes should be in the `[-90, 90]` degree range. -+ Longitudes should follow either of the following rules: - + Both longitudes are in the `[-180, 180]` degree range. - + One of the longitudes is in the `[-180, 180]` degree range, and - `lng2 - lng1` is in the `[0, 360]` interval. +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about **Return type** -`BOOL` +`STRING` or `BYTES` -**Example** +**Examples** ```sql -SELECT p, ST_INTERSECTSBOX(p, -90, 0, 90, 20) AS box1, - ST_INTERSECTSBOX(p, 90, 0, -90, 20) AS box2 -FROM UNNEST([ST_GEOGPOINT(10, 10), ST_GEOGPOINT(170, 10), - ST_GEOGPOINT(30, 30)]) p +WITH desserts AS + (SELECT 'apple pie' as dessert + UNION ALL + SELECT 'blackberry pie' as dessert + UNION ALL + SELECT 'cherry pie' as dessert) -/*----------------+--------------+--------------* - | p | box1 | box2 | - +----------------+--------------+--------------+ - | POINT(10 10) | TRUE | FALSE | - | POINT(170 10) | FALSE | TRUE | - | POINT(30 30) | FALSE | FALSE | - *----------------+--------------+--------------*/ +SELECT + REPLACE (dessert, 'pie', 'cobbler') as example +FROM desserts; + +/*--------------------* + | example | + +--------------------+ + | apple cobbler | + | blackberry cobbler | + | cherry cobbler | + *--------------------*/ ``` -### `ST_ISCLOSED` +### `REVERSE` ```sql -ST_ISCLOSED(geography_expression) +REVERSE(value) ``` **Description** -Returns `TRUE` for a non-empty Geography, where each element in the Geography -has an empty boundary. The boundary for each element can be defined with -[`ST_BOUNDARY`][st-boundary]. - -+ A point is closed. -+ A linestring is closed if the start and end points of the linestring are - the same. -+ A polygon is closed only if it is a full polygon. -+ A collection is closed if and only if every element in the collection is - closed. - -An empty `GEOGRAPHY` is not closed. +Returns the reverse of the input `STRING` or `BYTES`. **Return type** -`BOOL` +`STRING` or `BYTES` -**Example** +**Examples** ```sql -WITH example AS( - SELECT ST_GEOGFROMTEXT('POINT(5 0)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('LINESTRING(0 1, 4 3, 2 6, 0 1)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('LINESTRING(2 6, 1 3, 3 9)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')) +WITH example AS ( + SELECT 'foo' AS sample_string, b'bar' AS sample_bytes UNION ALL + SELECT 'абвгд' AS sample_string, b'123' AS sample_bytes +) SELECT - geography, - ST_ISCLOSED(geography) AS is_closed, + sample_string, + REVERSE(sample_string) AS reverse_string, + sample_bytes, + REVERSE(sample_bytes) AS reverse_bytes FROM example; -/*------------------------------------------------------+-----------* - | geography | is_closed | - +------------------------------------------------------+-----------+ - | POINT(5 0) | TRUE | - | LINESTRING(0 1, 4 3, 2 6, 0 1) | TRUE | - | LINESTRING(2 6, 1 3, 3 9) | FALSE | - | GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1)) | FALSE | - | GEOMETRYCOLLECTION EMPTY | FALSE | - *------------------------------------------------------+-----------*/ +/*---------------+----------------+--------------+---------------* + | sample_string | reverse_string | sample_bytes | reverse_bytes | + +---------------+----------------+--------------+---------------+ + | foo | oof | bar | rab | + | абвгд | дгвба | 123 | 321 | + *---------------+----------------+--------------+---------------*/ ``` -[st-boundary]: #st_boundary - -### `ST_ISCOLLECTION` +### `RIGHT` ```sql -ST_ISCOLLECTION(geography_expression) +RIGHT(value, length) ``` **Description** -Returns `TRUE` if the total number of points, linestrings, and polygons is -greater than one. +Returns a `STRING` or `BYTES` value that consists of the specified +number of rightmost characters or bytes from `value`. The `length` is an +`INT64` that specifies the length of the returned +value. If `value` is `BYTES`, `length` is the number of rightmost bytes to +return. If `value` is `STRING`, `length` is the number of rightmost characters +to return. -An empty `GEOGRAPHY` is not a collection. +If `length` is 0, an empty `STRING` or `BYTES` value will be +returned. If `length` is negative, an error will be returned. If `length` +exceeds the number of characters or bytes from `value`, the original `value` +will be returned. **Return type** -`BOOL` +`STRING` or `BYTES` -### `ST_ISEMPTY` +**Examples** ```sql -ST_ISEMPTY(geography_expression) +WITH examples AS +(SELECT 'apple' as example +UNION ALL +SELECT 'banana' as example +UNION ALL +SELECT 'абвгд' as example +) +SELECT example, RIGHT(example, 3) AS right_example +FROM examples; + +/*---------+---------------* + | example | right_example | + +---------+---------------+ + | apple | ple | + | banana | ana | + | абвгд | вгд | + *---------+---------------*/ +``` + +```sql +WITH examples AS +(SELECT b'apple' as example +UNION ALL +SELECT b'banana' as example +UNION ALL +SELECT b'\xab\xcd\xef\xaa\xbb' as example +) +SELECT example, RIGHT(example, 3) AS right_example +FROM examples; + +/*----------------------+---------------* + | example | right_example | + +----------------------+---------------+ + | apple | ple | + | banana | ana | + | \xab\xcd\xef\xaa\xbb | \xef\xaa\xbb | + *----------------------+---------------* +``` + +### `RPAD` + +```sql +RPAD(original_value, return_length[, pattern]) ``` **Description** -Returns `TRUE` if the given `GEOGRAPHY` is empty; that is, the `GEOGRAPHY` does -not contain any points, lines, or polygons. +Returns a `STRING` or `BYTES` value that consists of `original_value` appended +with `pattern`. The `return_length` parameter is an +`INT64` that specifies the length of the +returned value. If `original_value` is `BYTES`, +`return_length` is the number of bytes. If `original_value` is `STRING`, +`return_length` is the number of characters. -NOTE: An empty `GEOGRAPHY` is not associated with a particular geometry shape. -For example, the results of expressions `ST_GEOGFROMTEXT('POINT EMPTY')` and -`ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')` are identical. +The default value of `pattern` is a blank space. + +Both `original_value` and `pattern` must be the same data type. + +If `return_length` is less than or equal to the `original_value` length, this +function returns the `original_value` value, truncated to the value of +`return_length`. For example, `RPAD('hello world', 7);` returns `'hello w'`. + +If `original_value`, `return_length`, or `pattern` is `NULL`, this function +returns `NULL`. + +This function returns an error if: + ++ `return_length` is negative ++ `pattern` is empty **Return type** -`BOOL` +`STRING` or `BYTES` -### `ST_ISRING` +**Examples** ```sql -ST_ISRING(geography_expression) +SELECT t, len, FORMAT('%T', RPAD(t, len)) AS RPAD FROM UNNEST([ + STRUCT('abc' AS t, 5 AS len), + ('abc', 2), + ('例å­', 4) +]); + +/*------+-----+----------* + | t | len | RPAD | + +------+-----+----------+ + | abc | 5 | "abc " | + | abc | 2 | "ab" | + | ä¾‹å­ | 4 | "ä¾‹å­ " | + *------+-----+----------*/ ``` -**Description** +```sql +SELECT t, len, pattern, FORMAT('%T', RPAD(t, len, pattern)) AS RPAD FROM UNNEST([ + STRUCT('abc' AS t, 8 AS len, 'def' AS pattern), + ('abc', 5, '-'), + ('例å­', 5, '中文') +]); -Returns `TRUE` if the input `GEOGRAPHY` is a linestring and if the -linestring is both [`ST_ISCLOSED`][st-isclosed] and -simple. A linestring is considered simple if it does not pass through the -same point twice (with the exception of the start and endpoint, which may -overlap to form a ring). +/*------+-----+---------+--------------* + | t | len | pattern | RPAD | + +------+-----+---------+--------------+ + | abc | 8 | def | "abcdefde" | + | abc | 5 | - | "abc--" | + | ä¾‹å­ | 5 | 中文 | "例å­ä¸­æ–‡ä¸­" | + *------+-----+---------+--------------*/ +``` -An empty `GEOGRAPHY` is not a ring. +```sql +SELECT FORMAT('%T', t) AS t, len, FORMAT('%T', RPAD(t, len)) AS RPAD FROM UNNEST([ + STRUCT(b'abc' AS t, 5 AS len), + (b'abc', 2), + (b'\xab\xcd\xef', 4) +]); -**Return type** +/*-----------------+-----+------------------* + | t | len | RPAD | + +-----------------+-----+------------------+ + | b"abc" | 5 | b"abc " | + | b"abc" | 2 | b"ab" | + | b"\xab\xcd\xef" | 4 | b"\xab\xcd\xef " | + *-----------------+-----+------------------*/ +``` -`BOOL` +```sql +SELECT + FORMAT('%T', t) AS t, + len, + FORMAT('%T', pattern) AS pattern, + FORMAT('%T', RPAD(t, len, pattern)) AS RPAD +FROM UNNEST([ + STRUCT(b'abc' AS t, 8 AS len, b'def' AS pattern), + (b'abc', 5, b'-'), + (b'\xab\xcd\xef', 5, b'\x00') +]); -[st-isclosed]: #st_isclosed +/*-----------------+-----+---------+-------------------------* + | t | len | pattern | RPAD | + +-----------------+-----+---------+-------------------------+ + | b"abc" | 8 | b"def" | b"abcdefde" | + | b"abc" | 5 | b"-" | b"abc--" | + | b"\xab\xcd\xef" | 5 | b"\x00" | b"\xab\xcd\xef\x00\x00" | + *-----------------+-----+---------+-------------------------*/ +``` -### `ST_LENGTH` +### `RTRIM` ```sql -ST_LENGTH(geography_expression[, use_spheroid]) +RTRIM(value1[, value2]) ``` **Description** -Returns the total length in meters of the lines in the input -`GEOGRAPHY`. +Identical to [TRIM][string-link-to-trim], but only removes trailing characters. -If `geography_expression` is a point or a polygon, returns zero. If -`geography_expression` is a collection, returns the length of the lines in the -collection; if the collection does not contain lines, returns zero. +**Return type** -The optional `use_spheroid` parameter determines how this function measures -distance. If `use_spheroid` is `FALSE`, the function measures distance on the -surface of a perfect sphere. +`STRING` or `BYTES` -The `use_spheroid` parameter currently only supports -the value `FALSE`. The default value of `use_spheroid` is `FALSE`. +**Examples** -**Return type** +```sql +WITH items AS + (SELECT '***apple***' as item + UNION ALL + SELECT '***banana***' as item + UNION ALL + SELECT '***orange***' as item) -`DOUBLE` +SELECT + RTRIM(item, '*') as example +FROM items; + +/*-----------* + | example | + +-----------+ + | ***apple | + | ***banana | + | ***orange | + *-----------*/ +``` + +```sql +WITH items AS + (SELECT 'applexxx' as item + UNION ALL + SELECT 'bananayyy' as item + UNION ALL + SELECT 'orangezzz' as item + UNION ALL + SELECT 'pearxyz' as item) + +SELECT + RTRIM(item, 'xyz') as example +FROM items; + +/*---------* + | example | + +---------+ + | apple | + | banana | + | orange | + | pear | + *---------*/ +``` -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +[string-link-to-trim]: #trim -### `ST_LINELOCATEPOINT` +### `SAFE_CONVERT_BYTES_TO_STRING` ```sql -ST_LINELOCATEPOINT(linestring_geography, point_geography) +SAFE_CONVERT_BYTES_TO_STRING(value) ``` **Description** -Gets a section of a linestring between the start point and a selected point (a -point on the linestring closest to the `point_geography` argument). Returns the -percentage that this section represents in the linestring. - -Details: - -+ To select a point on the linestring `GEOGRAPHY` (`linestring_geography`), - this function takes a point `GEOGRAPHY` (`point_geography`) and finds the - [closest point][st-closestpoint] to it on the linestring. -+ If two points on `linestring_geography` are an equal distance away from - `point_geography`, it is not guaranteed which one will be selected. -+ The return value is an inclusive value between 0 and 1 (0-100%). -+ If the selected point is the start point on the linestring, function returns - 0 (0%). -+ If the selected point is the end point on the linestring, function returns 1 - (100%). - -`NULL` and error handling: - -+ Returns `NULL` if any input argument is `NULL`. -+ Returns an error if `linestring_geography` is not a linestring or if - `point_geography` is not a point. Use the `SAFE` prefix - to obtain `NULL` for invalid input instead of an error. +Converts a sequence of `BYTES` to a `STRING`. Any invalid UTF-8 characters are +replaced with the Unicode replacement character, `U+FFFD`. -**Return Type** +**Return type** -`DOUBLE` +`STRING` **Examples** -```sql -WITH geos AS ( - SELECT ST_GEOGPOINT(0, 0) AS point UNION ALL - SELECT ST_GEOGPOINT(1, 0) UNION ALL - SELECT ST_GEOGPOINT(1, 1) UNION ALL - SELECT ST_GEOGPOINT(2, 2) UNION ALL - SELECT ST_GEOGPOINT(3, 3) UNION ALL - SELECT ST_GEOGPOINT(4, 4) UNION ALL - SELECT ST_GEOGPOINT(5, 5) UNION ALL - SELECT ST_GEOGPOINT(6, 5) UNION ALL - SELECT NULL - ) -SELECT - point AS input_point, - ST_LINELOCATEPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 5 5)'), point) - AS percentage_from_beginning -FROM geos +The following statement returns the Unicode replacement character, �. -/*-------------+---------------------------* - | input_point | percentage_from_beginning | - +-------------+---------------------------+ - | POINT(0 0) | 0 | - | POINT(1 0) | 0 | - | POINT(1 1) | 0 | - | POINT(2 2) | 0.25015214685147907 | - | POINT(3 3) | 0.5002284283637185 | - | POINT(4 4) | 0.7501905913884388 | - | POINT(5 5) | 1 | - | POINT(6 5) | 1 | - | NULL | NULL | - *-------------+---------------------------*/ +```sql +SELECT SAFE_CONVERT_BYTES_TO_STRING(b'\xc2') as safe_convert; ``` -[st-closestpoint]: #st_closestpoint - -### `ST_LINESUBSTRING` +### `SOUNDEX` ```sql -ST_LINESUBSTRING(linestring_geography, start_fraction, end_fraction); +SOUNDEX(value) ``` **Description** -Gets a segment of a linestring at a specific starting and ending fraction. - -**Definitions** - -+ `linestring_geography`: The LineString `GEOGRAPHY` value that represents the - linestring from which to extract a segment. -+ `start_fraction`: `DOUBLE` value that represents - the starting fraction of the total length of `linestring_geography`. - This must be an inclusive value between 0 and 1 (0-100%). -+ `end_fraction`: `DOUBLE` value that represents - the ending fraction of the total length of `linestring_geography`. - This must be an inclusive value between 0 and 1 (0-100%). - -**Details** +Returns a `STRING` that represents the +[Soundex][string-link-to-soundex-wikipedia] code for `value`. -`end_fraction` must be greater than or equal to `start_fraction`. +SOUNDEX produces a phonetic representation of a string. It indexes words by +sound, as pronounced in English. It is typically used to help determine whether +two strings, such as the family names _Levine_ and _Lavine_, or the words _to_ +and _too_, have similar English-language pronunciation. -If `start_fraction` and `end_fraction` are equal, a linestring with only -one point is produced. +The result of the SOUNDEX consists of a letter followed by 3 digits. Non-latin +characters are ignored. If the remaining string is empty after removing +non-Latin characters, an empty `STRING` is returned. **Return type** -+ LineString `GEOGRAPHY` if the resulting geography has more than one point. -+ Point `GEOGRAPHY` if the resulting geography has only one point. - -**Example** - -The following query returns the second half of the linestring: - -```sql -WITH data AS ( - SELECT ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1 -) -SELECT ST_LINESUBSTRING(geo1, 0.5, 1) AS segment -FROM data; - -/*-------------------------------------------------------------+ - | segment | - +-------------------------------------------------------------+ - | LINESTRING(49.4760661523471 67.2419539103851, 10 70, 70 70) | - +-------------------------------------------------------------*/ -``` +`STRING` -The following query returns a linestring that only contains one point: +**Examples** ```sql -WITH data AS ( - SELECT ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1 +WITH example AS ( + SELECT 'Ashcraft' AS value UNION ALL + SELECT 'Raven' AS value UNION ALL + SELECT 'Ribbon' AS value UNION ALL + SELECT 'apple' AS value UNION ALL + SELECT 'Hello world!' AS value UNION ALL + SELECT ' H3##!@llo w00orld!' AS value UNION ALL + SELECT '#1' AS value UNION ALL + SELECT NULL AS value ) -SELECT ST_LINESUBSTRING(geo1, 0.5, 0.5) AS segment -FROM data; +SELECT value, SOUNDEX(value) AS soundex +FROM example; -/*------------------------------------------+ - | segment | - +------------------------------------------+ - | POINT(49.4760661523471 67.2419539103851) | - +------------------------------------------*/ +/*----------------------+---------* + | value | soundex | + +----------------------+---------+ + | Ashcraft | A261 | + | Raven | R150 | + | Ribbon | R150 | + | apple | a140 | + | Hello world! | H464 | + | H3##!@llo w00orld! | H464 | + | #1 | | + | NULL | NULL | + *----------------------+---------*/ ``` -### `ST_MAKELINE` +[string-link-to-soundex-wikipedia]: https://en.wikipedia.org/wiki/Soundex -```sql -ST_MAKELINE(geography_1, geography_2) -``` +### `SPLIT` ```sql -ST_MAKELINE(array_of_geography) +SPLIT(value[, delimiter]) ``` **Description** -Creates a `GEOGRAPHY` with a single linestring by -concatenating the point or line vertices of each of the input -`GEOGRAPHY`s in the order they are given. +Splits `value` using the `delimiter` argument. -`ST_MAKELINE` comes in two variants. For the first variant, input must be two -`GEOGRAPHY`s. For the second, input must be an `ARRAY` of type `GEOGRAPHY`. In -either variant, each input `GEOGRAPHY` must consist of one of the following -values: +For `STRING`, the default delimiter is the comma `,`. -+ Exactly one point. -+ Exactly one linestring. +For `BYTES`, you must specify a delimiter. -For the first variant of `ST_MAKELINE`, if either input `GEOGRAPHY` is `NULL`, -`ST_MAKELINE` returns `NULL`. For the second variant, if input `ARRAY` or any -element in the input `ARRAY` is `NULL`, `ST_MAKELINE` returns `NULL`. +Splitting on an empty delimiter produces an array of UTF-8 characters for +`STRING` values, and an array of `BYTES` for `BYTES` values. -**Constraints** +Splitting an empty `STRING` returns an +`ARRAY` with a single empty +`STRING`. -Every edge must span strictly less than 180 degrees. +This function supports specifying [collation][collation]. -NOTE: The ZetaSQL snapping process may discard sufficiently short -edges and snap the two endpoints together. For instance, if two input -`GEOGRAPHY`s each contain a point and the two points are separated by a distance -less than the snap radius, the points will be snapped together. In such a case -the result will be a `GEOGRAPHY` with exactly one point. +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about **Return type** -LineString `GEOGRAPHY` +`ARRAY` or `ARRAY` -### `ST_MAKEPOLYGON` +**Examples** ```sql -ST_MAKEPOLYGON(polygon_shell[, array_of_polygon_holes]) -``` - -**Description** - -Creates a `GEOGRAPHY` containing a single polygon -from linestring inputs, where each input linestring is used to construct a -polygon ring. +WITH letters AS + (SELECT '' as letter_group + UNION ALL + SELECT 'a' as letter_group + UNION ALL + SELECT 'b c d' as letter_group) -`ST_MAKEPOLYGON` comes in two variants. For the first variant, the input -linestring is provided by a single `GEOGRAPHY` containing exactly one -linestring. For the second variant, the input consists of a single `GEOGRAPHY` -and an array of `GEOGRAPHY`s, each containing exactly one linestring. +SELECT SPLIT(letter_group, ' ') as example +FROM letters; -The first `GEOGRAPHY` in either variant is used to construct the polygon shell. -Additional `GEOGRAPHY`s provided in the input `ARRAY` specify a polygon hole. -For every input `GEOGRAPHY` containing exactly one linestring, the following -must be true: +/*----------------------* + | example | + +----------------------+ + | [] | + | [a] | + | [b, c, d] | + *----------------------*/ +``` -+ The linestring must consist of at least three distinct vertices. -+ The linestring must be closed: that is, the first and last vertex have to be - the same. If the first and last vertex differ, the function constructs a - final edge from the first vertex to the last. +### `STARTS_WITH` -For the first variant of `ST_MAKEPOLYGON`, if either input `GEOGRAPHY` is -`NULL`, `ST_MAKEPOLYGON` returns `NULL`. For the second variant, if -input `ARRAY` or any element in the `ARRAY` is `NULL`, `ST_MAKEPOLYGON` returns -`NULL`. +```sql +STARTS_WITH(value, prefix) +``` -NOTE: `ST_MAKEPOLYGON` accepts an empty `GEOGRAPHY` as input. `ST_MAKEPOLYGON` -interprets an empty `GEOGRAPHY` as having an empty linestring, which will -create a full loop: that is, a polygon that covers the entire Earth. +**Description** -**Constraints** +Takes two `STRING` or `BYTES` values. Returns `TRUE` if `prefix` is a +prefix of `value`. -Together, the input rings must form a valid polygon: +This function supports specifying [collation][collation]. -+ The polygon shell must cover each of the polygon holes. -+ There can be only one polygon shell (which has to be the first input ring). - This implies that polygon holes cannot be nested. -+ Polygon rings may only intersect in a vertex on the boundary of both rings. +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about -Every edge must span strictly less than 180 degrees. +**Return type** -Each polygon ring divides the sphere into two regions. The first input linesting -to `ST_MAKEPOLYGON` forms the polygon shell, and the interior is chosen to be -the smaller of the two regions. Each subsequent input linestring specifies a -polygon hole, so the interior of the polygon is already well-defined. In order -to define a polygon shell such that the interior of the polygon is the larger of -the two regions, see [`ST_MAKEPOLYGONORIENTED`][st-makepolygonoriented]. +`BOOL` -NOTE: The ZetaSQL snapping process may discard sufficiently -short edges and snap the two endpoints together. Hence, when vertices are -snapped together, it is possible that a polygon hole that is sufficiently small -may disappear, or the output `GEOGRAPHY` may contain only a line or a -point. +**Examples** -**Return type** +```sql +WITH items AS + (SELECT 'foo' as item + UNION ALL + SELECT 'bar' as item + UNION ALL + SELECT 'baz' as item) -`GEOGRAPHY` +SELECT + STARTS_WITH(item, 'b') as example +FROM items; -[st-makepolygonoriented]: #st_makepolygonoriented +/*---------* + | example | + +---------+ + | False | + | True | + | True | + *---------*/ +``` -### `ST_MAKEPOLYGONORIENTED` +### `STRPOS` ```sql -ST_MAKEPOLYGONORIENTED(array_of_geography) +STRPOS(value, subvalue) ``` **Description** -Like `ST_MAKEPOLYGON`, but the vertex ordering of each input linestring -determines the orientation of each polygon ring. The orientation of a polygon -ring defines the interior of the polygon as follows: if someone walks along the -boundary of the polygon in the order of the input vertices, the interior of the -polygon is on the left. This applies for each polygon ring provided. +Takes two `STRING` or `BYTES` values. Returns the 1-based position of the first +occurrence of `subvalue` inside `value`. Returns `0` if `subvalue` is not found. -This variant of the polygon constructor is more flexible since -`ST_MAKEPOLYGONORIENTED` can construct a polygon such that the interior is on -either side of the polygon ring. However, proper orientation of polygon rings is -critical in order to construct the desired polygon. +This function supports specifying [collation][collation]. -If the input `ARRAY` or any element in the `ARRAY` is `NULL`, -`ST_MAKEPOLYGONORIENTED` returns `NULL`. +[collation]: https://github.com/google/zetasql/blob/master/docs/collation-concepts.md#collate_about -NOTE: The input argument for `ST_MAKEPOLYGONORIENTED` may contain an empty -`GEOGRAPHY`. `ST_MAKEPOLYGONORIENTED` interprets an empty `GEOGRAPHY` as having -an empty linestring, which will create a full loop: that is, a polygon that -covers the entire Earth. +**Return type** -**Constraints** +`INT64` -Together, the input rings must form a valid polygon: +**Examples** -+ The polygon shell must cover each of the polygon holes. -+ There must be only one polygon shell, which must to be the first input ring. - This implies that polygon holes cannot be nested. -+ Polygon rings may only intersect in a vertex on the boundary of both rings. +```sql +WITH email_addresses AS + (SELECT + 'foo@example.com' AS email_address + UNION ALL + SELECT + 'foobar@example.com' AS email_address + UNION ALL + SELECT + 'foobarbaz@example.com' AS email_address + UNION ALL + SELECT + 'quxexample.com' AS email_address) -Every edge must span strictly less than 180 degrees. +SELECT + STRPOS(email_address, '@') AS example +FROM email_addresses; -`ST_MAKEPOLYGONORIENTED` relies on the ordering of the input vertices of each -linestring to determine the orientation of the polygon. This applies to the -polygon shell and any polygon holes. `ST_MAKEPOLYGONORIENTED` expects all -polygon holes to have the opposite orientation of the shell. See -[`ST_MAKEPOLYGON`][st-makepolygon] for an alternate polygon constructor, and -other constraints on building a valid polygon. +/*---------* + | example | + +---------+ + | 4 | + | 7 | + | 10 | + | 0 | + *---------*/ +``` -NOTE: Due to the ZetaSQL snapping process, edges with a sufficiently -short length will be discarded and the two endpoints will be snapped to a single -point. Therefore, it is possible that vertices in a linestring may be snapped -together such that one or more edge disappears. Hence, it is possible that a -polygon hole that is sufficiently small may disappear, or the resulting -`GEOGRAPHY` may contain only a line or a point. +### `SUBSTR` -**Return type** +```sql +SUBSTR(value, position[, length]) +``` -`GEOGRAPHY` +**Description** -[st-makepolygon]: #st_makepolygon +Gets a portion (substring) of the supplied `STRING` or `BYTES` value. -### `ST_MAXDISTANCE` +The `position` argument is an integer specifying the starting position of the +substring. -```sql -ST_MAXDISTANCE(geography_1, geography_2[, use_spheroid]) -``` ++ If `position` is `1`, the substring starts from the first character or byte. ++ If `position` is `0` or less than `-LENGTH(value)`, `position` is set to `1`, + and the substring starts from the first character or byte. ++ If `position` is greater than the length of `value`, the function produces + an empty substring. ++ If `position` is negative, the function counts from the end of `value`, + with `-1` indicating the last character or byte. -Returns the longest distance in meters between two non-empty -`GEOGRAPHY`s; that is, the distance between two -vertices where the first vertex is in the first -`GEOGRAPHY`, and the second vertex is in the second -`GEOGRAPHY`. If `geography_1` and `geography_2` are the -same `GEOGRAPHY`, the function returns the distance -between the two most distant vertices in that -`GEOGRAPHY`. +The `length` argument specifies the maximum number of characters or bytes to +return. -If either of the input `GEOGRAPHY`s is empty, -`ST_MAXDISTANCE` returns `NULL`. ++ If `length` is not specified, the function produces a substring that starts + at the specified position and ends at the last character or byte of `value`. ++ If `length` is `0`, the function produces an empty substring. ++ If `length` is negative, the function produces an error. ++ The returned substring may be shorter than `length`, for example, when + `length` exceeds the length of `value`, or when the starting position of the + substring plus `length` is greater than the length of `value`. -The optional `use_spheroid` parameter determines how this function measures -distance. If `use_spheroid` is `FALSE`, the function measures distance on the -surface of a perfect sphere. +**Return type** -The `use_spheroid` parameter currently only supports -the value `FALSE`. The default value of `use_spheroid` is `FALSE`. +`STRING` or `BYTES` -**Return type** +**Examples** -`DOUBLE` +```sql +WITH items AS + (SELECT 'apple' as item + UNION ALL + SELECT 'banana' as item + UNION ALL + SELECT 'orange' as item) -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +SELECT + SUBSTR(item, 2) as example +FROM items; -### `ST_NPOINTS` +/*---------* + | example | + +---------+ + | pple | + | anana | + | range | + *---------*/ +``` ```sql -ST_NPOINTS(geography_expression) -``` +WITH items AS + (SELECT 'apple' as item + UNION ALL + SELECT 'banana' as item + UNION ALL + SELECT 'orange' as item) -**Description** +SELECT + SUBSTR(item, 2, 2) as example +FROM items; -An alias of [ST_NUMPOINTS][st-numpoints]. +/*---------* + | example | + +---------+ + | pp | + | an | + | ra | + *---------*/ +``` -[st-numpoints]: #st_numpoints +```sql +WITH items AS + (SELECT 'apple' as item + UNION ALL + SELECT 'banana' as item + UNION ALL + SELECT 'orange' as item) -### `ST_NUMGEOMETRIES` +SELECT + SUBSTR(item, -2) as example +FROM items; -``` -ST_NUMGEOMETRIES(geography_expression) +/*---------* + | example | + +---------+ + | le | + | na | + | ge | + *---------*/ ``` -**Description** +```sql +WITH items AS + (SELECT 'apple' as item + UNION ALL + SELECT 'banana' as item + UNION ALL + SELECT 'orange' as item) -Returns the number of geometries in the input `GEOGRAPHY`. For a single point, -linestring, or polygon, `ST_NUMGEOMETRIES` returns `1`. For any collection of -geometries, `ST_NUMGEOMETRIES` returns the number of geometries making up the -collection. `ST_NUMGEOMETRIES` returns `0` if the input is the empty -`GEOGRAPHY`. +SELECT + SUBSTR(item, 1, 123) as example +FROM items; -**Return type** +/*---------* + | example | + +---------+ + | apple | + | banana | + | orange | + *---------*/ +``` -`INT64` +```sql +WITH items AS + (SELECT 'apple' as item + UNION ALL + SELECT 'banana' as item + UNION ALL + SELECT 'orange' as item) -**Example** +SELECT + SUBSTR(item, 123) as example +FROM items; -The following example computes `ST_NUMGEOMETRIES` for a single point geography, -two collections, and an empty geography. +/*---------* + | example | + +---------+ + | | + | | + | | + *---------*/ +``` ```sql -WITH example AS( - SELECT ST_GEOGFROMTEXT('POINT(5 0)') AS geography - UNION ALL - SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 1, 4 3, 2 6)') AS geography +WITH items AS + (SELECT 'apple' as item UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography + SELECT 'banana' as item UNION ALL - SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')) + SELECT 'orange' as item) + SELECT - geography, - ST_NUMGEOMETRIES(geography) AS num_geometries, -FROM example; + SUBSTR(item, 123, 5) as example +FROM items; -/*------------------------------------------------------+----------------* - | geography | num_geometries | - +------------------------------------------------------+----------------+ - | POINT(5 0) | 1 | - | MULTIPOINT(0 1, 4 3, 2 6) | 3 | - | GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1)) | 2 | - | GEOMETRYCOLLECTION EMPTY | 0 | - *------------------------------------------------------+----------------*/ +/*---------* + | example | + +---------+ + | | + | | + | | + *---------*/ ``` -### `ST_NUMPOINTS` +### `SUBSTRING` ```sql -ST_NUMPOINTS(geography_expression) +SUBSTRING(value, position[, length]) ``` -**Description** - -Returns the number of vertices in the input -`GEOGRAPHY`. This includes the number of points, the -number of linestring vertices, and the number of polygon vertices. - -NOTE: The first and last vertex of a polygon ring are counted as distinct -vertices. - -**Return type** +Alias for [`SUBSTR`][substr]. -`INT64` +[substr]: #substr -### `ST_PERIMETER` +### `TO_BASE32` ```sql -ST_PERIMETER(geography_expression[, use_spheroid]) +TO_BASE32(bytes_expr) ``` **Description** -Returns the length in meters of the boundary of the polygons in the input -`GEOGRAPHY`. +Converts a sequence of `BYTES` into a base32-encoded `STRING`. To convert a +base32-encoded `STRING` into `BYTES`, use [FROM_BASE32][string-link-to-from-base32]. -If `geography_expression` is a point or a line, returns zero. If -`geography_expression` is a collection, returns the perimeter of the polygons -in the collection; if the collection does not contain polygons, returns zero. +**Return type** -The optional `use_spheroid` parameter determines how this function measures -distance. If `use_spheroid` is `FALSE`, the function measures distance on the -surface of a perfect sphere. +`STRING` -The `use_spheroid` parameter currently only supports -the value `FALSE`. The default value of `use_spheroid` is `FALSE`. +**Example** -**Return type** +```sql +SELECT TO_BASE32(b'abcde\xFF') AS base32_string; -`DOUBLE` +/*------------------* + | base32_string | + +------------------+ + | MFRGGZDF74====== | + *------------------*/ +``` -[wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +[string-link-to-from-base32]: #from_base32 -### `ST_POINTN` +### `TO_BASE64` ```sql -ST_POINTN(linestring_geography, index) +TO_BASE64(bytes_expr) ``` **Description** -Returns the Nth point of a linestring geography as a point geography, where N is -the index. The index is 1-based. Negative values are counted backwards from the -end of the linestring, so that -1 is the last point. Returns an error if the -input is not a linestring, if the input is empty, or if there is no vertex at -the given index. Use the `SAFE` prefix to obtain `NULL` for invalid input -instead of an error. +Converts a sequence of `BYTES` into a base64-encoded `STRING`. To convert a +base64-encoded `STRING` into `BYTES`, use [FROM_BASE64][string-link-to-from-base64]. -**Return Type** +There are several base64 encodings in common use that vary in exactly which +alphabet of 65 ASCII characters are used to encode the 64 digits and padding. +See [RFC 4648][RFC-4648] for details. This +function adds padding and uses the alphabet `[A-Za-z0-9+/=]`. -Point `GEOGRAPHY` +**Return type** + +`STRING` **Example** -The following example uses `ST_POINTN`, [`ST_STARTPOINT`][st-startpoint] and -[`ST_ENDPOINT`][st-endpoint] to extract points from a linestring. +```sql +SELECT TO_BASE64(b'\377\340') AS base64_string; + +/*---------------* + | base64_string | + +---------------+ + | /+A= | + *---------------*/ +``` + +To work with an encoding using a different base64 alphabet, you might need to +compose `TO_BASE64` with the `REPLACE` function. For instance, the +`base64url` url-safe and filename-safe encoding commonly used in web programming +uses `-_=` as the last characters rather than `+/=`. To encode a +`base64url`-encoded string, replace `+` and `/` with `-` and `_` respectively. ```sql -WITH linestring AS ( - SELECT ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)') g -) -SELECT ST_POINTN(g, 1) AS first, ST_POINTN(g, -1) AS last, - ST_POINTN(g, 2) AS second, ST_POINTN(g, -2) AS second_to_last -FROM linestring; +SELECT REPLACE(REPLACE(TO_BASE64(b'\377\340'), '+', '-'), '/', '_') as websafe_base64; -/*--------------+--------------+--------------+----------------* - | first | last | second | second_to_last | - +--------------+--------------+--------------+----------------+ - | POINT(1 1) | POINT(3 3) | POINT(2 1) | POINT(3 2) | - *--------------+--------------+--------------+----------------*/ +/*----------------* + | websafe_base64 | + +----------------+ + | _-A= | + *----------------*/ ``` -[st-startpoint]: #st_startpoint +[string-link-to-from-base64]: #from_base64 -[st-endpoint]: #st_endpoint +[RFC-4648]: https://tools.ietf.org/html/rfc4648#section-4 -### `ST_SIMPLIFY` +### `TO_CODE_POINTS` ```sql -ST_SIMPLIFY(geography, tolerance_meters) +TO_CODE_POINTS(value) ``` **Description** -Returns a simplified version of `geography`, the given input -`GEOGRAPHY`. The input `GEOGRAPHY` is simplified by replacing nearly straight -chains of short edges with a single long edge. The input `geography` will not -change by more than the tolerance specified by `tolerance_meters`. Thus, -simplified edges are guaranteed to pass within `tolerance_meters` of the -*original* positions of all vertices that were removed from that edge. The given -`tolerance_meters` is in meters on the surface of the Earth. - -Note that `ST_SIMPLIFY` preserves topological relationships, which means that -no new crossing edges will be created and the output will be valid. For a large -enough tolerance, adjacent shapes may collapse into a single object, or a shape -could be simplified to a shape with a smaller dimension. - -**Constraints** - -For `ST_SIMPLIFY` to have any effect, `tolerance_meters` must be non-zero. +Takes a `STRING` or `BYTES` value and returns an array of `INT64` values that +represent code points or extended ASCII character values. -`ST_SIMPLIFY` returns an error if the tolerance specified by `tolerance_meters` -is one of the following: ++ If `value` is a `STRING`, each element in the returned array represents a + [code point][string-link-to-code-points-wikipedia]. Each code point falls + within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. ++ If `value` is `BYTES`, each element in the array is an extended ASCII + character value in the range of [0, 255]. -+ A negative tolerance. -+ Greater than ~7800 kilometers. +To convert from an array of code points to a `STRING` or `BYTES`, see +[CODE_POINTS_TO_STRING][string-link-to-codepoints-to-string] or +[CODE_POINTS_TO_BYTES][string-link-to-codepoints-to-bytes]. **Return type** -`GEOGRAPHY` +`ARRAY` **Examples** -The following example shows how `ST_SIMPLIFY` simplifies the input line -`GEOGRAPHY` by removing intermediate vertices. +The following example gets the code points for each element in an array of +words. ```sql -WITH example AS - (SELECT ST_GEOGFROMTEXT('LINESTRING(0 0, 0.05 0, 0.1 0, 0.15 0, 2 0)') AS line) -SELECT - line AS original_line, - ST_SIMPLIFY(line, 1) AS simplified_line -FROM example; +SELECT word, TO_CODE_POINTS(word) AS code_points +FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word; -/*---------------------------------------------+----------------------* - | original_line | simplified_line | - +---------------------------------------------+----------------------+ - | LINESTRING(0 0, 0.05 0, 0.1 0, 0.15 0, 2 0) | LINESTRING(0 0, 2 0) | - *---------------------------------------------+----------------------*/ +/*---------+------------------------------------* + | word | code_points | + +---------+------------------------------------+ + | foo | [102, 111, 111] | + | bar | [98, 97, 114] | + | baz | [98, 97, 122] | + | giraffe | [103, 105, 114, 97, 102, 102, 101] | + | llama | [108, 108, 97, 109, 97] | + *---------+------------------------------------*/ ``` -The following example illustrates how the result of `ST_SIMPLIFY` can have a -lower dimension than the original shape. +The following example converts integer representations of `BYTES` to their +corresponding ASCII character values. ```sql -WITH example AS - (SELECT - ST_GEOGFROMTEXT('POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0))') AS polygon, - t AS tolerance - FROM UNNEST([1000, 10000, 100000]) AS t) -SELECT - polygon AS original_triangle, - tolerance AS tolerance_meters, - ST_SIMPLIFY(polygon, tolerance) AS simplified_result -FROM example +SELECT word, TO_CODE_POINTS(word) AS bytes_value_as_integer +FROM UNNEST([b'\x00\x01\x10\xff', b'\x66\x6f\x6f']) AS word; -/*-------------------------------------+------------------+-------------------------------------* - | original_triangle | tolerance_meters | simplified_result | - +-------------------------------------+------------------+-------------------------------------+ - | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 1000 | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | - | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 10000 | LINESTRING(0 0, 0.1 0.1) | - | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 100000 | POINT(0 0) | - *-------------------------------------+------------------+-------------------------------------*/ +/*------------------+------------------------* + | word | bytes_value_as_integer | + +------------------+------------------------+ + | \x00\x01\x10\xff | [0, 1, 16, 255] | + | foo | [102, 111, 111] | + *------------------+------------------------*/ ``` -### `ST_SNAPTOGRID` +The following example demonstrates the difference between a `BYTES` result and a +`STRING` result. ```sql -ST_SNAPTOGRID(geography_expression, grid_size) -``` - -**Description** +SELECT TO_CODE_POINTS(b'Ä€') AS b_result, TO_CODE_POINTS('Ä€') AS s_result; -Returns the input `GEOGRAPHY`, where each vertex has -been snapped to a longitude/latitude grid. The grid size is determined by the -`grid_size` parameter which is given in degrees. +/*------------+----------* + | b_result | s_result | + +------------+----------+ + | [196, 128] | [256] | + *------------+----------*/ +``` -**Constraints** +Notice that the character, Ä€, is represented as a two-byte Unicode sequence. As +a result, the `BYTES` version of `TO_CODE_POINTS` returns an array with two +elements, while the `STRING` version returns an array with a single element. -Arbitrary grid sizes are not supported. The `grid_size` parameter is rounded so -that it is of the form `10^n`, where `-10 < n < 0`. +[string-link-to-code-points-wikipedia]: https://en.wikipedia.org/wiki/Code_point -**Return type** +[string-link-to-codepoints-to-string]: #code_points_to_string -`GEOGRAPHY` +[string-link-to-codepoints-to-bytes]: #code_points_to_bytes -### `ST_STARTPOINT` +### `TO_HEX` ```sql -ST_STARTPOINT(linestring_geography) +TO_HEX(bytes) ``` **Description** -Returns the first point of a linestring geography as a point geography. Returns -an error if the input is not a linestring or if the input is empty. Use the -`SAFE` prefix to obtain `NULL` for invalid input instead of an error. +Converts a sequence of `BYTES` into a hexadecimal `STRING`. Converts each byte +in the `STRING` as two hexadecimal characters in the range +`(0..9, a..f)`. To convert a hexadecimal-encoded +`STRING` to `BYTES`, use [FROM_HEX][string-link-to-from-hex]. -**Return Type** +**Return type** -Point `GEOGRAPHY` +`STRING` **Example** ```sql -SELECT ST_STARTPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)')) first +WITH Input AS ( + SELECT b'\x00\x01\x02\x03\xAA\xEE\xEF\xFF' AS byte_str UNION ALL + SELECT b'foobar' +) +SELECT byte_str, TO_HEX(byte_str) AS hex_str +FROM Input; -/*--------------* - | first | - +--------------+ - | POINT(1 1) | - *--------------*/ +/*----------------------------------+------------------* + | byte_string | hex_string | + +----------------------------------+------------------+ + | \x00\x01\x02\x03\xaa\xee\xef\xff | 00010203aaeeefff | + | foobar | 666f6f626172 | + *----------------------------------+------------------*/ ``` -### `ST_TOUCHES` +[string-link-to-from-hex]: #from_hex + +### `TRANSLATE` ```sql -ST_TOUCHES(geography_1, geography_2) +TRANSLATE(expression, source_characters, target_characters) ``` **Description** -Returns `TRUE` provided the following two conditions are satisfied: +In `expression`, replaces each character in `source_characters` with the +corresponding character in `target_characters`. All inputs must be the same +type, either `STRING` or `BYTES`. -1. `geography_1` intersects `geography_2`. -1. The interior of `geography_1` and the interior of `geography_2` are - disjoint. ++ Each character in `expression` is translated at most once. ++ A character in `expression` that is not present in `source_characters` is left + unchanged in `expression`. ++ A character in `source_characters` without a corresponding character in + `target_characters` is omitted from the result. ++ A duplicate character in `source_characters` results in an error. **Return type** -`BOOL` +`STRING` or `BYTES` -### `ST_UNION` +**Examples** ```sql -ST_UNION(geography_1, geography_2) +WITH example AS ( + SELECT 'This is a cookie' AS expression, 'sco' AS source_characters, 'zku' AS + target_characters UNION ALL + SELECT 'A coaster' AS expression, 'co' AS source_characters, 'k' as + target_characters +) +SELECT expression, source_characters, target_characters, TRANSLATE(expression, +source_characters, target_characters) AS translate +FROM example; + +/*------------------+-------------------+-------------------+------------------* + | expression | source_characters | target_characters | translate | + +------------------+-------------------+-------------------+------------------+ + | This is a cookie | sco | zku | Thiz iz a kuukie | + | A coaster | co | k | A kaster | + *------------------+-------------------+-------------------+------------------*/ ``` +### `TRIM` + ```sql -ST_UNION(array_of_geography) +TRIM(value_to_trim[, set_of_characters_to_remove]) ``` **Description** -Returns a `GEOGRAPHY` that represents the point set -union of all input `GEOGRAPHY`s. - -`ST_UNION` comes in two variants. For the first variant, input must be two -`GEOGRAPHY`s. For the second, the input is an -`ARRAY` of type `GEOGRAPHY`. +Takes a `STRING` or `BYTES` value to trim. -For the first variant of `ST_UNION`, if an input -`GEOGRAPHY` is `NULL`, `ST_UNION` returns `NULL`. -For the second variant, if the input `ARRAY` value -is `NULL`, `ST_UNION` returns `NULL`. -For a non-`NULL` input `ARRAY`, the union is computed -and `NULL` elements are ignored so that they do not affect the output. +If the value to trim is a `STRING`, removes from this value all leading and +trailing Unicode code points in `set_of_characters_to_remove`. +The set of code points is optional. If it is not specified, all +whitespace characters are removed from the beginning and end of the +value to trim. -See [`ST_UNION_AGG`][st-union-agg] for the aggregate version of `ST_UNION`. +If the value to trim is `BYTES`, removes from this value all leading and +trailing bytes in `set_of_characters_to_remove`. The set of bytes is required. **Return type** -`GEOGRAPHY` ++ `STRING` if `value_to_trim` is a `STRING` value. ++ `BYTES` if `value_to_trim` is a `BYTES` value. -**Example** +**Examples** + +In the following example, all leading and trailing whitespace characters are +removed from `item` because `set_of_characters_to_remove` is not specified. ```sql -SELECT ST_UNION( - ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)'), - ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -100.19 47.69)') -) AS results +WITH items AS + (SELECT ' apple ' as item + UNION ALL + SELECT ' banana ' as item + UNION ALL + SELECT ' orange ' as item) -/*---------------------------------------------------------* - | results | - +---------------------------------------------------------+ - | LINESTRING(-100.19 47.69, -122.12 47.67, -122.19 47.69) | - *---------------------------------------------------------*/ -``` +SELECT + CONCAT('#', TRIM(item), '#') as example +FROM items; -[st-union-agg]: #st_union_agg +/*----------* + | example | + +----------+ + | #apple# | + | #banana# | + | #orange# | + *----------*/ +``` -### `ST_UNION_AGG` +In the following example, all leading and trailing `*` characters are removed +from `item`. ```sql -ST_UNION_AGG(geography) -``` +WITH items AS + (SELECT '***apple***' as item + UNION ALL + SELECT '***banana***' as item + UNION ALL + SELECT '***orange***' as item) -**Description** +SELECT + TRIM(item, '*') as example +FROM items; -Returns a `GEOGRAPHY` that represents the point set -union of all input `GEOGRAPHY`s. +/*---------* + | example | + +---------+ + | apple | + | banana | + | orange | + *---------*/ +``` -`ST_UNION_AGG` ignores `NULL` input `GEOGRAPHY` values. +In the following example, all leading and trailing `x`, `y`, and `z` characters +are removed from `item`. -See [`ST_UNION`][st-union] for the non-aggregate version of `ST_UNION_AGG`. +```sql +WITH items AS + (SELECT 'xxxapplexxx' as item + UNION ALL + SELECT 'yyybananayyy' as item + UNION ALL + SELECT 'zzzorangezzz' as item + UNION ALL + SELECT 'xyzpearxyz' as item) -**Return type** +SELECT + TRIM(item, 'xyz') as example +FROM items; -`GEOGRAPHY` +/*---------* + | example | + +---------+ + | apple | + | banana | + | orange | + | pear | + *---------*/ +``` -**Example** +In the following example, examine how `TRIM` interprets characters as +Unicode code-points. If your trailing character set contains a combining +diacritic mark over a particular letter, `TRIM` might strip the +same diacritic mark from a different letter. ```sql -SELECT ST_UNION_AGG(items) AS results -FROM UNNEST([ - ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)'), - ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -100.19 47.69)'), - ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)')]) as items; +SELECT + TRIM('abaWÌŠ', 'YÌŠ') AS a, + TRIM('WÌŠaba', 'YÌŠ') AS b, + TRIM('abaŪ̊', 'YÌŠ') AS c, + TRIM('Ū̊aba', 'YÌŠ') AS d; -/*---------------------------------------------------------* - | results | - +---------------------------------------------------------+ - | LINESTRING(-100.19 47.69, -122.12 47.67, -122.19 47.69) | - *---------------------------------------------------------*/ +/*------+------+------+------* + | a | b | c | d | + +------+------+------+------+ + | abaW | WÌŠaba | abaŪ | Ūaba | + *------+------+------+------*/ ``` -[st-union]: #st_union - -### `ST_WITHIN` +In the following example, all leading and trailing `b'n'`, `b'a'`, `b'\xab'` +bytes are removed from `item`. ```sql -ST_WITHIN(geography_1, geography_2) -``` - -**Description** - -Returns `TRUE` if no point of `geography_1` is outside of `geography_2` and -the interiors of `geography_1` and `geography_2` intersect. - -Given two geographies `a` and `b`, `ST_WITHIN(a, b)` returns the same result -as [`ST_CONTAINS`][st-contains]`(b, a)`. Note the opposite order of arguments. - -**Return type** - -`BOOL` +WITH items AS +( + SELECT b'apple' as item UNION ALL + SELECT b'banana' as item UNION ALL + SELECT b'\xab\xcd\xef\xaa\xbb' as item +) +SELECT item, TRIM(item, b'na\xab') AS examples +FROM items; -[st-contains]: #st_contains +/*----------------------+------------------* + | item | example | + +----------------------+------------------+ + | apple | pple | + | banana | b | + | \xab\xcd\xef\xaa\xbb | \xcd\xef\xaa\xbb | + *----------------------+------------------*/ +``` -### `ST_X` +### `UNICODE` ```sql -ST_X(point_geography_expression) +UNICODE(value) ``` **Description** -Returns the longitude in degrees of the single-point input -`GEOGRAPHY`. - -For any input `GEOGRAPHY` that is not a single point, -including an empty `GEOGRAPHY`, `ST_X` returns an -error. Use the `SAFE.` prefix to obtain `NULL`. +Returns the Unicode [code point][string-code-point] for the first character in +`value`. Returns `0` if `value` is empty, or if the resulting Unicode code +point is `0`. **Return type** -`DOUBLE` - -**Example** +`INT64` -The following example uses `ST_X` and `ST_Y` to extract coordinates from -single-point geographies. +**Examples** ```sql -WITH points AS - (SELECT ST_GEOGPOINT(i, i + 1) AS p FROM UNNEST([0, 5, 12]) AS i) - SELECT - p, - ST_X(p) as longitude, - ST_Y(p) as latitude -FROM points; +SELECT UNICODE('âbcd') as A, UNICODE('â') as B, UNICODE('') as C, UNICODE(NULL) as D; -/*--------------+-----------+----------* - | p | longitude | latitude | - +--------------+-----------+----------+ - | POINT(0 1) | 0.0 | 1.0 | - | POINT(5 6) | 5.0 | 6.0 | - | POINT(12 13) | 12.0 | 13.0 | - *--------------+-----------+----------*/ +/*-------+-------+-------+-------* + | A | B | C | D | + +-------+-------+-------+-------+ + | 226 | 226 | 0 | NULL | + *-------+-------+-------+-------*/ ``` -### `ST_Y` +[string-code-point]: https://en.wikipedia.org/wiki/Code_point + +### `UPPER` ```sql -ST_Y(point_geography_expression) +UPPER(value) ``` **Description** -Returns the latitude in degrees of the single-point input -`GEOGRAPHY`. +For `STRING` arguments, returns the original string with all alphabetic +characters in uppercase. Mapping between uppercase and lowercase is done +according to the +[Unicode Character Database][string-link-to-unicode-character-definitions] +without taking into account language-specific mappings. -For any input `GEOGRAPHY` that is not a single point, -including an empty `GEOGRAPHY`, `ST_Y` returns an -error. Use the `SAFE.` prefix to return `NULL` instead. +For `BYTES` arguments, the argument is treated as ASCII text, with all bytes +greater than 127 left intact. **Return type** -`DOUBLE` +`STRING` or `BYTES` -**Example** +**Examples** -See [`ST_X`][st-x] for example usage. +```sql +WITH items AS + (SELECT + 'foo' as item + UNION ALL + SELECT + 'bar' as item + UNION ALL + SELECT + 'baz' as item) -[st-x]: #st_x +SELECT + UPPER(item) AS example +FROM items; -## Protocol buffer functions +/*---------* + | example | + +---------+ + | FOO | + | BAR | + | BAZ | + *---------*/ +``` -ZetaSQL supports the following protocol buffer functions. +[string-link-to-unicode-character-definitions]: http://unicode.org/ucd/ + +[string-link-to-strpos]: #strpos + +## Time functions + +ZetaSQL supports the following time functions. ### Function list @@ -35021,1203 +37480,919 @@ ZetaSQL supports the following protocol buffer functions. - CONTAINS_KEY + CURRENT_TIME - Checks if a protocol buffer map field contains a given key. + Returns the current time as a TIME value. - EXTRACT + EXTRACT - Extracts a value or metadata from a protocol buffer. + Extracts part of a TIME value. - FILTER_FIELDS + FORMAT_TIME - Removed unwanted fields from a protocol buffer. + Formats a TIME value according to the specified format string. - FROM_PROTO + PARSE_TIME - Converts a protocol buffer value into ZetaSQL value. + Converts a STRING value to a TIME value. - MODIFY_MAP + TIME - Modifies a protocol buffer map field. + Constructs a TIME value. - PROTO_DEFAULT_IF_NULL + TIME_ADD - Produces the default protocol buffer field value if the - protocol buffer field is NULL. Otherwise, returns the - protocol buffer field value. + Adds a specified time interval to a TIME value. - REPLACE_FIELDS + TIME_DIFF - Replaces the values in one or more protocol buffer fields. + Gets the number of intervals between two TIME values. - TO_PROTO + TIME_SUB - Converts a ZetaSQL value into a protocol buffer value. + Subtracts a specified time interval from a TIME value. + + + + + TIME_TRUNC + + + + Truncates a TIME value. -### `CONTAINS_KEY` +### `CURRENT_TIME` ```sql -CONTAINS_KEY(proto_map_field_expression, key) +CURRENT_TIME([time_zone]) ``` -**Description** - -Returns whether a [protocol buffer map field][proto-map] contains a given key. - -Input values: - -+ `proto_map_field_expression`: A protocol buffer map field. -+ `key`: A key in the protocol buffer map field. - -`NULL` handling: +```sql +CURRENT_TIME +``` -+ If `map_field` is `NULL`, returns `NULL`. -+ If `key` is `NULL`, returns `FALSE`. +**Description** -**Return type** +Returns the current time as a `TIME` object. Parentheses are optional when +called with no arguments. -`BOOL` +This function supports an optional `time_zone` parameter. +See [Time zone definitions][time-link-to-timezone-definitions] for information +on how to specify a time zone. -**Examples** +The current time is recorded at the start of the query +statement which contains this function, not when this specific function is +evaluated. -To illustrate the use of this function, consider the protocol buffer message -`Item`: +**Return Data Type** -```proto -message Item { - optional map purchased = 1; -}; -``` +`TIME` -In the following example, the function returns `TRUE` when the key is -present, `FALSE` otherwise. +**Example** ```sql -SELECT - CONTAINS_KEY(m.purchased, 'A') AS contains_a, - CONTAINS_KEY(m.purchased, 'B') AS contains_b -FROM - (SELECT AS VALUE CAST("purchased { key: 'A' value: 2 }" AS Item)) AS m; +SELECT CURRENT_TIME() as now; -/*------------+------------* - | contains_a | contains_b | - +------------+------------+ - | TRUE | FALSE | - *------------+------------*/ +/*----------------------------* + | now | + +----------------------------+ + | 15:31:38.776361 | + *----------------------------*/ ``` -[proto-map]: https://developers.google.com/protocol-buffers/docs/proto3#maps - -### `EXTRACT` - +When a column named `current_time` is present, the column name and the function +call without parentheses are ambiguous. To ensure the function call, add +parentheses; to ensure the column name, qualify it with its +[range variable][time-functions-link-to-range-variables]. For example, the +following query will select the function in the `now` column and the table +column in the `current_time` column. ```sql -EXTRACT( extraction_type (proto_field) FROM proto_expression ) +WITH t AS (SELECT 'column value' AS `current_time`) +SELECT current_time() as now, t.current_time FROM t; -extraction_type: - { FIELD | RAW | HAS | ONEOF_CASE } +/*-----------------+--------------* + | now | current_time | + +-----------------+--------------+ + | 15:31:38.776361 | column value | + *-----------------+--------------*/ ``` -**Description** - -Extracts a value from a protocol buffer. `proto_expression` represents the -expression that returns a protocol buffer, `proto_field` represents the field -of the protocol buffer to extract from, and `extraction_type` determines the -type of data to return. `EXTRACT` can be used to get values of ambiguous fields. -An alternative to `EXTRACT` is the [dot operator][querying-protocol-buffers]. - -**Extraction Types** - -You can choose the type of information to get with `EXTRACT`. Your choices are: - -+ `FIELD`: Extract a value from a protocol buffer field. -+ `RAW`: Extract an uninterpreted value from a - protocol buffer field. Raw values - ignore any ZetaSQL type annotations. -+ `HAS`: Returns `TRUE` if a protocol buffer field is set in a proto message; - otherwise, `FALSE`. Returns an error if this is used with a scalar proto3 - field. Alternatively, use [`has_x`][has-value], to perform this task. -+ `ONEOF_CASE`: Returns the name of the set protocol buffer field in a Oneof. - If no field is set, returns an empty string. - -**Return Type** - -The return type depends upon the extraction type in the query. - -+ `FIELD`: Protocol buffer field type. -+ `RAW`: Protocol buffer field - type. Format annotations are - ignored. -+ `HAS`: `BOOL` -+ `ONEOF_CASE`: `STRING` - -**Examples** - -The examples in this section reference two protocol buffers called `Album` and -`Chart`, and one table called `AlbumList`. +[time-functions-link-to-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables -```proto -message Album { - optional string album_name = 1; - repeated string song = 2; - oneof group_name { - string solo = 3; - string duet = 4; - string band = 5; - } -} -``` +[time-link-to-timezone-definitions]: #timezone_definitions -```proto -message Chart { - optional int64 date = 1 [(zetasql.format) = DATE]; - optional string chart_name = 2; - optional int64 rank = 3; -} -``` +### `EXTRACT` ```sql -WITH AlbumList AS ( - SELECT - NEW Album( - 'Alana Yah' AS solo, - 'New Moon' AS album_name, - ['Sandstorm','Wait'] AS song) AS album_col, - NEW Chart( - 'Billboard' AS chart_name, - '2016-04-23' AS date, - 1 AS rank) AS chart_col - UNION ALL - SELECT - NEW Album( - 'The Roadlands' AS band, - 'Grit' AS album_name, - ['The Way', 'Awake', 'Lost Things'] AS song) AS album_col, - NEW Chart( - 'Billboard' AS chart_name, - 1 as rank) AS chart_col -) -SELECT * FROM AlbumList +EXTRACT(part FROM time_expression) ``` -The following example extracts the album names from a table called `AlbumList` -that contains a proto-typed column called `Album`. +**Description** -```sql -SELECT EXTRACT(FIELD(album_name) FROM album_col) AS name_of_album -FROM AlbumList +Returns a value that corresponds to the specified `part` from +a supplied `time_expression`. -/*------------------* - | name_of_album | - +------------------+ - | New Moon | - | Grit | - *------------------*/ -``` +Allowed `part` values are: -A table called `AlbumList` contains a proto-typed column called `Album`. -`Album` contains a field called `date`, which can store an integer. The -`date` field has an annotated format called `DATE` assigned to it, which means -that when you extract the value in this field, it returns a `DATE`, not an -`INT64`. ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` -If you would like to return the value for `date` as an `INT64`, not -as a `DATE`, use the `RAW` extraction type in your query. For example: +Returned values truncate lower order time periods. For example, when extracting +seconds, `EXTRACT` truncates the millisecond and microsecond values. -```sql -SELECT - EXTRACT(RAW(date) FROM chart_col) AS raw_date, - EXTRACT(FIELD(date) FROM chart_col) AS formatted_date -FROM AlbumList +**Return Data Type** -/*----------+----------------* - | raw_date | formatted_date | - +----------+----------------+ - | 16914 | 2016-04-23 | - | 0 | 1970-01-01 | - *----------+----------------*/ -``` +`INT64` -The following example checks to see if release dates exist in a table called -`AlbumList` that contains a protocol buffer called `Chart`. +**Example** + +In the following example, `EXTRACT` returns a value corresponding to the `HOUR` +time part. ```sql -SELECT EXTRACT(HAS(date) FROM chart_col) AS has_release_date -FROM AlbumList +SELECT EXTRACT(HOUR FROM TIME "15:30:00") as hour; /*------------------* - | has_release_date | + | hour | +------------------+ - | TRUE | - | FALSE | + | 15 | *------------------*/ ``` -The following example extracts the group name that is assigned to an artist in -a table called `AlbumList`. The group name is set for exactly one -protocol buffer field inside of the `group_name` Oneof. The `group_name` Oneof -exists inside the `Chart` protocol buffer. - -```sql -SELECT EXTRACT(ONEOF_CASE(group_name) FROM album_col) AS artist_type -FROM AlbumList; - -/*-------------* - | artist_type | - +-------------+ - | solo | - | band | - *-------------*/ -``` - -[querying-protocol-buffers]: https://github.com/google/zetasql/blob/master/docs/protocol-buffers.md#querying_protocol_buffers - -[has-value]: https://github.com/google/zetasql/blob/master/docs/protocol-buffers.md#checking_if_a_field_has_a_value - -### `FILTER_FIELDS` +### `FORMAT_TIME` ```sql -FILTER_FIELDS(proto_expression, proto_field_list [, reset_fields_named_arg]) - -proto_field_list: - {+|-}proto_field_path[, ...] - -reset_fields_named_arg: - RESET_CLEARED_REQUIRED_FIELDS => { TRUE | FALSE } -``` - -**Description** - -Takes a protocol buffer and a list of its fields to include or exclude. -Returns a version of that protocol buffer with unwanted fields removed. -Returns `NULL` if the protocol buffer is `NULL`. - -Input values: - -+ `proto_expression`: The protocol buffer to filter. -+ `proto_field_list`: The fields to exclude or include in the resulting - protocol buffer. -+ `+`: Include a protocol buffer field and its children in the results. -+ `-`: Exclude a protocol buffer field and its children in the results. -+ `proto_field_path`: The protocol buffer field to include or exclude. - If the field represents an [extension][querying-proto-extensions], you can use - syntax for that extension in the path. -+ `reset_fields_named_arg`: You can optionally add the - `RESET_CLEARED_REQUIRED_FIELDS` named argument. - If not explicitly set, `FALSE` is used implicitly. - If `FALSE`, you must include all protocol buffer `required` fields in the - `FILTER_FIELDS` function. If `TRUE`, you do not need to include all required - protocol buffer fields and the value of required fields - defaults to these values: - - Type | Default value - ----------------------- | -------- - Floating point | `0.0` - Integer | `0` - Boolean | `FALSE` - String, byte | `""` - Protocol buffer message | Empty message - -Protocol buffer field expression behavior: - -+ The first field in `proto_field_list` determines the default - inclusion/exclusion. By default, when you include the first field, all other - fields are excluded. Or by default, when you exclude the first field, all - other fields are included. -+ A required field in the protocol buffer cannot be excluded explicitly or - implicitly, unless you have the - `RESET_CLEARED_REQUIRED_FIELDS` named argument set as `TRUE`. -+ If a field is included, its child fields and descendants are implicitly - included in the results. -+ If a field is excluded, its child fields and descendants are - implicitly excluded in the results. -+ A child field must be listed after its parent field in the argument list, - but does not need to come right after the parent field. - -Caveats: - -+ If you attempt to exclude/include a field that already has been - implicitly excluded/included, an error is produced. -+ If you attempt to explicitly include/exclude a field that has already - implicitly been included/excluded, an error is produced. - -**Return type** - -Type of `proto_expression` - -**Examples** - -The examples in this section reference a protocol buffer called `Award` and -a table called `MusicAwards`. - -```proto -message Award { - required int32 year = 1; - optional int32 month = 2; - repeated Type type = 3; - - message Type { - optional string award_name = 1; - optional string category = 2; - } -} +FORMAT_TIME(format_string, time_object) ``` -```sql -WITH - MusicAwards AS ( - SELECT - CAST( - ''' - year: 2001 - month: 9 - type { award_name: 'Best Artist' category: 'Artist' } - type { award_name: 'Best Album' category: 'Album' } - ''' - AS zetasql.examples.music.Award) AS award_col - UNION ALL - SELECT - CAST( - ''' - year: 2001 - month: 12 - type { award_name: 'Best Song' category: 'Song' } - ''' - AS zetasql.examples.music.Award) AS award_col - ) -SELECT * -FROM MusicAwards - -/*---------------------------------------------------------* - | award_col | - +---------------------------------------------------------+ - | { | - | year: 2001 | - | month: 9 | - | type { award_name: "Best Artist" category: "Artist" } | - | type { award_name: "Best Album" category: "Album" } | - | } | - | { | - | year: 2001 | - | month: 12 | - | type { award_name: "Best Song" category: "Song" } | - | } | - *---------------------------------------------------------*/ -``` +**Description** +Formats a `TIME` object according to the specified `format_string`. See +[Supported Format Elements For TIME][time-format-elements] +for a list of format elements that this function supports. -The following example returns protocol buffers that only include the `year` -field. +**Return Data Type** + +`STRING` + +**Example** ```sql -SELECT FILTER_FIELDS(award_col, +year) AS filtered_fields -FROM MusicAwards +SELECT FORMAT_TIME("%R", TIME "15:30:00") as formatted_time; -/*-----------------* - | filtered_fields | - +-----------------+ - | {year: 2001} | - | {year: 2001} | - *-----------------*/ +/*----------------* + | formatted_time | + +----------------+ + | 15:30 | + *----------------*/ ``` -The following example returns protocol buffers that include all but the `type` -field. +[time-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time -```sql -SELECT FILTER_FIELDS(award_col, -type) AS filtered_fields -FROM MusicAwards +### `PARSE_TIME` -/*------------------------* - | filtered_fields | - +------------------------+ - | {year: 2001 month: 9} | - | {year: 2001 month: 12} | - *------------------------*/ +```sql +PARSE_TIME(format_string, time_string) ``` -The following example returns protocol buffers that only include the `year` and -`type.award_name` fields. +**Description** + +Converts a [string representation of time][time-format] to a +`TIME` object. + +`format_string` contains the [format elements][time-format-elements] +that define how `time_string` is formatted. Each element in +`time_string` must have a corresponding element in `format_string`. The +location of each element in `format_string` must match the location of +each element in `time_string`. ```sql -SELECT FILTER_FIELDS(award_col, +year, +type.award_name) AS filtered_fields -FROM MusicAwards +-- This works because elements on both sides match. +SELECT PARSE_TIME("%I:%M:%S", "07:30:00") -/*--------------------------------------* - | filtered_fields | - +--------------------------------------+ - | { | - | year: 2001 | - | type { award_name: "Best Artist" } | - | type { award_name: "Best Album" } | - | } | - | { | - | year: 2001 | - | type { award_name: "Best Song" } | - | } | - *--------------------------------------*/ +-- This produces an error because the seconds element is in different locations. +SELECT PARSE_TIME("%S:%I:%M", "07:30:00") + +-- This produces an error because one of the seconds elements is missing. +SELECT PARSE_TIME("%I:%M", "07:30:00") + +-- This works because %T can find all matching elements in time_string. +SELECT PARSE_TIME("%T", "07:30:00") ``` -The following example returns the `year` and `type` fields, but excludes the -`award_name` field in the `type` field. +When using `PARSE_TIME`, keep the following in mind: + ++ **Unspecified fields.** Any unspecified field is initialized from + `00:00:00.0`. For instance, if `seconds` is unspecified then it + defaults to `00`, and so on. ++ **Whitespace.** One or more consecutive white spaces in the format string + matches zero or more consecutive white spaces in the `TIME` string. In + addition, leading and trailing white spaces in the `TIME` string are always + allowed, even if they are not in the format string. ++ **Format precedence.** When two (or more) format elements have overlapping + information, the last one generally overrides any earlier ones. ++ **Format divergence.** `%p` can be used with `am`, `AM`, `pm`, and `PM`. + +**Return Data Type** + +`TIME` + +**Example** ```sql -SELECT FILTER_FIELDS(award_col, +year, +type, -type.award_name) AS filtered_fields -FROM MusicAwards +SELECT PARSE_TIME("%H", "15") as parsed_time; -/*---------------------------------* - | filtered_fields | - +---------------------------------+ - | { | - | year: 2001 | - | type { category: "Artist" } | - | type { category: "Album" } | - | } | - | { | - | year: 2001 | - | type { category: "Song" } | - | } | - *---------------------------------*/ +/*-------------* + | parsed_time | + +-------------+ + | 15:00:00 | + *-------------*/ ``` -The following example produces an error because `year` is a required field -and cannot be excluded explicitly or implicitly from the results. - ```sql -SELECT FILTER_FIELDS(award_col, -year) AS filtered_fields -FROM MusicAwards +SELECT PARSE_TIME('%I:%M:%S %p', '2:23:38 pm') AS parsed_time --- Error +/*-------------* + | parsed_time | + +-------------+ + | 14:23:38 | + *-------------*/ ``` -The following example produces an error because when `year` was included, -`month` was implicitly excluded. You cannot explicitly exclude a field that -has already been implicitly excluded. +[time-format]: #format_time -```sql -SELECT FILTER_FIELDS(award_col, +year, -month) AS filtered_fields -FROM MusicAwards +[time-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time --- Error +### `TIME` + +```sql +1. TIME(hour, minute, second) +2. TIME(timestamp, [time_zone]) +3. TIME(datetime) ``` -When `RESET_CLEARED_REQUIRED_FIELDS` is set as `TRUE`, `FILTER_FIELDS` doesn't -need to include required fields. In the example below, `MusicAwards` has a -required field called `year`, but this is not added as an argument for -`FILTER_FIELDS`. `year` is added to the results with its default value, `0`. +**Description** + +1. Constructs a `TIME` object using `INT64` + values representing the hour, minute, and second. +2. Constructs a `TIME` object using a `TIMESTAMP` object. It supports an + optional + parameter to [specify a time zone][time-link-to-timezone-definitions]. If no + time zone is specified, the default time zone, which is implementation defined, is + used. +3. Constructs a `TIME` object using a + `DATETIME` object. + +**Return Data Type** + +`TIME` + +**Example** ```sql -SELECT FILTER_FIELDS( - award_col, - +month, - RESET_CLEARED_REQUIRED_FIELDS => TRUE) AS filtered_fields -FROM MusicAwards; +SELECT + TIME(15, 30, 00) as time_hms, + TIME(TIMESTAMP "2008-12-25 15:30:00+08", "America/Los_Angeles") as time_tstz; -/*---------------------------------* - | filtered_fields | - +---------------------------------+ - | { | - | year: 0, | - | month: 9 | - | } | - | { | - | year: 0, | - | month: 12 | - | } | - *---------------------------------*/ +/*----------+-----------* + | time_hms | time_tstz | + +----------+-----------+ + | 15:30:00 | 23:30:00 | + *----------+-----------*/ ``` -[querying-proto-extensions]: https://github.com/google/zetasql/blob/master/docs/protocol-buffers.md#extensions +```sql +SELECT TIME(DATETIME "2008-12-25 15:30:00.000000") AS time_dt; -### `FROM_PROTO` +/*----------* + | time_dt | + +----------+ + | 15:30:00 | + *----------*/ +``` + +[time-link-to-timezone-definitions]: #timezone_definitions + +### `TIME_ADD` ```sql -FROM_PROTO(expression) +TIME_ADD(time_expression, INTERVAL int64_expression part) ``` **Description** -Returns a ZetaSQL value. The valid `expression` types are defined -in the table below, along with the return types that they produce. -Other input `expression` types are invalid. If `expression` cannot be converted -to a valid value, an error is returned. +Adds `int64_expression` units of `part` to the `TIME` object. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
expression typeReturn type
-
    -
  • INT32
  • -
  • google.protobuf.Int32Value
  • -
-
INT32
-
    -
  • UINT32
  • -
  • google.protobuf.UInt32Value
  • -
-
UINT32
-
    -
  • INT64
  • -
  • google.protobuf.Int64Value
  • -
-
INT64
-
    -
  • UINT64
  • -
  • google.protobuf.UInt64Value
  • -
-
UINT64
-
    -
  • FLOAT
  • -
  • google.protobuf.FloatValue
  • -
-
FLOAT
-
    -
  • DOUBLE
  • -
  • google.protobuf.DoubleValue
  • -
-
DOUBLE
-
    -
  • BOOL
  • -
  • google.protobuf.BoolValue
  • -
-
BOOL
-
    -
  • STRING
  • -
  • - google.protobuf.StringValue -

    - Note: The StringValue - value field must be - UTF-8 encoded. -

    -
  • -
-
STRING
-
    -
  • BYTES
  • -
  • google.protobuf.BytesValue
  • -
-
BYTES
-
    -
  • DATE
  • -
  • google.type.Date
  • -
-
DATE
-
    -
  • TIME
  • -
  • - google.type.TimeOfDay +`TIME_ADD` supports the following values for `part`: - ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` - +This function automatically adjusts when values fall outside of the 00:00:00 to +24:00:00 boundary. For example, if you add an hour to `23:30:00`, the returned +value is `00:30:00`. -
  • -
-
TIME
-
    -
  • TIMESTAMP
  • -
  • - google.protobuf.Timestamp +**Return Data Types** - +`TIME` - +**Example** -
  • -
-
TIMESTAMP
+```sql +SELECT + TIME "15:30:00" as original_time, + TIME_ADD(TIME "15:30:00", INTERVAL 10 MINUTE) as later; -**Return Type** +/*-----------------------------+------------------------* + | original_time | later | + +-----------------------------+------------------------+ + | 15:30:00 | 15:40:00 | + *-----------------------------+------------------------*/ +``` -The return type depends upon the `expression` type. See the return types -in the table above. +### `TIME_DIFF` -**Examples** +```sql +TIME_DIFF(time_expression_a, time_expression_b, part) +``` -Convert a `google.type.Date` type into a `DATE` type. +**Description** -```sql -SELECT FROM_PROTO( - new google.type.Date( - 2019 as year, - 10 as month, - 30 as day - ) -) +Returns the whole number of specified `part` intervals between two +`TIME` objects (`time_expression_a` - `time_expression_b`). If the first +`TIME` is earlier than the second one, the output is negative. Throws an error +if the computation overflows the result type, such as if the difference in +nanoseconds +between the two `TIME` objects would overflow an +`INT64` value. -/*------------* - | $col1 | - +------------+ - | 2019-10-30 | - *------------*/ -``` +`TIME_DIFF` supports the following values for `part`: -Pass in and return a `DATE` type. ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` + +**Return Data Type** + +`INT64` + +**Example** ```sql -SELECT FROM_PROTO(DATE '2019-10-30') +SELECT + TIME "15:30:00" as first_time, + TIME "14:35:00" as second_time, + TIME_DIFF(TIME "15:30:00", TIME "14:35:00", MINUTE) as difference; -/*------------* - | $col1 | - +------------+ - | 2019-10-30 | - *------------*/ +/*----------------------------+------------------------+------------------------* + | first_time | second_time | difference | + +----------------------------+------------------------+------------------------+ + | 15:30:00 | 14:35:00 | 55 | + *----------------------------+------------------------+------------------------*/ ``` -### `MODIFY_MAP` +### `TIME_SUB` ```sql -MODIFY_MAP(proto_map_field_expression, key_value_pair[, ...]) - -key_value_pair: - key, value +TIME_SUB(time_expression, INTERVAL int64_expression part) ``` **Description** -Modifies a [protocol buffer map field][proto-map] and returns the modified map -field. - -Input values: +Subtracts `int64_expression` units of `part` from the `TIME` object. -+ `proto_map_field_expression`: A protocol buffer map field. -+ `key_value_pair`: A key-value pair in the protocol buffer map field. +`TIME_SUB` supports the following values for `part`: -Modification behavior: ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` -+ If the key is not already in the map field, adds the key and its value to the - map field. -+ If the key is already in the map field, replaces its value. -+ If the key is in the map field and the value is `NULL`, removes the key and - its value from the map field. +This function automatically adjusts when values fall outside of the 00:00:00 to +24:00:00 boundary. For example, if you subtract an hour from `00:30:00`, the +returned value is `23:30:00`. -`NULL` handling: +**Return Data Type** -+ If `key` is `NULL`, produces an error. -+ If the same `key` appears more than once, produces an error. -+ If `map` is `NULL`, `map` is treated as empty. +`TIME` -**Return type** +**Example** -In the input protocol buffer map field, `V` as represented in `map`. +```sql +SELECT + TIME "15:30:00" as original_date, + TIME_SUB(TIME "15:30:00", INTERVAL 10 MINUTE) as earlier; -**Examples** +/*-----------------------------+------------------------* + | original_date | earlier | + +-----------------------------+------------------------+ + | 15:30:00 | 15:20:00 | + *-----------------------------+------------------------*/ +``` -To illustrate the use of this function, consider the protocol buffer message -`Item`: +### `TIME_TRUNC` -```proto -message Item { - optional map purchased = 1; -}; +```sql +TIME_TRUNC(time_expression, time_part) ``` -In the following example, the query deletes key `A`, replaces `B`, and adds -`C` in a map field called `purchased`. +**Description** -```sql -SELECT - MODIFY_MAP(m.purchased, 'A', NULL, 'B', 4, 'C', 6) AS result_map -FROM - (SELECT AS VALUE CAST("purchased { key: 'A' value: 2 } purchased { key: 'B' value: 3}" AS Item)) AS m; +Truncates a `TIME` value to the granularity of `time_part`. The `TIME` value +is always rounded to the beginning of `time_part`, which can be one of the +following: -/*---------------------------------------------* - | result_map | - +---------------------------------------------+ - | { key: 'B' value: 4 } { key: 'C' value: 6 } | - *---------------------------------------------*/ -``` ++ `NANOSECOND`: If used, nothing is truncated from the value. ++ `MICROSECOND`: The nearest lessor or equal microsecond. ++ `MILLISECOND`: The nearest lessor or equal millisecond. ++ `SECOND`: The nearest lessor or equal second. ++ `MINUTE`: The nearest lessor or equal minute. ++ `HOUR`: The nearest lessor or equal hour. -[proto-map]: https://developers.google.com/protocol-buffers/docs/proto3#maps +**Return Data Type** -### `PROTO_DEFAULT_IF_NULL` +`TIME` -```sql -PROTO_DEFAULT_IF_NULL(proto_field_expression) -``` +**Example** -**Description** +```sql +SELECT + TIME "15:30:00" as original, + TIME_TRUNC(TIME "15:30:00", HOUR) as truncated; -Evaluates any expression that results in a proto field access. -If the `proto_field_expression` evaluates to `NULL`, returns the default -value for the field. Otherwise, returns the field value. +/*----------------------------+------------------------* + | original | truncated | + +----------------------------+------------------------+ + | 15:30:00 | 15:00:00 | + *----------------------------+------------------------*/ +``` -Stipulations: +[time-to-string]: #cast -+ The expression cannot resolve to a required field. -+ The expression cannot resolve to a message field. -+ The expression must resolve to a regular proto field access, not - a virtual field. -+ The expression cannot access a field with - `zetasql.use_defaults=false`. +## Time series functions -**Return Type** +ZetaSQL supports the following time series functions. -Type of `proto_field_expression`. +### Function list -**Example** + + + + + + + + -In the following example, each book in a library has a country of origin. If -the country is not set, the country defaults to unknown. + + + + -```sql -SELECT PROTO_DEFAULT_IF_NULL(book.country) AS origin FROM library_books; -``` + + + + -```proto -message Book { - optional string country = 4 [default = 'Unknown']; -} -``` + + + + -```sql -/*-----------------* - | origin | - +-----------------+ - | Canada | - *-----------------*/ -``` + +
NameSummary
DATE_BUCKET -In this statement, table `library_books` contains a column named `book`, -whose type is `Book`. + + Gets the lower bound of the date bucket that contains a date. +
DATETIME_BUCKET -`Book` is a type that contains a field called `country`. + + Gets the lower bound of the datetime bucket that contains a datetime. +
TIMESTAMP_BUCKET -This is the result if `book.country` evaluates to `Canada`. + + Gets the lower bound of the timestamp bucket that contains a timestamp. +
-This is the result if `book` is `NULL`. Since `book` is `NULL`, -`book.country` evaluates to `NULL` and therefore the function result is the -default value for `country`. +### `DATE_BUCKET` ```sql -/*-----------------* - | origin | - +-----------------+ - | Unknown | - *-----------------*/ +DATE_BUCKET(date_in_bucket, bucket_width) ``` -### `REPLACE_FIELDS` - ```sql -REPLACE_FIELDS(proto_expression, value AS field_path [, ... ]) +DATE_BUCKET(date_in_bucket, bucket_width, bucket_origin_date) ``` **Description** -Returns a copy of a protocol buffer, replacing the values in one or more fields. -`field_path` is a delimited path to the protocol buffer field to be replaced. +Gets the lower bound of the date bucket that contains a date. -+ If `value` is `NULL`, it un-sets `field_path` or returns an error if the last - component of `field_path` is a required field. -+ Replacing subfields will succeed only if the message containing the field is - set. -+ Replacing subfields of repeated field is not allowed. -+ A repeated field can be replaced with an `ARRAY` value. +**Definitions** + ++ `date_in_bucket`: A `DATE` value that you can use to look up a date bucket. ++ `bucket_width`: An `INTERVAL` value that represents the width of + a date bucket. A [single interval][interval-single] with + [date parts][interval-parts] is supported. ++ `bucket_origin_date`: A `DATE` value that represents a point in time. All + buckets expand left and right from this point. If this argument is not set, + `1950-01-01` is used by default. **Return type** -Type of `proto_expression` +`DATE` **Examples** -To illustrate the usage of this function, we use protocol buffer messages -`Book` and `BookDetails`. +In the following example, the origin is omitted and the default origin, +`1950-01-01` is used. All buckets expand in both directions from the origin, +and the size of each bucket is two days. The lower bound of the bucket in +which `my_date` belongs is returned. -``` -message Book { - required string title = 1; - repeated string reviews = 2; - optional BookDetails details = 3; -}; +```sql +WITH some_dates AS ( + SELECT DATE '1949-12-29' AS my_date UNION ALL + SELECT DATE '1949-12-30' UNION ALL + SELECT DATE '1949-12-31' UNION ALL + SELECT DATE '1950-01-01' UNION ALL + SELECT DATE '1950-01-02' UNION ALL + SELECT DATE '1950-01-03' +) +SELECT DATE_BUCKET(my_date, INTERVAL 2 DAY) AS bucket_lower_bound +FROM some_dates; -message BookDetails { - optional string author = 1; - optional int32 chapters = 2; -}; -``` +/*--------------------+ + | bucket_lower_bound | + +--------------------+ + | 1949-12-28 | + | 1949-12-30 | + | 1949-12-30 | + | 1950-12-01 | + | 1950-12-01 | + | 1950-12-03 | + +--------------------*/ -This statement replaces value of field `title` and subfield `chapters` -of proto type `Book`. Note that field `details` must be set for the statement -to succeed. +-- Some date buckets that originate from 1950-01-01: +-- + Bucket: ... +-- + Bucket: [1949-12-28, 1949-12-30) +-- + Bucket: [1949-12-30, 1950-01-01) +-- + Origin: [1950-01-01] +-- + Bucket: [1950-01-01, 1950-01-03) +-- + Bucket: [1950-01-03, 1950-01-05) +-- + Bucket: ... +``` + +In the following example, the origin has been changed to `2000-12-24`, +and all buckets expand in both directions from this point. The size of each +bucket is seven days. The lower bound of the bucket in which `my_date` belongs +is returned: + +```sql +WITH some_dates AS ( + SELECT DATE '2000-12-20' AS my_date UNION ALL + SELECT DATE '2000-12-21' UNION ALL + SELECT DATE '2000-12-22' UNION ALL + SELECT DATE '2000-12-23' UNION ALL + SELECT DATE '2000-12-24' UNION ALL + SELECT DATE '2000-12-25' +) +SELECT DATE_BUCKET( + my_date, + INTERVAL 7 DAY, + DATE '2000-12-24') AS bucket_lower_bound +FROM some_dates; -```sql -SELECT REPLACE_FIELDS( - NEW Book( - "The Hummingbird" AS title, - NEW BookDetails(10 AS chapters) AS details), - "The Hummingbird II" AS title, - 11 AS details.chapters) -AS proto; +/*--------------------+ + | bucket_lower_bound | + +--------------------+ + | 2000-12-17 | + | 2000-12-17 | + | 2000-12-17 | + | 2000-12-17 | + | 2000-12-24 | + | 2000-12-24 | + +--------------------*/ -/*-----------------------------------------------------------------------------* - | proto | - +-----------------------------------------------------------------------------+ - |{title: "The Hummingbird II" details: {chapters: 11 }} | - *-----------------------------------------------------------------------------*/ +-- Some date buckets that originate from 2000-12-24: +-- + Bucket: ... +-- + Bucket: [2000-12-10, 2000-12-17) +-- + Bucket: [2000-12-17, 2000-12-24) +-- + Origin: [2000-12-24] +-- + Bucket: [2000-12-24, 2000-12-31) +-- + Bucket: [2000-12-31, 2000-01-07) +-- + Bucket: ... ``` -The function can replace value of repeated fields. +[interval-single]: https://github.com/google/zetasql/blob/master/docs/data-types.md#single_datetime_part_interval -```sql -SELECT REPLACE_FIELDS( - NEW Book("The Hummingbird" AS title, - NEW BookDetails(10 AS chapters) AS details), - ["A good read!", "Highly recommended."] AS reviews) -AS proto; +[interval-range]: https://github.com/google/zetasql/blob/master/docs/data-types.md#range_datetime_part_interval -/*-----------------------------------------------------------------------------* - | proto | - +-----------------------------------------------------------------------------+ - |{title: "The Hummingbird" review: "A good read" review: "Highly recommended."| - | details: {chapters: 10 }} | - *-----------------------------------------------------------------------------*/ -``` +[interval-parts]: https://github.com/google/zetasql/blob/master/docs/data-types.md#interval_datetime_parts -It can set a field to `NULL`. +### `DATETIME_BUCKET` ```sql -SELECT REPLACE_FIELDS( - NEW Book("The Hummingbird" AS title, - NEW BookDetails(10 AS chapters) AS details), - NULL AS details) -AS proto; - -/*-----------------------------------------------------------------------------* - | proto | - +-----------------------------------------------------------------------------+ - |{title: "The Hummingbird" } | - *-----------------------------------------------------------------------------*/ +DATETIME_BUCKET(datetime_in_bucket, bucket_width) ``` -### `TO_PROTO` - -``` -TO_PROTO(expression) +```sql +DATETIME_BUCKET(datetime_in_bucket, bucket_width, bucket_origin_datetime) ``` **Description** -Returns a PROTO value. The valid `expression` types are defined in the -table below, along with the return types that they produce. Other input -`expression` types are invalid. +Gets the lower bound of the datetime bucket that contains a datetime. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
expression typeReturn type
-
    -
  • INT32
  • -
  • google.protobuf.Int32Value
  • -
-
google.protobuf.Int32Value
-
    -
  • UINT32
  • -
  • google.protobuf.UInt32Value
  • -
-
google.protobuf.UInt32Value
-
    -
  • INT64
  • -
  • google.protobuf.Int64Value
  • -
-
google.protobuf.Int64Value
-
    -
  • UINT64
  • -
  • google.protobuf.UInt64Value
  • -
-
google.protobuf.UInt64Value
-
    -
  • FLOAT
  • -
  • google.protobuf.FloatValue
  • -
-
google.protobuf.FloatValue
-
    -
  • DOUBLE
  • -
  • google.protobuf.DoubleValue
  • -
-
google.protobuf.DoubleValue
-
    -
  • BOOL
  • -
  • google.protobuf.BoolValue
  • -
-
google.protobuf.BoolValue
-
    -
  • STRING
  • -
  • google.protobuf.StringValue
  • -
-
google.protobuf.StringValue
-
    -
  • BYTES
  • -
  • google.protobuf.BytesValue
  • -
-
google.protobuf.BytesValue
-
    -
  • DATE
  • -
  • google.type.Date
  • -
-
google.type.Date
-
    -
  • TIME
  • -
  • google.type.TimeOfDay
  • -
-
google.type.TimeOfDay
-
    -
  • TIMESTAMP
  • -
  • google.protobuf.Timestamp
  • -
-
google.protobuf.Timestamp
+**Definitions** -**Return Type** ++ `datetime_in_bucket`: A `DATETIME` value that you can use to look up a + datetime bucket. ++ `bucket_width`: An `INTERVAL` value that represents the width of + a datetime bucket. A [single interval][interval-single] with + [date and time parts][interval-parts] is supported. ++ `bucket_origin_datetime`: A `DATETIME` value that represents a point in + time. All buckets expand left and right from this point. If this argument + is not set, `1950-01-01 00:00:00` is used by default. -The return type depends upon the `expression` type. See the return types -in the table above. +**Return type** + +`DATETIME` **Examples** -Convert a `DATE` type into a `google.type.Date` type. +In the following example, the origin is omitted and the default origin, +`1950-01-01 00:00:00` is used. All buckets expand in both directions from the +origin, and the size of each bucket is 12 hours. The lower bound of the bucket +in which `my_datetime` belongs is returned: ```sql -SELECT TO_PROTO(DATE '2019-10-30') +WITH some_datetimes AS ( + SELECT DATETIME '1949-12-30 13:00:00' AS my_datetime UNION ALL + SELECT DATETIME '1949-12-31 00:00:00' UNION ALL + SELECT DATETIME '1949-12-31 13:00:00' UNION ALL + SELECT DATETIME '1950-01-01 00:00:00' UNION ALL + SELECT DATETIME '1950-01-01 13:00:00' UNION ALL + SELECT DATETIME '1950-01-02 00:00:00' +) +SELECT DATETIME_BUCKET(my_datetime, INTERVAL 12 HOUR) AS bucket_lower_bound +FROM some_datetimes; -/*--------------------------------* - | $col1 | - +--------------------------------+ - | {year: 2019 month: 10 day: 30} | - *--------------------------------*/ +/*---------------------+ + | bucket_lower_bound | + +---------------------+ + | 1949-12-30 12:00:00 | + | 1949-12-31 00:00:00 | + | 1949-12-31 12:00:00 | + | 1950-01-01 00:00:00 | + | 1950-01-01 12:00:00 | + | 1950-01-02 00:00:00 | + +---------------------*/ + +-- Some datetime buckets that originate from 1950-01-01 00:00:00: +-- + Bucket: ... +-- + Bucket: [1949-12-30 00:00:00, 1949-12-30 12:00:00) +-- + Bucket: [1949-12-30 12:00:00, 1950-01-01 00:00:00) +-- + Origin: [1950-01-01 00:00:00] +-- + Bucket: [1950-01-01 00:00:00, 1950-01-01 12:00:00) +-- + Bucket: [1950-01-01 12:00:00, 1950-02-00 00:00:00) +-- + Bucket: ... +``` + +In the following example, the origin has been changed to `2000-12-24 12:00:00`, +and all buckets expand in both directions from this point. The size of each +bucket is seven days. The lower bound of the bucket in which `my_datetime` +belongs is returned: + +```sql +WITH some_datetimes AS ( + SELECT DATETIME '2000-12-20 00:00:00' AS my_datetime UNION ALL + SELECT DATETIME '2000-12-21 00:00:00' UNION ALL + SELECT DATETIME '2000-12-22 00:00:00' UNION ALL + SELECT DATETIME '2000-12-23 00:00:00' UNION ALL + SELECT DATETIME '2000-12-24 00:00:00' UNION ALL + SELECT DATETIME '2000-12-25 00:00:00' +) +SELECT DATETIME_BUCKET( + my_datetime, + INTERVAL 7 DAY, + DATETIME '2000-12-22 12:00:00') AS bucket_lower_bound +FROM some_datetimes; + +/*--------------------+ + | bucket_lower_bound | + +--------------------+ + | 2000-12-15 12:00:00 | + | 2000-12-15 12:00:00 | + | 2000-12-15 12:00:00 | + | 2000-12-22 12:00:00 | + | 2000-12-22 12:00:00 | + | 2000-12-22 12:00:00 | + +--------------------*/ + +-- Some datetime buckets that originate from 2000-12-22 12:00:00: +-- + Bucket: ... +-- + Bucket: [2000-12-08 12:00:00, 2000-12-15 12:00:00) +-- + Bucket: [2000-12-15 12:00:00, 2000-12-22 12:00:00) +-- + Origin: [2000-12-22 12:00:00] +-- + Bucket: [2000-12-22 12:00:00, 2000-12-29 12:00:00) +-- + Bucket: [2000-12-29 12:00:00, 2000-01-05 12:00:00) +-- + Bucket: ... ``` -Pass in and return a `google.type.Date` type. +[interval-single]: https://github.com/google/zetasql/blob/master/docs/data-types.md#single_datetime_part_interval -```sql -SELECT TO_PROTO( - new google.type.Date( - 2019 as year, - 10 as month, - 30 as day - ) -) +[interval-parts]: https://github.com/google/zetasql/blob/master/docs/data-types.md#interval_datetime_parts -/*--------------------------------* - | $col1 | - +--------------------------------+ - | {year: 2019 month: 10 day: 30} | - *--------------------------------*/ +### `TIMESTAMP_BUCKET` + +```sql +TIMESTAMP_BUCKET(timestamp_in_bucket, bucket_width) ``` -## Security functions +```sql +TIMESTAMP_BUCKET(timestamp_in_bucket, bucket_width, bucket_origin_timestamp) +``` -ZetaSQL supports the following security functions. +**Description** -### Function list +Gets the lower bound of the timestamp bucket that contains a timestamp. - - - - - - - - +**Definitions** - - - - +**Return type** - -
NameSummary
SESSION_USER ++ `timestamp_in_bucket`: A `TIMESTAMP` value that you can use to look up a + timestamp bucket. ++ `bucket_width`: An `INTERVAL` value that represents the width of + a timestamp bucket. A [single interval][interval-single] with + [date and time parts][interval-parts] is supported. ++ `bucket_origin_timestamp`: A `TIMESTAMP` value that represents a point in + time. All buckets expand left and right from this point. If this argument + is not set, `1950-01-01 00:00:00` is used by default. - - Get the email address or principal identifier of the user that is running - the query. -
+`TIMESTAMP` -### `SESSION_USER` +**Examples** -``` -SESSION_USER() -``` +In the following example, the origin is omitted and the default origin, +`1950-01-01 00:00:00` is used. All buckets expand in both directions from the +origin, and the size of each bucket is 12 hours. The lower bound of the bucket +in which `my_timestamp` belongs is returned: -**Description** +```sql +WITH some_timestamps AS ( + SELECT TIMESTAMP '1949-12-30 13:00:00.00' AS my_timestamp UNION ALL + SELECT TIMESTAMP '1949-12-31 00:00:00.00' UNION ALL + SELECT TIMESTAMP '1949-12-31 13:00:00.00' UNION ALL + SELECT TIMESTAMP '1950-01-01 00:00:00.00' UNION ALL + SELECT TIMESTAMP '1950-01-01 13:00:00.00' UNION ALL + SELECT TIMESTAMP '1950-01-02 00:00:00.00' +) +SELECT TIMESTAMP_BUCKET(my_timestamp, INTERVAL 12 HOUR) AS bucket_lower_bound +FROM some_timestamps; -For first-party users, returns the email address of the user that is running the -query. -For third-party users, returns the -[principal identifier](https://cloud.google.com/iam/docs/principal-identifiers) -of the user that is running the query. -For more information about identities, see -[Principals](https://cloud.google.com/docs/authentication#principal). +-- Display of results may differ, depending upon the environment and +-- time zone where this query was executed. +/*---------------------------------------------+ + | bucket_lower_bound | + +---------------------------------------------+ + | 2000-12-30 12:00:00.000 America/Los_Angeles | + | 2000-12-31 00:00:00.000 America/Los_Angeles | + | 2000-12-31 12:00:00.000 America/Los_Angeles | + | 2000-01-01 00:00:00.000 America/Los_Angeles | + | 2000-01-01 12:00:00.000 America/Los_Angeles | + | 2000-01-01 00:00:00.000 America/Los_Angeles | + +---------------------------------------------*/ + +-- Some timestamp buckets that originate from 1950-01-01 00:00:00: +-- + Bucket: ... +-- + Bucket: [1949-12-30 00:00:00.00 UTC, 1949-12-30 12:00:00.00 UTC) +-- + Bucket: [1949-12-30 12:00:00.00 UTC, 1950-01-01 00:00:00.00 UTC) +-- + Origin: [1950-01-01 00:00:00.00 UTC] +-- + Bucket: [1950-01-01 00:00:00.00 UTC, 1950-01-01 12:00:00.00 UTC) +-- + Bucket: [1950-01-01 12:00:00.00 UTC, 1950-02-00 00:00:00.00 UTC) +-- + Bucket: ... +``` + +In the following example, the origin has been changed to `2000-12-24 12:00:00`, +and all buckets expand in both directions from this point. The size of each +bucket is seven days. The lower bound of the bucket in which `my_timestamp` +belongs is returned: + +```sql +WITH some_timestamps AS ( + SELECT TIMESTAMP '2000-12-20 00:00:00.00' AS my_timestamp UNION ALL + SELECT TIMESTAMP '2000-12-21 00:00:00.00' UNION ALL + SELECT TIMESTAMP '2000-12-22 00:00:00.00' UNION ALL + SELECT TIMESTAMP '2000-12-23 00:00:00.00' UNION ALL + SELECT TIMESTAMP '2000-12-24 00:00:00.00' UNION ALL + SELECT TIMESTAMP '2000-12-25 00:00:00.00' +) +SELECT TIMESTAMP_BUCKET( + my_timestamp, + INTERVAL 7 DAY, + TIMESTAMP '2000-12-22 12:00:00.00') AS bucket_lower_bound +FROM some_timestamps; + +-- Display of results may differ, depending upon the environment and +-- time zone where this query was executed. +/*---------------------------------------------+ + | bucket_lower_bound | + +---------------------------------------------+ + | 2000-12-15 12:00:00.000 America/Los_Angeles | + | 2000-12-15 12:00:00.000 America/Los_Angeles | + | 2000-12-15 12:00:00.000 America/Los_Angeles | + | 2000-12-22 12:00:00.000 America/Los_Angeles | + | 2000-12-22 12:00:00.000 America/Los_Angeles | + | 2000-12-22 12:00:00.000 America/Los_Angeles | + +---------------------------------------------*/ -**Return Data Type** +-- Some timestamp buckets that originate from 2000-12-22 12:00:00: +-- + Bucket: ... +-- + Bucket: [2000-12-08 12:00:00.00 UTC, 2000-12-15 12:00:00.00 UTC) +-- + Bucket: [2000-12-15 12:00:00.00 UTC, 2000-12-22 12:00:00.00 UTC) +-- + Origin: [2000-12-22 12:00:00.00 UTC] +-- + Bucket: [2000-12-22 12:00:00.00 UTC, 2000-12-29 12:00:00.00 UTC) +-- + Bucket: [2000-12-29 12:00:00.00 UTC, 2000-01-05 12:00:00.00 UTC) +-- + Bucket: ... +``` -`STRING` +[interval-single]: https://github.com/google/zetasql/blob/master/docs/data-types.md#single_datetime_part_interval -**Example** +[interval-parts]: https://github.com/google/zetasql/blob/master/docs/data-types.md#interval_datetime_parts -```sql -SELECT SESSION_USER() as user; +## Timestamp functions -/*----------------------* - | user | - +----------------------+ - | jdoe@example.com | - *----------------------*/ -``` +ZetaSQL supports the following timestamp functions. -## Net functions +IMPORTANT: Before working with these functions, you need to understand +the difference between the formats in which timestamps are stored and displayed, +and how time zones are used for the conversion between these formats. +To learn more, see +[How time zones work with timestamp functions][timestamp-link-to-timezone-definitions]. -ZetaSQL supports the following Net functions. +NOTE: These functions return a runtime error if overflow occurs; result +values are bounded by the defined [`DATE` range][data-types-link-to-date_type] +and [`TIMESTAMP` range][data-types-link-to-timestamp_type]. ### Function list @@ -36231,1302 +38406,1279 @@ ZetaSQL supports the following Net functions. - NET.FORMAT_IP + CURRENT_TIMESTAMP - (Deprecated) Converts an - IPv4 address from an INT64 value to a - STRING value. + Returns the current date and time as a TIMESTAMP object. - NET.FORMAT_PACKED_IP + EXTRACT - (Deprecated) Converts an - IPv4 or IPv6 address from a BYTES value to a - STRING value. + Extracts part of a TIMESTAMP value. - NET.HOST + FORMAT_TIMESTAMP - Gets the hostname from a URL. + Formats a TIMESTAMP value according to the specified + format string. - NET.IP_FROM_STRING + PARSE_TIMESTAMP - Converts an IPv4 or IPv6 address from a STRING value to - a BYTES value in network byte order. + Converts a STRING value to a TIMESTAMP value. - NET.IP_IN_NET + STRING - Checks if an IP address is in a subnet. + Converts a TIMESTAMP value to a STRING value. - NET.IP_NET_MASK + TIMESTAMP - Gets a network mask. + Constructs a TIMESTAMP value. - NET.IP_TO_STRING + TIMESTAMP_ADD - Converts an IPv4 or IPv6 address from a BYTES value in - network byte order to a STRING value. + Adds a specified time interval to a TIMESTAMP value. - NET.IP_TRUNC + TIMESTAMP_DIFF - Converts a BYTES IPv4 or IPv6 address in - network byte order to a BYTES subnet address. + Gets the number of intervals between two TIMESTAMP values. - NET.IPV4_FROM_INT64 + TIMESTAMP_FROM_UNIX_MICROS - Converts an IPv4 address from an INT64 value to a - BYTES value in network byte order. + Similar to TIMESTAMP_MICROS, except that additionally, a + TIMESTAMP value can be passed in. - NET.IPV4_TO_INT64 + TIMESTAMP_FROM_UNIX_MILLIS - Converts an IPv4 address from a BYTES value in network - byte order to an INT64 value. + Similar to TIMESTAMP_MILLIS, except that additionally, a + TIMESTAMP value can be passed in. - NET.MAKE_NET + TIMESTAMP_FROM_UNIX_SECONDS - Takes a IPv4 or IPv6 address and the prefix length, and produces a - CIDR subnet. + Similar to TIMESTAMP_SECONDS, except that additionally, a + TIMESTAMP value can be passed in. - NET.PARSE_IP + TIMESTAMP_MICROS - (Deprecated) Converts an - IPv4 address from a STRING value to an - INT64 value. + Converts the number of microseconds since + 1970-01-01 00:00:00 UTC to a TIMESTAMP. - NET.PARSE_PACKED_IP + TIMESTAMP_MILLIS - (Deprecated) Converts an - IPv4 or IPv6 address from a STRING value to a - BYTES value. + Converts the number of milliseconds since + 1970-01-01 00:00:00 UTC to a TIMESTAMP. - NET.PUBLIC_SUFFIX + TIMESTAMP_SECONDS - Gets the public suffix from a URL. + Converts the number of seconds since + 1970-01-01 00:00:00 UTC to a TIMESTAMP. - NET.REG_DOMAIN + TIMESTAMP_SUB - Gets the registered or registrable domain from a URL. + Subtracts a specified time interval from a TIMESTAMP value. - NET.SAFE_IP_FROM_STRING + TIMESTAMP_TRUNC - Similar to the NET.IP_FROM_STRING, but returns - NULL instead of producing an error if the input is invalid. + Truncates a TIMESTAMP value. - - - -### `NET.FORMAT_IP` (DEPRECATED) - - -``` -NET.FORMAT_IP(integer) -``` + + UNIX_MICROS -**Description** + + + Converts a TIMESTAMP value to the number of microseconds since + 1970-01-01 00:00:00 UTC. + + -This function is deprecated. It is the same as -[`NET.IP_TO_STRING`][net-link-to-ip-to-string]`(`[`NET.IPV4_FROM_INT64`][net-link-to-ipv4-from-int64]`(integer))`, -except that this function does not allow negative input values. + + UNIX_MILLIS -**Return Data Type** + + + Converts a TIMESTAMP value to the number of milliseconds + since 1970-01-01 00:00:00 UTC. + + -STRING + + UNIX_SECONDS -[net-link-to-ip-to-string]: #netip_to_string + + + Converts a TIMESTAMP value to the number of seconds since + 1970-01-01 00:00:00 UTC. + + -[net-link-to-ipv4-from-int64]: #netipv4_from_int64 + + -### `NET.FORMAT_PACKED_IP` (DEPRECATED) - +### `CURRENT_TIMESTAMP` +```sql +CURRENT_TIMESTAMP() ``` -NET.FORMAT_PACKED_IP(bytes_value) + +```sql +CURRENT_TIMESTAMP ``` **Description** -This function is deprecated. It is the same as [`NET.IP_TO_STRING`][net-link-to-ip-to-string]. - -**Return Data Type** +Returns the current date and time as a timestamp object. The timestamp is +continuous, non-ambiguous, has exactly 60 seconds per minute and does not repeat +values over the leap second. Parentheses are optional. -STRING +This function handles leap seconds by smearing them across a window of 20 hours +around the inserted leap second. -[net-link-to-ip-to-string]: #netip_to_string +The current date and time is recorded at the start of the query +statement which contains this function, not when this specific function is +evaluated. -### `NET.HOST` +**Supported Input Types** -``` -NET.HOST(url) -``` +Not applicable -**Description** +**Result Data Type** -Takes a URL as a `STRING` value and returns the host. For best results, URL -values should comply with the format as defined by -[RFC 3986][net-link-to-rfc-3986-appendix-a]. If the URL value does not comply -with RFC 3986 formatting, this function makes a best effort to parse the input -and return a relevant result. If the function cannot parse the input, it -returns `NULL`. +`TIMESTAMP` -Note: The function does not perform any normalization. +**Examples** -**Return Data Type** +```sql +SELECT CURRENT_TIMESTAMP() AS now; -`STRING` +/*---------------------------------------------* + | now | + +---------------------------------------------+ + | 2020-06-02 17:00:53.110 America/Los_Angeles | + *---------------------------------------------*/ +``` -**Example** +When a column named `current_timestamp` is present, the column name and the +function call without parentheses are ambiguous. To ensure the function call, +add parentheses; to ensure the column name, qualify it with its +[range variable][timestamp-functions-link-to-range-variables]. For example, the +following query selects the function in the `now` column and the table +column in the `current_timestamp` column. ```sql -SELECT - FORMAT("%T", input) AS input, - description, - FORMAT("%T", NET.HOST(input)) AS host, - FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix, - FORMAT("%T", NET.REG_DOMAIN(input)) AS domain -FROM ( - SELECT "" AS input, "invalid input" AS description - UNION ALL SELECT "http://abc.xyz", "standard URL" - UNION ALL SELECT "//user:password@a.b:80/path?query", - "standard URL with relative scheme, port, path and query, but no public suffix" - UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host" - UNION ALL SELECT "http://例å­.å·ç­’纸.中国", "standard URL with internationalized domain name" - UNION ALL SELECT " www.Example.Co.UK ", - "non-standard URL with spaces, upper case letters, and without scheme" - UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported" -); -``` +WITH t AS (SELECT 'column value' AS `current_timestamp`) +SELECT current_timestamp() AS now, t.current_timestamp FROM t; -| input | description | host | suffix | domain | -|---------------------------------------------------------------------|-------------------------------------------------------------------------------|--------------------|---------|----------------| -| "" | invalid input | NULL | NULL | NULL | -| "http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" | -| "//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL | -| "https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL | -| "http://例å­.å·ç­’纸.中国" | standard URL with internationalized domain name | "例å­.å·ç­’纸.中国" | "中国" | "å·ç­’纸.中国" | -| "    www.Example.Co.UK    " | non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK"| "Co.UK" | "Example.Co.UK"| -| "mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL | +/*---------------------------------------------+-------------------* + | now | current_timestamp | + +---------------------------------------------+-------------------+ + | 2020-06-02 17:00:53.110 America/Los_Angeles | column value | + *---------------------------------------------+-------------------*/ +``` -[net-link-to-rfc-3986-appendix-a]: https://tools.ietf.org/html/rfc3986#appendix-A +[timestamp-functions-link-to-range-variables]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#range_variables -### `NET.IP_FROM_STRING` +### `EXTRACT` -``` -NET.IP_FROM_STRING(addr_str) +```sql +EXTRACT(part FROM timestamp_expression [AT TIME ZONE time_zone]) ``` **Description** -Converts an IPv4 or IPv6 address from text (STRING) format to binary (BYTES) -format in network byte order. - -This function supports the following formats for `addr_str`: +Returns a value that corresponds to the specified `part` from +a supplied `timestamp_expression`. This function supports an optional +`time_zone` parameter. See +[Time zone definitions][timestamp-link-to-timezone-definitions] for information +on how to specify a time zone. -+ IPv4: Dotted-quad format. For example, `10.1.2.3`. -+ IPv6: Colon-separated format. For example, - `1234:5678:90ab:cdef:1234:5678:90ab:cdef`. For more examples, see the - [IP Version 6 Addressing Architecture][net-link-to-ipv6-rfc]. +Allowed `part` values are: -This function does not support [CIDR notation][net-link-to-cidr-notation], such as `10.1.2.3/32`. ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR` ++ `DAYOFWEEK`: Returns values in the range [1,7] with Sunday as the first day of + of the week. ++ `DAY` ++ `DAYOFYEAR` ++ `WEEK`: Returns the week number of the date in the range [0, 53]. Weeks begin + with Sunday, and dates prior to the first Sunday of the year are in week 0. ++ `WEEK()`: Returns the week number of `timestamp_expression` in the + range [0, 53]. Weeks begin on `WEEKDAY`. `datetime`s prior to the first + `WEEKDAY` of the year are in week 0. Valid values for `WEEKDAY` are `SUNDAY`, + `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, and `SATURDAY`. ++ `ISOWEEK`: Returns the [ISO 8601 week][ISO-8601-week] + number of the `datetime_expression`. `ISOWEEK`s begin on Monday. Return values + are in the range [1, 53]. The first `ISOWEEK` of each ISO year begins on the + Monday before the first Thursday of the Gregorian calendar year. ++ `MONTH` ++ `QUARTER` ++ `YEAR` ++ `ISOYEAR`: Returns the [ISO 8601][ISO-8601] + week-numbering year, which is the Gregorian calendar year containing the + Thursday of the week to which `date_expression` belongs. ++ `DATE` ++ DATETIME ++ TIME -If this function receives a `NULL` input, it returns `NULL`. If the input is -considered invalid, an `OUT_OF_RANGE` error occurs. +Returned values truncate lower order time periods. For example, when extracting +seconds, `EXTRACT` truncates the millisecond and microsecond values. **Return Data Type** -BYTES +`INT64`, except in the following cases: -**Example** ++ If `part` is `DATE`, the function returns a `DATE` object. + +**Examples** + +In the following example, `EXTRACT` returns a value corresponding to the `DAY` +time part. ```sql +WITH Input AS (SELECT TIMESTAMP("2008-12-25 05:30:00+00") AS timestamp_value) SELECT - addr_str, FORMAT("%T", NET.IP_FROM_STRING(addr_str)) AS ip_from_string -FROM UNNEST([ - '48.49.50.51', - '::1', - '3031:3233:3435:3637:3839:4041:4243:4445', - '::ffff:192.0.2.128' -]) AS addr_str; + EXTRACT(DAY FROM timestamp_value AT TIME ZONE "UTC") AS the_day_utc, + EXTRACT(DAY FROM timestamp_value AT TIME ZONE "America/Los_Angeles") AS the_day_california +FROM Input -/*---------------------------------------------------------------------------------------------------------------* - | addr_str | ip_from_string | - +---------------------------------------------------------------------------------------------------------------+ - | 48.49.50.51 | b"0123" | - | ::1 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" | - | 3031:3233:3435:3637:3839:4041:4243:4445 | b"0123456789@ABCDE" | - | ::ffff:192.0.2.128 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" | - *---------------------------------------------------------------------------------------------------------------*/ +/*-------------+--------------------* + | the_day_utc | the_day_california | + +-------------+--------------------+ + | 25 | 24 | + *-------------+--------------------*/ ``` -[net-link-to-ipv6-rfc]: http://www.ietf.org/rfc/rfc2373.txt - -[net-link-to-cidr-notation]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing +In the following example, `EXTRACT` returns values corresponding to different +time parts from a column of type `TIMESTAMP`. -### `NET.IP_IN_NET` +```sql +WITH Timestamps AS ( + SELECT TIMESTAMP("2005-01-03 12:34:56+00") AS timestamp_value UNION ALL + SELECT TIMESTAMP("2007-12-31 12:00:00+00") UNION ALL + SELECT TIMESTAMP("2009-01-01 12:00:00+00") UNION ALL + SELECT TIMESTAMP("2009-12-31 12:00:00+00") UNION ALL + SELECT TIMESTAMP("2017-01-02 12:00:00+00") UNION ALL + SELECT TIMESTAMP("2017-05-26 12:00:00+00") +) +SELECT + timestamp_value, + EXTRACT(ISOYEAR FROM timestamp_value) AS isoyear, + EXTRACT(ISOWEEK FROM timestamp_value) AS isoweek, + EXTRACT(YEAR FROM timestamp_value) AS year, + EXTRACT(WEEK FROM timestamp_value) AS week +FROM Timestamps +ORDER BY timestamp_value; +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+---------+---------+------+------* + | timestamp_value | isoyear | isoweek | year | week | + +---------------------------------------------+---------+---------+------+------+ + | 2005-01-03 04:34:56.000 America/Los_Angeles | 2005 | 1 | 2005 | 1 | + | 2007-12-31 04:00:00.000 America/Los_Angeles | 2008 | 1 | 2007 | 52 | + | 2009-01-01 04:00:00.000 America/Los_Angeles | 2009 | 1 | 2009 | 0 | + | 2009-12-31 04:00:00.000 America/Los_Angeles | 2009 | 53 | 2009 | 52 | + | 2017-01-02 04:00:00.000 America/Los_Angeles | 2017 | 1 | 2017 | 1 | + | 2017-05-26 05:00:00.000 America/Los_Angeles | 2017 | 21 | 2017 | 21 | + *---------------------------------------------+---------+---------+------+------*/ ``` -NET.IP_IN_NET(address, subnet) -``` - -**Description** - -Takes an IP address and a subnet CIDR as STRING and returns true if the IP -address is contained in the subnet. -This function supports the following formats for `address` and `subnet`: +In the following example, `timestamp_expression` falls on a Monday. `EXTRACT` +calculates the first column using weeks that begin on Sunday, and it calculates +the second column using weeks that begin on Monday. -+ IPv4: Dotted-quad format. For example, `10.1.2.3`. -+ IPv6: Colon-separated format. For example, - `1234:5678:90ab:cdef:1234:5678:90ab:cdef`. For more examples, see the - [IP Version 6 Addressing Architecture][net-link-to-ipv6-rfc]. -+ CIDR (IPv4): Dotted-quad format. For example, `10.1.2.0/24` -+ CIDR (IPv6): Colon-separated format. For example, `1:2::/48`. +```sql +WITH table AS (SELECT TIMESTAMP("2017-11-06 00:00:00+00") AS timestamp_value) +SELECT + timestamp_value, + EXTRACT(WEEK(SUNDAY) FROM timestamp_value) AS week_sunday, + EXTRACT(WEEK(MONDAY) FROM timestamp_value) AS week_monday +FROM table; -If this function receives a `NULL` input, it returns `NULL`. If the input is -considered invalid, an `OUT_OF_RANGE` error occurs. +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+-------------+---------------* + | timestamp_value | week_sunday | week_monday | + +---------------------------------------------+-------------+---------------+ + | 2017-11-05 16:00:00.000 America/Los_Angeles | 45 | 44 | + *---------------------------------------------+-------------+---------------*/ +``` -**Return Data Type** +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 -BOOL +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date -[net-link-to-ipv6-rfc]: http://www.ietf.org/rfc/rfc2373.txt +[timestamp-link-to-timezone-definitions]: #timezone_definitions -### `NET.IP_NET_MASK` +### `FORMAT_TIMESTAMP` -``` -NET.IP_NET_MASK(num_output_bytes, prefix_length) +```sql +FORMAT_TIMESTAMP(format_string, timestamp[, time_zone]) ``` **Description** -Returns a network mask: a byte sequence with length equal to `num_output_bytes`, -where the first `prefix_length` bits are set to 1 and the other bits are set to -0. `num_output_bytes` and `prefix_length` are INT64. -This function throws an error if `num_output_bytes` is not 4 (for IPv4) or 16 -(for IPv6). It also throws an error if `prefix_length` is negative or greater -than `8 * num_output_bytes`. +Formats a timestamp according to the specified `format_string`. + +See [Format elements for date and time parts][timestamp-format-elements] +for a list of format elements that this function supports. **Return Data Type** -BYTES +`STRING` **Example** ```sql -SELECT x, y, FORMAT("%T", NET.IP_NET_MASK(x, y)) AS ip_net_mask -FROM UNNEST([ - STRUCT(4 as x, 0 as y), - (4, 20), - (4, 32), - (16, 0), - (16, 1), - (16, 128) -]); +SELECT FORMAT_TIMESTAMP("%c", TIMESTAMP "2050-12-25 15:30:55+00", "UTC") + AS formatted; -/*--------------------------------------------------------------------------------* - | x | y | ip_net_mask | - +--------------------------------------------------------------------------------+ - | 4 | 0 | b"\x00\x00\x00\x00" | - | 4 | 20 | b"\xff\xff\xf0\x00" | - | 4 | 32 | b"\xff\xff\xff\xff" | - | 16 | 0 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | - | 16 | 1 | b"\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | - | 16 | 128 | b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff" | - *--------------------------------------------------------------------------------*/ +/*--------------------------* + | formatted | + +--------------------------+ + | Sun Dec 25 15:30:55 2050 | + *--------------------------*/ ``` -### `NET.IP_TO_STRING` +```sql +SELECT FORMAT_TIMESTAMP("%b-%d-%Y", TIMESTAMP "2050-12-25 15:30:55+00") + AS formatted; +/*-------------* + | formatted | + +-------------+ + | Dec-25-2050 | + *-------------*/ ``` -NET.IP_TO_STRING(addr_bin) -``` - -**Description** -Converts an IPv4 or IPv6 address from binary (BYTES) format in network byte -order to text (STRING) format. - -If the input is 4 bytes, this function returns an IPv4 address as a STRING. If -the input is 16 bytes, it returns an IPv6 address as a STRING. - -If this function receives a `NULL` input, it returns `NULL`. If the input has -a length different from 4 or 16, an `OUT_OF_RANGE` error occurs. - -**Return Data Type** -STRING +```sql +SELECT FORMAT_TIMESTAMP("%b %Y", TIMESTAMP "2050-12-25 15:30:55+00") + AS formatted; -**Example** +/*-------------* + | formatted | + +-------------+ + | Dec 2050 | + *-------------*/ +``` ```sql -SELECT FORMAT("%T", x) AS addr_bin, NET.IP_TO_STRING(x) AS ip_to_string -FROM UNNEST([ - b"0123", - b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01", - b"0123456789@ABCDE", - b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" -]) AS x; +SELECT FORMAT_TIMESTAMP("%Y-%m-%dT%H:%M:%SZ", TIMESTAMP "2050-12-25 15:30:55", "UTC") + AS formatted; -/*---------------------------------------------------------------------------------------------------------------* - | addr_bin | ip_to_string | - +---------------------------------------------------------------------------------------------------------------+ - | b"0123" | 48.49.50.51 | - | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" | ::1 | - | b"0123456789@ABCDE" | 3031:3233:3435:3637:3839:4041:4243:4445 | - | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" | ::ffff:192.0.2.128 | - *---------------------------------------------------------------------------------------------------------------*/ +/*+---------------------* + | formatted | + +----------------------+ + | 2050-12-25T15:30:55Z | + *----------------------*/ ``` -### `NET.IP_TRUNC` +[timestamp-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time -``` -NET.IP_TRUNC(addr_bin, prefix_length) +### `PARSE_TIMESTAMP` + +```sql +PARSE_TIMESTAMP(format_string, timestamp_string[, time_zone]) ``` **Description** -Takes `addr_bin`, an IPv4 or IPv6 address in binary (BYTES) format in network -byte order, and returns a subnet address in the same format. The result has the -same length as `addr_bin`, where the first `prefix_length` bits are equal to -those in `addr_bin` and the remaining bits are 0. -This function throws an error if `LENGTH(addr_bin)` is not 4 or 16, or if -`prefix_len` is negative or greater than `LENGTH(addr_bin) * 8`. - -**Return Data Type** - -BYTES +Converts a [string representation of a timestamp][timestamp-format] to a +`TIMESTAMP` object. -**Example** +`format_string` contains the [format elements][timestamp-format-elements] +that define how `timestamp_string` is formatted. Each element in +`timestamp_string` must have a corresponding element in `format_string`. The +location of each element in `format_string` must match the location of +each element in `timestamp_string`. ```sql -SELECT - FORMAT("%T", x) as addr_bin, prefix_length, - FORMAT("%T", NET.IP_TRUNC(x, prefix_length)) AS ip_trunc -FROM UNNEST([ - STRUCT(b"\xAA\xBB\xCC\xDD" as x, 0 as prefix_length), - (b"\xAA\xBB\xCC\xDD", 11), (b"\xAA\xBB\xCC\xDD", 12), - (b"\xAA\xBB\xCC\xDD", 24), (b"\xAA\xBB\xCC\xDD", 32), - (b'0123456789@ABCDE', 80) -]); +-- This works because elements on both sides match. +SELECT PARSE_TIMESTAMP("%a %b %e %I:%M:%S %Y", "Thu Dec 25 07:30:00 2008") -/*-----------------------------------------------------------------------------* - | addr_bin | prefix_length | ip_trunc | - +-----------------------------------------------------------------------------+ - | b"\xaa\xbb\xcc\xdd" | 0 | b"\x00\x00\x00\x00" | - | b"\xaa\xbb\xcc\xdd" | 11 | b"\xaa\xa0\x00\x00" | - | b"\xaa\xbb\xcc\xdd" | 12 | b"\xaa\xb0\x00\x00" | - | b"\xaa\xbb\xcc\xdd" | 24 | b"\xaa\xbb\xcc\x00" | - | b"\xaa\xbb\xcc\xdd" | 32 | b"\xaa\xbb\xcc\xdd" | - | b"0123456789@ABCDE" | 80 | b"0123456789\x00\x00\x00\x00\x00\x00" | - *-----------------------------------------------------------------------------*/ -``` +-- This produces an error because the year element is in different locations. +SELECT PARSE_TIMESTAMP("%a %b %e %Y %I:%M:%S", "Thu Dec 25 07:30:00 2008") -### `NET.IPV4_FROM_INT64` +-- This produces an error because one of the year elements is missing. +SELECT PARSE_TIMESTAMP("%a %b %e %I:%M:%S", "Thu Dec 25 07:30:00 2008") +-- This works because %c can find all matching elements in timestamp_string. +SELECT PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008") ``` -NET.IPV4_FROM_INT64(integer_value) -``` - -**Description** - -Converts an IPv4 address from integer format to binary (BYTES) format in network -byte order. In the integer input, the least significant bit of the IP address is -stored in the least significant bit of the integer, regardless of host or client -architecture. For example, `1` means `0.0.0.1`, and `0x1FF` means `0.0.1.255`. -This function checks that either all the most significant 32 bits are 0, or all -the most significant 33 bits are 1 (sign-extended from a 32-bit integer). -In other words, the input should be in the range `[-0x80000000, 0xFFFFFFFF]`; -otherwise, this function throws an error. +When using `PARSE_TIMESTAMP`, keep the following in mind: -This function does not support IPv6. ++ **Unspecified fields.** Any unspecified field is initialized from `1970-01-01 + 00:00:00.0`. This initialization value uses the time zone specified by the + function's time zone argument, if present. If not, the initialization value + uses the default time zone, which is implementation defined. For instance, if the year + is unspecified then it defaults to `1970`, and so on. ++ **Case insensitivity.** Names, such as `Monday`, `February`, and so on, are + case insensitive. ++ **Whitespace.** One or more consecutive white spaces in the format string + matches zero or more consecutive white spaces in the timestamp string. In + addition, leading and trailing white spaces in the timestamp string are always + allowed, even if they are not in the format string. ++ **Format precedence.** When two (or more) format elements have overlapping + information (for example both `%F` and `%Y` affect the year), the last one + generally overrides any earlier ones, with some exceptions (see the + descriptions of `%s`, `%C`, and `%y`). ++ **Format divergence.** `%p` can be used with `am`, `AM`, `pm`, and `PM`. **Return Data Type** -BYTES +`TIMESTAMP` **Example** ```sql -SELECT x, x_hex, FORMAT("%T", NET.IPV4_FROM_INT64(x)) AS ipv4_from_int64 -FROM ( - SELECT CAST(x_hex AS INT64) x, x_hex - FROM UNNEST(["0x0", "0xABCDEF", "0xFFFFFFFF", "-0x1", "-0x2"]) AS x_hex -); +SELECT PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008") AS parsed; -/*-----------------------------------------------* - | x | x_hex | ipv4_from_int64 | - +-----------------------------------------------+ - | 0 | 0x0 | b"\x00\x00\x00\x00" | - | 11259375 | 0xABCDEF | b"\x00\xab\xcd\xef" | - | 4294967295 | 0xFFFFFFFF | b"\xff\xff\xff\xff" | - | -1 | -0x1 | b"\xff\xff\xff\xff" | - | -2 | -0x2 | b"\xff\xff\xff\xfe" | - *-----------------------------------------------*/ +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------* + | parsed | + +---------------------------------------------+ + | 2008-12-25 07:30:00.000 America/Los_Angeles | + *---------------------------------------------*/ ``` -### `NET.IPV4_TO_INT64` +[timestamp-format]: #format_timestamp -``` -NET.IPV4_TO_INT64(addr_bin) -``` +[timestamp-format-elements]: https://github.com/google/zetasql/blob/master/docs/format-elements.md#format_elements_date_time -**Description** +### `STRING` -Converts an IPv4 address from binary (BYTES) format in network byte order to -integer format. In the integer output, the least significant bit of the IP -address is stored in the least significant bit of the integer, regardless of -host or client architecture. For example, `1` means `0.0.0.1`, and `0x1FF` means -`0.0.1.255`. The output is in the range `[0, 0xFFFFFFFF]`. +```sql +STRING(timestamp_expression[, time_zone]) +``` -If the input length is not 4, this function throws an error. +**Description** -This function does not support IPv6. +Converts a timestamp to a string. Supports an optional +parameter to specify a time zone. See +[Time zone definitions][timestamp-link-to-timezone-definitions] for information +on how to specify a time zone. **Return Data Type** -INT64 +`STRING` **Example** -```sql -SELECT - FORMAT("%T", x) AS addr_bin, - FORMAT("0x%X", NET.IPV4_TO_INT64(x)) AS ipv4_to_int64 -FROM -UNNEST([b"\x00\x00\x00\x00", b"\x00\xab\xcd\xef", b"\xff\xff\xff\xff"]) AS x; +```sql +SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00", "UTC") AS string; -/*-------------------------------------* - | addr_bin | ipv4_to_int64 | - +-------------------------------------+ - | b"\x00\x00\x00\x00" | 0x0 | - | b"\x00\xab\xcd\xef" | 0xABCDEF | - | b"\xff\xff\xff\xff" | 0xFFFFFFFF | - *-------------------------------------*/ +/*-------------------------------* + | string | + +-------------------------------+ + | 2008-12-25 15:30:00+00 | + *-------------------------------*/ ``` -### `NET.MAKE_NET` +[timestamp-link-to-timezone-definitions]: #timezone_definitions -``` -NET.MAKE_NET(address, prefix_length) +### `TIMESTAMP` + +```sql +TIMESTAMP(string_expression[, time_zone]) +TIMESTAMP(date_expression[, time_zone]) +TIMESTAMP(datetime_expression[, time_zone]) ``` **Description** -Takes an IPv4 or IPv6 address as STRING and an integer representing the prefix -length (the number of leading 1-bits in the network mask). Returns a -STRING representing the [CIDR subnet][net-link-to-cidr-notation] with the given prefix length. - -The value of `prefix_length` must be greater than or equal to 0. A smaller value -means a bigger subnet, covering more IP addresses. The result CIDR subnet must -be no smaller than `address`, meaning that the value of `prefix_length` must be -less than or equal to the prefix length in `address`. See the effective upper -bound below. ++ `string_expression[, time_zone]`: Converts a string to a + timestamp. `string_expression` must include a + timestamp literal. + If `string_expression` includes a time zone in the timestamp literal, do + not include an explicit `time_zone` + argument. ++ `date_expression[, time_zone]`: Converts a date to a timestamp. + The value returned is the earliest timestamp that falls within + the given date. ++ `datetime_expression[, time_zone]`: Converts a + datetime to a timestamp. -This function supports the following formats for `address`: +This function supports an optional +parameter to [specify a time zone][timestamp-link-to-timezone-definitions]. If +no time zone is specified, the default time zone, which is implementation defined, +is used. -+ IPv4: Dotted-quad format, such as `10.1.2.3`. The value of `prefix_length` - must be less than or equal to 32. -+ IPv6: Colon-separated format, such as - `1234:5678:90ab:cdef:1234:5678:90ab:cdef`. The value of `prefix_length` must - be less than or equal to 128. -+ CIDR (IPv4): Dotted-quad format, such as `10.1.2.0/24`. - The value of `prefix_length` must be less than or equal to the number after - the slash in `address` (24 in the example), which must be less than or equal - to 32. -+ CIDR (IPv6): Colon-separated format, such as `1:2::/48`. - The value of `prefix_length` must be less than or equal to the number after - the slash in `address` (48 in the example), which must be less than or equal - to 128. +**Return Data Type** -If this function receives a `NULL` input, it returns `NULL`. If the input is -considered invalid, an `OUT_OF_RANGE` error occurs. +`TIMESTAMP` -**Return Data Type** +**Examples** -STRING +```sql +SELECT TIMESTAMP("2008-12-25 15:30:00+00") AS timestamp_str; -[net-link-to-cidr-notation]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------* + | timestamp_str | + +---------------------------------------------+ + | 2008-12-25 07:30:00.000 America/Los_Angeles | + *---------------------------------------------*/ +``` -### `NET.PARSE_IP` (DEPRECATED) - +```sql +SELECT TIMESTAMP("2008-12-25 15:30:00", "America/Los_Angeles") AS timestamp_str; +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------* + | timestamp_str | + +---------------------------------------------+ + | 2008-12-25 15:30:00.000 America/Los_Angeles | + *---------------------------------------------*/ ``` -NET.PARSE_IP(addr_str) + +```sql +SELECT TIMESTAMP("2008-12-25 15:30:00 UTC") AS timestamp_str; + +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------* + | timestamp_str | + +---------------------------------------------+ + | 2008-12-25 07:30:00.000 America/Los_Angeles | + *---------------------------------------------*/ ``` -**Description** +```sql +SELECT TIMESTAMP(DATETIME "2008-12-25 15:30:00") AS timestamp_datetime; -This function is deprecated. It is the same as -[`NET.IPV4_TO_INT64`][net-link-to-ipv4-to-int64]`(`[`NET.IP_FROM_STRING`][net-link-to-ip-from-string]`(addr_str))`, -except that this function truncates the input at the first `'\x00'` character, -if any, while `NET.IP_FROM_STRING` treats `'\x00'` as invalid. +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------* + | timestamp_datetime | + +---------------------------------------------+ + | 2008-12-25 15:30:00.000 America/Los_Angeles | + *---------------------------------------------*/ +``` -**Return Data Type** +```sql +SELECT TIMESTAMP(DATE "2008-12-25") AS timestamp_date; -INT64 +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------* + | timestamp_date | + +---------------------------------------------+ + | 2008-12-25 00:00:00.000 America/Los_Angeles | + *---------------------------------------------*/ +``` -[net-link-to-ip-to-string]: #netip_to_string +[timestamp-literals]: https://github.com/google/zetasql/blob/master/docs/lexical.md#timestamp_literals -[net-link-to-ipv4-to-int64]: #netipv4_to_int64 +[timestamp-link-to-timezone-definitions]: #timezone_definitions -### `NET.PARSE_PACKED_IP` (DEPRECATED) - +### `TIMESTAMP_ADD` -``` -NET.PARSE_PACKED_IP(addr_str) +```sql +TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression date_part) ``` **Description** -This function is deprecated. It is the same as -[`NET.IP_FROM_STRING`][net-link-to-ip-from-string], except that this function truncates -the input at the first `'\x00'` character, if any, while `NET.IP_FROM_STRING` -treats `'\x00'` as invalid. +Adds `int64_expression` units of `date_part` to the timestamp, independent of +any time zone. -**Return Data Type** +`TIMESTAMP_ADD` supports the following values for `date_part`: -BYTES ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR`. Equivalent to 60 `MINUTE` parts. ++ `DAY`. Equivalent to 24 `HOUR` parts. -[net-link-to-ip-from-string]: #netip_from_string +**Return Data Types** -### `NET.PUBLIC_SUFFIX` +`TIMESTAMP` -``` -NET.PUBLIC_SUFFIX(url) -``` +**Example** -**Description** +```sql +SELECT + TIMESTAMP("2008-12-25 15:30:00+00") AS original, + TIMESTAMP_ADD(TIMESTAMP "2008-12-25 15:30:00+00", INTERVAL 10 MINUTE) AS later; -Takes a URL as a `STRING` value and returns the public suffix (such as `com`, -`org`, or `net`). A public suffix is an ICANN domain registered at -[publicsuffix.org][net-link-to-public-suffix]. For best results, URL values -should comply with the format as defined by -[RFC 3986][net-link-to-rfc-3986-appendix-a]. If the URL value does not comply -with RFC 3986 formatting, this function makes a best effort to parse the input -and return a relevant result. +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+---------------------------------------------* + | original | later | + +---------------------------------------------+---------------------------------------------+ + | 2008-12-25 07:30:00.000 America/Los_Angeles | 2008-12-25 07:40:00.000 America/Los_Angeles | + *---------------------------------------------+---------------------------------------------*/ +``` -This function returns `NULL` if any of the following is true: +### `TIMESTAMP_DIFF` -+ It cannot parse the host from the input; -+ The parsed host contains adjacent dots in the middle - (not leading or trailing); -+ The parsed host does not contain any public suffix. +```sql +TIMESTAMP_DIFF(timestamp_expression_a, timestamp_expression_b, date_part) +``` -Before looking up the public suffix, this function temporarily normalizes the -host by converting uppercase English letters to lowercase and encoding all -non-ASCII characters with [Punycode][net-link-to-punycode]. -The function then returns the public suffix as part of the original host instead -of the normalized host. +**Description** -Note: The function does not perform -[Unicode normalization][unicode-normalization]. +Returns the whole number of specified `date_part` intervals between two +timestamps (`timestamp_expression_a` - `timestamp_expression_b`). +If the first timestamp is earlier than the second one, +the output is negative. Produces an error if the computation overflows the +result type, such as if the difference in +nanoseconds +between the two timestamps would overflow an +`INT64` value. -Note: The public suffix data at -[publicsuffix.org][net-link-to-public-suffix] also contains -private domains. This function ignores the private domains. +`TIMESTAMP_DIFF` supports the following values for `date_part`: -Note: The public suffix data may change over time. Consequently, input that -produces a `NULL` result now may produce a non-`NULL` value in the future. ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR`. Equivalent to 60 `MINUTE`s. ++ `DAY`. Equivalent to 24 `HOUR`s. **Return Data Type** -`STRING` +`INT64` **Example** ```sql SELECT - FORMAT("%T", input) AS input, - description, - FORMAT("%T", NET.HOST(input)) AS host, - FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix, - FORMAT("%T", NET.REG_DOMAIN(input)) AS domain -FROM ( - SELECT "" AS input, "invalid input" AS description - UNION ALL SELECT "http://abc.xyz", "standard URL" - UNION ALL SELECT "//user:password@a.b:80/path?query", - "standard URL with relative scheme, port, path and query, but no public suffix" - UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host" - UNION ALL SELECT "http://例å­.å·ç­’纸.中国", "standard URL with internationalized domain name" - UNION ALL SELECT " www.Example.Co.UK ", - "non-standard URL with spaces, upper case letters, and without scheme" - UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported" -); + TIMESTAMP("2010-07-07 10:20:00+00") AS later_timestamp, + TIMESTAMP("2008-12-25 15:30:00+00") AS earlier_timestamp, + TIMESTAMP_DIFF(TIMESTAMP "2010-07-07 10:20:00+00", TIMESTAMP "2008-12-25 15:30:00+00", HOUR) AS hours; + +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+---------------------------------------------+-------* + | later_timestamp | earlier_timestamp | hours | + +---------------------------------------------+---------------------------------------------+-------+ + | 2010-07-07 03:20:00.000 America/Los_Angeles | 2008-12-25 07:30:00.000 America/Los_Angeles | 13410 | + *---------------------------------------------+---------------------------------------------+-------*/ ``` -| input | description | host | suffix | domain | -|--------------------------------------------------------------------|-------------------------------------------------------------------------------|--------------------|---------|----------------| -| "" | invalid input | NULL | NULL | NULL | -| "http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" | -| "//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL | -| "https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL | -| "http://例å­.å·ç­’纸.中国" | standard URL with internationalized domain name | "例å­.å·ç­’纸.中国" | "中国" | "å·ç­’纸.中国" | -| "    www.Example.Co.UK    "| non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK"| "Co.UK" | "Example.Co.UK | -| "mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL | +In the following example, the first timestamp occurs before the +second timestamp, resulting in a negative output. -[unicode-normalization]: https://en.wikipedia.org/wiki/Unicode_equivalence +```sql +SELECT TIMESTAMP_DIFF(TIMESTAMP "2018-08-14", TIMESTAMP "2018-10-14", DAY) AS negative_diff; -[net-link-to-punycode]: https://en.wikipedia.org/wiki/Punycode +/*---------------* + | negative_diff | + +---------------+ + | -61 | + *---------------+ +``` -[net-link-to-public-suffix]: https://publicsuffix.org/list/ +In this example, the result is 0 because only the number of whole specified +`HOUR` intervals are included. -[net-link-to-rfc-3986-appendix-a]: https://tools.ietf.org/html/rfc3986#appendix-A +```sql +SELECT TIMESTAMP_DIFF("2001-02-01 01:00:00", "2001-02-01 00:00:01", HOUR) AS diff; -### `NET.REG_DOMAIN` +/*---------------* + | diff | + +---------------+ + | 0 | + *---------------+ +``` + +### `TIMESTAMP_FROM_UNIX_MICROS` +```sql +TIMESTAMP_FROM_UNIX_MICROS(int64_expression) ``` -NET.REG_DOMAIN(url) + +```sql +TIMESTAMP_FROM_UNIX_MICROS(timestamp_expression) ``` **Description** -Takes a URL as a string and returns the registered or registrable domain (the -[public suffix](#netpublic_suffix) plus one preceding label), as a -string. For best results, URL values should comply with the format as defined by -[RFC 3986][net-link-to-rfc-3986-appendix-a]. If the URL value does not comply -with RFC 3986 formatting, this function makes a best effort to parse the input -and return a relevant result. - -This function returns `NULL` if any of the following is true: - -+ It cannot parse the host from the input; -+ The parsed host contains adjacent dots in the middle - (not leading or trailing); -+ The parsed host does not contain any public suffix; -+ The parsed host contains only a public suffix without any preceding label. +Interprets `int64_expression` as the number of microseconds since +1970-01-01 00:00:00 UTC and returns a timestamp. If a timestamp is passed in, +the same timestamp is returned. -Before looking up the public suffix, this function temporarily normalizes the -host by converting uppercase English letters to lowercase and encoding all -non-ASCII characters with [Punycode][net-link-to-punycode]. The function then -returns the registered or registerable domain as part of the original host -instead of the normalized host. +**Return Data Type** -Note: The function does not perform -[Unicode normalization][unicode-normalization]. +`TIMESTAMP` -Note: The public suffix data at -[publicsuffix.org][net-link-to-public-suffix] also contains -private domains. This function does not treat a private domain as a public -suffix. For example, if `us.com` is a private domain in the public suffix data, -`NET.REG_DOMAIN("foo.us.com")` returns `us.com` (the public suffix `com` plus -the preceding label `us`) rather than `foo.us.com` (the private domain `us.com` -plus the preceding label `foo`). +**Example** -Note: The public suffix data may change over time. -Consequently, input that produces a `NULL` result now may produce a non-`NULL` -value in the future. +```sql +SELECT TIMESTAMP_FROM_UNIX_MICROS(1230219000000000) AS timestamp_value; -**Return Data Type** +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*------------------------* + | timestamp_value | + +------------------------+ + | 2008-12-25 15:30:00+00 | + *------------------------*/ +``` -`STRING` +### `TIMESTAMP_FROM_UNIX_MILLIS` -**Example** +```sql +TIMESTAMP_FROM_UNIX_MILLIS(int64_expression) +``` ```sql -SELECT - FORMAT("%T", input) AS input, - description, - FORMAT("%T", NET.HOST(input)) AS host, - FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix, - FORMAT("%T", NET.REG_DOMAIN(input)) AS domain -FROM ( - SELECT "" AS input, "invalid input" AS description - UNION ALL SELECT "http://abc.xyz", "standard URL" - UNION ALL SELECT "//user:password@a.b:80/path?query", - "standard URL with relative scheme, port, path and query, but no public suffix" - UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host" - UNION ALL SELECT "http://例å­.å·ç­’纸.中国", "standard URL with internationalized domain name" - UNION ALL SELECT " www.Example.Co.UK ", - "non-standard URL with spaces, upper case letters, and without scheme" - UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported" -); +TIMESTAMP_FROM_UNIX_MILLIS(timestamp_expression) ``` -| input | description | host | suffix | domain | -|--------------------------------------------------------------------|-------------------------------------------------------------------------------|--------------------|---------|----------------| -| "" | invalid input | NULL | NULL | NULL | -| "http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" | -| "//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL | -| "https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL | -| "http://例å­.å·ç­’纸.中国" | standard URL with internationalized domain name | "例å­.å·ç­’纸.中国" | "中国" | "å·ç­’纸.中国" | -| "    www.Example.Co.UK    "| non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK"| "Co.UK" | "Example.Co.UK"| -| "mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL | +**Description** -[unicode-normalization]: https://en.wikipedia.org/wiki/Unicode_equivalence +Interprets `int64_expression` as the number of milliseconds since +1970-01-01 00:00:00 UTC and returns a timestamp. If a timestamp is passed in, +the same timestamp is returned. -[net-link-to-public-suffix]: https://publicsuffix.org/list/ +**Return Data Type** -[net-link-to-punycode]: https://en.wikipedia.org/wiki/Punycode +`TIMESTAMP` -[net-link-to-rfc-3986-appendix-a]: https://tools.ietf.org/html/rfc3986#appendix-A +**Example** -### `NET.SAFE_IP_FROM_STRING` +```sql +SELECT TIMESTAMP_FROM_UNIX_MILLIS(1230219000000) AS timestamp_value; + +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*------------------------* + | timestamp_value | + +------------------------+ + | 2008-12-25 15:30:00+00 | + *------------------------*/ +``` + +### `TIMESTAMP_FROM_UNIX_SECONDS` +```sql +TIMESTAMP_FROM_UNIX_SECONDS(int64_expression) ``` -NET.SAFE_IP_FROM_STRING(addr_str) + +```sql +TIMESTAMP_FROM_UNIX_SECONDS(timestamp_expression) ``` **Description** -Similar to [`NET.IP_FROM_STRING`][net-link-to-ip-from-string], but returns `NULL` -instead of throwing an error if the input is invalid. +Interprets `int64_expression` as the number of seconds since +1970-01-01 00:00:00 UTC and returns a timestamp. If a timestamp is passed in, +the same timestamp is returned. **Return Data Type** -BYTES +`TIMESTAMP` **Example** ```sql -SELECT - addr_str, - FORMAT("%T", NET.SAFE_IP_FROM_STRING(addr_str)) AS safe_ip_from_string -FROM UNNEST([ - '48.49.50.51', - '::1', - '3031:3233:3435:3637:3839:4041:4243:4445', - '::ffff:192.0.2.128', - '48.49.50.51/32', - '48.49.50', - '::wxyz' -]) AS addr_str; +SELECT TIMESTAMP_FROM_UNIX_SECONDS(1230219000) AS timestamp_value; -/*---------------------------------------------------------------------------------------------------------------* - | addr_str | safe_ip_from_string | - +---------------------------------------------------------------------------------------------------------------+ - | 48.49.50.51 | b"0123" | - | ::1 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" | - | 3031:3233:3435:3637:3839:4041:4243:4445 | b"0123456789@ABCDE" | - | ::ffff:192.0.2.128 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" | - | 48.49.50.51/32 | NULL | - | 48.49.50 | NULL | - | ::wxyz | NULL | - *---------------------------------------------------------------------------------------------------------------*/ +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*------------------------* + | timestamp_value | + +------------------------+ + | 2008-12-25 15:30:00+00 | + *------------------------*/ ``` -[net-link-to-ip-from-string]: #netip_from_string - -## Debugging functions - -ZetaSQL supports the following debugging functions. - -### Function list - - - - - - - - - - - - - - +```sql +TIMESTAMP_MICROS(int64_expression) +``` - - - - +Interprets `int64_expression` as the number of microseconds since 1970-01-01 +00:00:00 UTC and returns a timestamp. - - - - +`TIMESTAMP` - - - - +```sql +SELECT TIMESTAMP_MICROS(1230219000000000) AS timestamp_value; - -
NameSummary
ERROR +### `TIMESTAMP_MICROS` - - Produces an error with a custom error message. -
IFERROR +**Description** - - Evaluates a try expression, and if an evaluation error is produced, returns - the result of a catch expression. -
ISERROR +**Return Data Type** - - Evaluates a try expression, and if an evaluation error is produced, returns - TRUE. -
NULLIFERROR +**Example** - - Evaluates a try expression, and if an evaluation error is produced, returns - NULL. -
+-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*------------------------* + | timestamp_value | + +------------------------+ + | 2008-12-25 15:30:00+00 | + *------------------------*/ +``` -### `ERROR` +### `TIMESTAMP_MILLIS` ```sql -ERROR(error_message) +TIMESTAMP_MILLIS(int64_expression) ``` **Description** -Returns an error. The `error_message` argument is a `STRING`. - -ZetaSQL treats `ERROR` in the same way as any expression that may -result in an error: there is no special guarantee of evaluation order. +Interprets `int64_expression` as the number of milliseconds since 1970-01-01 +00:00:00 UTC and returns a timestamp. **Return Data Type** -ZetaSQL infers the return type in context. - -**Examples** +`TIMESTAMP` -In the following example, the query returns an error message if the value of the -row does not match one of two defined values. +**Example** ```sql -SELECT - CASE - WHEN value = 'foo' THEN 'Value is foo.' - WHEN value = 'bar' THEN 'Value is bar.' - ELSE ERROR(CONCAT('Found unexpected value: ', value)) - END AS new_value -FROM ( - SELECT 'foo' AS value UNION ALL - SELECT 'bar' AS value UNION ALL - SELECT 'baz' AS value); +SELECT TIMESTAMP_MILLIS(1230219000000) AS timestamp_value; --- Found unexpected value: baz +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*------------------------* + | timestamp_value | + +------------------------+ + | 2008-12-25 15:30:00+00 | + *------------------------*/ ``` -In the following example, ZetaSQL may evaluate the `ERROR` function -before or after the `x > 0` condition, because ZetaSQL -generally provides no ordering guarantees between `WHERE` clause conditions and -there are no special guarantees for the `ERROR` function. +### `TIMESTAMP_SECONDS` ```sql -SELECT * -FROM (SELECT -1 AS x) -WHERE x > 0 AND ERROR('Example error'); +TIMESTAMP_SECONDS(int64_expression) ``` -In the next example, the `WHERE` clause evaluates an `IF` condition, which -ensures that ZetaSQL only evaluates the `ERROR` function if the -condition fails. +**Description** + +Interprets `int64_expression` as the number of seconds since 1970-01-01 00:00:00 +UTC and returns a timestamp. + +**Return Data Type** + +`TIMESTAMP` + +**Example** ```sql -SELECT * -FROM (SELECT -1 AS x) -WHERE IF(x > 0, true, ERROR(FORMAT('Error: x must be positive but is %t', x))); +SELECT TIMESTAMP_SECONDS(1230219000) AS timestamp_value; --- Error: x must be positive but is -1 +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*------------------------* + | timestamp_value | + +------------------------+ + | 2008-12-25 15:30:00+00 | + *------------------------*/ ``` -### `IFERROR` +### `TIMESTAMP_SUB` ```sql -IFERROR(try_expression, catch_expression) +TIMESTAMP_SUB(timestamp_expression, INTERVAL int64_expression date_part) ``` **Description** -Evaluates `try_expression`. - -When `try_expression` is evaluated: - -+ If the evaluation of `try_expression` does not produce an error, then - `IFERROR` returns the result of `try_expression` without evaluating - `catch_expression`. -+ If the evaluation of `try_expression` produces a system error, then `IFERROR` - produces that system error. -+ If the evaluation of `try_expression` produces an evaluation error, then - `IFERROR` suppresses that evaluation error and evaluates `catch_expression`. - -If `catch_expression` is evaluated: - -+ If the evaluation of `catch_expression` does not produce an error, then - `IFERROR` returns the result of `catch_expression`. -+ If the evaluation of `catch_expression` produces any error, then `IFERROR` - produces that error. - -**Arguments** +Subtracts `int64_expression` units of `date_part` from the timestamp, +independent of any time zone. -+ `try_expression`: An expression that returns a scalar value. -+ `catch_expression`: An expression that returns a scalar value. +`TIMESTAMP_SUB` supports the following values for `date_part`: -The results of `try_expression` and `catch_expression` must share a -[supertype][supertype]. ++ `NANOSECOND` + (if the SQL engine supports it) ++ `MICROSECOND` ++ `MILLISECOND` ++ `SECOND` ++ `MINUTE` ++ `HOUR`. Equivalent to 60 `MINUTE` parts. ++ `DAY`. Equivalent to 24 `HOUR` parts. **Return Data Type** -The [supertype][supertype] for `try_expression` and -`catch_expression`. +`TIMESTAMP` **Example** -In the following examples, the query successfully evaluates `try_expression`. - ```sql -SELECT IFERROR('a', 'b') AS result +SELECT + TIMESTAMP("2008-12-25 15:30:00+00") AS original, + TIMESTAMP_SUB(TIMESTAMP "2008-12-25 15:30:00+00", INTERVAL 10 MINUTE) AS earlier; -/*--------* - | result | - +--------+ - | a | - *--------*/ +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+---------------------------------------------* + | original | earlier | + +---------------------------------------------+---------------------------------------------+ + | 2008-12-25 07:30:00.000 America/Los_Angeles | 2008-12-25 07:20:00.000 America/Los_Angeles | + *---------------------------------------------+---------------------------------------------*/ ``` -```sql -SELECT IFERROR((SELECT [1,2,3][OFFSET(0)]), -1) AS result +### `TIMESTAMP_TRUNC` -/*--------* - | result | - +--------+ - | 1 | - *--------*/ +```sql +TIMESTAMP_TRUNC(timestamp_expression, date_time_part[, time_zone]) ``` -In the following examples, `IFERROR` catches an evaluation error in the -`try_expression` and successfully evaluates `catch_expression`. +**Description** -```sql -SELECT IFERROR(ERROR('a'), 'b') AS result +Truncates a timestamp to the granularity of `date_time_part`. +The timestamp is always rounded to the beginning of `date_time_part`, +which can be one of the following: -/*--------* - | result | - +--------+ - | b | - *--------*/ -``` ++ `NANOSECOND`: If used, nothing is truncated from the value. ++ `MICROSECOND`: The nearest lessor or equal microsecond. ++ `MILLISECOND`: The nearest lessor or equal millisecond. ++ `SECOND`: The nearest lessor or equal second. ++ `MINUTE`: The nearest lessor or equal minute. ++ `HOUR`: The nearest lessor or equal hour. ++ `DAY`: The day in the Gregorian calendar year that contains the + `TIMESTAMP` value. ++ `WEEK`: The first day of the week in the week that contains the + `TIMESTAMP` value. Weeks begin on Sundays. `WEEK` is equivalent to + `WEEK(SUNDAY)`. ++ `WEEK(WEEKDAY)`: The first day of the week in the week that contains the + `TIMESTAMP` value. Weeks begin on `WEEKDAY`. `WEEKDAY` must be one of the + following: `SUNDAY`, `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, + or `SATURDAY`. ++ `ISOWEEK`: The first day of the [ISO 8601 week][ISO-8601-week] in the + ISO week that contains the `TIMESTAMP` value. The ISO week begins on + Monday. The first ISO week of each ISO year contains the first Thursday of the + corresponding Gregorian calendar year. ++ `MONTH`: The first day of the month in the month that contains the + `TIMESTAMP` value. ++ `QUARTER`: The first day of the quarter in the quarter that contains the + `TIMESTAMP` value. ++ `YEAR`: The first day of the year in the year that contains the + `TIMESTAMP` value. ++ `ISOYEAR`: The first day of the [ISO 8601][ISO-8601] week-numbering year + in the ISO year that contains the `TIMESTAMP` value. The ISO year is the + Monday of the first week whose Thursday belongs to the corresponding + Gregorian calendar year. -```sql -SELECT IFERROR((SELECT [1,2,3][OFFSET(9)]), -1) AS result + -/*--------* - | result | - +--------+ - | -1 | - *--------*/ -``` +[ISO-8601]: https://en.wikipedia.org/wiki/ISO_8601 -In the following query, the error is handled by the innermost `IFERROR` -operation, `IFERROR(ERROR('a'), 'b')`. +[ISO-8601-week]: https://en.wikipedia.org/wiki/ISO_week_date -```sql -SELECT IFERROR(IFERROR(ERROR('a'), 'b'), 'c') AS result + -/*--------* - | result | - +--------+ - | b | - *--------*/ -``` +`TIMESTAMP_TRUNC` function supports an optional `time_zone` parameter. This +parameter applies to the following `date_time_part`: -In the following query, the error is handled by the outermost `IFERROR` -operation, `IFERROR(..., 'c')`. ++ `MINUTE` ++ `HOUR` ++ `DAY` ++ `WEEK` ++ `WEEK()` ++ `ISOWEEK` ++ `MONTH` ++ `QUARTER` ++ `YEAR` ++ `ISOYEAR` + +Use this parameter if you want to use a time zone other than the +default time zone, which is implementation defined, as part of the +truncate operation. + +When truncating a timestamp to `MINUTE` +or`HOUR` parts, `TIMESTAMP_TRUNC` determines the civil time of the +timestamp in the specified (or default) time zone +and subtracts the minutes and seconds (when truncating to `HOUR`) or the seconds +(when truncating to `MINUTE`) from that timestamp. +While this provides intuitive results in most cases, the result is +non-intuitive near daylight savings transitions that are not hour-aligned. + +**Return Data Type** + +`TIMESTAMP` + +**Examples** ```sql -SELECT IFERROR(IFERROR(ERROR('a'), ERROR('b')), 'c') AS result +SELECT + TIMESTAMP_TRUNC(TIMESTAMP "2008-12-25 15:30:00+00", DAY, "UTC") AS utc, + TIMESTAMP_TRUNC(TIMESTAMP "2008-12-25 15:30:00+00", DAY, "America/Los_Angeles") AS la; -/*--------* - | result | - +--------+ - | c | - *--------*/ +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+---------------------------------------------* + | utc | la | + +---------------------------------------------+---------------------------------------------+ + | 2008-12-24 16:00:00.000 America/Los_Angeles | 2008-12-25 00:00:00.000 America/Los_Angeles | + *---------------------------------------------+---------------------------------------------*/ ``` -In the following example, an evaluation error is produced because the subquery -passed in as the `try_expression` evaluates to a table, not a scalar value. +In the following example, `timestamp_expression` has a time zone offset of +12. +The first column shows the `timestamp_expression` in UTC time. The second +column shows the output of `TIMESTAMP_TRUNC` using weeks that start on Monday. +Because the `timestamp_expression` falls on a Sunday in UTC, `TIMESTAMP_TRUNC` +truncates it to the preceding Monday. The third column shows the same function +with the optional [Time zone definition][timestamp-link-to-timezone-definitions] +argument 'Pacific/Auckland'. Here, the function truncates the +`timestamp_expression` using New Zealand Daylight Time, where it falls on a +Monday. ```sql -SELECT IFERROR((SELECT e FROM UNNEST([1, 2]) AS e), 3) AS result +SELECT + timestamp_value AS timestamp_value, + TIMESTAMP_TRUNC(timestamp_value, WEEK(MONDAY), "UTC") AS utc_truncated, + TIMESTAMP_TRUNC(timestamp_value, WEEK(MONDAY), "Pacific/Auckland") AS nzdt_truncated +FROM (SELECT TIMESTAMP("2017-11-06 00:00:00+12") AS timestamp_value); -/*--------* - | result | - +--------+ - | 3 | - *--------*/ +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+---------------------------------------------+---------------------------------------------* + | timestamp_value | utc_truncated | nzdt_truncated | + +---------------------------------------------+---------------------------------------------+---------------------------------------------+ + | 2017-11-05 04:00:00.000 America/Los_Angeles | 2017-10-29 17:00:00.000 America/Los_Angeles | 2017-11-05 03:00:00.000 America/Los_Angeles | + *---------------------------------------------+---------------------------------------------+---------------------------------------------*/ ``` -In the following example, `IFERROR` catches an evaluation error in `ERROR('a')` -and then evaluates `ERROR('b')`. Because there is also an evaluation error in -`ERROR('b')`, `IFERROR` produces an evaluation error for `ERROR('b')`. +In the following example, the original `timestamp_expression` is in the +Gregorian calendar year 2015. However, `TIMESTAMP_TRUNC` with the `ISOYEAR` date +part truncates the `timestamp_expression` to the beginning of the ISO year, not +the Gregorian calendar year. The first Thursday of the 2015 calendar year was +2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29. +Therefore the ISO year boundary preceding the `timestamp_expression` +2015-06-15 00:00:00+00 is 2014-12-29. ```sql -SELECT IFERROR(ERROR('a'), ERROR('b')) AS result +SELECT + TIMESTAMP_TRUNC("2015-06-15 00:00:00+00", ISOYEAR) AS isoyear_boundary, + EXTRACT(ISOYEAR FROM TIMESTAMP "2015-06-15 00:00:00+00") AS isoyear_number; ---ERROR: OUT_OF_RANGE 'b' +-- Display of results may differ, depending upon the environment and time zone where this query was executed. +/*---------------------------------------------+----------------* + | isoyear_boundary | isoyear_number | + +---------------------------------------------+----------------+ + | 2014-12-29 00:00:00.000 America/Los_Angeles | 2015 | + *---------------------------------------------+----------------*/ ``` -[supertype]: https://github.com/google/zetasql/blob/master/docs/conversion_rules.md#supertypes +[timestamp-link-to-timezone-definitions]: #timezone_definitions -### `ISERROR` +### `UNIX_MICROS` ```sql -ISERROR(try_expression) +UNIX_MICROS(timestamp_expression) ``` **Description** -Evaluates `try_expression`. - -+ If the evaluation of `try_expression` does not produce an error, then - `ISERROR` returns `FALSE`. -+ If the evaluation of `try_expression` produces a system error, then `ISERROR` - produces that system error. -+ If the evaluation of `try_expression` produces an evaluation error, then - `ISERROR` returns `TRUE`. - -**Arguments** - -+ `try_expression`: An expression that returns a scalar value. +Returns the number of microseconds since `1970-01-01 00:00:00 UTC`. +Truncates higher levels of precision by +rounding down to the beginning of the microsecond. **Return Data Type** -`BOOL` - -**Example** +`INT64` -In the following examples, `ISERROR` successfully evaluates `try_expression`. +**Examples** ```sql -SELECT ISERROR('a') AS is_error +SELECT UNIX_MICROS(TIMESTAMP "2008-12-25 15:30:00+00") AS micros; -/*----------* - | is_error | - +----------+ - | false | - *----------*/ +/*------------------* + | micros | + +------------------+ + | 1230219000000000 | + *------------------*/ ``` ```sql -SELECT ISERROR(2/1) AS is_error +SELECT UNIX_MICROS(TIMESTAMP "1970-01-01 00:00:00.0000018+00") AS micros; -/*----------* - | is_error | - +----------+ - | false | - *----------*/ +/*------------------* + | micros | + +------------------+ + | 1 | + *------------------*/ ``` -```sql -SELECT ISERROR((SELECT [1,2,3][OFFSET(0)])) AS is_error +### `UNIX_MILLIS` -/*----------* - | is_error | - +----------+ - | false | - *----------*/ +```sql +UNIX_MILLIS(timestamp_expression) ``` -In the following examples, `ISERROR` catches an evaluation error in -`try_expression`. +**Description** -```sql -SELECT ISERROR(ERROR('a')) AS is_error +Returns the number of milliseconds since `1970-01-01 00:00:00 UTC`. Truncates +higher levels of precision by rounding down to the beginning of the millisecond. -/*----------* - | is_error | - +----------+ - | true | - *----------*/ -``` +**Return Data Type** -```sql -SELECT ISERROR(2/0) AS is_error +`INT64` -/*----------* - | is_error | - +----------+ - | true | - *----------*/ -``` +**Examples** ```sql -SELECT ISERROR((SELECT [1,2,3][OFFSET(9)])) AS is_error +SELECT UNIX_MILLIS(TIMESTAMP "2008-12-25 15:30:00+00") AS millis; -/*----------* - | is_error | - +----------+ - | true | - *----------*/ +/*---------------* + | millis | + +---------------+ + | 1230219000000 | + *---------------*/ ``` -In the following example, an evaluation error is produced because the subquery -passed in as `try_expression` evaluates to a table, not a scalar value. - ```sql -SELECT ISERROR((SELECT e FROM UNNEST([1, 2]) AS e)) AS is_error +SELECT UNIX_MILLIS(TIMESTAMP "1970-01-01 00:00:00.0018+00") AS millis; -/*----------* - | is_error | - +----------+ - | true | - *----------*/ +/*---------------* + | millis | + +---------------+ + | 1 | + *---------------*/ ``` -### `NULLIFERROR` +### `UNIX_SECONDS` ```sql -NULLIFERROR(try_expression) +UNIX_SECONDS(timestamp_expression) ``` -**Description** - -Evaluates `try_expression`. - -+ If the evaluation of `try_expression` does not produce an error, then - `NULLIFERROR` returns the result of `try_expression`. -+ If the evaluation of `try_expression` produces a system error, then - `NULLIFERROR` produces that system error. - -+ If the evaluation of `try_expression` produces an evaluation error, then - `NULLIFERROR` returns `NULL`. -**Arguments** +**Description** -+ `try_expression`: An expression that returns a scalar value. +Returns the number of seconds since `1970-01-01 00:00:00 UTC`. Truncates higher +levels of precision by rounding down to the beginning of the second. **Return Data Type** -The data type for `try_expression` or `NULL` - -**Example** +`INT64` -In the following examples, `NULLIFERROR` successfully evaluates -`try_expression`. +**Examples** ```sql -SELECT NULLIFERROR('a') AS result +SELECT UNIX_SECONDS(TIMESTAMP "2008-12-25 15:30:00+00") AS seconds; -/*--------* - | result | - +--------+ - | a | - *--------*/ +/*------------* + | seconds | + +------------+ + | 1230219000 | + *------------*/ ``` ```sql -SELECT NULLIFERROR((SELECT [1,2,3][OFFSET(0)])) AS result +SELECT UNIX_SECONDS(TIMESTAMP "1970-01-01 00:00:01.8+00") AS seconds; -/*--------* - | result | - +--------+ - | 1 | - *--------*/ +/*------------* + | seconds | + +------------+ + | 1 | + *------------*/ ``` -In the following examples, `NULLIFERROR` catches an evaluation error in -`try_expression`. +### How time zones work with timestamp functions + -```sql -SELECT NULLIFERROR(ERROR('a')) AS result +A timestamp represents an absolute point in time, independent of any time +zone. However, when a timestamp value is displayed, it is usually converted to +a human-readable format consisting of a civil date and time +(YYYY-MM-DD HH:MM:SS) +and a time zone. This is not the internal representation of the +`TIMESTAMP`; it is only a human-understandable way to describe the point in time +that the timestamp represents. -/*--------* - | result | - +--------+ - | NULL | - *--------*/ -``` +Some timestamp functions have a time zone argument. A time zone is needed to +convert between civil time (YYYY-MM-DD HH:MM:SS) and the absolute time +represented by a timestamp. +A function like `PARSE_TIMESTAMP` takes an input string that represents a +civil time and returns a timestamp that represents an absolute time. A +time zone is needed for this conversion. A function like `EXTRACT` takes an +input timestamp (absolute time) and converts it to civil time in order to +extract a part of that civil time. This conversion requires a time zone. +If no time zone is specified, the default time zone, which is implementation defined, +is used. -```sql -SELECT NULLIFERROR((SELECT [1,2,3][OFFSET(9)])) AS result +Certain date and timestamp functions allow you to override the default time zone +and specify a different one. You can specify a time zone by either supplying +the time zone name (for example, `America/Los_Angeles`) +or time zone offset from UTC (for example, -08). -/*--------* - | result | - +--------+ - | NULL | - *--------*/ -``` +To learn more about how time zones work with the `TIMESTAMP` type, see +[Time zones][data-types-timezones]. -In the following example, an evaluation error is produced because the subquery -passed in as `try_expression` evaluates to a table, not a scalar value. +[timezone-by-name]: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones -```sql -SELECT NULLIFERROR((SELECT e FROM UNNEST([1, 2]) AS e)) AS result +[data-types-timezones]: https://github.com/google/zetasql/blob/master/docs/data-types.md#time_zones -/*--------* - | result | - +--------+ - | NULL | - *--------*/ -``` +[timestamp-link-to-timezone-definitions]: #timezone_definitions + +[data-types-link-to-date_type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#date_type + +[data-types-link-to-timestamp_type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#timestamp_type diff --git a/docs/geography_functions.md b/docs/geography_functions.md index 123f178eb..34f1ab625 100644 --- a/docs/geography_functions.md +++ b/docs/geography_functions.md @@ -89,6 +89,7 @@ behavior: ST_EXTERIORRING
ST_INTERIORRINGS
ST_INTERSECTION
+ ST_LINEINTERPOLATEPOINT
ST_LINESUBSTRING
ST_SIMPLIFY
ST_SNAPTOGRID
@@ -668,6 +669,16 @@ behavior: + + ST_LINEINTERPOLATEPOINT + + + + Gets a point at a specific fraction in a linestring GEOGRAPHY + value. + + + ST_LINELOCATEPOINT @@ -1225,13 +1236,13 @@ determine how much the resulting geography can deviate from the ideal buffer radius. + `geography`: The input `GEOGRAPHY` to encircle with the buffer radius. -+ `buffer_radius`: `DOUBLE` that represents the radius of the buffer - around the input geography. The radius is in meters. Note that polygons - contract when buffered with a negative `buffer_radius`. Polygon shells and - holes that are contracted to a point are discarded. -+ `num_seg_quarter_circle`: (Optional) `DOUBLE` specifies the number of - segments that are used to approximate a quarter circle. The default value is - `8.0`. Naming this argument is optional. ++ `buffer_radius`: `DOUBLE` that represents the radius of the + buffer around the input geography. The radius is in meters. Note that + polygons contract when buffered with a negative `buffer_radius`. Polygon + shells and holes that are contracted to a point are discarded. ++ `num_seg_quarter_circle`: (Optional) `DOUBLE` specifies the + number of segments that are used to approximate a quarter circle. The + default value is `8.0`. Naming this argument is optional. + `endcap`: (Optional) `STRING` allows you to specify one of two endcap styles: `ROUND` and `FLAT`. The default value is `ROUND`. This option only affects the endcaps of buffered linestrings. @@ -1296,13 +1307,13 @@ but you provide tolerance instead of segments to determine how much the resulting geography can deviate from the ideal buffer radius. + `geography`: The input `GEOGRAPHY` to encircle with the buffer radius. -+ `buffer_radius`: `DOUBLE` that represents the radius of the buffer - around the input geography. The radius is in meters. Note that polygons - contract when buffered with a negative `buffer_radius`. Polygon shells - and holes that are contracted to a point are discarded. -+ `tolerance_meters`: `DOUBLE` specifies a tolerance in meters with - which the shape is approximated. Tolerance determines how much a polygon can - deviate from the ideal radius. Naming this argument is optional. ++ `buffer_radius`: `DOUBLE` that represents the radius of the + buffer around the input geography. The radius is in meters. Note that + polygons contract when buffered with a negative `buffer_radius`. Polygon + shells and holes that are contracted to a point are discarded. ++ `tolerance_meters`: `DOUBLE` specifies a tolerance in + meters with which the shape is approximated. Tolerance determines how much a + polygon can deviate from the ideal radius. Naming this argument is optional. + `endcap`: (Optional) `STRING` allows you to specify one of two endcap styles: `ROUND` and `FLAT`. The default value is `ROUND`. This option only affects the endcaps of buffered linestrings. @@ -1834,7 +1845,7 @@ FROM example | GEOMETRYCOLLECTION(POINT(0 0), | [POINT(0 0), LINESTRING(1 2, 2 1)] | | LINESTRING(1 2, 2 1)) | | *-------------------------------------+------------------------------------*/ - ``` +``` The following example shows how `ST_DUMP` with the dimension argument only returns simple geographies of the given dimension. @@ -2991,6 +3002,66 @@ the value `FALSE`. The default value of `use_spheroid` is `FALSE`. [wgs84-link]: https://en.wikipedia.org/wiki/World_Geodetic_System +### `ST_LINEINTERPOLATEPOINT` + +```sql +ST_LINEINTERPOLATEPOINT(linestring_geography, fraction) +``` + +**Description** + +Gets a point at a specific fraction in a linestring GEOGRAPHY +value. + +**Definitions** + ++ `linestring_geography`: A linestring `GEOGRAPHY` on which the target point + is located. ++ `fraction`: A `DOUBLE` value that represents a fraction + along the linestring `GEOGRAPHY` where the target point is located. + This should be an inclusive value between `0` (start of the + linestring) and `1` (end of the linestring). + +**Details** + ++ Returns `NULL` if any input argument is `NULL`. ++ Returns an empty geography if `linestring_geography` is an empty geography. ++ Returns an error if `linestring_geography` is not a linestring or an empty + geography, or if `fraction` is outside the `[0, 1]` range. + +**Return Type** + +`GEOGRAPHY` + +**Example** + +The following query returns a few points on a linestring. Notice that the + midpoint of the linestring `LINESTRING(1 1, 5 5)` is slightly different from + `POINT(3 3)` because the `GEOGRAPHY` type uses geodesic line segments. + +```sql +WITH fractions AS ( + SELECT 0 AS fraction UNION ALL + SELECT 0.5 UNION ALL + SELECT 1 UNION ALL + SELECT NULL + ) +SELECT + fraction, + ST_LINEINTERPOLATEPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 5 5)'), fraction) + AS point +FROM fractions + +/*-------------+-------------------------------------------* + | fraction | point | + +-------------+-------------------------------------------+ + | 0 | POINT(1 1) | + | 0.5 | POINT(2.99633827268976 3.00182528336078) | + | 1 | POINT(5 5) | + | NULL | NULL | + *-------------+-------------------------------------------*/ +``` + ### `ST_LINELOCATEPOINT` ```sql diff --git a/docs/json_functions.md b/docs/json_functions.md index c98f2168f..7fd5a6f3a 100644 --- a/docs/json_functions.md +++ b/docs/json_functions.md @@ -2189,6 +2189,7 @@ SELECT JSON_REMOVE(JSON 'null', '$.a.b') AS json_data JSON_SET( json_expr, json_path_value_pair[, ...] + [, create_if_missing=> { TRUE | FALSE }] ) json_path_value_pair: @@ -2213,35 +2214,42 @@ Arguments: + `value`: A [JSON encoding-supported][json-encodings] value to insert. ++ `create_if_missing`: An optional, mandatory named argument. + + + If TRUE (default), replaces or inserts data if the path does not exist. + + + If FALSE, only _existing_ JSONPath values are replaced. If the path + doesn't exist, the set operation is ignored. Details: + Path value pairs are evaluated left to right. The JSON produced by evaluating one pair becomes the JSON against which the next pair is evaluated. -+ If a path doesn't exist, the remainder of the path is recursively created. + If a matched path has an existing value, it overwrites the existing data with `value`. ++ If `create_if_missing` is `TRUE`: + + + If a path doesn't exist, the remainder of the path is recursively + created. + + If the matched path prefix points to a JSON null, the remainder of the + path is recursively created, and `value` is inserted. + + If a path token points to a JSON array and the specified index is + _larger_ than the size of the array, pads the JSON array with JSON + nulls, recursively creates the remainder of the path at the specified + index, and inserts the path value pair. + This function applies all path value pair set operations even if an individual path value pair operation is invalid. For invalid operations, the operation is ignored and the function continues to process the rest of the path value pairs. + If the path exists but has an incompatible type at any given path token, no update happens for that specific path value pair. -+ If the matched path prefix points to a JSON null, the remainder of the - path is recursively created, and `value` is inserted. -+ If a path token points to a JSON array and the specified - index is _larger_ than the size of the array, pads the JSON array with - JSON nulls, recursively creates the remainder of the path at the specified - index, and inserts the path value pair. -+ If a matched path points to a JSON array and the specified index is - _less than_ the length of the array, replaces the existing JSON array value - at index with `value`. + If any `json_path` is an invalid [JSONPath][JSONPath-format], an error is produced. + If `json_expr` is SQL `NULL`, the function returns SQL `NULL`. + If `json_path` is SQL `NULL`, the `json_path_value_pair` operation is ignored. ++ If `create_if_missing` is SQL `NULL`, the set operation is ignored. **Return type** @@ -2262,6 +2270,38 @@ SELECT JSON_SET(JSON '{"a": 1}', '$', JSON '{"b": 2, "c": 3}') AS json_data *---------------*/ ``` +In the following example, `create_if_missing` is `FALSE` and the path `$.b` +doesn't exist, so the set operation is ignored. + +```sql +SELECT JSON_SET( + JSON '{"a": 1}', + "$.b", 999, + create_if_missing => false) AS json_data + +/*------------* + | json_data | + +------------+ + | '{"a": 1}' | + *------------*/ +``` + +In the following example, `create_if_missing` is `TRUE` and the path `$.a` +exists, so the value is replaced. + +```sql +SELECT JSON_SET( + JSON '{"a": 1}', + "$.a", 999, + create_if_missing => false) AS json_data + +/*--------------* + | json_data | + +--------------+ + | '{"a": 999}' | + *--------------*/ +``` + In the following example, the path `$.a` is matched, but `$.a.b` does not exist, so the new path and the value are inserted. @@ -3202,8 +3242,8 @@ Details: string - If the JSON string represents a JSON number, parses it as a - BIGNUMERIC value, and then safe casts the result as a + If the JSON string represents a JSON number, parses it as + a BIGNUMERIC value, and then safe casts the result as a DOUBLE value. If the JSON string can't be converted, returns NULL. @@ -3425,8 +3465,8 @@ Details: string - If the JSON string represents a JSON number, parses it as a - BIGNUMERIC value, and then safe casts the results as an + If the JSON string represents a JSON number, parses it as + a BIGNUMERIC value, and then safe casts the results as an INT64 value. If the JSON string can't be converted, returns NULL. @@ -3637,13 +3677,13 @@ Details: string - Returns the JSON string as a STRING value. + Returns the JSON string as a STRING value. number - Returns the JSON number as a STRING value. + Returns the JSON number as a STRING value. diff --git a/docs/lexical.md b/docs/lexical.md index 1b9904dae..592a72b08 100644 --- a/docs/lexical.md +++ b/docs/lexical.md @@ -204,6 +204,170 @@ protocol buffer message, or JSON object. A literal represents a constant value of a built-in data type. Some, but not all, data types can be expressed as literals. +### Tokens in literals + +A literal can contain one or more tokens. For example: + + ```sql + -- This date literal has one token: '2014-01-31' + SELECT DATE '2014-01-31' + ``` + + ```sql + -- This date literal has three tokens: '2014', '-01', and '-31' + SELECT DATE '2014' '-01' '-31' + ``` + +When a literal contains multiple tokens, the tokens must be separated by +whitespace, comments, or both. For example, the following date literals +produce the same results: + + ```sql + SELECT DATE '2014-01-31' + ``` + + ```sql + SELECT DATE '2014' '-01' '-31' + ``` + + ```sql + SELECT DATE /* year */ '2014' /* month */ '-01' /* day */ '-31' + ``` + + ```sql + SELECT DATE /* year and month */ '2014' '-01' /* day */ '-31' + ``` + +A token can be a `STRING` type or a `BYTES` type. String tokens can only be +used with string tokens and bytes tokens can only be used with +bytes tokens. If you try to use them together in a literal, an error is +produced. For example: + + ```sql + -- The following string literal contains string tokens. + SELECT 'x' 'y' 'z' + ``` + + ```sql + -- The following bytes literal contains bytes tokens. + SELECT b'x' b'y' b'z' + ``` + + ```sql + -- Error: string and bytes tokens can't be used together in the same literal. + SELECT 'x' b'y' + ``` + +String tokens can be one of the following +[format types][quoted-literals] and used together: + + + Quoted string + + Triple-quoted string + + Raw string + + If a raw string is used, it's applied to the immediate token, but not + to the results. + + Examples: + + ```sql + -- Compatible format types can be used together in a string literal. + SELECT 'abc' "d" '''ef''' + + /*--------+ + | abcdef | + +--------*/ + ``` + + ```sql + -- \n is escaped in the raw string token but not in the quoted string token. + SELECT '\na' r"\n" + + /*-----+ + | | + | a/n | + +-----*/ + ``` + +Bytes tokens can be one of the following +[format types][quoted-literals] and used together: + + + Bytes + + Raw bytes + + If raw bytes are used, they're applied to the immediate token, but not to + the results. + + Examples: + + ```sql + -- Compatible format types can be used together in a bytes literal. + SELECT b'\x41' b'''\x42''' b"""\x41""" + + /*-----+ + | ABA | + +-----*/ + ``` + + ```sql + -- Control characters are escaped in the raw bytes tokens but not in the + -- bytes token. + SELECT b'\x41' RB'\x42' br'\x41' + + /*-------------+ + | A\\x42\\x41 | + +-------------*/ + ``` + +Additional examples: + +```sql +-- The following JSON literal is equivalent to: JSON '{"name":"my_file.md","regex":"\\d+"}' +SELECT JSON '{"name": "my_file.md", "regex": ' /*start*/ r' "\\d+"' /*end*/ '}' + +/*--------------------------------------+ + | {"name":"my_file.md","regex":"\\d+"} | + +--------------------------------------*/ +``` + +```sql +-- The following NUMERIC literal is equivalent to: NUMERIC '-1.2' +SELECT NUMERIC '-' "1" '''.''' r'2' + +/*------+ + | -1.2 | + +------*/ +``` + +```sql +-- The following NUMERIC literal is equivalent to: NUMERIC '1.23e-6 ' +SELECT NUMERIC "1" '''.'''' r'23' 'e-6' + +/*------------+ + | 0.00000123 | + +------------*/ +``` + +```sql +-- The following DATE literal is equivalent to: DATE '2014-01-31' +SELECT DATE /* year */ '2014' /* month and day */ "-01-31" + +/*------------+ + | 2014-01-31 | + +------------*/ +``` + +```sql +-- Error: Illegal escape sequence found in '\def'. +SELECT r'abc' '\def' +``` + +```sql +-- Error: backticks are reserved for quoted identifiers and not a valid +-- format type. +SELECT `abc` `def` AS results; +``` + ### String and bytes literals @@ -720,6 +884,81 @@ TIMESTAMP '2014-09-27 12:30:00 America/Los_Angeles' TIMESTAMP '2014-09-27 12:30:00 America/Argentina/Buenos_Aires' ``` +### Range literals + +Syntax: + +```sql +RANGE '[lower_bound, upper_bound)' +``` + +A range literal contains a contiguous range between two +[dates][date-data-type], [datetimes][datetime-data-type], or +[timestamps][timestamp-data-type]. The lower or upper bound can be unbounded, +if desired. + +Example of a date range literal with a lower and upper bound: + +```sql +RANGE '[2020-01-01, 2020-12-31)' +``` + +Example of a datetime range literal with a lower and upper bound: + +```sql +RANGE '[2020-01-01 12:00:00, 2020-12-31 12:00:00)' +``` + +Example of a timestamp range literal with a lower and upper bound: + +```sql +RANGE '[2020-10-01 12:00:00+08, 2020-12-31 12:00:00+08)' +``` + +Examples of a range literal without a lower bound: + +```sql +RANGE '[UNBOUNDED, 2020-12-31)' +``` +```sql +RANGE '[NULL, 2020-12-31)' +``` + +Examples of a range literal without an upper bound: + +```sql +RANGE '[2020-01-01, UNBOUNDED)' +``` +```sql +RANGE '[2020-01-01, NULL)' +``` + +Examples of a range literal that includes all possible values: + +```sql +RANGE '[UNBOUNDED, UNBOUNDED)' +``` + +```sql +RANGE '[NULL, NULL)' +``` + +There must be a single whitespace after the comma in a range literal, otherwise +an error is produced. For example: + +```sql +-- This range literal is valid: +RANGE '[2020-01-01, 2020-12-31)' +``` + +```sql +-- This range literal produces an error: +RANGE '[2020-01-01,2020-12-31)' +``` + +A range literal represents a constant value of the +[range data type][range-data-type]. + ### Interval literals An interval literal represents a constant value of the @@ -1346,6 +1585,8 @@ WHERE book = "Ulysses"; [floating-point-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#floating_point_types +[quoted-literals]: #quoted_literals + [decimal-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#decimal_types [date-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#date_type @@ -1380,6 +1621,8 @@ WHERE book = "Ulysses"; [construct-range-interval]: https://github.com/google/zetasql/blob/master/docs/data-types.md#range_datetime_part_interval +[range-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#range_type + [enum-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#enum_type [json-data-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#json_type diff --git a/docs/mathematical_functions.md b/docs/mathematical_functions.md index ee9ae9e84..2a6f336a5 100644 --- a/docs/mathematical_functions.md +++ b/docs/mathematical_functions.md @@ -10,6 +10,147 @@ All mathematical functions have the following behaviors: + They return `NULL` if any of the input parameters is `NULL`. + They return `NaN` if any of the arguments is `NaN`. +### Categories + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CategoryFunctions
Trigonometric + ACOS   + ACOSH   + ASIN   + ASINH   + ATAN   + ATAN2   + ATANH   + COS   + COSH   + COT   + COTH   + CSC   + CSCH   + SEC   + SECH   + SIN   + SINH   + TAN   + TANH   +
+ Exponential and
+ logarithmic +
+ EXP   + LN   + LOG   + LOG10   +
+ Rounding and
+ truncation +
+ CEIL   + CEILING   + FLOOR   + ROUND   + TRUNC   +
+ Power and
+ root +
+ CBRT   + POW   + POWER   + SQRT   +
Sign + ABS   + SIGN   +
+ Distance + + COSINE_DISTANCE   + EUCLIDEAN_DISTANCE   +
+ Comparison + + GREATEST   + LEAST   +
Random number generator + RAND   +
Arithmetic and error handling + DIV   + IEEE_DIVIDE   + IS_INF   + IS_NAN   + MOD   + SAFE_ADD   + SAFE_DIVIDE   + SAFE_MULTIPLY   + SAFE_NEGATE   + SAFE_SUBTRACT   +
Bucket + RANGE_BUCKET   +
Numerical constants + PI   + PI_BIGNUMERIC   + PI_NUMERIC   +
+ ### Function list @@ -356,6 +497,16 @@ All mathematical functions have the following behaviors: + + + + +
RANGE_BUCKET + + + Scans through a sorted array and returns the 0-based position + of a point's upper bound. +
ROUND @@ -1093,100 +1244,103 @@ Generates an error if overflow occurs. ### `COSINE_DISTANCE` - - ```sql COSINE_DISTANCE(vector1, vector2) ``` **Description** -Computes the [cosine distance][cosine-distance] between two vectors. +Computes the [cosine distance][wiki-cosine-distance] between two vectors. **Definitions** -+ `vector1`: The first vector. -+ `vector2`: The second vector. ++ `vector1`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. ++ `vector2`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. **Details** -Each vector represents a quantity that includes magnitude and direction. -The following vector types are supported: ++ `ARRAY` can be used to represent a vector. Each zero-based index in this + array represents a dimension. The value for each element in this array + represents a magnitude. -+ Dense vector: `ARRAY` that represents - the vector and its numerical values. `value` is of type - `DOUBLE`. + `T` can represent the following and must be the same for both + vectors: - This is an example of a dense vector: + + + - ``` - [1.0, 0.0, 3.0] - ``` -+ Sparse vector: `ARRAY>`, where - `STRUCT` contains a dimension-value pair for each numerical value in the - vector. This information is used to generate a dense vector. - - + `dimension`: A `STRING` or `INT64` value that represents the - specific dimension for `value` in a vector. - - + `value`: A `DOUBLE` value that represents the - numerical value for `dimension`. + + `FLOAT` + + `DOUBLE` - A sparse vector contains mostly zeros, with only a few non-zero elements. - It's a useful data structure for representing data that is mostly empty or - has a lot of zeros. For example, if you have a vector of length 10,000 and - only 10 elements are non-zero, then it is a sparse vector. As a result, - it's more efficient to describe a sparse vector by only mentioning its - non-zero elements. If an element isn't present in the - sparse representation, its value can be implicitly understood to be zero. + + - The following `INT64` sparse vector + In the following example vector, there are four dimensions. The magnitude + is `10.0` for dimension `0`, `55.0` for dimension `1`, `40.0` for + dimension `2`, and `34.0` for dimension `3`: ``` - [(0, 1.0), (2, 3.0)] - ``` - - is converted to this dense vector: - + [10.0, 55.0, 40.0, 34.0] ``` - [1.0, 0.0, 3.0] ++ `ARRAY>` can be used to represent a + sparse vector. With a sparse vector, you only need to include + dimension-magnitude pairs for non-zero magnitudes. If a magnitude isn't + present in the sparse vector, the magnitude is implicitly understood to be + zero. + + For example, if you have a vector with 10,000 dimensions, but only 10 + dimensions have non-zero magnitudes, then the vector is a sparse vector. + As a result, it's more efficient to describe a sparse vector by only + mentioning its non-zero magnitudes. + + In `ARRAY>`, `STRUCT` + represents a dimension-magnitude pair for each non-zero magnitude in a + sparse vector. These parts need to be included for each dimension-magnitude + pair: + + + `dimension`: A `STRING` or `INT64` value that represents a + dimension in a vector. + + + `magnitude`: A `DOUBLE` value that represents a + non-zero magnitude for a specific dimension in a vector. + + You don't need to include empty dimension-magnitude pairs in a + sparse vector. For example, the following sparse vector and + non-sparse vector are equivalent: + + ```sql + -- sparse vector ARRAY> + [(1, 10.0), (2: 30.0), (5, 40.0)] ``` - The following `STRING` sparse vector - - ``` - [('d': 4.0), ('a', 1.0), ('b': 3.0)] + ```sql + -- vector ARRAY + [0.0, 10.0, 30.0, 0.0, 0.0, 40.0] ``` - is converted to this dense vector: + In a sparse vector, dimension-magnitude pairs don't need to be in any + particular order. The following sparse vectors are equivalent: + ```sql + [('a', 10.0), ('b': 30.0), ('d': 40.0)] ``` - [1.0, 3.0, 0.0, 4.0] - ``` - -The ordering of numeric values in a vector doesn't impact the results -produced by this function if the dimensions of the vectors are aligned. - -A vector can have one or more dimensions. Both vectors in this function must -share these same dimensions, and if they don't, an error is produced. - -A vector can't be a zero vector. A vector is a zero vector if all elements in -the vector are `0`. For example, `[0.0, 0.0]`. If a zero vector is encountered, -an error is produced. -An error is produced if an element or field in a vector is `NULL`. - -If `vector1` or `vector2` is `NULL`, `NULL` is returned. + ```sql + [('d': 40.0), ('a', 10.0), ('b': 30.0)] + ``` ++ Both non-sparse vectors + in this function must share the same dimensions, and if they don't, an error + is produced. ++ A vector can't be a zero vector. A vector is a zero vector if it has + no dimensions or all dimensions have a magnitude of `0`, such as `[]` or + `[0.0, 0.0]`. If a zero vector is encountered, an error is produced. ++ An error is produced if a magnitude in a vector is `NULL`. ++ If a vector is `NULL`, `NULL` is returned. **Return type** @@ -1194,8 +1348,8 @@ If `vector1` or `vector2` is `NULL`, `NULL` is returned. **Examples** -In the following example, dense vectors are used to compute the -cosine distance: +In the following example, non-sparsevectors +are used to compute the cosine distance: ```sql SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; @@ -1228,11 +1382,17 @@ even though the numeric values in each vector is in a different order: ```sql SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; +``` +```sql SELECT COSINE_DISTANCE([2.0, 1.0], [4.0, 3.0]) AS results; +``` +```sql SELECT COSINE_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results; +``` +```sql /*----------* | results | +----------+ @@ -1248,9 +1408,14 @@ the first vector, which is a zero vector: SELECT COSINE_DISTANCE([0.0, 0.0], [3.0, 4.0]) AS results; ``` -Both dense vectors must have the same dimensions. If not, an error is produced. -In the following examples, the first vector has two dimensions and the second -vector has three: +```sql +-- ERROR +SELECT COSINE_DISTANCE([(1, 0.0), (2, 0.0)], [(1, 3.0), (2, 4.0)]) AS results; +``` + +Both non-sparse vectors must have the same +dimensions. If not, an error is produced. In the following example, the +first vector has two dimensions and the second vector has three: ```sql -- ERROR @@ -1266,7 +1431,7 @@ SELECT COSINE_DISTANCE( [(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results; ``` -[cosine-distance]: https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_distance +[wiki-cosine-distance]: https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_distance ### `COT` @@ -1638,99 +1803,103 @@ result overflows. ### `EUCLIDEAN_DISTANCE` - - ```sql EUCLIDEAN_DISTANCE(vector1, vector2) ``` **Description** -Computes the [Euclidean distance][euclidean-distance] between two vectors. +Computes the [Euclidean distance][wiki-euclidean-distance] between two vectors. **Definitions** -+ `vector1`: The first vector. -+ `vector2`: The second vector. ++ `vector1`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. ++ `vector2`: A vector that is represented by an + `ARRAY` value or a sparse vector that is + represented by an `ARRAY>` value. **Details** -Each vector represents a quantity that includes magnitude and direction. -The following vector types are supported: - -+ Dense vector: `ARRAY` that represents - the vector and its numerical values. `value` is of type - `DOUBLE`. - - This is an example of a dense vector: ++ `ARRAY` can be used to represent a vector. Each zero-based index in this + array represents a dimension. The value for each element in this array + represents a magnitude. - ``` - [1.0, 0.0, 3.0] - ``` -+ Sparse vector: `ARRAY>`, where - `STRUCT` contains a dimension-value pair for each numerical value in the - vector. This information is used to generate a dense vector. + `T` can represent the following and must be the same for both + vectors: - + `dimension`: A `STRING` or `INT64` value that represents the - specific dimension for `value` in a vector. + + + - + `value`: A `DOUBLE` value that represents a - numerical value for `dimension`. + + `FLOAT` + + `DOUBLE` - A sparse vector contains mostly zeros, with only a few non-zero elements. - It's a useful data structure for representing data that is mostly empty or - has a lot of zeros. For example, if you have a vector of length 10,000 and - only 10 elements are non-zero, then it is a sparse vector. As a result, - it's more efficient to describe a sparse vector by only mentioning its - non-zero elements. If an element isn't present in the - sparse representation, its value can be implicitly understood to be zero. + + - The following `INT64` sparse vector + In the following example vector, there are four dimensions. The magnitude + is `10.0` for dimension `0`, `55.0` for dimension `1`, `40.0` for + dimension `2`, and `34.0` for dimension `3`: ``` - [(0, 1.0), (2, 3.0)] + [10.0, 55.0, 40.0, 34.0] ``` - - is converted to this dense vector: - - ``` - [1.0, 0.0, 3.0] ++ `ARRAY>` can be used to represent a + sparse vector. With a sparse vector, you only need to include + dimension-magnitude pairs for non-zero magnitudes. If a magnitude isn't + present in the sparse vector, the magnitude is implicitly understood to be + zero. + + For example, if you have a vector with 10,000 dimensions, but only 10 + dimensions have non-zero magnitudes, then the vector is a sparse vector. + As a result, it's more efficient to describe a sparse vector by only + mentioning its non-zero magnitudes. + + In `ARRAY>`, `STRUCT` + represents a dimension-magnitude pair for each non-zero magnitude in a + sparse vector. These parts need to be included for each dimension-magnitude + pair: + + + `dimension`: A `STRING` or `INT64` value that represents a + dimension in a vector. + + + `magnitude`: A `DOUBLE` value that represents a + non-zero magnitude for a specific dimension in a vector. + + You don't need to include empty dimension-magnitude pairs in a + sparse vector. For example, the following sparse vector and + non-sparse vector are equivalent: + + ```sql + -- sparse vector ARRAY> + [(1, 10.0), (2: 30.0), (5, 40.0)] ``` - The following `STRING` sparse vector - - ``` - [('d': 4.0), ('a', 1.0), ('b': 3.0)] + ```sql + -- vector ARRAY + [0.0, 10.0, 30.0, 0.0, 0.0, 40.0] ``` - is converted to this dense vector: + In a sparse vector, dimension-magnitude pairs don't need to be in any + particular order. The following sparse vectors are equivalent: + ```sql + [('a', 10.0), ('b': 30.0), ('d': 40.0)] ``` - [1.0, 3.0, 0.0, 4.0] - ``` - -The ordering of numeric values in a vector doesn't impact the results -produced by this function if the dimensions of the vectors are aligned. -A vector can have one or more dimensions. Both vectors in this function must -share these same dimensions, and if they don't, an error is produced. - -A vector can be a zero vector. A vector is a zero vector if all elements in -the vector are `0`. For example, `[0.0, 0.0]`. - -An error is produced if an element or field in a vector is `NULL`. - -If `vector1` or `vector2` is `NULL`, `NULL` is returned. + ```sql + [('d': 40.0), ('a', 10.0), ('b': 30.0)] + ``` ++ Both non-sparse vectors + in this function must share the same dimensions, and if they don't, an error + is produced. ++ A vector can be a zero vector. A vector is a zero vector if it has + no dimensions or all dimensions have a magnitude of `0`, such as `[]` or + `[0.0, 0.0]`. ++ An error is produced if a magnitude in a vector is `NULL`. ++ If a vector is `NULL`, `NULL` is returned. **Return type** @@ -1738,8 +1907,8 @@ If `vector1` or `vector2` is `NULL`, `NULL` is returned. **Examples** -In the following example, dense vectors are used to compute the -Euclidean distance: +In the following example, non-sparse vectors +are used to compute the Euclidean distance: ```sql SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results; @@ -1766,17 +1935,23 @@ SELECT EUCLIDEAN_DISTANCE( *----------*/ ``` -The ordering of numeric values in a vector doesn't impact the results +The ordering of magnitudes in a vector doesn't impact the results produced by this function. For example these queries produce the same results -even though the numeric values in each vector is in a different order: +even though the magnitudes in each vector is in a different order: ```sql SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]); +``` +```sql SELECT EUCLIDEAN_DISTANCE([2.0, 1.0], [4.0, 3.0]); +``` +```sql SELECT EUCLIDEAN_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results; +``` +```sql /*----------* | results | +----------+ @@ -1784,9 +1959,9 @@ SELECT EUCLIDEAN_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results *----------*/ ``` -Both dense vectors must have the same dimensions. If not, an error is produced. -In the following examples, the first vector has two dimensions and the second -vector has three: +Both non-sparse vectors must have the same +dimensions. If not, an error is produced. In the following example, the first +vector has two dimensions and the second vector has three: ```sql -- ERROR @@ -1802,7 +1977,7 @@ SELECT EUCLIDEAN_DISTANCE( [(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results; ``` -[euclidean-distance]: https://en.wikipedia.org/wiki/Euclidean_distance +[wiki-euclidean-distance]: https://en.wikipedia.org/wiki/Euclidean_distance ### `FLOOR` @@ -2558,6 +2733,121 @@ RAND() Generates a pseudo-random value of type `DOUBLE` in the range of [0, 1), inclusive of 0 and exclusive of 1. +### `RANGE_BUCKET` + +```sql +RANGE_BUCKET(point, boundaries_array) +``` + +**Description** + +`RANGE_BUCKET` scans through a sorted array and returns the 0-based position +of the point's upper bound. This can be useful if you need to group your data to +build partitions, histograms, business-defined rules, and more. + +`RANGE_BUCKET` follows these rules: + ++ If the point exists in the array, returns the index of the next larger value. + + ```sql + RANGE_BUCKET(20, [0, 10, 20, 30, 40]) -- 3 is return value + RANGE_BUCKET(20, [0, 10, 20, 20, 40, 40]) -- 4 is return value + ``` ++ If the point does not exist in the array, but it falls between two values, + returns the index of the larger value. + + ```sql + RANGE_BUCKET(25, [0, 10, 20, 30, 40]) -- 3 is return value + ``` ++ If the point is smaller than the first value in the array, returns 0. + + ```sql + RANGE_BUCKET(-10, [5, 10, 20, 30, 40]) -- 0 is return value + ``` ++ If the point is greater than or equal to the last value in the array, + returns the length of the array. + + ```sql + RANGE_BUCKET(80, [0, 10, 20, 30, 40]) -- 5 is return value + ``` ++ If the array is empty, returns 0. + + ```sql + RANGE_BUCKET(80, []) -- 0 is return value + ``` ++ If the point is `NULL` or `NaN`, returns `NULL`. + + ```sql + RANGE_BUCKET(NULL, [0, 10, 20, 30, 40]) -- NULL is return value + ``` ++ The data type for the point and array must be compatible. + + ```sql + RANGE_BUCKET('a', ['a', 'b', 'c', 'd']) -- 1 is return value + RANGE_BUCKET(1.2, [1, 1.2, 1.4, 1.6]) -- 2 is return value + RANGE_BUCKET(1.2, [1, 2, 4, 6]) -- execution failure + ``` + +Execution failure occurs when: + ++ The array has a `NaN` or `NULL` value in it. + + ```sql + RANGE_BUCKET(80, [NULL, 10, 20, 30, 40]) -- execution failure + ``` ++ The array is not sorted in ascending order. + + ```sql + RANGE_BUCKET(30, [10, 30, 20, 40, 50]) -- execution failure + ``` + +**Parameters** + ++ `point`: A generic value. ++ `boundaries_array`: A generic array of values. + +Note: The data type for `point` and the element type of `boundaries_array` +must be equivalent. The data type must be [comparable][data-type-properties]. + +**Return Value** + +`INT64` + +**Examples** + +In a table called `students`, check to see how many records would +exist in each `age_group` bucket, based on a student's age: + ++ age_group 0 (age < 10) ++ age_group 1 (age >= 10, age < 20) ++ age_group 2 (age >= 20, age < 30) ++ age_group 3 (age >= 30) + +```sql +WITH students AS +( + SELECT 9 AS age UNION ALL + SELECT 20 AS age UNION ALL + SELECT 25 AS age UNION ALL + SELECT 31 AS age UNION ALL + SELECT 32 AS age UNION ALL + SELECT 33 AS age +) +SELECT RANGE_BUCKET(age, [10, 20, 30]) AS age_group, COUNT(*) AS count +FROM students +GROUP BY 1 + +/*--------------+-------* + | age_group | count | + +--------------+-------+ + | 0 | 1 | + | 2 | 2 | + | 3 | 3 | + *--------------+-------*/ +``` + +[data-type-properties]: https://github.com/google/zetasql/blob/master/docs/data-types.md#data_type_properties + ### `ROUND` ``` diff --git a/docs/navigation_functions.md b/docs/navigation_functions.md index 53489c436..67e4a463a 100644 --- a/docs/navigation_functions.md +++ b/docs/navigation_functions.md @@ -5,7 +5,7 @@ # Navigation functions ZetaSQL supports navigation functions. -Navigation functions are a subset window functions. To create a +Navigation functions are a subset of window functions. To create a window function call and learn about the syntax for window functions, see [Window function_calls][window-function-calls]. diff --git a/docs/operators.md b/docs/operators.md index 12c3a0c39..975150271 100644 --- a/docs/operators.md +++ b/docs/operators.md @@ -21,7 +21,7 @@ Common conventions: ### Operator precedence The following table lists all ZetaSQL operators from highest to -lowest precedence, i.e. the order in which they will be evaluated within a +lowest precedence, i.e., the order in which they will be evaluated within a statement. @@ -1465,7 +1465,7 @@ This operator throws an error if Y is negative.X: Integer or BYTES
Y: INT64 diff --git a/docs/user-defined-functions.md b/docs/user-defined-functions.md index 78a5b695f..0fdbd851c 100644 --- a/docs/user-defined-functions.md +++ b/docs/user-defined-functions.md @@ -281,13 +281,14 @@ function. For details, see [Function calls][function-calls]. ### SQL type encodings in JavaScript -ZetaSQL represents types in the following manner: +[ZetaSQL data types][data-types] represent +[JavaScript data types][javascript-types] in the following manner:
Shifts the first operand X to the right. This operator does not -do sign bit extension with a signed type (i.e. it fills vacant bits on the left +do sign bit extension with a signed type (i.e., it fills vacant bits on the left with 0). This operator returns 0 or a byte sequence of b'\x00' @@ -1564,187 +1564,190 @@ SELECT entry FROM entry_table WHERE entry IS NULL ### Comparison operators -Comparisons always return `BOOL`. Comparisons generally -require both operands to be of the same type. If operands are of different -types, and if ZetaSQL can convert the values of those types to a -common type without loss of precision, ZetaSQL will generally coerce -them to that common type for the comparison; ZetaSQL will generally -coerce literals to the type of non-literals, where -present. Comparable data types are defined in -[Data Types][operators-link-to-data-types]. -NOTE: ZetaSQL allows comparisons -between signed and unsigned integers. - -Structs support only these comparison operators: equal -(`=`), not equal (`!=` and `<>`), and `IN`. - -The comparison operators in this section cannot be used to compare -`JSON` ZetaSQL literals with other `JSON` ZetaSQL literals. -If you need to compare values inside of `JSON`, convert the values to -SQL values first. For more information, see [`JSON` functions][json-functions]. - -The following rules apply when comparing these data types: - -+ Floating point: - All comparisons with `NaN` return `FALSE`, - except for `!=` and `<>`, which return `TRUE`. -+ `BOOL`: `FALSE` is less than `TRUE`. -+ `STRING`: Strings are - compared codepoint-by-codepoint, which means that canonically equivalent - strings are only guaranteed to compare as equal if - they have been normalized first. -+ `NULL`: The convention holds here: any operation with a `NULL` input returns - `NULL`. +Compares operands and produces the results of the comparison as a `BOOL` +value. These comparison operators are available: - - - - - - - - - - - - + + + + + + + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - + for details. + + +
NameSyntaxDescription
Less ThanX < Y - Returns TRUE if X is less than Y. - +
NameSyntaxDescription
Less ThanX < Y + Returns TRUE if X is less than Y. + This operator supports specifying collation. -
Less Than or Equal ToX <= Y - Returns TRUE if X is less than or equal to - Y. - +
Less Than or Equal ToX <= Y + Returns TRUE if X is less than or equal to + Y. + This operator supports specifying collation. -
Greater ThanX > Y - Returns TRUE if X is greater than Y. - +
Greater ThanX > Y + Returns TRUE if X is greater than + Y. + This operator supports specifying collation. -
Greater Than or Equal ToX >= Y - Returns TRUE if X is greater than or equal to - Y. - +
Greater Than or Equal ToX >= Y + Returns TRUE if X is greater than or equal to + Y. + This operator supports specifying collation. -
EqualX = Y - Returns TRUE if X is equal to Y. - +
EqualX = Y + Returns TRUE if X is equal to Y. + This operator supports specifying collation. -
Not EqualX != Y
X <> Y
- Returns TRUE if X is not equal to Y. - +
Not EqualX != Y
X <> Y
+ Returns TRUE if X is not equal to + Y. + This operator supports specifying collation. -
BETWEENX [NOT] BETWEEN Y AND Z -

- Returns TRUE if X is [not] within the range - specified. The result of X BETWEEN Y AND Z is equivalent to - Y <= X AND X <= Z but X is evaluated only - once in the former. - +

BETWEENX [NOT] BETWEEN Y AND Z +

+ Returns TRUE if X is [not] within the range + specified. The result of X BETWEEN Y AND Z is equivalent + to Y <= X AND X <= Z but X is + evaluated only once in the former. + This operator supports specifying collation. -

-
LIKEX [NOT] LIKE Y - See the `LIKE` operator +

+
LIKEX [NOT] LIKE Y + See the `LIKE` operator - for details. -
INMultiple - See the `IN` operator + for details. +
INMultiple + See the `IN` operator - for details. -
-When testing values that have a struct data type for -equality, it's possible that one or more fields are `NULL`. In such cases: +The following rules apply to operands in a comparison operator: + ++ The operands must be [comparable][data-type-comparable]. ++ A comparison operator generally requires both operands to be of the + same type. ++ If the operands are of different types, and the values of those types can be + converted to a common type without loss of precision, + they are generally coerced to that common type for the comparison. ++ A literal operand is generally coerced to the same data type of a + non-literal operand that is part of the comparison. ++ Comparisons between operands that are signed and unsigned integers is + allowed. ++ Struct operands support only these comparison operators: equal + (`=`), not equal (`!=` and `<>`), and `IN`. -+ If all non-`NULL` field values are equal, the comparison returns `NULL`. -+ If any non-`NULL` field values are not equal, the comparison returns `FALSE`. +The following rules apply when comparing these data types: -The following table demonstrates how struct data -types are compared when they have fields that are `NULL` valued. - - - - - - - - - - - - - - - - - - - - - - - - - - -
Struct1Struct2Struct1 = Struct2
STRUCT(1, NULL)STRUCT(1, NULL)NULL
STRUCT(1, NULL)STRUCT(2, NULL)FALSE
STRUCT(1,2)STRUCT(1, NULL)NULL
++ Floating point: + All comparisons with `NaN` return `FALSE`, + except for `!=` and `<>`, which return `TRUE`. ++ `BOOL`: `FALSE` is less than `TRUE`. ++ `STRING`: Strings are compared codepoint-by-codepoint, which means that + canonically equivalent strings are only guaranteed to compare as equal if + they have been normalized first. ++ `JSON`: You can't compare JSON, but you can compare + the values inside of JSON if you convert the values to + SQL values first. For more information, see + [`JSON` functions][json-functions]. ++ `NULL`: Any operation with a `NULL` input returns `NULL`. ++ `STRUCT`: When testing a struct for equality, it's possible that one or more + fields are `NULL`. In such cases: + + + If all non-`NULL` field values are equal, the comparison returns `NULL`. + + If any non-`NULL` field values are not equal, the comparison returns + `FALSE`. + + The following table demonstrates how `STRUCT` data types are compared when + they have fields that are `NULL` valued. + + + + + + + + + + + + + + + + + + + + + + + + + + +
Struct1Struct2Struct1 = Struct2
STRUCT(1, NULL)STRUCT(1, NULL)NULL
STRUCT(1, NULL)STRUCT(2, NULL)FALSE
STRUCT(1,2)STRUCT(1, NULL)NULL
### `EXISTS` operator @@ -3003,6 +3006,8 @@ FROM UNNEST([ [operators-link-to-data-types]: https://github.com/google/zetasql/blob/master/docs/data-types.md +[data-type-comparable]: https://github.com/google/zetasql/blob/master/docs/data-types.md#comparable_data_types + [operators-link-to-struct-type]: https://github.com/google/zetasql/blob/master/docs/data-types.md#struct_type [operators-link-to-from-clause]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#from_clause diff --git a/docs/query-syntax.md b/docs/query-syntax.md index 4327f5394..a228b596d 100644 --- a/docs/query-syntax.md +++ b/docs/query-syntax.md @@ -2807,6 +2807,7 @@ aggregated rows for each of them: + `(a, b, c)` + `(a, b)` ++ `(a, c)` + `(a)` + `(b, c)` + `(b)` @@ -3350,7 +3351,10 @@ WINDOW
 set_operation:
-  query_expr set_operator [ corresponding_specification ] query_expr
+  {
+    query_expr set_operator query_expr
+    | corresponding_set_operation
+  }
 
 set_operator:
   {
@@ -3358,9 +3362,6 @@ WINDOW
     INTERSECT { ALL | DISTINCT } |
     EXCEPT { ALL | DISTINCT }
   }
-
-corresponding_specification:
-  CORRESPONDING
 
Set operators combine results from two or @@ -3485,23 +3486,105 @@ EXCEPT DISTINCT SELECT 1; ### `CORRESPONDING` -Add the `CORRESPONDING` specification to match columns by name instead of -position in a set operation. +
+corresponding_set_operation:
+  query_expr corresponding_set_operator query_expr
+
+corresponding_set_operator:
+  [ set_op_outer_mode ] set_operator [ STRICT ] CORRESPONDING [ BY (column_list) ]
+
+column_list:
+  column_name[, ...]
+
+set_op_outer_mode:
+  { set_op_full_outer_mode | set_op_left_outer_mode }
+
+set_op_full_outer_mode:
+  { FULL OUTER | FULL | OUTER }
 
-General usage notes for the `CORRESPONDING` specification:
+set_op_left_outer_mode:
+  { LEFT OUTER | LEFT }
+
+ +Use the `CORRESPONDING` set operation to match columns by name instead of by +position. + +**Definitions** -+ In the set operation, only columns appearing in both input queries are - preserved and all other columns are dropped. -+ Both input queries in the set operation must produce at least - one column with the same name. -+ Columns in the set operation results appear in the order in which they occur - in the first input query. -+ Neither input query in the set operation can produce a value table. -+ Neither input query in the set operation can produce anonymous columns or - duplicate column names. -+ Pseudocolumns are ignored in the set operation. ++ `query_expr`: One of two input queries whose results are combined into a + single result set. ++ `set_operator`: The [set operator][set-operators] to include in the + operation. ++ `BY column_list`: If specified, the columns of an input query are preserved + if and only if they appear in `column_list`. ++ `set_op_left_outer_mode`: Applies left outer mode to the set operation. ++ `set_op_full_outer_mode`: Applies full outer mode to the set operation. ++ `STRICT`: Applies strict mode to the set operation. -#### Set operation results when matching by column position + + +**Rules** + +For all `CORRESPONDING` set operations: + ++ `set_op_outer_mode` and `STRICT` cannot be used together. ++ An input query that produces a [value table][value-tables] is not supported. + +If the `BY` clause is used: + ++ Columns in the result set are produced in the order in which they occur in + `column_list`. Any column names not in this list are excluded from the + results. ++ Regarding duplicate and anonymous columns: + + The `column_list` must not contain duplicate names. + + Input queries must have at most one column with each name from + `column_list`. + + Input queries may have anonymous or duplicate column names that are not + in `column_list`. ++ If neither `STRICT` nor an outer mode is used (default mode): + + Both the left and right input query must produce all columns in the + `column_list`. ++ In strict mode: + + Both the left and the right input query must produce exactly the columns + in `column_list`; column order can be different. ++ In left outer mode: + + Each column name in `column_list` must exist in the left input query. + + If the right input query does not have a column in `column_list`, `NULL` + is used as the column values for rows originating from the right input + query. ++ In full outer mode: + + Each column name in `column_list` must exist in at least one input + query. + + If one input query does not have a column in `column_list`, `NULL` is + used as the column values for rows originating from that input. + +If the `BY` clause isn't used: + ++ Columns from the left query are produced first, followed by the right + query's unique columns. ++ Neither input queries can produce duplicate nor anonymous columns. ++ If neither `STRICT` nor an outer mode is used (default mode): + + Only columns appearing in both input queries are included in the result + set; all other columns are excluded. + + There must be at least one column name common to the left and right + input query. ++ In strict mode: + + Both input queries must have the same set of columns; column order can + be different. + + All columns from the input queries are included in the result set. ++ In left outer mode: + + All columns from the left input query are included in the result set. + + If a column of the left query does not appear in the right query, `NULL` + is used as its column value for the right query. + + There must be at least one column name common to the left and right + input query. ++ In full outer mode: + + All columns from the input queries are included in the result set. + + If a column of one input query does not appear in the other query, + `NULL` is used as its column value for the other query. + +#### Produce results by column name + ZetaSQL supports set operations, and by default, all of these set operations match columns by position. More specifically: @@ -3534,12 +3617,9 @@ SELECT * FROM Produce1 UNION ALL SELECT * FROM Produce2; In the proceeding example, an attempt was made to list of fruits in one column and vegetables in another, however the results don't reflect this. -To get the intended results, use the `CORRESPONDING` specification. - -#### Set operation results when matching by column name -To produce the intended results, match columns by name, using the -`CORRESPONDING` specification. In particular, this means: +To produce the intended results, use the `CORRESPONDING` set operation to match +the columns by name. In particular, this means: + When rows of the second input query contribute to the set operation, the columns are reordered so that the column names are in the same positions @@ -3566,10 +3646,32 @@ SELECT * FROM Produce1 UNION ALL CORRESPONDING SELECT * FROM Produce2; Notice that the order of the columns is determined by the first input query in this example. -#### Columns that don't appear in both result sets are dropped +#### Produce results for specific columns only + -With the `CORRESPONDING` specification, only columns appearing in both -input queries are preserved and all other columns are dropped. +To produce results for only specific columns, use the optional `BY` clause +with the `CORRESPONDING` set operation. In the following example, only results +for `fruit` is produced: + +```sql +WITH + Produce1 AS (SELECT 'apple' AS fruit, 'leek' AS vegetable), + Produce2 AS (SELECT 'kale' AS vegetable, 'orange' AS fruit) +SELECT * FROM Produce1 UNION ALL CORRESPONDING BY (fruit) SELECT * FROM Produce2; + +/*--------* + | fruit | + +--------+ + | apple | + | orange | + +--------*/ +``` + +#### Drop columns that don't appear in both result sets + + +With the `CORRESPONDING` set operation, only columns appearing in both +input queries are preserved and all other columns are removed from the results. In the following example, The columns `extra_col1` and `extra_col2` are dropped as they only occur in one input query. @@ -3588,32 +3690,112 @@ SELECT * FROM Produce1 UNION ALL CORRESPONDING SELECT * FROM Produce2; +--------+-----------*/ ``` +#### Inject `NULL` values into the results for missing columns + + +You can inject `NULL` values for missing columns by including the `FULL OUTER` +or `LEFT OUTER` syntax in the operation. + +In the following example, `FULL OUTER` is used to add `NULL` values for missing +columns in both input queries: + +```sql +WITH + Produce1 AS (SELECT 'apple' AS fruit, 'leek' AS vegetable, 1 AS extra_col1), + Produce2 AS (SELECT 'kale' AS vegetable, 'orange' AS fruit, 2 AS extra_col2) +SELECT * FROM Produce1 +FULL OUTER UNION ALL CORRESPONDING +SELECT * FROM Produce2; + +/*--------+-----------+------------+------------+ + | fruit | vegetable | extra_col1 | extra_col2 | + +--------+-----------+------------+------------+ + | apple | leek | 1 | NULL | + | orange | kale | NULL | 2 | + +--------+-----------+------------+------------*/ +``` + +In the following example, `LEFT OUTER` is used to add `NULL` values for missing +columns in the second input query: + +```sql +WITH + Produce1 AS (SELECT 'apple' AS fruit, 'leek' AS vegetable, 1 AS extra_col1), + Produce2 AS (SELECT 'kale' AS vegetable, 'orange' AS fruit, 2 AS extra_col2) +SELECT * FROM Produce1 +LEFT OUTER UNION ALL CORRESPONDING +SELECT * FROM Produce2; + +/*--------+-----------+------------+ + | fruit | vegetable | extra_col1 | + +--------+-----------+------------+ + | apple | leek | 1 | + | orange | kale | NULL | + +--------+-----------+------------*/ +``` + +You can include the `BY` clause to only include specific columns in the results. +For example: + +```sql +WITH + Produce1 AS (SELECT 'apple' AS fruit, 'leek' AS vegetable, 1 AS extra_col1), + Produce2 AS (SELECT 'kale' AS vegetable, 'orange' AS fruit, 2 AS extra_col2) +SELECT * FROM Produce1 +FULL OUTER UNION ALL CORRESPONDING BY (fruit, extra_col1) +SELECT * FROM Produce2; + +/*--------+------------+ + | fruit | extra_col1 | + +--------+------------+ + | apple | 1 | + | orange | NULL | + +--------+------------*/ +``` + ## `LIMIT` and `OFFSET` clauses -
+```sql
 LIMIT count [ OFFSET skip_rows ]
-
+``` -`LIMIT` specifies a non-negative `count` of type INT64, -and no more than `count` rows will be returned. `LIMIT` `0` returns 0 rows. +Limits the number of rows to return in a query. Optionally includes +the ability to skip over rows. -If there is a set operation, `LIMIT` is applied after the set operation is -evaluated. +**Definitions** -`OFFSET` specifies a non-negative number of rows to skip before applying -`LIMIT`. `skip_rows` is of type INT64. ++ `LIMIT`: Limits the number of rows to produce. -These clauses accept only literal or parameter values. The rows that are -returned by `LIMIT` and `OFFSET` are unspecified unless these -operators are used after `ORDER BY`. + `count` is an `INT64` constant expression that represents the + non-negative, non-`NULL` limit. No more than `count` rows are produced. + `LIMIT 0` returns 0 rows. -Examples: + + If there is a set operation, `LIMIT` is applied after the set operation is + evaluated. + + + ++ `OFFSET`: Skips a specific number of rows before applying `LIMIT`. + + `skip_rows` is an `INT64` constant expression that represents + the non-negative, non-`NULL` number of rows to skip. + +**Details** + +The rows that are returned by `LIMIT` and `OFFSET` have undefined order unless +these clauses are used after `ORDER BY`. + +A constant expression can be represented by a general expression, literal, or +parameter value. + +**Examples** ```sql SELECT * FROM UNNEST(ARRAY['a', 'b', 'c', 'd', 'e']) AS letter -ORDER BY letter ASC LIMIT 2 +ORDER BY letter ASC LIMIT 2; /*---------* | letter | @@ -3626,7 +3808,7 @@ ORDER BY letter ASC LIMIT 2 ```sql SELECT * FROM UNNEST(ARRAY['a', 'b', 'c', 'd', 'e']) AS letter -ORDER BY letter ASC LIMIT 3 OFFSET 1 +ORDER BY letter ASC LIMIT 3 OFFSET 1; /*---------* | letter | @@ -4265,18 +4447,25 @@ SELECT * FROM B ## Differential privacy clause -**Anonymization clause** +Warning: `ANONYMIZATION` has been deprecated. Use +`DIFFERENTIAL_PRIVACY` instead. + +Warning: `kappa` has been deprecated. Use +`max_groups_contributed` instead. + +Syntax for queries that support differential privacy with views:
-WITH ANONYMIZATION OPTIONS( privacy_parameters )
+WITH DIFFERENTIAL_PRIVACY OPTIONS( privacy_parameters )
 
 privacy_parameters:
   epsilon = expression,
   { delta = expression | k_threshold = expression },
-  [ { kappa = expression | max_groups_contributed = expression } ],
+  [ max_groups_contributed = expression ],
+  [ group_selection_strategy = { LAPLACE_THRESHOLD | PUBLIC_GROUPS } ]
 
-**Differential privacy clause** +Syntax for queries that support differential privacy without views:
 WITH DIFFERENTIAL_PRIVACY OPTIONS( privacy_parameters )
@@ -4305,17 +4494,16 @@ You can use the following syntax to build a differential privacy clause:
 +  [`k_threshold`][dp-k-threshold]: The number of entities that must
    contribute to a group in order for the group to be exposed in the results.
    `expression` must return an `INT64`.
-+  [`kappa` or `max_groups_contributed`][dp-kappa]: A positive integer identifying the limit on
-   the number of groups that an entity is allowed to contribute to. This number
-   is also used to scale the noise for each group. `expression` must be a
-   literal and return an `INT64`.
++  [`max_groups_contributed`][dp-max-groups]: A positive integer identifying the
+   limit on the number of groups that an entity is allowed to contribute to.
+   This number is also used to scale the noise for each group. `expression` must
+   be a literal and return an `INT64`.
 + [`privacy_unit_column`][dp-privacy-unit-id]: The column that represents the
   privacy unit column. Replace `column_name` with the path expression for the
   column. The first identifier in the path can start with either a table name
   or a column name that is visible in the `FROM` clause.
-
-Important: Avoid using `kappa` as it is soon to be depreciated. Instead, use
-`max_groups_contributed`.
++ [`group_selection_strategy`][dp-group-selection-strategy]: Determines how
+  differential privacy is applied to groups.
 
 Note: `delta` and `k_threshold` are mutually exclusive; `delta` is preferred
 over `k_threshold`.
@@ -4325,11 +4513,6 @@ more differentially private aggregate functions in the `SELECT` list.
 To learn more about the privacy parameters in this syntax,
 see [Privacy parameters][dp-privacy-parameters].
 
-Both the anonymization and differential privacy clause indicate that you are
-adding differential privacy to your query. If possible, use
-the differential privacy clause as the anonymization clause contains
-legacy syntax.
-
 ### Privacy parameters 
 
 
@@ -4360,11 +4543,12 @@ will also inject an implicit differentially private aggregate into the plan for
 removing small groups that computes a noisy entity count per group. If you have
 `n` explicit differentially private aggregate functions in your query, then each
 aggregate individually gets `epsilon/(n+1)` for its computation. If used with
-`kappa` or `max_groups_contributed`, the effective `epsilon` per function per groups is further
-split by `kappa` or `max_groups_contributed`. Additionally, if implicit clamping is used for an
-aggregate differentially private function, then half of the function's epsilon
-is applied towards computing implicit bounds, and half of the function's epsilon
-is applied towards the differentially private aggregation itself.
+`max_groups_contributed`, the effective `epsilon` per function per groups is
+further split by `max_groups_contributed`. Additionally, if implicit clamping is
+used for an aggregate differentially private function, then half of the
+function's epsilon is applied towards computing implicit bounds, and half of the
+function's epsilon is applied towards the differentially private aggregation
+itself.
 
 #### `delta` 
 
@@ -4401,31 +4585,29 @@ for each group and eliminates groups with few entities from the output. Use
 this parameter to define how many unique entities must be included in the group
 for the value to be included in the output.
 
-#### `kappa` or `max_groups_contributed` 
-
+#### `max_groups_contributed` 
+
 
-Important: Avoid using `kappa` as it is soon to be depreciated. Instead, use
-`max_groups_contributed`.
-
-The `kappa` or `max_groups_contributed` differential privacy parameter is a positive integer that,
-if specified, scales the noise and limits the number of groups that each entity
-can contribute to.
+The `max_groups_contributed` differential privacy parameter is a
+positive integer that, if specified, scales the noise and limits the number of
+groups that each entity can contribute to.
 
-The default values for `kappa` and `max_groups_contributed` are determined by the
+The default value for `max_groups_contributed` is determined by the
 query engine.
 
-If `kappa` or `max_groups_contributed` is unspecified, then there is no limit to the number of
-groups that each entity can contribute to.
+If `max_groups_contributed` is unspecified, then there is no limit to the
+number of groups that each entity can contribute to.
 
-If `kappa` or `max_groups_contributed` is unspecified, the language can't guarantee that the
-results will be differentially private. We recommend that you specify
-`kappa` or `max_groups_contributed`. If you don't specify `kappa` or `max_groups_contributed`, the results might
-still be differentially private if certain preconditions are met. For example,
-if you know that the privacy unit column in a table or view is unique in the
-`FROM` clause, the entity can't contribute to more than one group and therefore
-the results will be the same regardless of whether `kappa` or `max_groups_contributed` is set.
+If `max_groups_contributed` is unspecified, the language can't guarantee that
+the results will be differentially private. We recommend that you specify
+`max_groups_contributed`. If you don't specify `max_groups_contributed`, the
+results might still be differentially private if certain preconditions are met.
+For example, if you know that the privacy unit column in a table or view is
+unique in the `FROM` clause, the entity can't contribute to more than one group
+and therefore the results will be the same regardless of whether
+`max_groups_contributed` is set.
 
-Tip: We recommend that engines require `kappa` or `max_groups_contributed` to be set.
+Tip: We recommend that engines require `max_groups_contributed` to be set.
 
 #### `privacy_unit_column` 
 
@@ -4433,6 +4615,59 @@ Tip: We recommend that engines require `kappa` or `max_groups_contributed` to be
 To learn about the privacy unit and how to define a privacy unit column, see
 [Define a privacy unit column][dp-define-privacy-unit-id].
 
+#### `group_selection_strategy` 
+
+
+Differential privacy queries can include a `GROUP BY` clause. By default,
+including groups in a differentially private query requires the
+differential privacy algorithm to protect the entities in these groups.
+
+The `group_selection_strategy` privacy parameter determines how groups are
+anonymized in a query. The choices are:
+
++   `LAPLACE_THRESHOLD` (default): Groups are anonymized by counting distinct
+    privacy units that contributed to the group, adding laplace noise to the
+    count, and then adding the group from the results if the noised value is
+    above a certain threshold. This process depends on epsilon and delta.
+
+    If the `group_selection_strategy` parameter is not included in
+    the `WITH DIFFERENTIAL_PRIVACY` clause, this is the default setting.
++   `PUBLIC_GROUPS`: Groups won't be anonymized because they don't depend upon
+    protected data or they have already been anonymized. Use this option if the
+    groups in the `GROUP BY` clause don't depend on any private data, and no
+    further anonymization is needed for the groups. For example, if the groups
+    contain fixed data such as the days in a week or operating system types,
+    and this data doesn't need to be private, use this option.
+
+Semantic rules for `PUBLIC_GROUPS`:
+
++   In the query, there should be a _public table_, which is a table
+    without a privacy unit ID column. This table should contain
+    data that is common knowledge or data that has already been anonymized.
++   In the query, there must be a _private table_, which is a table with a
+    privacy unit ID column. This table might contain identifying data.
++   In the `GROUP BY` clause, all grouping items must come from a
+    public table.
++   You must join a public table and a private table with a _public group join_,
+    but conditions apply.
+
+    +   Only `LEFT OUTER JOIN` and `RIGHT OUTER JOIN` are supported.
+
+    +   You must include a `SELECT DISTINCT` subquery on the side of the join
+        with the public group table.
+
+    +   The public group join must join a public table with a private table.
+
+    +   The public group join must join on each grouping item.
+
+    +   If a join doesn't fulfill the previous requirements, it's treated as a
+        normal join in the query and not counted as a public group join for
+        any of the joined columns.
++   For every grouping item there must be a public group join,
+    otherwise, an error is returned.
++   To obtain accurate results in a differentially private query with
+    public groups, `NULL` privacy units should not be present in the query.
+
 ### Differential privacy examples
 
 This section contains examples that illustrate how to work with
@@ -4471,13 +4706,13 @@ The examples in this section reference these views:
 ```sql
 CREATE OR REPLACE VIEW {{USERNAME}}.view_on_professors
 OPTIONS(anonymization_userid_column='id')
-AS (SELECT * FROM professors);
+AS (SELECT * FROM {{USERNAME}}.professors);
 ```
 
 ```sql
 CREATE OR REPLACE VIEW {{USERNAME}}.view_on_students
 OPTIONS(anonymization_userid_column='id')
-AS (SELECT * FROM students);
+AS (SELECT * FROM {{USERNAME}}.students);
 ```
 
 These views reference the [professors][dp-example-tables] and
@@ -4494,10 +4729,10 @@ privacy protection.
 -- This gets the average number of items requested per professor and adds
 -- noise to the results
 SELECT
-  WITH ANONYMIZATION
+  WITH DIFFERENTIAL_PRIVACY
     OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2)
     item,
-    ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity
+    AVG(quantity, contribution_bounds_per_group => (0,100)) AS average_quantity
 FROM {{USERNAME}}.view_on_professors
 GROUP BY item;
 
@@ -4519,7 +4754,7 @@ SELECT
   WITH DIFFERENTIAL_PRIVACY
     OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2, privacy_unit_column=id)
     item,
-    AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity
+    AVG(quantity, contribution_bounds_per_group => (0,100)) AS average_quantity
 FROM professors
 GROUP BY item;
 
@@ -4545,10 +4780,10 @@ from the results.
 -- This gets the average number of items requested per professor and removes
 -- noise from the results
 SELECT
-  WITH ANONYMIZATION
+  WITH DIFFERENTIAL_PRIVACY
     OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=2)
     item,
-    ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity
+    AVG(quantity, contribution_bounds_per_group => (0,100)) AS average_quantity
 FROM {{USERNAME}}.view_on_professors
 GROUP BY item;
 
@@ -4568,7 +4803,7 @@ SELECT
   WITH DIFFERENTIAL_PRIVACY
     OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=2, privacy_unit_column=id)
     item,
-    AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity
+    AVG(quantity, contribution_bounds_per_group => (0,100)) AS average_quantity
 FROM professors
 GROUP BY item;
 
@@ -4582,19 +4817,19 @@ GROUP BY item;
 ```
 
 #### Limit the groups in which a privacy unit ID can exist 
-
+
 
 A privacy unit column can exist within multiple groups. For example, in the
 `professors` table, the privacy unit column `123` exists in the `pencil` and
-`pen` group. You can set `kappa` or `max_groups_contributed` to different values to limit how many
+`pen` group. You can set `max_groups_contributed` to different values to limit how many
 groups each privacy unit column will be included in.
 
 ```sql
 SELECT
-  WITH ANONYMIZATION
+  WITH DIFFERENTIAL_PRIVACY
     OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
     item,
-    ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity
+    AVG(quantity, contribution_bounds_per_group => (0,100)) AS average_quantity
 FROM {{USERNAME}}.view_on_professors
 GROUP BY item;
 
@@ -4614,7 +4849,7 @@ SELECT
   WITH DIFFERENTIAL_PRIVACY
     OPTIONS(epsilon=1e20, delta=.01, privacy_unit_column=id)
     item,
-    AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity
+    AVG(quantity, contribution_bounds_per_group => (0,100)) AS average_quantity
 FROM professors
 GROUP BY item;
 
@@ -4629,6 +4864,58 @@ GROUP BY item;
  *----------+------------------*/
 ```
 
+#### Use public groups in a differentially private query 
+
+
+In the following example, `UNNEST(["pen", "pencil", "book"])` is not
+anonymized because it's public knowledge and doesn't reveal any information
+about user data. In the results, `scissors` is excluded because it's not in the
+public table that is generated by the `UNNEST` operation.
+
+```sql
+-- Create the professors table (table to protect)
+CREATE OR REPLACE TABLE {{USERNAME}}.professors AS (
+  SELECT 101 AS id, "pencil" AS item, 24 AS quantity UNION ALL
+  SELECT 123, "pen", 16 UNION ALL
+  SELECT 123, "pencil", 10 UNION ALL
+  SELECT 123, "pencil", 38 UNION ALL
+  SELECT 101, "pen", 19 UNION ALL
+  SELECT 101, "pen", 23 UNION ALL
+  SELECT 130, "scissors", 8 UNION ALL
+  SELECT 150, "pencil", 72);
+
+-- Create the professors view
+CREATE OR REPLACE VIEW {{USERNAME}}.view_on_professors
+OPTIONS(anonymization_userid_column='id')
+AS (SELECT * FROM {{USERNAME}}.professors);
+
+-- Run the DIFFERENTIAL_PRIVACY query
+SELECT WITH DIFFERENTIAL_PRIVACY
+  OPTIONS (
+    epsilon = 1e20,
+    delta = .01,
+    max_groups_contributed = 1,
+    group_selection_strategy = PUBLIC_GROUPS
+  )
+  item,
+  AVG(quantity) AS average_quantity
+FROM {{USERNAME}}.view_on_professors
+RIGHT OUTER JOIN
+(SELECT DISTINCT item FROM UNNEST(['pen', 'pencil', 'book']))
+USING (item)
+GROUP BY item;
+
+-- The privacy unit ID 123 was only included in the pen group in this example.
+-- Noise was removed from this query for demonstration purposes only.
+/*----------+------------------*
+ | item     | average_quantity |
+ +----------+------------------+
+ | pencil   | 40               |
+ | pen      | 18.5             |
+ | book     | 0                |
+ *----------+------------------*/
+```
+
 ## Using aliases 
 
 
@@ -5469,12 +5756,14 @@ Results:
 
 [dp-epsilon]: #dp_epsilon
 
-[dp-kappa]: #dp_kappa
+[dp-max-groups]: #dp_max_groups
 
 [dp-delta]: #dp_delta
 
 [dp-privacy-unit-id]: #dp_privacy_unit_id
 
+[dp-group-selection-strategy]: #dp_group_selection_strategy
+
 [dp-define-privacy-unit-id]: https://github.com/google/zetasql/blob/master/docs/differential-privacy.md#dp_define_privacy_unit_id
 
 [dp-functions]: https://github.com/google/zetasql/blob/master/docs/aggregate-dp-functions.md
diff --git a/docs/resolved_ast.md b/docs/resolved_ast.md
index cd2b33e12..d01721b12 100755
--- a/docs/resolved_ast.md
+++ b/docs/resolved_ast.md
@@ -74,7 +74,10 @@ See that file for comments on specific nodes and fields.
     ResolvedColumnDefaultValue
     ResolvedColumnDefinition
     ResolvedColumnHolder
-    ResolvedComputedColumn
+    ResolvedComputedColumnBase
+      ResolvedComputedColumnImpl
+        ResolvedComputedColumn
+        ResolvedDeferredComputedColumn
     ResolvedConnection
     ResolvedConstraint
       ResolvedCheckConstraint
@@ -109,6 +112,7 @@ See that file for comments on specific nodes and fields.
     ResolvedOutputColumn
     ResolvedPivotColumn
     ResolvedPrivilege
+    ResolvedRecursionDepthModifier
     ResolvedReplaceFieldItem
     ResolvedReturningClause
     ResolvedSequence
@@ -186,6 +190,7 @@ See that file for comments on specific nodes and fields.
       ResolvedAlterApproxViewStmt
       ResolvedAlterDatabaseStmt
       ResolvedAlterEntityStmt
+      ResolvedAlterExternalSchemaStmt
       ResolvedAlterMaterializedViewStmt
       ResolvedAlterModelStmt
       ResolvedAlterPrivilegeRestrictionStmt
@@ -212,7 +217,9 @@ See that file for comments on specific nodes and fields.
       ResolvedCreateModelStmt
       ResolvedCreatePrivilegeRestrictionStmt
       ResolvedCreateProcedureStmt
-      ResolvedCreateSchemaStmt
+      ResolvedCreateSchemaStmtBase
+        ResolvedCreateExternalSchemaStmt
+        ResolvedCreateSchemaStmt
       ResolvedCreateSnapshotTableStmt
       ResolvedCreateTableFunctionStmt
       ResolvedCreateTableStmtBase
@@ -1440,9 +1447,8 @@ class ResolvedModel : public ResolvedArgument {
 
 
 


-// Represents a connection object as a TVF argument.
-// <connection> is the connection object encapsulated metadata to connect to
-// an external data source.
+// Represents a connection object, which encapsulates engine-specific
+// metadata used to connect to an external data source.
 class ResolvedConnection : public ResolvedArgument {
   static const ResolvedNodeKind TYPE = RESOLVED_CONNECTION;
 
@@ -1561,6 +1567,13 @@ class ResolvedJoinScan : public ResolvedScan {
   const ResolvedScan* right_scan() const;
 
   const ResolvedExpr* join_expr() const;
+
+  // This indicates this join was generated from syntax with USING.
+  // The sql_builder will use this field only as a suggestion.
+  // JOIN USING(...) syntax will be used if and only if
+  // `has_using` is True and `join_expr` has the correct shape.
+  // Otherwise the sql_builder will generate JOIN ON.
+  bool has_using() const;
 };
 

@@ -1816,9 +1829,9 @@ class ResolvedAggregateScanBase : public ResolvedScanResolvedCollation collation_list(int i) const; - const std::vector<std::unique_ptr<const ResolvedComputedColumn>>& aggregate_list() const; + const std::vector<std::unique_ptr<const ResolvedComputedColumnBase>>& aggregate_list() const; int aggregate_list_size() const; - const ResolvedComputedColumn* aggregate_list(int i) const; + const ResolvedComputedColumnBase* aggregate_list(int i) const; const std::vector<std::unique_ptr<const ResolvedGroupingSetBase>>& grouping_set_list() const; int grouping_set_list_size() const; @@ -2104,8 +2117,11 @@ class ResolvedWithRefScan : public ResolvedScan { // window ORDER BY. // // The output <column_list> contains all columns from <input_scan>, -// one column per analytic function. It may also conain partitioning/ordering -// expression columns if they reference to select columns. +// one column per analytic function. It may also contain partitioning/ordering +// expression columns if they reference to select columns. +// +// Currently, the analyzer combines equivalent OVER clauses into the same +// ResolvedAnalyticFunctionGroup only for OVER () or a named window. class ResolvedAnalyticScan : public ResolvedScan { static const ResolvedNodeKind TYPE = RESOLVED_ANALYTIC_SCAN; @@ -2170,16 +2186,66 @@ class ResolvedSampleScan : public ResolvedScan { };

-### ResolvedComputedColumn - +### ResolvedComputedColumnBase +


 // This is used when an expression is computed and given a name (a new
 // ResolvedColumn) that can be referenced elsewhere.  The new ResolvedColumn
 // can appear in a column_list or in ResolvedColumnRefs in other expressions,
 // when appropriate.  This node is not an expression itself - it is a
-// container that holds an expression.
-class ResolvedComputedColumn : public ResolvedArgument {
+// container that holds an expression.
+//
+// There are 2 concrete subclasses: ResolvedComputedColumn and
+// ResolvedDeferredComputedColumn.
+//
+// ResolvedDeferredComputedColumn has extra information about deferring
+// side effects like errors.  This can be used in cases like AggregateScans
+// before conditional expressions like IF(), where errors from the aggregate
+// function should only be exposed if the right IF branch is chosen.
+//
+// Nodes where deferred side effects are not possible (like GROUP BY
+// expressions) are declared as ResolvedComputedColumn directly.
+//
+// Nodes that might need to defer errors, such as AggregateScan's
+// aggregate_list(), are declared as ResolvedComputedColumnBase.
+// The runtime type will be either ResolvedComputedColumn or
+// ResolvedDeferredComputedColumn, depending on whether any side effects need
+// to be captured.
+//
+// If FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION is not set, the runtime
+// type is always just ResolvedComputedColumn.
+//
+// See (broken link) for more details.
+class ResolvedComputedColumnBase : public ResolvedArgument {
+  // Virtual getter to avoid changing ResolvedComputedColumnProto
+  virtual const ResolvedColumn& column() const = 0;
+
+  // Virtual getter to avoid changing ResolvedComputedColumnProto
+  virtual const ResolvedExpr* expr() const = 0;
+
+};
+

+ +### ResolvedComputedColumnImpl + + +


+// An intermediate abstract superclass that holds common getters for
+// ResolvedComputedColumn and ResolvedDeferredComputedColumn. This class
+// exists to ensure that callers static_cast to the appropriate subclass,
+// rather than processing ResolvedComputedColumnBase directly.
+class ResolvedComputedColumnImpl : public ResolvedComputedColumnBase {
+};
+

+ +### ResolvedComputedColumn + + +


+// This is the usual ResolvedComputedColumn without deferred side effects.
+// See comments on ResolvedComputedColumnBase.
+class ResolvedComputedColumn : public ResolvedComputedColumnImpl {
   static const ResolvedNodeKind TYPE = RESOLVED_COMPUTED_COLUMN;
 
   const ResolvedColumn& column() const;
@@ -2188,6 +2254,46 @@ class ResolvedComputedColumn : public ResolvedArgume
 };
 

+### ResolvedDeferredComputedColumn + + +


+// This is a ResolvedColumnColumn variant that adds deferred side effect
+// capture.
+//
+// This is used for computations that get separated into multiple scans,
+// where side effects like errors in earlier scans need to be deferred
+// util conditional expressions in later scans are evalauted.
+// See (broken link) for details.
+// For example:
+//   SELECT IF(C, SUM(A/B), -1) FROM T
+// The division A/B could produce an error when B is 0, but errors should not
+// be exposed when C is false, due to IF's conditional evaluation semantics.
+//
+// `side_effect_column` is a new column (of type BYTES) created at the same
+// time as `column`, storing side effects like errors from the computation.
+// This column will store an implementation-specific representation of the
+// side effect (e.g. util::StatusProto) and will get a NULL value if there
+// are no captured side effects.
+//
+// Typically, this column will be passed to a call to the internal function
+// $with_side_effect() later to expose the side effects. The validator checks
+// that it is consumed downstream.
+class ResolvedDeferredComputedColumn : public ResolvedComputedColumnImpl {
+  static const ResolvedNodeKind TYPE = RESOLVED_DEFERRED_COMPUTED_COLUMN;
+
+  const ResolvedColumn& column() const;
+
+  const ResolvedExpr* expr() const;
+
+  // Creates the companion side effects columns for this
+  // computation, of type BYTES. Instead of immediately exposing the
+  // side effect (e.g. an error), the side effect is captured in the
+  // side_effects_column.
+  const ResolvedColumn& side_effect_column() const;
+};
+

+ ### ResolvedOrderByItem @@ -3051,6 +3157,27 @@ class ResolvedCreateIndexStmt : public Resolv };

+### ResolvedCreateSchemaStmtBase +
+ +


+// A base for statements that create schemas, such as:
+//   CREATE [OR REPLACE] SCHEMA [IF NOT EXISTS] <name>
+//   [DEFAULT COLLATE <collation>]
+//   [OPTIONS (name=value, ...)]
+//
+//   CREATE [OR REPLACE] [TEMP|TEMPORARY|PUBLIC|PRIVATE] EXTERNAL SCHEMA
+//   [IF NOT EXISTS] <name> WITH CONNECTION <connection>
+//   OPTIONS (name=value, ...)
+//
+// <option_list> contains engine-specific options associated with the schema
+class ResolvedCreateSchemaStmtBase : public ResolvedCreateStatement {
+  const std::vector<std::unique_ptr<const ResolvedOption>>& option_list() const;
+  int option_list_size() const;
+  const ResolvedOption* option_list(int i) const;
+};
+

+ ### ResolvedCreateSchemaStmt @@ -3059,8 +3186,6 @@ class ResolvedCreateIndexStmt : public Resolv // CREATE [OR REPLACE] SCHEMA [IF NOT EXISTS] <name> // [DEFAULT COLLATE <collation>] // [OPTIONS (name=value, ...)] -// -// <option_list> engine-specific options. // <collation_name> specifies the default collation specification for future // tables created in the dataset. If a table is created in this dataset // without specifying table-level default collation, it inherits the @@ -3071,14 +3196,31 @@ class ResolvedCreateIndexStmt : public Resolv // Note: If a table being created in this schema does not specify table // default collation, the engine should copy the dataset default collation // to the table as the table default collation. -class ResolvedCreateSchemaStmt : public ResolvedCreateStatement { +class ResolvedCreateSchemaStmt : public ResolvedCreateSchemaStmtBase { static const ResolvedNodeKind TYPE = RESOLVED_CREATE_SCHEMA_STMT; const ResolvedExpr* collation_name() const; +}; +

- const std::vector<std::unique_ptr<const ResolvedOption>>& option_list() const; - int option_list_size() const; - const ResolvedOption* option_list(int i) const; +### ResolvedCreateExternalSchemaStmt + + +


+// This statement:
+// CREATE [OR REPLACE] [TEMP|TEMPORARY|PUBLIC|PRIVATE] EXTERNAL SCHEMA
+// [IF NOT EXISTS] <name> WITH CONNECTION <connection>
+// OPTIONS (name=value, ...)
+//
+// <connection> encapsulates engine-specific metadata used to connect
+// to an external data source
+//
+// Note: external schemas are pointers to schemas defined in an external
+// system. CREATE EXTERNAL SCHEMA does not actually build a new schema.
+class ResolvedCreateExternalSchemaStmt : public ResolvedCreateSchemaStmtBase {
+  static const ResolvedNodeKind TYPE = RESOLVED_CREATE_EXTERNAL_SCHEMA_STMT;
+
+  const ResolvedConnection* connection() const;
 };
 

@@ -3300,13 +3442,13 @@ class ResolvedCreateModelAliasedQuery : public Resol // * Trained: <has_query> // * External: !<has_query> // * Remote models <is_remote> = TRUE -// * Trained: <has_query> [Not supported yet] +// * Trained: <has_query> // * External: !<has_query> // // <option_list> has engine-specific directives for how to train this model. -// <query> is the AS SELECT statement. It can be only set when <is_remote> is -// false and all of <input_column_definition_list>, -// <output_column_definition_list> and <aliased_query_list> are empty. +// <query> is the AS SELECT statement. It can be only set when all of +// <input_column_definition_list>, <output_column_definition_list> and +// <aliased_query_list> are empty. // TODO: consider rename to <query_output_column_list>. // <output_column_list> matches 1:1 with the <query>'s column_list and // identifies the names and types of the columns output from the select @@ -3319,8 +3461,7 @@ class ResolvedCreateModelAliasedQuery : public Resol // columns. Cannot be set if <has_query> is true. Might be absent when // <is_remote> is true, meaning schema is read from the remote model // itself. -// <is_remote> is true if this is a remote model. Cannot be set when -// <has_query> is true. +// <is_remote> is true if this is a remote model. // <connection> is the identifier path of the connection object. It can be // only set when <is_remote> is true. // <transform_list> is the list of ResolvedComputedColumn in TRANSFORM @@ -3953,6 +4094,42 @@ class ResolvedRecursiveRefScan : public ResolvedScan };

+### ResolvedRecursionDepthModifier + + +


+// This represents a recursion depth modifier to recursive CTE:
+//     WITH DEPTH [ AS <recursion_depth_column> ]
+//                [ BETWEEN <lower_bound> AND <upper_bound> ]
+//
+// <lower_bound> and <upper_bound> represents the range of iterations (both
+// side included) whose results are part of CTE's final output.
+//
+// lower_bound and upper_bound are two integer literals or
+// query parameters. Query parameter values must be checked at run-time by
+// ZetaSQL compliant backend systems.
+// - both lower/upper_bound must be non-negative;
+// - lower_bound is by default zero if unspecified;
+// - upper_bound is by default infinity if unspecified;
+// - lower_bound must be smaller or equal than upper_bound;
+//
+// <recursion_depth_column> is the column that represents the
+// recursion depth semantics: the iteration number that outputs this row;
+// it is part of ResolvedRecursiveScan's column list when specified, but
+// there is no corresponding column in the inputs of Recursive CTE.
+//
+// See (broken link):explicit-recursion-depth for details.
+class ResolvedRecursionDepthModifier : public ResolvedArgument {
+  static const ResolvedNodeKind TYPE = RESOLVED_RECURSION_DEPTH_MODIFIER;
+
+  const ResolvedExpr* lower_bound() const;
+
+  const ResolvedExpr* upper_bound() const;
+
+  const ResolvedColumnHolder* recursion_depth_column() const;
+};
+

+ ### ResolvedRecursiveScan @@ -3974,19 +4151,29 @@ class ResolvedRecursiveRefScan : public ResolvedScan // // At runtime, a recursive scan is evaluated using an iterative process: // -// Step 1: Evaluate the non-recursive term. If UNION DISTINCT +// Step 1 (iteration 0): Evaluate the non-recursive term. If UNION DISTINCT // is specified, discard duplicates. // -// Step 2: +// Step 2 (iteration k): // Repeat until step 2 produces an empty result: // Evaluate the recursive term, binding the recursive table to the -// new rows produced by previous step. If UNION DISTINCT is specified, -// discard duplicate rows, as well as any rows which match any -// previously-produced result. +// new rows produced by previous step (iteration k-1). +// If UNION DISTINCT is specified, discard duplicate rows, as well as any +// rows which match any previously-produced result. // // Step 3: // The final content of the recursive table is the UNION ALL of all results -// produced (step 1, plus all iterations of step 2). +// produced [lower_bound, upper_bound] iterations specified in the +// recursion depth modifier. (which are already DISTINCT because of step 2, +// if the query had UNION DISTINCT). The final content is augmented by the +// column specified in the recursion depth modifier (if specified) which +// represents the iteration number that the row is output. +// If UNION DISTINCT is specified, the depth column represents the first +// iteration that produces a given row. +// The depth column will be part of the output column list. +// +// When recursion_depth_modifier is unspecified, the lower bound is +// effectively zero, the upper bound is infinite. // // ResolvedRecursiveScan only supports a recursive WITH entry which // directly references itself; ZetaSQL does not support mutual recursion @@ -4005,6 +4192,8 @@ class ResolvedRecursiveScan : public ResolvedScan { const ResolvedSetOperationItem* non_recursive_term() const; const ResolvedSetOperationItem* recursive_term() const; + + const ResolvedRecursionDepthModifier* recursion_depth_modifier() const; };

@@ -5181,6 +5370,18 @@ class ResolvedAlterSchemaStmt : public Resolv };

+### ResolvedAlterExternalSchemaStmt +
+ +


+// This statement:
+// ALTER EXTERNAL SCHEMA [IF EXISTS] <name_path> <alter_action_list>;
+class ResolvedAlterExternalSchemaStmt : public ResolvedAlterObjectStmt {
+  static const ResolvedNodeKind TYPE = RESOLVED_ALTER_EXTERNAL_SCHEMA_STMT;
+
+};
+

+ ### ResolvedAlterModelStmt @@ -7341,7 +7542,8 @@ class ResolvedAuxLoadDataStmt : public ResolvedStat


 // This statement:
 //   UNDROP <schema_object_kind> [IF NOT EXISTS] <name_path>
-//   FOR SYSTEM_TIME AS OF [<for_system_time_expr>];
+//   FOR SYSTEM_TIME AS OF [<for_system_time_expr>]
+//   [OPTIONS (name=value, ...)];
 //
 // <schema_object_kind> is a string identifier for the entity to be
 // undroped. Currently, only 'SCHEMA' object is supported.
@@ -7353,7 +7555,9 @@ class ResolvedAuxLoadDataStmt : public ResolvedStat
 // exists.
 //
 // <for_system_time_expr> specifies point in time from which entity is to
-// be undropped.
+// be undropped.
+//
+// <option_list> contains engine-specific options associated with the schema.
 class ResolvedUndropStmt : public ResolvedStatement {
   static const ResolvedNodeKind TYPE = RESOLVED_UNDROP_STMT;
 
@@ -7366,6 +7570,10 @@ class ResolvedUndropStmt : public ResolvedStatement
   std::string name_path(int i) const;
 
   const ResolvedExpr* for_system_time_expr() const;
+
+  const std::vector<std::unique_ptr<const ResolvedOption>>& option_list() const;
+  int option_list_size() const;
+  const ResolvedOption* option_list(int i) const;
 };
 

diff --git a/docs/string_functions.md b/docs/string_functions.md index 23b77b027..b0707a417 100644 --- a/docs/string_functions.md +++ b/docs/string_functions.md @@ -1007,18 +1007,6 @@ FROM Employees; ### `EDIT_DISTANCE` - - ```sql EDIT_DISTANCE(value1, value2, [max_distance => max_distance_value]) ``` @@ -1808,11 +1796,11 @@ The `STRING` is formatted as follows:
BYTES unquoted escaped bytes
- e.g. abc\x01\x02 + e.g., abc\x01\x02
quoted bytes literal
- e.g. b"abc\x01\x02" + e.g., b"abc\x01\x02"
- - + + @@ -296,7 +297,12 @@ ZetaSQL represents types in the following manner: - + @@ -375,14 +381,12 @@ ZetaSQL represents types in the following manner: @@ -392,12 +396,12 @@ ZetaSQL represents types in the following manner: UINT64 @@ -413,9 +417,8 @@ ZetaSQL represents types in the following manner: @@ -440,6 +443,26 @@ ZetaSQL represents types in the following manner:
SQL Data TypeJavaScript Data TypeZetaSQL
Data Type
JavaScript
Data Type
Notes
ARRAY Array + An array of arrays is not supported. To get around this + limitation, use + JavaScript Array<Object<Array>> and + ZetaSQL ARRAY<STRUCT<ARRAY>>. +
INT64 - - See notes - + N/A - - See the documentation for your database engine. - + INT64 is unsupported as an input type for JavaScript UDFs. Instead, + use DOUBLE to represent integer values as a + number, or STRING to represent integer values as a string.
- - See notes - + N/A - Same as INT64. + UINT64 is unsupported as an input type for JavaScript UDFs. Instead, + use DOUBLE to represent integer values as a + number, or STRING to represent integer values as a string.
STRUCT Object - - See the documentation for your database engine. - + Object where each STRUCT field is a named property in the Object. + Unnamed field in STRUCT is not supported.
+Some ZetaSQL types have a direct mapping to JavaScript types, but +others do not. + +For example, because JavaScript does not support a 64-bit integer type, +`INT64` is unsupported as an input type for JavaScript UDFs. Instead, +use `DOUBLE` to represent integer values as a number, +or `STRING` to represent integer values as a string. + +ZetaSQL does support `INT64` as a return type in JavaScript UDFs. +In this case, the JavaScript function body can return either a JavaScript +`Number` or a `String`. ZetaSQL then converts either of +these types to `INT64`. + +In addition, some ZetaSQL and JavaScript data types have different +rules. For example, in JavaScript, you can have an array of arrays +(`Array`), whereas in ZetaSQL, you can't. Before using +encodings, ensure they are compatible. To learn more about ZetaSQL +data types, see [ZetaSQL data types][data-types]. To learn more about +JavaScript data types, see [JavaScript data types][javascript-types]. + ### JavaScript UDF examples The following example illustrates a simple JavaScript UDF with a @@ -1071,6 +1094,8 @@ valid for that argument type. +[javascript-types]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects + [templated-parameters]: #templated_function_parameters [javascript-data-types]: #javascript_udf_data_types diff --git a/docs/variables.md b/docs/variables.md index ca654bc62..6ac7a1103 100644 --- a/docs/variables.md +++ b/docs/variables.md @@ -4,40 +4,121 @@ # Variables -ZetaSQL supports variables. +ZetaSQL supports three types of variables: -## Variable types ++ System variables: Defined by a client or an engine to expose configuration. ++ Query parameters: Defined by a user to bind typed values into a query. ++ Runtime variables: Defined by a user to track state in a script. -There are three primary variable types: runtime variables, query parameters, and -system variables. Each can be set and unset with the `SET` and `UNSET` commands. -Additionally, query parameters are prefixed with a single @ symbol and system -variables are prefixed with @@. +The implementation that you use determines which variable types are supported +and the way you set the variables. -The variable names must be valid sql [identifiers][link-to-sql-identifiers]. +## System variables -## `SET` +System variables are defined by a client or an engine to expose some state or +configuration. System variables are prefixed with a double `@@` symbol, and must +be one or more SQL [identifiers][sql-identifiers] separated by periods. -Sets a variable. +Because system variables are defined by each implementation and not by the +ZetaSQL language, see the documentation for your implementation to +determine the names, types, and behavior of available system variables. +**Syntax** + +```sql +SET @@system_variable = expression; ``` -SET runtime_variable = constant_value; -SET @query_parameter = constant_value; -SET @@system_variable = constant_value; + +**Examples** + +```sql +-- Set the system variable `@@system_var_a` to have the literal STRING value +-- `"TEST"`. +SET @@system_var_a = "TEST"; + +-- Set the system variable `@@Request.system_var_b` to have the value of the +-- expression `1+2+3`. +SET @@Request.system_var_b = 1+2+3; + +-- Reference the system variable `@@system_var_c` from a query. Whether system +-- variables can be read in this way depends on the implementation you use. +SELECT @@system_var_c; ``` -## `UNSET` +## Query parameters -Unsets a variable. +Query parameters are defined by a user as part of a query or request, and are +used to bind typed values into a query. Query parameters are prefixed with a +single `@` symbol. +**Syntax** + +```sql +SET @query_parameter = expression; ``` -UNSET runtime_variable; -UNSET @query_parameter; -UNSET @@system_variable; + +**Examples** + +```sql +-- Set the query parameter `@query_parameter_a` to have the value of the +-- expression `1`. +SET @query_parameter_a = 1; + +-- Set the query parameter `@query_parameter_b` to have the value of an array +-- result from a scalar subquery. +SET @query_parameter_b = (SELECT ARRAY_AGG(country) FROM countries_t_able); + +-- Reference the query parameters in a subsequent query. +SELECT * +FROM my_table +WHERE + total_count > @query_parameter_a + AND country IN UNNEST(@query_parameter_b) +``` + +## Runtime variables + +Runtime variables are defined and set by a user to track state in a +ZetaSQL [procedural language][procedural-language] script. You must +first declare a runtime variable using a [`DECLARE`][declare] statement before +you can set the variable. + +**Syntax** + +```sql +DECLARE runtime_variable [variable_type] [DEFAULT expression]; +SET runtime_variable = expression; +SET (variable1, variable2, ...) = struct_expression; +``` + +**Examples** + +```sql +-- Declare two runtime variables: `target_word` and `corpus_count`. +DECLARE target_word STRING DEFAULT 'methinks'; +DECLARE corpus_count, word_count INT64; + +-- Set the variables by assigning the results of a `SELECT AS STRUCT` query to +-- the two variables. +SET (corpus_count, word_count) = ( + SELECT AS STRUCT COUNT(DISTINCT corpus), SUM(word_count) + FROM shakespeare + WHERE LOWER(word) = target_word +); + +-- Reference the runtime variables in a subsequent query. +SELECT + FORMAT('Found %d occurrences of "%s" across %d Shakespeare works', + word_count, target_word, corpus_count) AS result; ``` -[link-to-sql-identifiers]: https://github.com/google/zetasql/blob/master/docs/lexical.md#identifiers +[sql-identifiers]: https://github.com/google/zetasql/blob/master/docs/lexical.md#identifiers + +[procedural-language]: https://github.com/google/zetasql/blob/master/docs/procedural-language.md + +[declare]: https://github.com/google/zetasql/blob/master/docs/procedural-language.md#declare diff --git a/docs/window-function-calls.md b/docs/window-function-calls.md index 3daa0bcfe..29588d258 100644 --- a/docs/window-function-calls.md +++ b/docs/window-function-calls.md @@ -62,14 +62,15 @@ following syntax to build a window function: **Notes** A window function can appear as a scalar expression operand in -two places in the query: - - + The `SELECT` list. If the window function appears in the `SELECT` list, - its argument list and `OVER` clause can't refer to aliases introduced - in the same SELECT list. - + The `ORDER BY` clause. If the window function appears in the `ORDER BY` - clause of the query, its argument list can refer to `SELECT` - list aliases. +the following places in the query: + ++ The `SELECT` list. If the window function appears in the `SELECT` list, + its argument list and `OVER` clause can't refer to aliases introduced + in the same `SELECT` list. ++ The `ORDER BY` clause. If the window function appears in the `ORDER BY` + clause of the query, its argument list can refer to `SELECT` + list aliases. ++ The `QUALIFY` clause. A window function can't refer to another window function in its argument list or its `OVER` clause, even indirectly through an alias. diff --git a/java/com/google/zetasql/BUILD b/java/com/google/zetasql/BUILD index dec4bef1c..b0aa41c81 100644 --- a/java/com/google/zetasql/BUILD +++ b/java/com/google/zetasql/BUILD @@ -35,6 +35,7 @@ TYPES_SRCS = [ "ZetaSQLDescriptorPool.java", "ZetaSQLStrings.java", "IntervalValue.java", + "MapType.java", "ProtoType.java", "RangeType.java", "SimpleType.java", diff --git a/java/com/google/zetasql/Catalog.java b/java/com/google/zetasql/Catalog.java index 49cb06dae..2c13207f1 100644 --- a/java/com/google/zetasql/Catalog.java +++ b/java/com/google/zetasql/Catalog.java @@ -665,8 +665,7 @@ protected Type getType( } /** - * Get an object of Catalog from this Catalog, without - * looking at any nested Catalogs. + * Get an object of Catalog from this Catalog, without looking at any nested Catalogs. * *

A NULL pointer should be returned if the object doesn't exist. * @@ -674,7 +673,7 @@ protected Type getType( * * @return Catalog object if found, NULL if not found */ - protected final Catalog getCatalog(String name) { + public Catalog getCatalog(String name) { return getCatalog(name, new FindOptions()); } diff --git a/java/com/google/zetasql/DebugPrintableNode.java b/java/com/google/zetasql/DebugPrintableNode.java index 6f47bdc1a..3df44e1c0 100644 --- a/java/com/google/zetasql/DebugPrintableNode.java +++ b/java/com/google/zetasql/DebugPrintableNode.java @@ -98,7 +98,8 @@ default void debugStringImpl(String prefix1, String prefix2, StringBuilder sb) { sb.append("\n"); for (DebugStringField field : fields) { boolean printFieldName = !field.name.isEmpty(); - boolean printOneLine = field.nodes.isEmpty(); + boolean hasNewlines = field.value != null && field.value.contains("\n"); + boolean printOneLine = field.nodes.isEmpty() && !hasNewlines; if (printFieldName) { sb.append(prefix1).append("+-").append(field.name).append("="); @@ -111,6 +112,13 @@ default void debugStringImpl(String prefix1, String prefix2, StringBuilder sb) { } if (!printOneLine) { + if (hasNewlines) { + sb.append(prefix1).append("| \"\"\"\n"); + for (String line : field.value.split("\n", /*limit=*/-1)) { + sb.append(prefix1).append("| ").append(line).append("\n"); + } + sb.append(prefix1).append("| \"\"\"\n"); + } for (DebugPrintableNode node : field.nodes) { Preconditions.checkState(node != null); String fieldNameIndent = diff --git a/java/com/google/zetasql/MapType.java b/java/com/google/zetasql/MapType.java new file mode 100644 index 000000000..1354c7d40 --- /dev/null +++ b/java/com/google/zetasql/MapType.java @@ -0,0 +1,81 @@ +/* + * Copyright 2019 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.google.zetasql; + +import com.google.zetasql.ZetaSQLOptions.ProductMode; +import com.google.zetasql.ZetaSQLType.MapTypeProto; +import com.google.zetasql.ZetaSQLType.TypeKind; +import com.google.zetasql.ZetaSQLType.TypeProto; +import java.util.Objects; + +/** Represents a MAP type, where K is the key and V is the value. */ +public class MapType extends Type { + static boolean equalsImpl(MapType type1, MapType type2, boolean equivalent) { + return type1.keyType.equalsInternal(type2.keyType, equivalent) + && type1.valueType.equalsInternal(type2.valueType, equivalent); + } + + private final Type keyType; + private final Type valueType; + + /** Private constructor, instances must be created with {@link TypeFactory} */ + MapType(Type keyType, Type valueType) { + super(TypeKind.TYPE_MAP); + this.keyType = keyType; + this.valueType = valueType; + } + + public Type getKeyType() { + return keyType; + } + + public Type getValueType() { + return valueType; + } + + @Override + public void serialize( + TypeProto.Builder typeProtoBuilder, FileDescriptorSetsBuilder fileDescriptorSetsBuilder) { + typeProtoBuilder.setTypeKind(getKind()); + MapTypeProto.Builder map = typeProtoBuilder.getMapTypeBuilder(); + keyType.serialize(map.getKeyTypeBuilder(), fileDescriptorSetsBuilder); + valueType.serialize(map.getValueTypeBuilder(), fileDescriptorSetsBuilder); + } + + @Override + public int hashCode() { + return Objects.hash(keyType, valueType, getKind()); + } + + @Override + public String typeName(ProductMode productMode) { + return String.format( + "MAP<%s, %s>", keyType.typeName(productMode), valueType.typeName(productMode)); + } + + @Override + public String debugString(boolean details) { + return String.format( + "MAP<%s, %s>", keyType.debugString(details), valueType.debugString(details)); + } + + @Override + public MapType asMap() { + return this; + } +} diff --git a/java/com/google/zetasql/SimpleCatalog.java b/java/com/google/zetasql/SimpleCatalog.java index d17ad5f78..d8c096872 100644 --- a/java/com/google/zetasql/SimpleCatalog.java +++ b/java/com/google/zetasql/SimpleCatalog.java @@ -721,6 +721,11 @@ public SimpleCatalog getCatalog(String name, FindOptions options) { return catalogs.get(Ascii.toLowerCase(name)); } + @Override + public SimpleCatalog getCatalog(String name) { + return getCatalog(name, new FindOptions()); + } + @Override protected Function getFunction(String name, FindOptions options) { return customFunctions.get(Ascii.toLowerCase(name)); diff --git a/java/com/google/zetasql/Type.java b/java/com/google/zetasql/Type.java index 665a9e9b1..286654d65 100644 --- a/java/com/google/zetasql/Type.java +++ b/java/com/google/zetasql/Type.java @@ -78,6 +78,7 @@ public abstract class Type implements Serializable { "JSON", "INTERVAL", "RANGE", + "MAP", }; /** Returns {@code true} if the given {@code date} value is within valid range. */ @@ -204,6 +205,10 @@ public boolean isRange() { return kind == TypeKind.TYPE_RANGE; } + public boolean isMap() { + return kind == TypeKind.TYPE_MAP; + } + public boolean isStructOrProto() { return isStruct() || isProto(); } @@ -410,6 +415,11 @@ public RangeType asRange() { return null; } + /** Returns {@code this} cast to MapType or null for other types. */ + public MapType asMap() { + return null; + } + @SuppressWarnings("ReferenceEquality") protected boolean equalsInternal(Type other, boolean equivalent) { if (other == this) { @@ -439,6 +449,8 @@ protected boolean equalsInternal(Type other, boolean equivalent) { return ProtoType.equalsImpl(this.asProto(), other.asProto(), equivalent); case TYPE_RANGE: return RangeType.equalsImpl(this.asRange(), other.asRange(), equivalent); + case TYPE_MAP: + return MapType.equalsImpl(this.asMap(), other.asMap(), equivalent); default: throw new IllegalArgumentException("Shouldn't happen: unsupported type " + other); } diff --git a/java/com/google/zetasql/TypeFactory.java b/java/com/google/zetasql/TypeFactory.java index 96ca1f52a..c888e7dc2 100644 --- a/java/com/google/zetasql/TypeFactory.java +++ b/java/com/google/zetasql/TypeFactory.java @@ -38,6 +38,7 @@ import com.google.zetasql.ZetaSQLOptions.ProductMode; import com.google.zetasql.ZetaSQLType.ArrayTypeProto; import com.google.zetasql.ZetaSQLType.EnumTypeProto; +import com.google.zetasql.ZetaSQLType.MapTypeProto; import com.google.zetasql.ZetaSQLType.ProtoTypeProto; import com.google.zetasql.ZetaSQLType.RangeTypeProto; import com.google.zetasql.ZetaSQLType.StructFieldProto; @@ -183,6 +184,11 @@ public static RangeType createRangeType(Type elementType) { return new RangeType(elementType); } + /** Returns a MapType that contains the given {@code keyType} and {@code valueType}. */ + public static MapType createMapType(Type keyType, Type valueType) { + return new MapType(keyType, valueType); + } + /** * Returns a ProtoType with a proto message descriptor that is loaded from FileDescriptorSet with * {@link DescriptorPool}. @@ -318,6 +324,8 @@ public final Type deserialize(TypeProto proto, List po case TYPE_RANGE: return deserializeRangeType(proto, pools); + case TYPE_MAP: + return deserializeMapType(proto, pools); default: throw new IllegalArgumentException( String.format("proto.type_kind: %s", proto.getTypeKind())); @@ -409,6 +417,12 @@ private RangeType deserializeRangeType(TypeProto proto, List pools) { + MapTypeProto mapType = proto.getMapType(); + return createMapType( + deserialize(mapType.getKeyType(), pools), deserialize(mapType.getValueType(), pools)); + } } private static Descriptor getDescriptor(Class type) { diff --git a/java/com/google/zetasql/parser/ASTNodes.java.template b/java/com/google/zetasql/parser/ASTNodes.java.template index 4a37ddf63..0e6f3d8d5 100644 --- a/java/com/google/zetasql/parser/ASTNodes.java.template +++ b/java/com/google/zetasql/parser/ASTNodes.java.template @@ -142,6 +142,13 @@ public class ASTNodes { new DebugStringField("{{field.name}}", {{field.name|lower_camel_case}})); } + # elif not field.serialize_default_value + {# Reusing serialization option to decide not to print default values. #} + if ({{field.name|lower_camel_case}}) { + debugStringfields.add( + new DebugStringField("{{field.name}}", + {{field.name|lower_camel_case}})); + } # else debugStringfields.add( new DebugStringField("{{field.name}}", diff --git a/java/com/google/zetasql/resolvedast/DebugStrings.java b/java/com/google/zetasql/resolvedast/DebugStrings.java index 14a6a2d26..40bbaf39a 100644 --- a/java/com/google/zetasql/resolvedast/DebugStrings.java +++ b/java/com/google/zetasql/resolvedast/DebugStrings.java @@ -48,6 +48,7 @@ import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedCast; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedComputedColumn; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedConstant; +import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedDeferredComputedColumn; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedExtendedCastElement; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedFunctionCallBase; import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedMakeProtoField; @@ -328,6 +329,17 @@ static String toStringCommaSeparated(ImmutableList resolvedCo * children. */ static void collectDebugStringFields(ResolvedComputedColumn node, List fields) { + fields.clear(); + node.getExpr().collectDebugStringFieldsWithNameFormat(fields); + } + + /** + * ResolvedDeferredComputedColumn gets formatted as "name := expr" with expr's children printed as + * its own children. + */ + static void collectDebugStringFields( + ResolvedDeferredComputedColumn node, List fields) { + fields.clear(); node.getExpr().collectDebugStringFieldsWithNameFormat(fields); } @@ -457,7 +469,18 @@ static void collectDebugStringFields(ResolvedSystemVariable node, List(), options.build()); - assertThat(fn2.getSignatureList()).hasSize(0); + assertThat(fn2.getSignatureList()).isEmpty(); assertThat(fn2.debugString(true)).isEqualTo("ZetaSQLTest:test_function_2"); assertThat(fn2.getMode()).isEqualTo(Mode.AGGREGATE); diff --git a/javatests/com/google/zetasql/MapTypeTest.java b/javatests/com/google/zetasql/MapTypeTest.java new file mode 100644 index 000000000..3a4eba2b9 --- /dev/null +++ b/javatests/com/google/zetasql/MapTypeTest.java @@ -0,0 +1,261 @@ +/* + * Copyright 2019 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.google.zetasql; + +import static com.google.common.truth.Truth.assertThat; +import static com.google.common.truth.Truth.assertWithMessage; +import static com.google.zetasql.TypeTestBase.checkSerializable; +import static com.google.zetasql.TypeTestBase.checkTypeSerializationAndDeserialization; + +import com.google.common.testing.EqualsTester; +import com.google.zetasql.ZetaSQLDescriptorPool.GeneratedDescriptorPool; +import com.google.zetasql.ZetaSQLOptions.ProductMode; +import com.google.zetasql.ZetaSQLType.MapTypeProto; +import com.google.zetasql.ZetaSQLType.TypeKind; +import com.google.zetasql.ZetaSQLType.TypeProto; +import java.util.ArrayList; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class MapTypeTest { + + @Test + public void testSerializationAndDeserialization() { + TypeFactory factory = TypeFactory.nonUniqueNames(); + checkTypeSerializationAndDeserialization( + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT32), + TypeFactory.createSimpleType(TypeKind.TYPE_BOOL))); + checkTypeSerializationAndDeserialization( + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_STRING), + TypeFactory.createSimpleType(TypeKind.TYPE_STRING))); + ProtoType protoType = factory.createProtoType(TypeProto.class); + checkTypeSerializationAndDeserialization( + TypeFactory.createMapType(protoType, TypeFactory.createSimpleType(TypeKind.TYPE_STRING))); + + Type arrayType = + TypeFactory.createArrayType(TypeFactory.createSimpleType(TypeKind.TYPE_STRING)); + Type mapType = + TypeFactory.createMapType(arrayType, TypeFactory.createSimpleType(TypeKind.TYPE_BOOL)); + checkTypeSerializationAndDeserialization(TypeFactory.createMapType(mapType, arrayType)); + } + + @Test + public void testSerializable() { + TypeFactory factory = TypeFactory.nonUniqueNames(); + checkSerializable( + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT32), + TypeFactory.createSimpleType(TypeKind.TYPE_BOOL))); + ProtoType protoType = factory.createProtoType(TypeProto.class); + assertThat( + GeneratedDescriptorPool.getGeneratedPool() + .findMessageTypeByName(TypeProto.getDescriptor().getFullName())) + .isNotNull(); + + checkSerializable( + TypeFactory.createMapType(protoType, TypeFactory.createSimpleType(TypeKind.TYPE_STRING))); + } + + @Test + public void testEquivalent() { + MapType map1 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT32), + TypeFactory.createSimpleType(TypeKind.TYPE_BOOL)); + MapType map2 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT32), + TypeFactory.createSimpleType(TypeKind.TYPE_BOOL)); + MapType map3 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT64), + TypeFactory.createSimpleType(TypeKind.TYPE_INT64)); + + assertThat(map1.equivalent(map1)).isTrue(); + assertThat(map1.equivalent(map2)).isTrue(); + assertThat(map1.equivalent(map3)).isFalse(); + assertThat(map2.equivalent(map1)).isTrue(); + assertThat(map2.equivalent(map2)).isTrue(); + assertThat(map2.equivalent(map3)).isFalse(); + assertThat(map3.equivalent(map1)).isFalse(); + assertThat(map3.equivalent(map2)).isFalse(); + assertThat(map3.equivalent(map3)).isTrue(); + + assertThat(map1.equivalent(map1.getKeyType())).isFalse(); + assertThat(map1.equivalent(map1.getValueType())).isFalse(); + } + + @Test + public void testEquals() { + MapType mapSimple1 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT32), + TypeFactory.createSimpleType(TypeKind.TYPE_BOOL)); + MapType mapSimple2 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT32), + TypeFactory.createSimpleType(TypeKind.TYPE_BOOL)); + + Type arrayType = + TypeFactory.createArrayType(TypeFactory.createSimpleType(TypeKind.TYPE_STRING)); + Type mapType = + TypeFactory.createMapType(arrayType, TypeFactory.createSimpleType(TypeKind.TYPE_BOOL)); + MapType mapMapArray1 = TypeFactory.createMapType(mapType, arrayType); + MapType mapMapArray2 = TypeFactory.createMapType(mapType, arrayType); + + new EqualsTester() + .addEqualityGroup(mapSimple1, mapSimple2) + .addEqualityGroup(mapSimple1.getKeyType()) + .addEqualityGroup(mapSimple1.getValueType()) + .addEqualityGroup(mapMapArray1, mapMapArray2) + .addEqualityGroup(mapMapArray1.getKeyType()) + .addEqualityGroup(mapMapArray1.getValueType()) + .testEquals(); + } + + @Test + public void testDebugString() { + TypeFactory factory = TypeFactory.nonUniqueNames(); + + MapType map1 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE), + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE)); + MapType map2 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT64), + TypeFactory.createSimpleType(TypeKind.TYPE_INT64)); + MapType map3 = + TypeFactory.createMapType( + factory.createEnumType(TypeKind.class), + TypeFactory.createSimpleType(TypeKind.TYPE_INT64)); + MapType map4 = + TypeFactory.createMapType( + TypeFactory.createArrayType(factory.createProtoType(TypeProto.class)), + TypeFactory.createSimpleType(TypeKind.TYPE_INT64)); + + assertThat(map1.debugString(false)).isEqualTo("MAP"); + assertThat(map2.debugString(false)).isEqualTo("MAP"); + assertThat(map3.debugString(false)).isEqualTo("MAP, INT64>"); + assertThat(map4.debugString(false)).isEqualTo("MAP>, INT64>"); + + assertThat(map1.debugString(true)).isEqualTo("MAP"); + assertThat(map2.debugString(true)).isEqualTo("MAP"); + + String typeProtoPath = "zetasql/public/type.proto"; + assertThat(map3.debugString(true)) + .isEqualTo( + "MAP>, INT64>"); + assertThat(map4.debugString(true)).isEqualTo("MAP>>, INT64>"); + } + + @Test + public void testToString() { + TypeFactory factory = TypeFactory.nonUniqueNames(); + + MapType map1 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE), + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE)); + MapType map2 = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_INT64), + TypeFactory.createSimpleType(TypeKind.TYPE_INT64)); + MapType map3 = + TypeFactory.createMapType( + factory.createEnumType(TypeKind.class), + TypeFactory.createSimpleType(TypeKind.TYPE_INT64)); + MapType map4 = + TypeFactory.createMapType( + TypeFactory.createArrayType(factory.createProtoType(TypeProto.class)), + TypeFactory.createSimpleType(TypeKind.TYPE_INT64)); + + assertThat(map1.toString()).isEqualTo("MAP"); + assertThat(map2.toString()).isEqualTo("MAP"); + assertThat(map3.toString()).isEqualTo("MAP, INT64>"); + assertThat(map4.toString()).isEqualTo("MAP>, INT64>"); + } + + @Test + public void testAsMap() { + TypeFactory factory = TypeFactory.nonUniqueNames(); + + MapType map = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE), + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE)); + ArrayType array = + TypeFactory.createArrayType(TypeFactory.createSimpleType(TypeKind.TYPE_INT32)); + EnumType enumType = factory.createEnumType(TypeKind.class); + ProtoType proto = factory.createProtoType(TypeProto.class); + ArrayList fields = new ArrayList<>(); + fields.add(new StructType.StructField("", TypeFactory.createSimpleType(TypeKind.TYPE_STRING))); + fields.add(new StructType.StructField("a", TypeFactory.createSimpleType(TypeKind.TYPE_INT32))); + StructType struct = TypeFactory.createStructType(fields); + + assertThat(map.asMap()).isEqualTo(map); + assertThat(array.asMap()).isNull(); + assertThat(enumType.asMap()).isNull(); + assertThat(proto.asMap()).isNull(); + assertThat(struct.asMap()).isNull(); + assertThat(TypeFactory.createSimpleType(TypeKind.TYPE_INT32).asMap()).isNull(); + } + + @Test + public void testIsMap() { + TypeFactory factory = TypeFactory.nonUniqueNames(); + + MapType map = + TypeFactory.createMapType( + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE), + TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE)); + ArrayType array = + TypeFactory.createArrayType(TypeFactory.createSimpleType(TypeKind.TYPE_INT32)); + EnumType enumType = factory.createEnumType(TypeKind.class); + ProtoType proto = factory.createProtoType(TypeProto.class); + ArrayList fields = new ArrayList<>(); + fields.add(new StructType.StructField("", TypeFactory.createSimpleType(TypeKind.TYPE_STRING))); + fields.add(new StructType.StructField("a", TypeFactory.createSimpleType(TypeKind.TYPE_INT32))); + StructType struct = TypeFactory.createStructType(fields); + + assertThat(map.isMap()).isTrue(); + assertThat(array.isMap()).isFalse(); + assertThat(enumType.isMap()).isFalse(); + assertThat(proto.isMap()).isFalse(); + assertThat(struct.isMap()).isFalse(); + assertThat(TypeFactory.createSimpleType(TypeKind.TYPE_INT32).isMap()).isFalse(); + } + + @Test + public void testClassAndProtoSize() { + assertWithMessage( + "The number of fields of MapTypeProto has changed, please also update the " + + "serialization code accordingly.") + .that(MapTypeProto.getDescriptor().getFields()) + .hasSize(2); + assertWithMessage( + "The number of fields in MapType class has changed, please also update the proto and " + + "serialization code accordingly.") + .that(TestUtil.getNonStaticFieldCount(MapType.class)) + .isEqualTo(2); + } +} diff --git a/javatests/com/google/zetasql/TypeTest.java b/javatests/com/google/zetasql/TypeTest.java index af53072ff..9189aeffe 100644 --- a/javatests/com/google/zetasql/TypeTest.java +++ b/javatests/com/google/zetasql/TypeTest.java @@ -154,7 +154,7 @@ public void classAndProtoSize() { "The number of fields of TypeProto has changed, " + "please also update the serialization code accordingly.") .that(TypeProto.getDescriptor().getFields()) - .hasSize(8); + .hasSize(9); assertWithMessage( "The number of fields in Type class has changed, " + "please also update the proto and serialization code accordingly.") diff --git a/javatests/com/google/zetasql/ValueTest.java b/javatests/com/google/zetasql/ValueTest.java index 35a5e5b3e..b4f7f7841 100644 --- a/javatests/com/google/zetasql/ValueTest.java +++ b/javatests/com/google/zetasql/ValueTest.java @@ -3008,7 +3008,7 @@ public void testClassAndProtoSize() { "The number of fields of ValueProto has changed, " + "please also update the serialization code accordingly.") .that(ValueProto.getDescriptor().getFields()) - .hasSize(24); + .hasSize(25); assertWithMessage( "The number of fields of ValueProto::Array has changed, " + "please also update the serialization code accordingly.") diff --git a/javatests/com/google/zetasql/parser/EndToEndTest.java b/javatests/com/google/zetasql/parser/EndToEndTest.java index 55c3cd2ad..0787df033 100644 --- a/javatests/com/google/zetasql/parser/EndToEndTest.java +++ b/javatests/com/google/zetasql/parser/EndToEndTest.java @@ -379,8 +379,12 @@ public void testQueryStatement2() { + " | | | | +-Identifier(parenthesized=false, id_string=t1)\n" + " | | | | +-Identifier(parenthesized=false, id_string=col2)\n" + " | | | +-rhs=\n" - + " | | | +-StringLiteral(parenthesized=false, image='foo'," + + " | | | +-StringLiteral\n" + + " | | | +-parenthesized=false\n" + + " | | | +-components=\n" + + " | | | | +-StringLiteralComponent(parenthesized=false, image='foo'," + " string_value=foo)\n" + + " | | | +-string_value=foo\n" + " | | +-BinaryExpression\n" + " | | | +-parenthesized=false\n" + " | | | +-op=EQ\n" diff --git a/javatests/com/google/zetasql/parser/protos/query_statement1.textproto b/javatests/com/google/zetasql/parser/protos/query_statement1.textproto index 20e9de04d..18aa358b9 100644 --- a/javatests/com/google/zetasql/parser/protos/query_statement1.textproto +++ b/javatests/com/google/zetasql/parser/protos/query_statement1.textproto @@ -50,19 +50,23 @@ ast_query_statement_node { } expression { ast_leaf_node { - ast_int_literal_node { - parent { + ast_printable_leaf_node { + ast_int_literal_node { parent { parent { - parse_location_range { - filename: "" - start: 7 - end: 10 + parent { + parent { + parse_location_range { + filename: "" + start: 7 + end: 10 + } + } + parenthesized: false } } - parenthesized: false + image: "123" } - image: "123" } } } diff --git a/zetasql/analyzer/BUILD b/zetasql/analyzer/BUILD index 5d20cbfb2..290eb542f 100644 --- a/zetasql/analyzer/BUILD +++ b/zetasql/analyzer/BUILD @@ -102,6 +102,7 @@ cc_library( "@com_google_absl//absl/hash", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -145,6 +146,7 @@ cc_library( "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", "@com_google_googletest//:gtest", ], ) @@ -237,6 +239,7 @@ cc_library( "//zetasql/public:type_cc_proto", "//zetasql/public:value", "//zetasql/public/annotation:collation", + "//zetasql/public/functions:array_zip_mode_cc_proto", "//zetasql/public/functions:convert_string", "//zetasql/public/functions:date_time_util", "//zetasql/public/functions:datetime_cc_proto", @@ -289,13 +292,13 @@ cc_library( ], deps = [ "//zetasql/base", + "//zetasql/base:check", "//zetasql/parser", "//zetasql/public:function", "//zetasql/public:value", "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", - "@com_google_absl//absl/types:optional", "@com_google_absl//absl/types:span", ], ) @@ -311,17 +314,19 @@ cc_library( srcs = ["all_rewriters.cc"], hdrs = ["all_rewriters.h"], deps = [ - ":anonymization_rewriter", + "//zetasql/analyzer/rewriters:aggregation_threshold_rewriter", + "//zetasql/analyzer/rewriters:anonymization_rewriter", "//zetasql/analyzer/rewriters:array_functions_rewriter", "//zetasql/analyzer/rewriters:builtin_function_inliner", "//zetasql/analyzer/rewriters:flatten_rewriter", "//zetasql/analyzer/rewriters:grouping_set_rewriter", + "//zetasql/analyzer/rewriters:insert_dml_values_rewriter", "//zetasql/analyzer/rewriters:like_any_all_rewriter", "//zetasql/analyzer/rewriters:map_function_rewriter", + "//zetasql/analyzer/rewriters:multiway_unnest_rewriter", "//zetasql/analyzer/rewriters:nulliferror_function_rewriter", "//zetasql/analyzer/rewriters:pivot_rewriter", "//zetasql/analyzer/rewriters:registration", - "//zetasql/analyzer/rewriters:set_operation_corresponding_rewriter", "//zetasql/analyzer/rewriters:sql_function_inliner", "//zetasql/analyzer/rewriters:sql_view_inliner", "//zetasql/analyzer/rewriters:typeof_function_rewriter", @@ -383,60 +388,6 @@ cc_library( ], ) -cc_library( - name = "anonymization_rewriter", - srcs = ["anonymization_rewriter.cc"], - hdrs = ["anonymization_rewriter.h"], - deps = [ - ":expr_matching_helpers", - ":name_scope", - ":resolver", - "//zetasql/base:ret_check", - "//zetasql/base:source_location", - "//zetasql/base:status", - "//zetasql/common:errors", - "//zetasql/common:status_payload_utils", - "//zetasql/parser", - "//zetasql/proto:anon_output_with_report_cc_proto", - "//zetasql/proto:internal_error_location_cc_proto", - "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output_properties", - "//zetasql/public:anon_function", - "//zetasql/public:anonymization_utils", - "//zetasql/public:builtin_function", - "//zetasql/public:builtin_function_cc_proto", - "//zetasql/public:catalog", - "//zetasql/public:function", - "//zetasql/public:function_cc_proto", - "//zetasql/public:id_string", - "//zetasql/public:language_options", - "//zetasql/public:options_cc_proto", - "//zetasql/public:parse_location", - "//zetasql/public:rewriter_interface", - "//zetasql/public:select_with_mode", - "//zetasql/public:strings", - "//zetasql/public:type", - "//zetasql/public:type_cc_proto", - "//zetasql/public:value", - "//zetasql/public/functions:differential_privacy_cc_proto", - "//zetasql/public/types", - "//zetasql/resolved_ast", - "//zetasql/resolved_ast:make_node_vector", - "//zetasql/resolved_ast:resolved_ast_enums_cc_proto", - "//zetasql/resolved_ast:resolved_node_kind_cc_proto", - "//zetasql/resolved_ast:rewrite_utils", - "@com_google_absl//absl/base:core_headers", - "@com_google_absl//absl/container:flat_hash_map", - "@com_google_absl//absl/container:flat_hash_set", - "@com_google_absl//absl/memory", - "@com_google_absl//absl/status", - "@com_google_absl//absl/status:statusor", - "@com_google_absl//absl/strings", - "@com_google_absl//absl/strings:str_format", - "@com_google_absl//absl/types:span", - ], -) - cc_library( name = "filter_fields_path_validator", srcs = ["filter_fields_path_validator.cc"], @@ -560,10 +511,17 @@ cc_test( "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/parser:parse_tree", "//zetasql/parser:parse_tree_serializer", + "//zetasql/public:analyzer", + "//zetasql/public:analyzer_options", + "//zetasql/public:analyzer_output", "//zetasql/public:id_string", "//zetasql/public:language_options", "//zetasql/public:templated_sql_tvf", + "//zetasql/public/types", + "//zetasql/resolved_ast", + "//zetasql/testdata:sample_catalog", "@com_google_absl//absl/status", + "@com_google_absl//absl/strings", ], ) @@ -882,6 +840,7 @@ cc_test( "//zetasql/public/types", "//zetasql/resolved_ast", "@com_google_absl//absl/status", + "@com_google_absl//absl/strings", ], ) diff --git a/zetasql/analyzer/all_rewriters.cc b/zetasql/analyzer/all_rewriters.cc index 03dde8ef4..34d42f7db 100644 --- a/zetasql/analyzer/all_rewriters.cc +++ b/zetasql/analyzer/all_rewriters.cc @@ -18,14 +18,16 @@ #include -#include "zetasql/analyzer/anonymization_rewriter.h" +#include "zetasql/analyzer/rewriters/aggregation_threshold_rewriter.h" +#include "zetasql/analyzer/rewriters/anonymization_rewriter.h" #include "zetasql/analyzer/rewriters/array_functions_rewriter.h" #include "zetasql/analyzer/rewriters/builtin_function_inliner.h" #include "zetasql/analyzer/rewriters/flatten_rewriter.h" -#include "zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.h" #include "zetasql/analyzer/rewriters/grouping_set_rewriter.h" +#include "zetasql/analyzer/rewriters/insert_dml_values_rewriter.h" #include "zetasql/analyzer/rewriters/like_any_all_rewriter.h" #include "zetasql/analyzer/rewriters/map_function_rewriter.h" +#include "zetasql/analyzer/rewriters/multiway_unnest_rewriter.h" #include "zetasql/analyzer/rewriters/nulliferror_function_rewriter.h" #include "zetasql/analyzer/rewriters/pivot_rewriter.h" #include "zetasql/analyzer/rewriters/registration.h" @@ -63,6 +65,8 @@ void RegisterBuiltinRewriters() { r.Register(ResolvedASTRewrite::REWRITE_FLATTEN, GetFlattenRewriter()); r.Register(ResolvedASTRewrite::REWRITE_ANONYMIZATION, GetAnonymizationRewriter()); + r.Register(ResolvedASTRewrite::REWRITE_AGGREGATION_THRESHOLD, + GetAggregationThresholdRewriter()); r.Register(ResolvedASTRewrite::REWRITE_PROTO_MAP_FNS, GetMapFunctionRewriter()); r.Register(ResolvedASTRewrite::REWRITE_ARRAY_FILTER_TRANSFORM, @@ -80,11 +84,12 @@ void RegisterBuiltinRewriters() { r.Register(ResolvedASTRewrite::REWRITE_LIKE_ANY_ALL, GetLikeAnyAllRewriter()); - r.Register(ResolvedASTRewrite::REWRITE_SET_OPERATION_CORRESPONDING, - GetSetOperationCorrespondingRewriter()); - r.Register(ResolvedASTRewrite::REWRITE_GROUPING_SET, GetGroupingSetRewriter()); + r.Register(ResolvedASTRewrite::REWRITE_INSERT_DML_VALUES, + GetInsertDmlValuesRewriter()); + r.Register(ResolvedASTRewrite::REWRITE_MULTIWAY_UNNEST, + GetMultiwayUnnestRewriter()); // This rewriter should typically be the last in the rewrite sequence // because it cleans up after several other rewriters add ResolvedWithExprs. diff --git a/zetasql/analyzer/analytic_function_resolver.cc b/zetasql/analyzer/analytic_function_resolver.cc index c88cad10b..38c15aa98 100644 --- a/zetasql/analyzer/analytic_function_resolver.cc +++ b/zetasql/analyzer/analytic_function_resolver.cc @@ -541,47 +541,14 @@ absl::Status AnalyticFunctionResolver::ResolveWindowExpression( const Type** expr_type_out) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); - // This is NULL if this analytic function call is in the SELECT list, which - // cannot reference a column in the SELECT list. - const SelectColumnState* select_column_state = nullptr; - - // Identify whether the expression is an alias reference. Alias references - // are only allowed in the ORDER BY. We know whether or not we are in the - // ORDER BY based on whether or not named window references are allowed. - if (named_window_not_allowed_here_name_ != nullptr && - ast_expr->node_kind() == AST_PATH_EXPRESSION) { - const IdString alias = - ast_expr->GetAs()->first_name()->GetAsIdString(); - const SelectColumnStateList* select_column_state_list = - expr_resolution_info->query_resolution_info->select_column_state_list(); - ZETASQL_RETURN_IF_ERROR( - select_column_state_list->FindAndValidateSelectColumnStateByAlias( - clause_name, ast_expr, alias, expr_resolution_info, - &select_column_state)); - } - - // The ResolvedExpr of the SELECT-list column that this window expression - // references. std::unique_ptr tmp_resolved_expr; - if (select_column_state == nullptr) { - ZETASQL_RETURN_IF_ERROR(resolver_->ResolveExpr(ast_expr, expr_resolution_info, - &tmp_resolved_expr)); - } + ZETASQL_RETURN_IF_ERROR(resolver_->ResolveExpr(ast_expr, expr_resolution_info, + &tmp_resolved_expr)); + + *expr_type_out = tmp_resolved_expr->type(); + *resolved_item_out = + std::make_unique(ast_expr, tmp_resolved_expr.release()); - if (select_column_state != nullptr) { - expr_resolution_info->has_aggregation = - select_column_state->has_aggregation; - expr_resolution_info->has_analytic = select_column_state->has_analytic; - *expr_type_out = select_column_state->GetType(); - *resolved_item_out = std::make_unique( - ast_expr, select_column_state->select_list_position, - select_column_state->GetType()); - } else { - ZETASQL_RET_CHECK(tmp_resolved_expr != nullptr); - *expr_type_out = tmp_resolved_expr->type(); - *resolved_item_out = - std::make_unique(ast_expr, tmp_resolved_expr.release()); - } return absl::OkStatus(); } @@ -1052,19 +1019,9 @@ absl::Status AnalyticFunctionResolver::ResolveWindowOrderByPostAggregation( ast_order_by->ordering_expressions(); ZETASQL_RET_CHECK_EQ(ast_ordering_exprs.size(), order_by_info->size()); for (int i = 0; i < order_by_info->size(); ++i) { - ResolvedOrderByItemEnums::NullOrderMode null_order = - ResolvedOrderByItemEnums::ORDER_UNSPECIFIED; - if (ast_ordering_exprs[i]->null_order()) { - if (!resolver_->language().LanguageFeatureEnabled( - FEATURE_V_1_3_NULLS_FIRST_LAST_IN_ORDER_BY)) { - return MakeSqlErrorAt(ast_ordering_exprs[i]->null_order()) - << "NULLS FIRST and NULLS LAST are not supported"; - } else { - null_order = ast_ordering_exprs[i]->null_order()->nulls_first() - ? ResolvedOrderByItemEnums::NULLS_FIRST - : ResolvedOrderByItemEnums::NULLS_LAST; - } - } + ZETASQL_ASSIGN_OR_RETURN( + ResolvedOrderByItemEnums::NullOrderMode null_order, + resolver_->ResolveNullOrderMode(ast_ordering_exprs[i]->null_order())); // Since a window ORDER BY may be shared by multiple analytic functions, // do not create a new column if we have created one for this ordering diff --git a/zetasql/analyzer/analytic_function_resolver.h b/zetasql/analyzer/analytic_function_resolver.h index 6072e3a54..e4296f99b 100644 --- a/zetasql/analyzer/analytic_function_resolver.h +++ b/zetasql/analyzer/analytic_function_resolver.h @@ -205,7 +205,7 @@ class AnalyticFunctionResolver { analytic_function_groups_; // Map from grouping windows to their related analytic function group info. - // An analytic function group is uniquely identified by a grouping window. + // An analytic function group is uniquely identified by a grouping window. // This map is used to find the group that an analytic function belongs to // according to the grouping window of the analytic function. std::map @@ -309,8 +309,7 @@ class AnalyticFunctionResolver { ExprResolutionInfo* expr_resolution_info, WindowExprInfoList** order_by_info_out); - // Resolves a window (partitioning/ordering expression) expression and - // identifies whether it is an alias reference to a SELECT-list column. + // Resolves a window (partitioning/ordering expression) expression. absl::Status ResolveWindowExpression( const char* clause_name, const ASTExpression* ast_expr, ExprResolutionInfo* expr_resolution_info, diff --git a/zetasql/analyzer/analyzer_impl.cc b/zetasql/analyzer/analyzer_impl.cc index 6ef354282..3a731bda7 100644 --- a/zetasql/analyzer/analyzer_impl.cc +++ b/zetasql/analyzer/analyzer_impl.cc @@ -162,8 +162,8 @@ absl::Status InternalAnalyzeExpressionFromParserAST( if (absl::GetFlag(FLAGS_zetasql_print_resolved_ast)) { std::cout << "Resolved AST from thread " << std::this_thread::get_id() - << ":" << std::endl - << resolved_expr->DebugString() << std::endl; + << ":" << '\n' + << resolved_expr->DebugString() << '\n'; } if (options.language().error_on_deprecated_syntax() && diff --git a/zetasql/analyzer/analyzer_test.cc b/zetasql/analyzer/analyzer_test.cc index a25bb9a62..34cd0d713 100644 --- a/zetasql/analyzer/analyzer_test.cc +++ b/zetasql/analyzer/analyzer_test.cc @@ -653,6 +653,65 @@ TEST_F(AnalyzerOptionsTest, ErrorMessageFormat) { " ^")); } +TEST_F(AnalyzerOptionsTest, ErrorMessageStability_ResolutionError) { + std::unique_ptr output; + + const std::string query = "select *\nfrom BadTable"; + const std::string expr = "1 +\n2 + BadCol +\n3"; + + EXPECT_EQ(ErrorMessageStability::ERROR_MESSAGE_STABILITY_UNSPECIFIED, + options_.error_message_stability()); + EXPECT_EQ(ErrorMessageStability::ERROR_MESSAGE_STABILITY_UNSPECIFIED, + options_.error_message_options().stability); + + EXPECT_THAT( + AnalyzeStatement(query, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError( + "Table not found: BadTable; Did you mean abTable? [at 2:6]")); + EXPECT_THAT( + AnalyzeExpression(expr, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError("Unrecognized name: BadCol [at 2:5]")); + + options_.set_error_message_stability(ERROR_MESSAGE_STABILITY_TEST_REDACTED); + EXPECT_THAT( + AnalyzeStatement(query, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError("SQL ERROR")); + + EXPECT_THAT( + AnalyzeExpression(expr, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError("SQL ERROR")); +} + +TEST_F(AnalyzerOptionsTest, ErrorMessageStability_SyntaxError) { + std::unique_ptr output; + + const std::string query = "select 1 1 1"; + const std::string expr = "1 + + + "; + + EXPECT_EQ(ErrorMessageStability::ERROR_MESSAGE_STABILITY_UNSPECIFIED, + options_.error_message_stability()); + EXPECT_EQ(ErrorMessageStability::ERROR_MESSAGE_STABILITY_UNSPECIFIED, + options_.error_message_options().stability); + + EXPECT_THAT( + AnalyzeStatement(query, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError( + R"(Syntax error: Expected end of input but got integer literal "1" [at 1:10])")); + EXPECT_THAT( + AnalyzeExpression(expr, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError( + "Syntax error: Unexpected end of expression [at 1:8]")); + + options_.set_error_message_stability(ERROR_MESSAGE_STABILITY_TEST_REDACTED); + EXPECT_THAT( + AnalyzeStatement(query, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError("SQL ERROR")); + + EXPECT_THAT( + AnalyzeExpression(expr, options_, catalog(), &type_factory_, &output), + HasInvalidArgumentError("SQL ERROR")); +} + TEST_F(AnalyzerOptionsTest, NestedCatalogTypesErrorMessageFormat) { std::unique_ptr output; @@ -1771,8 +1830,10 @@ TEST(SQLBuilderTest, Int32ParameterForLimit) { ZETASQL_ASSERT_OK(sql_builder.Process(*limit_offset_scan)); std::string formatted_sql; ZETASQL_ASSERT_OK(FormatSql(sql_builder.sql(), &formatted_sql)); - EXPECT_EQ("SELECT\n 1\nLIMIT CAST(2 AS INT32) OFFSET CAST(1 AS INT32);", - formatted_sql); + EXPECT_EQ( + "SELECT\n 1\nLIMIT CAST(CAST(2 AS INT32) AS INT64) OFFSET CAST(CAST(1 " + "AS INT32) AS INT64);", + formatted_sql); } // Adding specific unit test to input provided by Random Query Generator tree. @@ -1943,10 +2004,17 @@ TEST(SQLBuilderTest, WithScanWithArrayScan) { type_factory.get_bool(), ResolvedSubqueryExpr::ARRAY, /*parameter_list=*/{}, /*in_expr=*/nullptr, /*subquery=*/MakeResolvedSingleRowScan()); - auto array_scan = MakeResolvedArrayScan( - {scan_column}, std::move(table_scan), std::move(array_expr), array_column, - /*array_offset_column=*/nullptr, - /*join_expr=*/nullptr, /*is_outer=*/true); + std::vector> array_expr_list; + array_expr_list.push_back(std::move(array_expr)); + std::vector element_column_list; + element_column_list.push_back(array_column); + auto array_scan = MakeResolvedArrayScan({scan_column}, std::move(table_scan), + std::move(array_expr_list), + std::move(element_column_list), + /*array_offset_column=*/nullptr, + /*join_expr=*/nullptr, + /*is_outer=*/true, + /*array_zip_mode=*/nullptr); std::vector> with_entry_list; with_entry_list.emplace_back( MakeResolvedWithEntry(with_query_name, std::move(array_scan))); diff --git a/zetasql/analyzer/analyzer_test_options.cc b/zetasql/analyzer/analyzer_test_options.cc index e160fb2dc..40d15e310 100644 --- a/zetasql/analyzer/analyzer_test_options.cc +++ b/zetasql/analyzer/analyzer_test_options.cc @@ -101,6 +101,8 @@ const char* const kIdStringAllowUnicodeCharacters = "zetasql_idstring_allow_unicode_characters"; const char* const kDisallowDuplicateOptions = "disallow_duplicate_options"; const char* const kRewriteOptions = "rewrite_options"; +const char* const kShowReferencedPropertyGraphs = + "show_referenced_property_graphs"; void RegisterAnalyzerTestOptions( file_based_test_driver::TestCaseOptions* test_case_options) { @@ -165,6 +167,7 @@ void RegisterAnalyzerTestOptions( test_case_options->RegisterBool(kDisallowDuplicateOptions, false); test_case_options->RegisterString( kRewriteOptions, RewriteOptions::default_instance().DebugString()); + test_case_options->RegisterBool(kShowReferencedPropertyGraphs, false); } std::vector> GetQueryParameters( @@ -249,22 +252,6 @@ std::vector> GetQueryParameters( }; } -static AnalyzerOptions::ASTRewriteSet GetAllRewrites() { - AnalyzerOptions::ASTRewriteSet enabled_set; - const google::protobuf::EnumDescriptor* descriptor = - google::protobuf::GetEnumDescriptor(); - for (int i = 0; i < descriptor->value_count(); ++i) { - const google::protobuf::EnumValueDescriptor* value_descriptor = descriptor->value(i); - if (value_descriptor->number() == 0) { - // This is the "INVALID" entry. Skip this case. - continue; - } - enabled_set.insert( - static_cast(value_descriptor->number())); - } - return enabled_set; -} - absl::StatusOr GetEnabledRewrites( const file_based_test_driver::TestCaseOptions& test_case_options) { AnalyzerTestRewriteGroups rewrite_groups; diff --git a/zetasql/analyzer/analyzer_test_options.h b/zetasql/analyzer/analyzer_test_options.h index ebe50cd20..f4859490f 100644 --- a/zetasql/analyzer/analyzer_test_options.h +++ b/zetasql/analyzer/analyzer_test_options.h @@ -172,6 +172,9 @@ class AnalyzerTestCase; // - A text proto string for RewriteOptions proto message, the default // value is an empty RewriteOptions string. The parsed RewriteOptions // is used for zetasql resolved ast rewriters. +// kShowReferencedPropertyGraphs - if true (default is false) show the +// PropertyGraphs referenced in the original +// query before pruning or rewrite occur. extern const char* const kAllowInternalError; extern const char* const kAllowUndeclaredParameters; extern const char* const kDefaultAnonKappaValue; @@ -225,6 +228,7 @@ extern const char* const kReplaceTableNotFoundErrorWithTvfErrorIfApplicable; extern const char* const kIdStringAllowUnicodeCharacters; extern const char* const kDisallowDuplicateOptions; extern const char* const kRewriteOptions; +extern const char* const kShowReferencedPropertyGraphs; // set_flag // Causes a command line flag to be set to a particular value during the run @@ -238,18 +242,10 @@ extern const char* const kSetFlag; void RegisterAnalyzerTestOptions( file_based_test_driver::TestCaseOptions* test_case_options); -void SerializeAnalyzerTestOptions( - const file_based_test_driver::TestCaseOptions* options, - AnalyzerTestCase* proto); - // Return a set of known parameters used in the analyzer tests. std::vector> GetQueryParameters( TypeFactory* type_factory); -// Returns a collection of positional parameters used in the analyzer tests. -std::vector GetPositionalQueryParameters( - TypeFactory* type_factory); - // A map-like type where keys are a canonicalized version of the string that // apperas in the kEnabledASTRewewrites and ASTRewriteSet is the set of rewrites // implied by that string. We use a vector to preserve insertion order. A diff --git a/zetasql/analyzer/expr_matching_helpers.cc b/zetasql/analyzer/expr_matching_helpers.cc index f6b8621e2..2acb0de54 100644 --- a/zetasql/analyzer/expr_matching_helpers.cc +++ b/zetasql/analyzer/expr_matching_helpers.cc @@ -39,6 +39,7 @@ #include "absl/hash/hash.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -329,8 +330,7 @@ static bool IsProtoOrStructFieldAccess(const ResolvedNode* node) { // Returns true if the `column_ref_list` contains a equal pointer to // `column_ref`. static bool ContainsColumnReference( - const std::vector>& - column_ref_list, + absl::Span> column_ref_list, const ResolvedColumnRef* column_ref) { for (const auto& param : column_ref_list) { if (param.get() == column_ref) { diff --git a/zetasql/analyzer/expr_resolver_helper.cc b/zetasql/analyzer/expr_resolver_helper.cc index 554615175..ce8788010 100644 --- a/zetasql/analyzer/expr_resolver_helper.cc +++ b/zetasql/analyzer/expr_resolver_helper.cc @@ -212,6 +212,47 @@ absl::StatusOr IsConstantExpression(const ResolvedExpr* expr) { } } +// The requirements for IsConstantFunctionArg or IsNonAggregateFunctionArg +// ignore any number of wrapping cast operations. This helper removes the casts +// and returns the first interesting expression. +static const ResolvedExpr* RemoveWrappingCasts(const ResolvedExpr* expr) { + // TODO: b/323409001 - Either do not remove casts with format, or only remove + // in case the format expression sasisfies the definition of constant that + // the caller is interesting in understanding. + while (expr->Is()) { + expr = expr->GetAs()->expr(); + } + return expr; +} + +absl::StatusOr IsConstantFunctionArg(const ResolvedExpr* expr) { + switch (RemoveWrappingCasts(expr)->node_kind()) { + case RESOLVED_PARAMETER: + case RESOLVED_LITERAL: + case RESOLVED_CONSTANT: + return true; + default: + return false; + } +} + +absl::StatusOr IsNonAggregateFunctionArg(const ResolvedExpr* expr) { + // LINT.IfChange(non_aggregate_args_def) + const ResolvedExpr* uncast_expr = RemoveWrappingCasts(expr); + switch (uncast_expr->node_kind()) { + case RESOLVED_PARAMETER: + case RESOLVED_LITERAL: + case RESOLVED_CONSTANT: + return true; + case RESOLVED_ARGUMENT_REF: + return uncast_expr->GetAs()->argument_kind() == + ResolvedArgumentRef::NOT_AGGREGATE; + default: + return false; + } + // LINT.ThenChange(./rewriters/sql_function_inliner.cc:non_aggregate_args_def) +} + ExprResolutionInfo::ExprResolutionInfo( const NameScope* name_scope_in, const NameScope* aggregate_name_scope_in, const NameScope* analytic_name_scope_in, bool allows_aggregation_in, @@ -232,11 +273,13 @@ ExprResolutionInfo::ExprResolutionInfo( ExprResolutionInfo::ExprResolutionInfo( const NameScope* name_scope_in, QueryResolutionInfo* query_resolution_info_in, - const ASTExpression* top_level_ast_expr_in, IdString column_alias_in) + const ASTExpression* top_level_ast_expr_in, IdString column_alias_in, + const char* clause_name_in) : ExprResolutionInfo( name_scope_in, name_scope_in, name_scope_in, - true /* allows_aggregation */, true /* allows_analytic */, - false /* use_post_grouping_columns */, "" /* clause_name */, + /*allows_aggregation_in=*/(clause_name_in == nullptr), + true /* allows_analytic */, false /* use_post_grouping_columns */, + (clause_name_in == nullptr ? "" : clause_name_in), query_resolution_info_in, top_level_ast_expr_in, column_alias_in) {} ExprResolutionInfo::ExprResolutionInfo(const NameScope* name_scope_in, diff --git a/zetasql/analyzer/expr_resolver_helper.h b/zetasql/analyzer/expr_resolver_helper.h index a4965fca9..03e0d4e53 100644 --- a/zetasql/analyzer/expr_resolver_helper.h +++ b/zetasql/analyzer/expr_resolver_helper.h @@ -60,6 +60,23 @@ struct ExprResolutionInfo; // constant if all arguments are constant absl::StatusOr IsConstantExpression(const ResolvedExpr* expr); +// Return true if `expr` is an appropriate argument for a function argument +// marked 'must_be_constant'. +// +// The current definition uses these rules. +// - literals, parameters, and CONSTANT references are appropriate. +// - a cast with an input expression that satisfies one of the above +absl::StatusOr IsConstantFunctionArg(const ResolvedExpr* expr); + +// Return true if `expr` is an appropriate argument for an aggregate function +// argument that is labeled "NON AGGREGATE". +// +// The current definition uses these rules. +// - literals, parameters, and CONSTANT references are appropriate. +// - a reference to a non-aggregate arg from a containing function. +// - a cast with an input expression that satisfies one of the above +absl::StatusOr IsNonAggregateFunctionArg(const ResolvedExpr* expr); + // Helper for representing if we're allowed to flatten (ie: allowed to dot into // the fields of a proto/struct/json array), and if so, if we're already in a // ResolvedFlatten from having previously done so. @@ -155,8 +172,8 @@ struct ExprResolutionInfo { const ASTExpression* top_level_ast_expr_in = nullptr, IdString column_alias_in = IdString()); - // Construct an ExprResolutionInfo that allows both aggregation and - // analytic expressions. + // Construct an ExprResolutionInfo that allows analytic expressions. + // Aggregation is allowed unless is passed in. // Does not take ownership of . // Currently used for initially resolving select list columns, and // resolving LIMIT with an empty NameScope, so never resolves against @@ -164,7 +181,8 @@ struct ExprResolutionInfo { ExprResolutionInfo(const NameScope* name_scope_in, QueryResolutionInfo* query_resolution_info_in, const ASTExpression* top_level_ast_expr_in = nullptr, - IdString column_alias_in = IdString()); + IdString column_alias_in = IdString(), + const char* clause_name_in = nullptr); // Construct an ExprResolutionInfo that disallows aggregation and analytic // expressions. @@ -224,12 +242,12 @@ struct ExprResolutionInfo { // functions are not allowed in this clause, e.g. "WHERE clause". It is // also used in error messages related to path expression resolution // after GROUP BY. - // This can be empty if both aggregations and analytic functions are + // This can be "" (not null) if both aggregations and analytic functions are // allowed, or if there is no clear clause name to use in error messages // (for instance when resolving correlated path expressions that are in // a subquery's SELECT list but the subquery itself is in the outer // query's ORDER BY clause). - const char* const clause_name; + const char* const clause_name = ""; // Mutable info. @@ -259,10 +277,10 @@ struct ExprResolutionInfo { // field is set only when resolving SELECT columns. Not owned. const ASTExpression* const top_level_ast_expr = nullptr; - // The column alias of the top-level AST expression in SELECT list, which will - // be used as the name of the resolved column when the top-level AST - // expression being resolved is an aggregate or an analytic function. This - // field is set only when resolving SELECT columns. + // The alias for `top_level_ast_expr` in the SELECT list. This will + // be used as the ResolvedColumn name if a column is created for that + // top-level AST expression as output of an aggregate or an analytic function. + // This field is set only when resolving SELECT columns. const IdString column_alias = IdString(); // Context around if we can flatten and if we're currently actively doing so. diff --git a/zetasql/analyzer/expr_resolver_helper_test.cc b/zetasql/analyzer/expr_resolver_helper_test.cc index 8f65bd4b0..138a7a5ff 100644 --- a/zetasql/analyzer/expr_resolver_helper_test.cc +++ b/zetasql/analyzer/expr_resolver_helper_test.cc @@ -21,15 +21,23 @@ #include "zetasql/base/testing/status_matchers.h" #include "zetasql/parser/parse_tree.h" #include "zetasql/parser/parser.h" +#include "zetasql/public/analyzer.h" +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/analyzer_output.h" #include "zetasql/public/id_string.h" #include "zetasql/public/language_options.h" #include "zetasql/public/templated_sql_tvf.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/testdata/sample_catalog.h" #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/status/status.h" +#include "absl/strings/string_view.h" namespace zetasql { +using ::zetasql_base::testing::IsOkAndHolds; using ::zetasql_base::testing::StatusIs; TEST(ResolvedTVFArgTest, GetScan) { @@ -72,4 +80,49 @@ TEST(GetAliasForExpression, OtherASTNodeTypes) { EXPECT_EQ(GetAliasForExpression(parser_output->expression()).ToString(), ""); } +class IsConstantTest : public ::testing::Test { + protected: + IsConstantTest() { + options_.mutable_language()->EnableMaximumLanguageFeaturesForDevelopment(); + } + + absl::Status Analyze(absl::string_view sql) { + return AnalyzeExpression(sql, options_, catalog_.catalog(), &type_factory_, + &output_); + } + + const ResolvedExpr* result() { return output_->resolved_expr(); } + + std::unique_ptr output_; + TypeFactory type_factory_; + SampleCatalog catalog_; + AnalyzerOptions options_; +}; + +TEST_F(IsConstantTest, StackOCasts) { + ZETASQL_ASSERT_OK(Analyze( + "CAST(CAST(CAST(CAST(CAST('apples' AS BYTES) AS STRING) AS BYTES) AS " + "STRING) AS BYTES)")); + EXPECT_THAT(IsConstantExpression(result()), IsOkAndHolds(true)); + EXPECT_THAT(IsConstantFunctionArg(result()), IsOkAndHolds(true)); + EXPECT_THAT(IsNonAggregateFunctionArg(result()), IsOkAndHolds(true)); +} + +TEST_F(IsConstantTest, VolatileExpr) { + ZETASQL_ASSERT_OK(Analyze("IF(RAND() > 0.5, 'bananas', 'avocados')")); + EXPECT_THAT(IsConstantExpression(result()), IsOkAndHolds(false)); + EXPECT_THAT(IsConstantFunctionArg(result()), IsOkAndHolds(false)); + EXPECT_THAT(IsNonAggregateFunctionArg(result()), IsOkAndHolds(false)); +} + +TEST_F(IsConstantTest, CastWithFormat) { + ZETASQL_ASSERT_OK( + Analyze("CAST('apples' AS BYTES " + "FORMAT IF(RAND() > 0.5, 'bananas', 'avocados'))")); + // TODO: These results should be 'false'. + EXPECT_THAT(IsConstantExpression(result()), IsOkAndHolds(true)); + EXPECT_THAT(IsConstantFunctionArg(result()), IsOkAndHolds(true)); + EXPECT_THAT(IsNonAggregateFunctionArg(result()), IsOkAndHolds(true)); +} + } // namespace zetasql diff --git a/zetasql/analyzer/function_resolver.cc b/zetasql/analyzer/function_resolver.cc index cd201251f..5fc3bc19e 100644 --- a/zetasql/analyzer/function_resolver.cc +++ b/zetasql/analyzer/function_resolver.cc @@ -668,7 +668,7 @@ MakeResolvedLiteralForInjectedArgument(const InputArgumentType& input_arg_type, absl::Status FunctionResolver::ReorderArgumentExpressionsPerIndexMapping( absl::string_view function_name, const FunctionSignature& signature, absl::Span index_mapping, const ASTNode* ast_location, - const std::vector& input_argument_types, + absl::Span input_argument_types, std::vector* arg_locations, std::vector>* resolved_args, std::vector* resolved_tvf_args) { @@ -761,14 +761,24 @@ static absl::Status AppendMismatchReasonWithIndent(std::string* message, } absl::StatusOr FunctionResolver::GetSupportedSignaturesWithMessage( - const Function* function, const std::vector& mismatch_errors, + const Function* function, absl::Span mismatch_errors, FunctionArgumentType::NamePrintingStyle print_style, int* num_signatures) const { ZETASQL_RET_CHECK_EQ(mismatch_errors.size(), function->signatures().size()); + // We don't show detailed error message when HideSupportedSignatures is true. + // ABSL_DCHECK for sanity. + ABSL_DCHECK(!function->HideSupportedSignatures()); + if (function->HideSupportedSignatures()) { + return ""; + } + // Use the customized signatures callback if set. const LanguageOptions& language_options = resolver_->language(); - if (function->GetSupportedSignaturesCallback() != nullptr) { + if (function->GetSupportedSignaturesCallback() != nullptr && + // In case we have per signature callback, we have opportunity for per + // signature mismatch error. + !function->HasSignatureTextCallback()) { *num_signatures = function->NumSignatures(); return function->GetSupportedSignaturesCallback()(language_options, *function); @@ -783,14 +793,19 @@ absl::StatusOr FunctionResolver::GetSupportedSignaturesWithMessage( continue; } (*num_signatures)++; - std::vector argument_texts = - signature.GetArgumentsUserFacingTextWithCardinality( - language_options, print_style, /*print_template_details=*/true); if (!result.empty()) { absl::StrAppend(&result, "\n"); } absl::StrAppend(&result, " Signature: "); - absl::StrAppend(&result, function->GetSQL(argument_texts)); + if (function->HasSignatureTextCallback()) { + absl::StrAppend(&result, function->GetSignatureTextCallback()( + language_options, *function, signature)); + } else { + std::vector argument_texts = + signature.GetArgumentsUserFacingTextWithCardinality( + language_options, print_style, /*print_template_details=*/true); + absl::StrAppend(&result, function->GetSQL(argument_texts)); + } ZETASQL_RETURN_IF_ERROR( AppendMismatchReasonWithIndent(&result, mismatch_errors[sig_idx])); } @@ -826,7 +841,8 @@ FunctionResolver::GenerateErrorMessageWithSupportedSignatures( (num_signatures > 1 ? "s" : ""), ": ", supported_signatures); } else { - if (function->GetSupportedSignaturesCallback() == nullptr) { + if (function->GetSupportedSignaturesCallback() == nullptr && + !function->HideSupportedSignatures()) { // If we do not have any supported signatures and there is // no custom callback for producing the signature messages, // then we provide an error message as if the function did @@ -1131,6 +1147,15 @@ absl::Status ExtractStructFieldLocations( } break; } + case AST_BRACED_CONSTRUCTOR: { + const ASTBracedConstructor* ast_braced = + cast_free_ast_location->GetAsOrDie(); + ABSL_DCHECK_EQ(ast_braced->fields().size(), to_struct_type->num_fields()); + for (const ASTBracedConstructorField* field : ast_braced->fields()) { + field_arg_locations->push_back(field->value()); + } + break; + } default: { ZETASQL_RET_CHECK_FAIL() << "Cannot obtain the AST expressions for field " << "arguments of struct constructor:\n" @@ -1795,8 +1820,11 @@ absl::Status FunctionResolver::ResolveGeneralFunctionCall( } std::vector input_argument_types; - GetInputArgumentTypesForGenericArgumentList(arg_locations, arguments, - &input_argument_types); + // We do not determined the actual signature and its argument types yet, so + // leaving NULL arguments as untyped. + GetInputArgumentTypesForGenericArgumentList( + arg_locations, arguments, + /*pick_default_type_for_untyped_expr=*/false, &input_argument_types); // Check initial argument constraints, if any. if (function->PreResolutionConstraints() != nullptr) { @@ -1812,9 +1840,14 @@ absl::Status FunctionResolver::ResolveGeneralFunctionCall( // When function has SupportedSignaturesCallback which returns list of // signatures in one string, we cannot interleave signatures and mismatch // errors. - // TODO: better support SupportedSignaturesCallback. + // When function has SignatureTextCallback, we have opportunity for per + // signature mismatch error. + // Skip mismatch details when we don't even want to show list of signatures + // with HideSupportedSignatures. bool show_mismatch_details = - function->GetSupportedSignaturesCallback() == nullptr && + (function->GetSupportedSignaturesCallback() == nullptr || + function->HasSignatureTextCallback()) && + !function->HideSupportedSignatures() && resolver_->analyzer_options().show_function_signature_mismatch_details(); auto mismatch_errors = show_mismatch_details ? std::make_unique>() @@ -2060,38 +2093,22 @@ absl::Status FunctionResolver::ResolveGeneralFunctionCall( } } - // If we have a cast of a parameter, we want to check the expression inside - // the cast. Even if the query just has a parameter, when we unparse, we - // may get a cast of a parameter, and that should be legal too. - const ResolvedExpr* unwrapped_argument = arguments[idx].get(); - while (unwrapped_argument->node_kind() == RESOLVED_CAST) { - unwrapped_argument = unwrapped_argument->GetAs()->expr(); - } - // We currently use the same validation for must_be_constant and - // is_not_aggregate, except that is_not_aggregate also allows - // ResolvedArgumentRefs with kind NOT_AGGREGATE, so that we can have SQL - // UDF bodies that wrap calls with NOT_AGGREGATE arguments. - if (concrete_argument.must_be_constant() || - concrete_argument.options().is_not_aggregate()) { - switch (unwrapped_argument->node_kind()) { - case RESOLVED_PARAMETER: - case RESOLVED_LITERAL: - case RESOLVED_CONSTANT: - break; - case RESOLVED_ARGUMENT_REF: - // A NOT_AGGREGATE argument is allowed (for is_not_aggregate mode), - // but any other argument type should fall through to the error case. - if (!concrete_argument.must_be_constant() && - unwrapped_argument->GetAs() - ->argument_kind() == ResolvedArgumentRef::NOT_AGGREGATE) { - break; - } - ABSL_FALLTHROUGH_INTENDED; - default: - return MakeSqlErrorAt(arg_locations[idx]) - << BadArgErrorPrefix(idx) - << " must be a literal or query parameter"; - } + bool satisfies_non_aggregate_requirement = true; + if (concrete_argument.options().is_not_aggregate()) { + ZETASQL_ASSIGN_OR_RETURN(satisfies_non_aggregate_requirement, + IsNonAggregateFunctionArg(arguments[idx].get())); + } + bool satisfies_constant_requirement = true; + if (concrete_argument.must_be_constant()) { + ZETASQL_ASSIGN_OR_RETURN(satisfies_constant_requirement, + IsConstantFunctionArg(arguments[idx].get())); + } + // TODO: b/323602106 - Improve correctness of error message + if (!satisfies_constant_requirement || + !satisfies_non_aggregate_requirement) { + return MakeSqlErrorAt(arg_locations[idx]) + << BadArgErrorPrefix(idx) + << " must be a literal or query parameter"; } const Type* target_type = concrete_argument.type(); @@ -2107,8 +2124,11 @@ absl::Status FunctionResolver::ResolveGeneralFunctionCall( // Update the argument type with the casted one, so that the // PostResolutionArgumentConstraintsCallback and the // ComputeResultTypeCallback can get the exact types passed to function. - input_argument_types[idx] = - GetInputArgumentTypeForExpr(arguments[idx].get()); + input_argument_types[idx] = GetInputArgumentTypeForExpr( + arguments[idx].get(), + /*pick_default_type_for_untyped_expr=*/ + resolver_->language().LanguageFeatureEnabled( + FEATURE_TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS)); } // If we have a literal argument value, check it against the value @@ -2398,7 +2418,7 @@ absl::Status FunctionResolver::ForwardNestedResolutionAnalysisError( absl::Status FunctionResolver::ResolveTemplatedSQLFunctionCall( const ASTNode* ast_location, const TemplatedSQLFunction& function, const AnalyzerOptions& analyzer_options, - const std::vector& actual_arguments, + absl::Span actual_arguments, std::shared_ptr* function_call_info_out) { // Check if this function calls itself. If so, return an error. Otherwise, add // a pointer to this class to the cycle detector in the analyzer options. diff --git a/zetasql/analyzer/function_resolver.h b/zetasql/analyzer/function_resolver.h index 43f80568d..3bb3e780e 100644 --- a/zetasql/analyzer/function_resolver.h +++ b/zetasql/analyzer/function_resolver.h @@ -129,7 +129,7 @@ class FunctionResolver { absl::Status ResolveTemplatedSQLFunctionCall( const ASTNode* ast_location, const TemplatedSQLFunction& function, const AnalyzerOptions& analyzer_options, - const std::vector& actual_arguments, + absl::Span actual_arguments, std::shared_ptr* function_call_info_out); // This is a helper method when parsing or analyzing the function's SQL @@ -344,7 +344,7 @@ class FunctionResolver { absl::string_view function_name, const FunctionSignature& signature, absl::Span index_mapping, const ASTNode* ast_location, - const std::vector& input_argument_types, + absl::Span input_argument_types, std::vector* arg_locations, std::vector>* resolved_args, std::vector* resolved_tvf_args); @@ -400,7 +400,7 @@ class FunctionResolver { // // . absl::StatusOr GetSupportedSignaturesWithMessage( - const Function* function, const std::vector& mismatch_errors, + const Function* function, absl::Span mismatch_errors, FunctionArgumentType::NamePrintingStyle print_style, int* num_signatures) const; diff --git a/zetasql/analyzer/function_signature_matcher.cc b/zetasql/analyzer/function_signature_matcher.cc index c2423327d..131b949d9 100644 --- a/zetasql/analyzer/function_signature_matcher.cc +++ b/zetasql/analyzer/function_signature_matcher.cc @@ -51,6 +51,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_format.h" #include "absl/strings/substitute.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -205,7 +206,7 @@ class FunctionSignatureMatcher { // counts. // Returns a non-OK status for any internal error. absl::StatusOr GetConcreteArguments( - const std::vector& input_arguments, + absl::Span input_arguments, const FunctionSignature& signature, int repetitions, int optionals, const ArgKindToResolvedTypeMap& templated_argument_map, std::vector* arg_index_mapping) const; @@ -239,7 +240,7 @@ class FunctionSignatureMatcher { // signature matches. is undefined otherwise. absl::StatusOr CheckArgumentTypesAndCollectTemplatedArguments( const std::vector& arg_ast_nodes, - const std::vector& input_arguments, + absl::Span input_arguments, const FunctionSignature& signature, int repetitions, const ResolveLambdaCallback* resolve_lambda_callback, ArgKindToInputTypesMap* templated_argument_map, @@ -470,7 +471,7 @@ absl::StatusOr FunctionSignatureMatcher::GetConcreteArgument( absl::StatusOr FunctionSignatureMatcher::GetConcreteArguments( - const std::vector& input_arguments, + absl::Span input_arguments, const FunctionSignature& signature, int repetitions, int optionals, const ArgKindToResolvedTypeMap& templated_argument_map, std::vector* arg_index_mapping) const { @@ -878,7 +879,7 @@ bool FunctionSignatureMatcher:: absl::StatusOr FunctionSignatureMatcher::CheckArgumentTypesAndCollectTemplatedArguments( const std::vector& arg_ast_nodes, - const std::vector& input_arguments, + absl::Span input_arguments, const FunctionSignature& signature, int repetitions, const ResolveLambdaCallback* resolve_lambda_callback, ArgKindToInputTypesMap* templated_argument_map, @@ -908,8 +909,7 @@ FunctionSignatureMatcher::CheckArgumentTypesAndCollectTemplatedArguments( resolve_lambda_callback, templated_argument_map, signature_match_result, arg_overrides)) { ZETASQL_RET_CHECK(!signature_match_result->allow_mismatch_message() || - !signature_match_result->mismatch_message().empty() || - !signature_match_result->tvf_mismatch_message().empty()) + !signature_match_result->mismatch_message().empty()) << "Mismatch error message should have been set."; return false; } @@ -1248,11 +1248,11 @@ absl::Status FunctionSignatureMatcher::CheckRelationArgumentTypes( .second) { // There was a duplicate column name in the input relation. This is // invalid. - signature_match_result->set_tvf_mismatch_message(absl::StrCat( + signature_match_result->set_mismatch_message(absl::StrCat( "Table-valued function does not allow duplicate input ", "columns named \"", provided_col_name, "\" for argument ", arg_idx + 1)); - signature_match_result->set_tvf_bad_argument_index(arg_idx); + signature_match_result->set_bad_argument_index(arg_idx); *signature_matches = false; return absl::OkStatus(); } @@ -1262,10 +1262,10 @@ absl::Status FunctionSignatureMatcher::CheckRelationArgumentTypes( !provided_schema.is_value_table()) { // There was a column name in the input relation not specified in the // required output schema, and the signature does not allow this. - signature_match_result->set_tvf_mismatch_message( + signature_match_result->set_mismatch_message( absl::StrCat("Function does not allow extra input column named \"", provided_col_name, "\" for argument ", arg_idx + 1)); - signature_match_result->set_tvf_bad_argument_index(arg_idx); + signature_match_result->set_bad_argument_index(arg_idx); *signature_matches = false; return absl::OkStatus(); } @@ -1289,12 +1289,12 @@ absl::Status FunctionSignatureMatcher::CheckRelationArgumentTypes( // The required value table was not found in the provided input // relation. Generate a descriptive error message. ZETASQL_RET_CHECK_EQ(1, required_schema.num_columns()); - signature_match_result->set_tvf_mismatch_message( + signature_match_result->set_mismatch_message( absl::StrCat("Expected value table of type ", required_schema.column(0).type->ShortTypeName( language_.product_mode()), " for argument ", arg_idx + 1)); - signature_match_result->set_tvf_bad_argument_index(arg_idx); + signature_match_result->set_bad_argument_index(arg_idx); *signature_matches = false; return absl::OkStatus(); } @@ -1304,10 +1304,10 @@ absl::Status FunctionSignatureMatcher::CheckRelationArgumentTypes( if (lookup == nullptr) { // The required column name was not found in the provided input // relation. Generate a descriptive error message. - signature_match_result->set_tvf_mismatch_message(absl::StrCat( + signature_match_result->set_mismatch_message(absl::StrCat( "Required column \"", required_col_name, "\" not found in table passed as argument ", arg_idx + 1)); - signature_match_result->set_tvf_bad_argument_index(arg_idx); + signature_match_result->set_bad_argument_index(arg_idx); *signature_matches = false; return absl::OkStatus(); } @@ -1330,7 +1330,7 @@ absl::Status FunctionSignatureMatcher::CheckRelationArgumentTypes( } else { // The provided column type is invalid. Mark the argument index and // column name to return a descriptive error later. - signature_match_result->set_tvf_mismatch_message(absl::StrCat( + signature_match_result->set_mismatch_message(absl::StrCat( "Invalid type ", provided_col_type->ShortTypeName(language_.product_mode()), (required_schema.is_value_table() @@ -1338,7 +1338,7 @@ absl::Status FunctionSignatureMatcher::CheckRelationArgumentTypes( : absl::StrCat(" for column \"", required_col_name, " ")), required_col_type->ShortTypeName(language_.product_mode()), "\" of argument ", arg_idx + 1)); - signature_match_result->set_tvf_bad_argument_index(arg_idx); + signature_match_result->set_bad_argument_index(arg_idx); *signature_matches = false; return absl::OkStatus(); } @@ -1570,8 +1570,7 @@ absl::StatusOr FunctionSignatureMatcher::SignatureMatches( if (!match) { signature_match_result->UpdateFromResult(local_signature_match_result); ZETASQL_RET_CHECK(!signature_match_result->allow_mismatch_message() || - !signature_match_result->mismatch_message().empty() || - !signature_match_result->tvf_mismatch_message().empty()); + !signature_match_result->mismatch_message().empty()); return false; } diff --git a/zetasql/analyzer/input_argument_type_resolver_helper.cc b/zetasql/analyzer/input_argument_type_resolver_helper.cc index c5f5102dc..8e1700d6a 100644 --- a/zetasql/analyzer/input_argument_type_resolver_helper.cc +++ b/zetasql/analyzer/input_argument_type_resolver_helper.cc @@ -17,21 +17,23 @@ #include "zetasql/analyzer/input_argument_type_resolver_helper.h" #include -#include #include #include "zetasql/base/logging.h" #include "zetasql/parser/parse_tree.h" #include "zetasql/public/function.h" +#include "zetasql/public/input_argument_type.h" #include "zetasql/public/types/type.h" #include "zetasql/public/value.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/resolved_ast/resolved_node_kind.pb.h" +#include "zetasql/base/check.h" #include "absl/types/span.h" namespace zetasql { -InputArgumentType GetInputArgumentTypeForExpr(const ResolvedExpr* expr) { +InputArgumentType GetInputArgumentTypeForExpr( + const ResolvedExpr* expr, bool pick_default_type_for_untyped_expr) { ABSL_DCHECK(expr != nullptr); if (expr->type()->IsStruct() && expr->node_kind() == RESOLVED_MAKE_STRUCT) { const ResolvedMakeStruct* struct_expr = expr->GetAs(); @@ -39,7 +41,8 @@ InputArgumentType GetInputArgumentTypeForExpr(const ResolvedExpr* expr) { field_types.reserve(struct_expr->field_list_size()); for (const std::unique_ptr& argument : struct_expr->field_list()) { - field_types.push_back(GetInputArgumentTypeForExpr(argument.get())); + field_types.push_back(GetInputArgumentTypeForExpr( + argument.get(), pick_default_type_for_untyped_expr)); } // We construct a custom InputArgumentType for structs that may have // some literal and some non-literal fields. @@ -51,15 +54,17 @@ InputArgumentType GetInputArgumentTypeForExpr(const ResolvedExpr* expr) { // respect to subsequent coercion. if (expr->node_kind() == RESOLVED_LITERAL && !expr->GetAs()->has_explicit_type()) { - if (expr->GetAs()->value().is_null()) { - // This is a literal NULL that does not have an explicit type, so - // it can coerce to anything. - return InputArgumentType::UntypedNull(); - } - // This is a literal empty array that does not have an explicit type, - // so it can coerce to any array type. - if (expr->GetAs()->value().is_empty_array()) { - return InputArgumentType::UntypedEmptyArray(); + if (!pick_default_type_for_untyped_expr) { + if (expr->GetAs()->value().is_null()) { + // This is a literal NULL that does not have an explicit type, so + // it can coerce to anything. + return InputArgumentType::UntypedNull(); + } + // This is a literal empty array that does not have an explicit type, + // so it can coerce to any array type. + if (expr->GetAs()->value().is_empty_array()) { + return InputArgumentType::UntypedEmptyArray(); + } } return InputArgumentType(expr->GetAs()->value()); } @@ -90,7 +95,8 @@ InputArgumentType GetInputArgumentTypeForExpr(const ResolvedExpr* expr) { } static InputArgumentType GetInputArgumentTypeForGenericArgument( - const ASTNode* argument_ast_node, const ResolvedExpr* expr) { + const ASTNode* argument_ast_node, const ResolvedExpr* expr, + bool pick_default_type_for_untyped_expr) { ABSL_DCHECK(argument_ast_node != nullptr); bool expects_null_expr = argument_ast_node->Is() || @@ -106,19 +112,21 @@ static InputArgumentType GetInputArgumentTypeForGenericArgument( "sequence argument"; } ABSL_DCHECK(!expects_null_expr); - return GetInputArgumentTypeForExpr(expr); + return GetInputArgumentTypeForExpr(expr, pick_default_type_for_untyped_expr); } void GetInputArgumentTypesForGenericArgumentList( const std::vector& argument_ast_nodes, absl::Span> arguments, + bool pick_default_type_for_untyped_expr, std::vector* input_arguments) { ABSL_DCHECK_EQ(argument_ast_nodes.size(), arguments.size()); input_arguments->clear(); input_arguments->reserve(arguments.size()); for (int i = 0; i < argument_ast_nodes.size(); i++) { input_arguments->push_back(GetInputArgumentTypeForGenericArgument( - argument_ast_nodes[i], arguments[i].get())); + argument_ast_nodes[i], arguments[i].get(), + pick_default_type_for_untyped_expr)); } } diff --git a/zetasql/analyzer/input_argument_type_resolver_helper.h b/zetasql/analyzer/input_argument_type_resolver_helper.h index 1995a456a..6435a837d 100644 --- a/zetasql/analyzer/input_argument_type_resolver_helper.h +++ b/zetasql/analyzer/input_argument_type_resolver_helper.h @@ -21,9 +21,8 @@ #include #include "zetasql/parser/parse_tree.h" -#include "zetasql/public/function.h" +#include "zetasql/public/input_argument_type.h" #include "zetasql/resolved_ast/resolved_ast.h" -#include "absl/types/optional.h" #include "absl/types/span.h" namespace zetasql { @@ -31,7 +30,15 @@ namespace zetasql { // Get an InputArgumentType for a ResolvedExpr, identifying whether or not it // is a parameter and pointing at the literal value inside if // appropriate. must outlive the returned object. -InputArgumentType GetInputArgumentTypeForExpr(const ResolvedExpr* expr); +// +// The `pick_default_type_for_untyped_expr` argument controls how to deal +// with an untyped input argument like a NULL or empty array literal. +// - When it is false, this function will return an untyped InputArgumentType +// when `expr` is NULL or empty array without an explicit type. +// - When it is true, an InputArgumentType with the default type for NULL or +// empty array will be returned. +InputArgumentType GetInputArgumentTypeForExpr( + const ResolvedExpr* expr, bool pick_default_type_for_untyped_expr); // Get a list of from a list of and // , invoking GetInputArgumentTypeForExpr() on each of the @@ -39,9 +46,17 @@ InputArgumentType GetInputArgumentTypeForExpr(const ResolvedExpr* expr); // This method is called before signature matching. Lambdas are not resolved // yet. are used to determine InputArgumentType for lambda // arguments. +// +// The `pick_default_type_for_untyped_expr` argument controls how to deal +// with an untyped input argument like a NULL or empty array literal. +// - When it is false, this function will return an untyped InputArgumentType +// when `expr` is NULL or empty array without an explicit type. +// - When it is true, an InputArgumentType with the default type for NULL or +// empty array will be returned. void GetInputArgumentTypesForGenericArgumentList( const std::vector& argument_ast_nodes, absl::Span> arguments, + bool pick_default_type_for_untyped_expr, std::vector* input_arguments); } // namespace zetasql diff --git a/zetasql/analyzer/input_argument_type_resolver_helper_test.cc b/zetasql/analyzer/input_argument_type_resolver_helper_test.cc index c34ad4f43..48839d1e2 100644 --- a/zetasql/analyzer/input_argument_type_resolver_helper_test.cc +++ b/zetasql/analyzer/input_argument_type_resolver_helper_test.cc @@ -38,13 +38,66 @@ TEST(ExprResolverHelperTest, LambdaInputArgumentType) { arg_ast_nodes.push_back(&int_literal); arguments.push_back(MakeResolvedLiteral(Value::Int64(1))); - GetInputArgumentTypesForGenericArgumentList(arg_ast_nodes, arguments, - &input_arguments); + GetInputArgumentTypesForGenericArgumentList( + arg_ast_nodes, arguments, + /*pick_default_type_for_untyped_expr=*/false, &input_arguments); ASSERT_EQ(input_arguments.size(), 2); ASSERT_TRUE(input_arguments[0].is_lambda()); ASSERT_EQ(input_arguments[0].type(), nullptr); ASSERT_FALSE(input_arguments[1].is_lambda()); + ASSERT_TRUE(input_arguments[1].is_literal()); + EXPECT_TRUE(input_arguments[1].type()->IsInt64()); +} + +TEST(ExprResolverHelperTest, LambdaInputArgumentTypeWithNull) { + ASTLambda astLambda; + std::vector arg_ast_nodes; + arg_ast_nodes.push_back(&astLambda); + std::vector> arguments; + arguments.push_back(nullptr); + std::vector input_arguments; + ASTIntLiteral int_literal; + arg_ast_nodes.push_back(&int_literal); + ASTNullLiteral null_literal; + arg_ast_nodes.push_back(&null_literal); + arguments.push_back(MakeResolvedLiteral(Value::Int64(1))); + arguments.push_back(MakeResolvedLiteral(Value::NullFloat())); + + { + GetInputArgumentTypesForGenericArgumentList( + arg_ast_nodes, arguments, + /*pick_default_type_for_untyped_expr=*/false, &input_arguments); + ASSERT_EQ(input_arguments.size(), 3); + ASSERT_TRUE(input_arguments[0].is_lambda()); + ASSERT_EQ(input_arguments[0].type(), nullptr); + + ASSERT_FALSE(input_arguments[1].is_lambda()); + ASSERT_TRUE(input_arguments[1].is_literal()); + EXPECT_TRUE(input_arguments[1].type()->IsInt64()); + + ASSERT_FALSE(input_arguments[2].is_lambda()); + ASSERT_TRUE(input_arguments[2].is_untyped_null()); + EXPECT_TRUE(input_arguments[2].type()->IsInt64()); + } + + { + GetInputArgumentTypesForGenericArgumentList( + arg_ast_nodes, arguments, + /*pick_default_type_for_untyped_expr=*/true, &input_arguments); + ASSERT_EQ(input_arguments.size(), 3); + ASSERT_TRUE(input_arguments[0].is_lambda()); + ASSERT_EQ(input_arguments[0].type(), nullptr); + + ASSERT_FALSE(input_arguments[1].is_lambda()); + ASSERT_TRUE(input_arguments[1].is_literal()); + EXPECT_TRUE(input_arguments[1].type()->IsInt64()); + + ASSERT_FALSE(input_arguments[2].is_lambda()); + ASSERT_FALSE(input_arguments[2].is_untyped_null()); + ASSERT_TRUE(input_arguments[2].is_literal()); + EXPECT_TRUE(input_arguments[2].type()->IsFloat()); + } } } // namespace zetasql diff --git a/zetasql/analyzer/lookup_catalog_column_callback_test.cc b/zetasql/analyzer/lookup_catalog_column_callback_test.cc index 5bafa30e3..054d74a9c 100644 --- a/zetasql/analyzer/lookup_catalog_column_callback_test.cc +++ b/zetasql/analyzer/lookup_catalog_column_callback_test.cc @@ -26,6 +26,7 @@ #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/status/status.h" +#include "absl/strings/string_view.h" namespace zetasql { namespace { @@ -71,7 +72,7 @@ TEST_F(LookupCatalogColumnCallbackTest, BaselineErrorWhenNoColumnDefined) { TEST_F(LookupCatalogColumnCallbackTest, SameErrorAsBaselineWhenCatalogColumnCallbackReturnsNullptr) { options_.SetLookupCatalogColumnCallback( - [](const std::string& column) -> absl::StatusOr { + [](absl::string_view column) -> absl::StatusOr { return nullptr; }); EXPECT_THAT(Analyze("mycolumn + 1"), @@ -82,7 +83,7 @@ TEST_F(LookupCatalogColumnCallbackTest, TEST_F(LookupCatalogColumnCallbackTest, ErrorWhenLookupCatalogColumnReturnsError) { options_.SetLookupCatalogColumnCallback( - [](const std::string& column) -> absl::StatusOr { + [](absl::string_view column) -> absl::StatusOr { return absl::NotFoundError("error column-not-found: mycolumn"); }); EXPECT_THAT(Analyze("mycolumn + 1"), @@ -92,7 +93,7 @@ TEST_F(LookupCatalogColumnCallbackTest, TEST_F(LookupCatalogColumnCallbackTest, SuccessfulLookupTest) { options_.SetLookupCatalogColumnCallback( - [&](const std::string& column) -> absl::StatusOr { + [&](absl::string_view column) -> absl::StatusOr { EXPECT_THAT(column, testing::StrCaseEq("mycolumn")); return &column_; }); diff --git a/zetasql/analyzer/name_scope.cc b/zetasql/analyzer/name_scope.cc index c5e9d4b5f..b9a3176fb 100644 --- a/zetasql/analyzer/name_scope.cc +++ b/zetasql/analyzer/name_scope.cc @@ -18,6 +18,7 @@ #include +#include #include #include #include @@ -41,6 +42,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -63,11 +65,12 @@ std::string ValidNamePath::DebugString() const { target_column_.DebugString()); } +// Includes a leading space if non-empty. static std::string ValidNamePathListDebugString( const ValidNamePathList& valid_name_path_list) { std::string debug_string; if (!valid_name_path_list.empty()) { - absl::StrAppend(&debug_string, "("); + absl::StrAppend(&debug_string, " ("); bool first = true; for (const ValidNamePath& valid_name_path : valid_name_path_list) { if (first) { @@ -119,12 +122,13 @@ std::string ValidFieldInfoMap::DebugString(absl::string_view indent) const { return debug_string; } -std::string NamedColumn::DebugString(const absl::string_view prefix) const { +std::string NamedColumn::DebugString() const { return absl::StrCat( - prefix, "Column: ", ToIdentifierLiteral(name_), - (is_explicit_ ? " explicit" : " implicit"), + IsInternalAlias(name_) ? "" : ToIdentifierLiteral(name_), " ", + column_.type()->ShortTypeName(ProductMode::PRODUCT_INTERNAL), " ", + column_.DebugString(), (is_value_table_column_ - ? absl::StrCat(" value_table", + ? absl::StrCat(" (value table)", ExclusionsDebugString(excluded_field_names_)) : "")); } @@ -283,7 +287,7 @@ bool NameScope::LookupName( bool ValidFieldInfoMap::FindLongestMatchingPathIfAny( const ValidNamePathList& name_path_list, - const std::vector& path_names, ResolvedColumn* resolved_column, + absl::Span path_names, ResolvedColumn* resolved_column, int* name_at) { bool found = false; *name_at = 0; @@ -654,27 +658,62 @@ bool NameScope::HasLocalRangeVariables() const { std::string NameScope::ValueTableColumn::DebugString() const { std::string out; - absl::StrAppend(&out, "value_table_column: ", column_.DebugString(), + absl::StrAppend(&out, column_.DebugString(), ExclusionsDebugString(excluded_field_names_), - (is_valid_to_access_ ? "" : " ACCESS_INVALID"), " ", + (is_valid_to_access_ ? "" : " ACCESS_INVALID"), ValidNamePathListDebugString(valid_name_path_list_)); return out; } -std::string NameScope::DebugString(absl::string_view indent) const { - std::string out; +// `name_list_columns` is the set of columns present in the NameList. +// These are assumed to not be pseudo-columns. +std::string NameScope::DebugString( + absl::string_view indent, const IdStringSetCase* name_list_columns) const { + // Sort the output lines for the hash_set to make the output deterministic, + // and split a separate list for range variables. + std::vector name_lines; + std::vector rv_name_lines; for (const auto& name : names()) { + std::string line = absl::StrCat(name.first.ToStringView(), " -> ", + name.second.DebugString()); + if (name.second.IsRangeVariable()) { + rv_name_lines.push_back(line); + } else { + if (name_list_columns != nullptr && name.second.IsColumn() && + !zetasql_base::ContainsKey(*name_list_columns, name.first)) { + absl::StrAppend(&line, " (pseudo-column)"); + } + name_lines.push_back(line); + } + } + std::sort(name_lines.begin(), name_lines.end()); + std::sort(rv_name_lines.begin(), rv_name_lines.end()); + + std::string out; + if (!name_lines.empty()) { + absl::StrAppend(&out, indent, "Names:"); + for (const auto& name_line : name_lines) { + absl::StrAppend(&out, "\n", indent, " ", name_line); + } + } + if (!rv_name_lines.empty()) { if (!out.empty()) out += "\n"; - absl::StrAppend(&out, indent, name.first.ToStringView(), " -> ", - name.second.DebugString()); + absl::StrAppend(&out, indent, "Range variables:"); + for (const auto& name_line : rv_name_lines) { + absl::StrAppend(&out, "\n", indent, " ", name_line); + } } - for (const ValueTableColumn& value_table_column : value_table_columns()) { + if (!value_table_columns().empty()) { if (!out.empty()) out += "\n"; - absl::StrAppend(&out, indent, value_table_column.DebugString()); + absl::StrAppend(&out, indent, "Value table columns:"); + for (const ValueTableColumn& value_table_column : value_table_columns()) { + absl::StrAppend(&out, "\n", indent, " ", + value_table_column.DebugString()); + } } if (previous_scope_ != nullptr) { if (!out.empty()) out += "\n"; - absl::StrAppend(&out, indent, " previous_scope:\n", + absl::StrAppend(&out, indent, "Parent scope:\n", previous_scope_->DebugString(absl::StrCat(indent, " "))); } return out; @@ -1087,7 +1126,7 @@ absl::Status NameScope::CreateNameScopeGivenValidNamePaths( } void NameTarget::SetAccessError(const Kind original_kind, - const std::string& access_error_message) { + absl::string_view access_error_message) { // Initialize fields. kind_ = ACCESS_ERROR; access_error_message_ = access_error_message; @@ -1138,8 +1177,10 @@ std::string NameTarget::DebugString() const { ">"); case IMPLICIT_COLUMN: case EXPLICIT_COLUMN: - return absl::StrCat(column_.DebugString(), - (kind_ == IMPLICIT_COLUMN ? " (implicit)" : "")); + return absl::StrCat( + column_.type()->ShortTypeName(ProductMode::PRODUCT_INTERNAL), " (", + column_.DebugString(), ")", + (kind_ == IMPLICIT_COLUMN ? " (implicit)" : "")); case FIELD_OF: return absl::StrCat("FIELD_OF<", column_.DebugString(), "> (id: ", field_id_, ")"); @@ -1371,15 +1412,27 @@ absl::Status NameList::MergeFrom(const NameList& other, ZETASQL_RET_CHECK(!options.flatten_to_table); ZETASQL_RET_CHECK(other.is_value_table()); } + ZETASQL_RET_CHECK(!options.columns_to_replace || !options.excluded_field_names); const IdStringSetCase* excluded_field_names = options.excluded_field_names; + const MergeOptions::ColumnsToReplaceMap* columns_to_replace = + options.columns_to_replace; // Copy the columns vector, with exclusions. // We're not using AddColumn because we're going to copy the NameScope // maps directly below. for (const NamedColumn& named_column : other.columns()) { - if (excluded_field_names == nullptr || - !zetasql_base::ContainsKey(*excluded_field_names, named_column.name())) { + const ResolvedColumn* replacement_column = nullptr; + if (columns_to_replace != nullptr) { + replacement_column = + zetasql_base::FindOrNull(*columns_to_replace, named_column.name()); + } + + if (replacement_column != nullptr) { + columns_.push_back(NamedColumn(named_column.name(), *replacement_column, + /*is_explicit=*/true)); + } else if (excluded_field_names == nullptr || + !zetasql_base::ContainsKey(*excluded_field_names, named_column.name())) { // For value table columns, we add new excluded_field_names so fields // with those names won't show up in SELECT *. // With `flatten_to_table`, we skip handling the column as @@ -1387,10 +1440,15 @@ absl::Status NameList::MergeFrom(const NameList& other, if (named_column.is_value_table_column() && !options.flatten_to_table) { // Compute the union of the existing excluded_field_names and the // newly added excluded_field_names. + // Columns from `columns_to_replace` are also excluded. IdStringSetCase new_excluded_field_names; if (excluded_field_names != nullptr) { InsertFrom(*excluded_field_names, &new_excluded_field_names); } + if (columns_to_replace != nullptr) { + zetasql_base::InsertKeysFromMap(*columns_to_replace, + &new_excluded_field_names); + } InsertFrom(named_column.excluded_field_names(), &new_excluded_field_names); @@ -1431,6 +1489,18 @@ absl::Status NameList::MergeFrom(const NameList& other, continue; } + // If we have a replacement column, replace whatever we have in the scope + // with that column. Even range variables, error targets, etc. + if (columns_to_replace != nullptr) { + const ResolvedColumn* replacement_column = + zetasql_base::FindOrNull(*columns_to_replace, name); + if (replacement_column != nullptr) { + NameTarget new_target(*replacement_column, /*is_explicit=*/true); + name_scope_.AddNameTarget(name, new_target); + continue; + } + } + if (target.IsRangeVariable()) { // With `flatten_to_table`, range variables are dropped, except for // value table range variables, which will be converted to columns. @@ -1602,16 +1672,33 @@ Type::HasFieldResult NameList::SelectStarHasColumn(IdString name) const { std::string NameList::DebugString(absl::string_view indent) const { std::string out; - if (is_value_table()) { - absl::StrAppend(&out, indent, "is_value_table = true"); + if (!is_value_table()) { + absl::StrAppend(&out, indent, "NameList:"); + } else { + absl::StrAppend(&out, indent, "NameList (is_value_table = true):"); } + + // We don't track whether names in the NameScope are pseudo-columns, so we + // reverse engineer this by checking if the names match any name from the + // NameList. If not, they must have been pseudo-columns. + IdStringSetCase name_list_columns; + for (const NamedColumn& named_column : columns_) { if (!out.empty()) out += "\n"; absl::StrAppend(&out, indent, " ", named_column.DebugString()); + + if (!IsInternalAlias(named_column.name())) { + name_list_columns.insert(named_column.name()); + } } + + const std::string name_scope_contents = + name_scope_.DebugString(absl::StrCat(indent, " "), &name_list_columns); if (!out.empty()) out += "\n"; - absl::StrAppend(&out, indent, "Inline NameScope:\n", - name_scope_.DebugString(absl::StrCat(indent, " "))); + absl::StrAppend(&out, indent, "NameScope:"); + if (!name_scope_contents.empty()) { + absl::StrAppend(&out, "\n", name_scope_contents); + } return out; } diff --git a/zetasql/analyzer/name_scope.h b/zetasql/analyzer/name_scope.h index a4490bc3d..97b69e80b 100644 --- a/zetasql/analyzer/name_scope.h +++ b/zetasql/analyzer/name_scope.h @@ -33,6 +33,7 @@ #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" namespace zetasql { @@ -142,7 +143,7 @@ class ValidFieldInfoMap { // size in `name_at`. static bool FindLongestMatchingPathIfAny( const ValidNamePathList& name_path_list, - const std::vector& path_names, ResolvedColumn* resolved_column, + absl::Span path_names, ResolvedColumn* resolved_column, int* name_at); const ResolvedColumnToValidNamePathsMap& map() const { @@ -182,7 +183,7 @@ class NamedColumn { NamedColumn(const NamedColumn& other) = default; NamedColumn& operator=(const NamedColumn& other) = default; - std::string DebugString(absl::string_view prefix = "") const; + std::string DebugString() const; IdString name() const { return name_; } const ResolvedColumn& column() const { return column_; } @@ -323,7 +324,7 @@ class NameTarget { // If non-empty, 'access_error_message' indicates the error message // associated with this NameTarget. void SetAccessError(Kind original_kind, - const std::string& access_error_message = ""); + absl::string_view access_error_message = ""); Kind kind() const { return kind_; } @@ -627,7 +628,9 @@ class NameScope { // are range variables. bool HasLocalRangeVariables() const; - std::string DebugString(absl::string_view indent = "") const; + std::string DebugString( + absl::string_view indent = "", + const IdStringSetCase* name_list_columns = nullptr) const; const NameScope* previous_scope() const { return previous_scope_; } @@ -952,8 +955,22 @@ class NameList { struct MergeOptions { // If non-NULL, names in this list will be excluded. // Range variables with matching names are also excluded. + // For value tables, this name gets added to their excluded_field_names, + // so that field cannot be read implicitly. const IdStringSetCase* excluded_field_names = nullptr; + // If non-NULL, names in this map will be replaced with the new + // ResolvedColumn. All matching names in the NameList will be replaced + // with the new column. All existing names in the scope (including + // columns, pseudo-columns, range variables, ambiguous names, etc) + // will be removed, and replaced by one new entry pointing at the column. + // This also acts like `excluded_field_names` for other occurrences of + // replaced name. `excluded_field_names` cannot be set at the same time. + typedef absl::flat_hash_map + ColumnsToReplaceMap; + ColumnsToReplaceMap* columns_to_replace = nullptr; + // If true, the copied names are converted to be just a flat table. // Range variables are dropped, and value tables are converted to // regular columns. diff --git a/zetasql/analyzer/query_resolver_helper.cc b/zetasql/analyzer/query_resolver_helper.cc index 487bb82db..462983148 100644 --- a/zetasql/analyzer/query_resolver_helper.cc +++ b/zetasql/analyzer/query_resolver_helper.cc @@ -149,10 +149,11 @@ absl::Status ReleaseLegacyRollupColumnList( void QueryGroupByAndAggregateInfo::Reset() { has_group_by = false; has_aggregation = false; + is_group_by_all = false; aggregate_expr_map.clear(); group_by_columns_to_compute.clear(); group_by_expr_map.clear(); - grouping_list.clear(); + grouping_call_list.clear(); grouping_output_columns.clear(); grouping_set_list.clear(); aggregate_columns_to_compute.clear(); @@ -206,12 +207,12 @@ std::string SelectColumnState::DebugString(absl::string_view indent) const { } void SelectColumnStateList::AddSelectColumn( - const ASTExpression* ast_expr, IdString alias, bool is_explicit, + const ASTSelectColumn* ast_select_column, IdString alias, bool is_explicit, bool has_aggregation, bool has_analytic, bool has_volatile, std::unique_ptr resolved_expr) { AddSelectColumn(std::make_unique( - ast_expr, alias, is_explicit, has_aggregation, has_analytic, has_volatile, - std::move(resolved_expr))); + ast_select_column, alias, is_explicit, has_aggregation, has_analytic, + has_volatile, std::move(resolved_expr))); } void SelectColumnStateList::AddSelectColumn( @@ -379,7 +380,7 @@ void QueryResolutionInfo::AddGroupingSet(const GroupingSetInfo& grouping_set) { void QueryResolutionInfo::AddGroupingColumn( std::unique_ptr column) { - group_by_info_.grouping_list.push_back(std::move(column)); + group_by_info_.grouping_call_list.push_back(std::move(column)); } // Add the grouping column to the expr map, but since at this point it's an @@ -524,6 +525,14 @@ bool QueryResolutionInfo::HasAnalytic() const { return analytic_resolver_->HasAnalytic(); } +bool QueryResolutionInfo::HasAggregation() const { + return group_by_info_.has_aggregation; +} + +void QueryResolutionInfo::SetHasAggregation(bool value) { + group_by_info_.has_aggregation = value; +} + bool QueryResolutionInfo::SelectFormAllowsSelectStar() const { switch (select_form_) { case SelectForm::kClassic: @@ -565,10 +574,7 @@ void QueryResolutionInfo::ResetAnalyticResolver(Resolver* resolver) { resolver, analytic_resolver_->ReleaseNamedWindowInfoMap()); } -absl::Status QueryResolutionInfo::CheckComputedColumnListsAreEmpty() { - // grouping columns to compute are not used for any computation, so can clear - // when expecting lists to be empty. - group_by_info_.grouping_output_columns.clear(); +absl::Status QueryResolutionInfo::CheckComputedColumnListsAreEmpty() const { ZETASQL_RET_CHECK(select_list_columns_to_compute_before_aggregation_.empty()); ZETASQL_RET_CHECK(select_list_columns_to_compute_.empty()); ZETASQL_RET_CHECK(group_by_info_.group_by_columns_to_compute.empty()); @@ -595,6 +601,8 @@ std::string QueryResolutionInfo::DebugString() const { "\n"); absl::StrAppend(&debug_string, "has_aggregation: ", group_by_info_.has_aggregation, "\n"); + absl::StrAppend(&debug_string, + "is_group_by_all: ", group_by_info_.is_group_by_all, "\n"); const absl::string_view select_with_mode_str = [&] { switch (select_with_mode_) { @@ -620,6 +628,16 @@ std::string QueryResolutionInfo::DebugString() const { for (const auto& column : group_by_info_.aggregate_columns_to_compute) { absl::StrAppend(&debug_string, " ", column->DebugString(), "\n"); } + absl::StrAppend(&debug_string, "grouping_call_list(size ", + group_by_info_.grouping_call_list.size(), "):\n"); + for (const auto& grouping_call : group_by_info_.grouping_call_list) { + absl::StrAppend(&debug_string, " ", grouping_call->DebugString(), "\n"); + } + absl::StrAppend(&debug_string, "grouping_output_columns(size ", + group_by_info_.grouping_output_columns.size(), "):\n"); + for (const auto& column : group_by_info_.grouping_output_columns) { + absl::StrAppend(&debug_string, " ", column->DebugString(), "\n"); + } absl::StrAppend(&debug_string, "aggregate_expr_map size: ", group_by_info_.aggregate_expr_map.size(), "\n"); absl::StrAppend( diff --git a/zetasql/analyzer/query_resolver_helper.h b/zetasql/analyzer/query_resolver_helper.h index 152d0701a..6c19ea043 100644 --- a/zetasql/analyzer/query_resolver_helper.h +++ b/zetasql/analyzer/query_resolver_helper.h @@ -121,27 +121,46 @@ struct GroupingSetIds { GroupingSetKind kind; }; +// This stores a column to order by in the final ResolvedOrderByScan. struct OrderByItemInfo { - OrderByItemInfo(const ASTNode* ast_location_in, int64_t index, + // Constructor for ordering by an ordinal (column number). + OrderByItemInfo(const ASTNode* ast_location_in, + const ASTCollate* ast_collate_in, int64_t index, bool descending, ResolvedOrderByItemEnums::NullOrderMode null_order) : ast_location(ast_location_in), + ast_collate(ast_collate_in), select_list_index(index), is_descending(descending), null_order(null_order) {} + // Constructor for ordering by an expression. OrderByItemInfo(const ASTNode* ast_location_in, + const ASTCollate* ast_collate_in, std::unique_ptr expr, bool descending, ResolvedOrderByItemEnums::NullOrderMode null_order) : ast_location(ast_location_in), + ast_collate(ast_collate_in), order_expression(std::move(expr)), is_descending(descending), null_order(null_order) {} + // Constructor for ordering by a ResolvedColumn (that will exist when + // we get to MakeResolvedOrderByScan). + OrderByItemInfo(const ASTNode* ast_location_in, + const ASTCollate* ast_collate_in, + const ResolvedColumn& order_column_in, bool descending, + ResolvedOrderByItemEnums::NullOrderMode null_order) + : ast_location(ast_location_in), + ast_collate(ast_collate_in), + order_column(order_column_in), + is_descending(descending), + null_order(null_order) {} // This value is not valid as a 0-based select list index. static constexpr int64_t kInvalidSelectListIndex = std::numeric_limits::max(); - const ASTNode* ast_location; + const ASTNode* ast_location; // Expression being ordered by. + const ASTCollate* ast_collate; // Collate clause, if present. bool is_select_list_index() const { return select_list_index != kInvalidSelectListIndex; @@ -153,7 +172,12 @@ struct OrderByItemInfo { // be populated. int64_t select_list_index = kInvalidSelectListIndex; - // Only populated if == -1; + // Expression or ResolvedColumn to order by. + // Not populated if selecting by ordinal, i.e. != -1 + // The is originally filled in by ResolveOrderingExprs. + // AddColumnsForOrderByExprs resolves those expressions to a specific + // , which is then referenced in MakeResolvedOrderByScan. + // std::unique_ptr order_expression; ResolvedColumn order_column; @@ -203,7 +227,7 @@ struct QueryGroupByAndAggregateInfo { // Columns referenced by GROUPING function calls. A GROUPING function call // has a single ResolvedComputedColumn argument per call as well as an // output column to be referenced in column lists. - std::vector> grouping_list; + std::vector> grouping_call_list; // Aggregate function calls that must be computed. // This is built up as expressions are resolved. During expression @@ -254,10 +278,11 @@ struct QueryGroupByAndAggregateInfo { // TODO: Convert this to an enacapsulated class. struct SelectColumnState { explicit SelectColumnState( - const ASTExpression* ast_expr_in, IdString alias_in, bool is_explicit_in, - bool has_aggregation_in, bool has_analytic_in, bool has_volatile_in, + const ASTSelectColumn* ast_select_column, IdString alias_in, + bool is_explicit_in, bool has_aggregation_in, bool has_analytic_in, + bool has_volatile_in, std::unique_ptr resolved_expr_in) - : ast_expr(ast_expr_in), + : ast_expr(ast_select_column->expression()), alias(alias_in), is_explicit(is_explicit_in), select_list_position(-1), @@ -282,6 +307,7 @@ struct SelectColumnState { // Returns a multi-line debug string, where each line is prefixed by . std::string DebugString(absl::string_view indent = "") const; + // The expression for this selected column. // Points at the * if this came from SELECT *. const ASTExpression* ast_expr; @@ -308,6 +334,12 @@ struct SelectColumnState { // and will be set to NULL. std::unique_ptr resolved_expr; + // Unowned ResolvedExpr for this SELECT list item's original expression before + // it's updated to be a reference to a computed expression. + // It's only used if the `resolved_expr` is updated to a computed column. + // TODO: b/325532418 - propagate this as `resolved_expr` unconditionally + const ResolvedExpr* original_resolved_expr = nullptr; + // References the related ResolvedComputedColumn for this SELECT list column, // if one is needed. Otherwise it is NULL. The referenced // ResolvedComputedColumn is owned by a column list in QueryResolutionInfo. @@ -359,7 +391,7 @@ class SelectColumnStateList { // not change any scoping behavior except for the final check in strict mode // that may raise an error. For more information, please see the beginning of // (broken link). - void AddSelectColumn(const ASTExpression* ast_expr, IdString alias, + void AddSelectColumn(const ASTSelectColumn* ast_select_column, IdString alias, bool is_explicit, bool has_aggregation, bool has_analytic, bool has_volatile, std::unique_ptr resolved_expr); @@ -517,10 +549,24 @@ class QueryResolutionInfo { return !group_by_info_.grouping_set_list.empty(); } + // Returns whether the query contains GROUPING function calls. grouping_list + // and grouping_output_columns are populated at different stages, so we check + // them both. + bool HasGroupingCall() const { + return !group_by_info_.grouping_call_list.empty() || + !group_by_info_.grouping_output_columns.empty(); + } + // Returns whether or not the query includes analytic functions. bool HasAnalytic() const; - absl::Status CheckComputedColumnListsAreEmpty(); + // Returns whether or not the query includes aggregate functions. + bool HasAggregation() const; + + // Sets the boolean indicating the query contains an aggregation function. + void SetHasAggregation(bool value); + + absl::Status CheckComputedColumnListsAreEmpty() const; void set_is_post_distinct(bool is_post_distinct) { group_by_info_.is_post_distinct = is_post_distinct; @@ -547,15 +593,15 @@ class QueryResolutionInfo { } std::vector> - release_grouping_columns_list() { + release_grouping_call_list() { std::vector> tmp; - group_by_info_.grouping_list.swap(tmp); + group_by_info_.grouping_call_list.swap(tmp); return tmp; } const std::vector>& grouping_columns_list() const { - return group_by_info_.grouping_list; + return group_by_info_.grouping_call_list; } const std::vector>& @@ -596,7 +642,6 @@ class QueryResolutionInfo { const std::vector& order_by_item_info() { return order_by_item_info_; } - std::vector* mutable_order_by_item_info() { return &order_by_item_info_; } @@ -766,6 +811,8 @@ class QueryResolutionInfo { // columns if the query has GROUP BY or aggregation, and either HAVING or // ORDER BY is present in the query. This list only contains SELECT columns // that do not themselves include aggregation. + // Note that, the computed columns might or might not be used by other parts + // of the query like GROUP BY, HAVING, ORDER BY, etc. std::vector> select_list_columns_to_compute_before_aggregation_; @@ -832,7 +879,7 @@ class QueryResolutionInfo { // ORDER BY information. bool has_order_by_ = false; - // List of ORDER BY information. + // List of items from ORDER BY. std::vector order_by_item_info_; // DML THEN RETURN information, where it also uses the select list. diff --git a/zetasql/analyzer/resolver.cc b/zetasql/analyzer/resolver.cc index 83542a157..2daac6045 100644 --- a/zetasql/analyzer/resolver.cc +++ b/zetasql/analyzer/resolver.cc @@ -237,7 +237,7 @@ Resolver::MakeResolvedLiteralWithoutLocation(const Value& value) { absl::Status Resolver::AddAdditionalDeprecationWarningsForCalledFunction( const ASTNode* ast_location, const FunctionSignature& signature, - const std::string& function_name, bool is_tvf) { + absl::string_view function_name, bool is_tvf) { std::set warning_kinds_seen; for (const FreestandingDeprecationWarning& warning : signature.AdditionalDeprecationWarnings()) { @@ -466,7 +466,7 @@ ResolvedColumnList Resolver::ConcatColumnLists( ResolvedColumnList Resolver::ConcatColumnListWithComputedColumnsAndSort( const ResolvedColumnList& column_list, - const std::vector>& + absl::Span> computed_columns) { ResolvedColumnList out = column_list; for (const std::unique_ptr& computed_column : @@ -818,6 +818,7 @@ absl::Status Resolver::ResolveHintOrOptionAndAppend( const ASTIdentifier* ast_name, HintOrOptionType hint_or_option_type, const AllowedHintsAndOptions& allowed, const NameScope* from_name_scope, ASTOptionsEntry::AssignmentOp option_assignment_op, + bool allow_alter_array_operators, std::vector>* option_list) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); ZETASQL_RET_CHECK(ast_name != nullptr); @@ -843,7 +844,7 @@ absl::Status Resolver::ResolveHintOrOptionAndAppend( ZETASQL_ASSIGN_OR_RETURN(auto option_properties, GetHintOrOptionProperties(allowed, qualifier, ast_name, name, hint_or_option_type)); - auto [expected_type, resolving_kind, allow_alter_array] = + auto [expected_type, resolving_kind, option_allow_alter_array] = std::move(option_properties); switch (resolving_kind) { case AllowedHintsAndOptionsProto::OptionProto:: @@ -928,7 +929,7 @@ absl::Status Resolver::ResolveHintOrOptionAndAppend( if (hint_or_option_type == HintOrOptionType::Option && option_assignment_op != ASTOptionsEntry::AssignmentOp::ASSIGN) { - if (!allow_alter_array) { + if (!allow_alter_array_operators || !option_allow_alter_array) { return MakeSqlErrorAt(ast_name) << "Operators '+=' and '-=' are not allowed for option " << name; } @@ -990,7 +991,8 @@ absl::Status Resolver::ResolveHintAndAppend( ast_hint_entry->value(), ast_hint_entry->qualifier(), ast_hint_entry->name(), HintOrOptionType::Hint, analyzer_options_.allowed_hints_and_options(), - /*from_name_scope=*/nullptr, ASTOptionsEntry::ASSIGN, hints)); + /*from_name_scope=*/nullptr, ASTOptionsEntry::ASSIGN, + /*allow_alter_array_operators=*/false, hints)); } return absl::OkStatus(); @@ -1098,7 +1100,7 @@ absl::Status Resolver::ResolveTableAndColumnInfoList( } absl::Status Resolver::ResolveOptionsList( - const ASTOptionsList* options_list, + const ASTOptionsList* options_list, bool allow_alter_array_operators, std::vector>* resolved_options) { // Function arguments are never resolved inside options. Sanity check to make // sure none are accidentally in scope. @@ -1122,7 +1124,7 @@ absl::Status Resolver::ResolveOptionsList( options_entry->name(), HintOrOptionType::Option, analyzer_options_.allowed_hints_and_options(), /*from_name_scope=*/nullptr, options_entry->assignment_op(), - resolved_options)); + allow_alter_array_operators, resolved_options)); } } return absl::OkStatus(); @@ -1163,7 +1165,8 @@ absl::Status Resolver::ResolveAnonymizationOptionsList( options_entry->value(), /*ast_qualifier=*/nullptr, options_entry->name(), option_type, analyzer_options_.allowed_hints_and_options(), &from_name_scope, - options_entry->assignment_op(), resolved_options)); + options_entry->assignment_op(), + /*allow_alter_array_operators=*/false, resolved_options)); } // Validate that if epsilon is specified, then only at most one of delta or @@ -1282,7 +1285,8 @@ absl::Status Resolver::ResolveAggregationThresholdOptionsList( options_entry->value(), /*ast_qualifier=*/nullptr, options_entry->name(), HintOrOptionType::AggregationThresholdOption, analyzer_options_.allowed_hints_and_options(), &from_name_scope, - options_entry->assignment_op(), resolved_options)); + options_entry->assignment_op(), + /*allow_alter_array_operators=*/false, resolved_options)); } // Validate that at most one of the options max_groups_contributed, @@ -1327,7 +1331,7 @@ absl::Status Resolver::ResolveAnonWithReportOptionsList( options_entry->value(), /*ast_qualifier=*/nullptr, options_entry->name(), HintOrOptionType::Option, allowed_report_options, /*from_name_scope=*/nullptr, options_entry->assignment_op(), - resolved_options)); + /*allow_alter_array_operators=*/false, resolved_options)); ZETASQL_RET_CHECK_EQ(resolved_options->size(), 1); ZETASQL_RET_CHECK( @@ -1880,7 +1884,7 @@ void Resolver::RecordColumnAccess( } void Resolver::RecordColumnAccess( - const std::vector& columns, + absl::Span columns, ResolvedStatement::ObjectAccess access_flags) { for (const ResolvedColumn& column : columns) { RecordColumnAccess(column, access_flags); diff --git a/zetasql/analyzer/resolver.h b/zetasql/analyzer/resolver.h index f9e8f9ad6..a6c9e8972 100644 --- a/zetasql/analyzer/resolver.h +++ b/zetasql/analyzer/resolver.h @@ -121,6 +121,10 @@ class Resolver { AggregationThresholdOption, }; + enum class OrderBySimpleMode { + kNormal, + }; + struct GeneratedColumnIndexAndResolvedId { int column_index; int resolved_column_id; @@ -212,8 +216,7 @@ class Resolver { // checks that the query in the SQL body returns one column of a type that is // equal or implicitly coercible to the value-table type. absl::Status CheckSQLBodyReturnTypesAndCoerceIfNeeded( - const ASTNode* statement_location, - const TVFRelation& return_tvf_relation, + const ASTNode* statement_location, const TVFRelation& return_tvf_relation, const NameList* tvf_body_name_list, std::unique_ptr* resolved_query, std::vector>* @@ -288,7 +291,7 @@ class Resolver { ABSL_DEPRECATED( "Use CoerceExprToType function with argument.") absl::Status CoerceExprToType( - const ASTNode* ast_location, const Type* target_type, CoercionMode mode, + const ASTNode* ast_location, const Type* target_type, CoercionMode mode, std::unique_ptr* resolved_expr) const; // Same as the previous method but is used to contain @@ -445,6 +448,8 @@ class Resolver { static const IdString& kArrayOffsetId; static const IdString& kLambdaArgId; static const IdString& kWithActionId; + static const IdString& kRecursionDepthAlias; + static const IdString& kRecursionDepthId; // Input SQL query text. Set before resolving a statement, expression or // type. @@ -740,7 +745,7 @@ class Resolver { // corresponding to 'signature'. absl::Status AddAdditionalDeprecationWarningsForCalledFunction( const ASTNode* ast_location, const FunctionSignature& signature, - const std::string& function_name, bool is_tvf); + absl::string_view function_name, bool is_tvf); // Adds a deprecation warning pointing at . If // is non-NULL, it is added to the new deprecation warning as an ErrorSource. @@ -751,8 +756,8 @@ class Resolver { const std::string& message, const FreestandingDeprecationWarning* source_warning = nullptr); - static ResolvedColumnList ConcatColumnLists( - const ResolvedColumnList& left, const ResolvedColumnList& right); + static ResolvedColumnList ConcatColumnLists(const ResolvedColumnList& left, + const ResolvedColumnList& right); // Appends the ResolvedColumns in to those in // , returning a new ResolvedColumnList. The returned @@ -761,7 +766,7 @@ class Resolver { // the result plan better against the pre-refactoring plans. static ResolvedColumnList ConcatColumnListWithComputedColumnsAndSort( const ResolvedColumnList& column_list, - const std::vector>& + absl::Span> computed_columns); // Returns the alias of the given column (if not internal). Otherwise returns @@ -1025,7 +1030,7 @@ class Resolver { absl::Status ResolveForeignKeys( absl::Span ast_table_elements, const ColumnIndexMap& column_indexes, - const std::vector>& + absl::Span> column_definitions, std::set* constraint_names, @@ -1081,8 +1086,8 @@ class Resolver { // Resolve a CREATE INDEX statement. absl::Status ResolveCreateIndexStatement( - const ASTCreateIndexStatement* ast_statement, - std::unique_ptr* output); + const ASTCreateIndexStatement* ast_statement, + std::unique_ptr* output); // Validates 'resolved_expr' on an index key or storing clause of an index. // @@ -1145,6 +1150,11 @@ class Resolver { const ASTCreateSchemaStatement* ast_statement, std::unique_ptr* output); + // Resolves a CREATE EXTERNAL SCHEMA statement. + absl::Status ResolveCreateExternalSchemaStatement( + const ASTCreateExternalSchemaStatement* ast_statement, + std::unique_ptr* output); + // Resolves a CREATE VIEW statement absl::Status ResolveCreateViewStatement( const ASTCreateViewStatement* ast_statement, @@ -1570,6 +1580,10 @@ class Resolver { const ASTAlterSchemaStatement* ast_statement, std::unique_ptr* output); + absl::Status ResolveAlterExternalSchemaStatement( + const ASTAlterExternalSchemaStatement* ast_statement, + std::unique_ptr* output); + absl::Status ResolveAlterTableStatement( const ASTAlterTableStatement* ast_statement, std::unique_ptr* output); @@ -1606,7 +1620,8 @@ class Resolver { const ASTAnalyzeStatement* ast_statement, std::unique_ptr* output); - absl::Status ResolveAssertStatement(const ASTAssertStatement* ast_statement, + absl::Status ResolveAssertStatement( + const ASTAssertStatement* ast_statement, std::unique_ptr* output); // Resolve an ASTQuery ignoring its ASTWithClause. This is only called from @@ -1653,6 +1668,18 @@ class Resolver { absl::StatusOr> ResolveAliasedQuery( const ASTAliasedQuery* with_entry, bool recursive); + // Validates the to the with entry with alias . + // is true only when a WITH entry is actually recursive, as + // opposed to merely belonging to a WITH clause with the RECURSIVE keyword. + absl::Status ValidateAliasedQueryModifiers( + IdString query_alias, const ASTAliasedQueryModifiers* modifiers, + bool is_recursive); + + // Resolves the depth modifier to a recursive WITH entry. + absl::StatusOr> + ResolveRecursionDepthModifier( + const ASTRecursionDepthModifier* recursion_depth_modifier); + // Called only for the query associated with an actually-recursive WITH // entry. Verifies that the query is a UNION and returns the ASTSetOperation // node representing that UNION. @@ -1729,6 +1756,14 @@ class Resolver { const std::shared_ptr& from_clause_name_list, std::shared_ptr* output_name_list); + // Check that `select` has no child nodes present other than those + // in `allowed_children`. + // `node_context` is a name for the error message. + absl::Status CheckForUnwantedSelectClauseChildNodes( + const ASTSelect* select, + absl::flat_hash_set allowed_children, + const char* node_context); + // Resolves TableDataSource to a ResolvedScan for copy or clone operation. absl::Status ResolveDataSourceForCopyOrClone( const ASTTableDataSource* data_source, @@ -1817,9 +1852,8 @@ class Resolver { // generate a ResolvedFilterScan for it. The will be // wrapped with this new ResolvedFilterScan. absl::Status ResolveWhereClauseAndCreateScan( - const ASTWhereClause* where_clause, - const NameScope* from_scan_scope, - std::unique_ptr* current_scan); + const ASTWhereClause* where_clause, const NameScope* from_scan_scope, + std::unique_ptr* current_scan); // Performs first pass analysis on the SELECT list expressions. This // pass includes star and dot-star expansion, and resolves expressions @@ -1863,16 +1897,14 @@ class Resolver { // expressions against GROUP BY scope if necessary. After this pass, each // SelectColumnState has an initialized output ResolvedColumn. absl::Status ResolveSelectListExprsSecondPass( - IdString query_alias, - const NameScope* group_by_scope, + IdString query_alias, const NameScope* group_by_scope, std::shared_ptr* final_project_name_list, QueryResolutionInfo* query_resolution_info); // Performs second pass analysis on a SELECT list expression, as indicated // by . absl::Status ResolveSelectColumnSecondPass( - IdString query_alias, - const NameScope* group_by_scope, + IdString query_alias, const NameScope* group_by_scope, SelectColumnState* select_column_state, std::shared_ptr* final_project_name_list, QueryResolutionInfo* query_resolution_info); @@ -1891,11 +1923,9 @@ class Resolver { // and is used to check that excluded names actually exist. // is the scope for resolving full expressions in REPLACE. absl::Status ResolveSelectStarModifiers( - const ASTNode* ast_location, - const ASTStarModifiers* modifiers, - const NameList* name_list_for_star, - const Type* type_for_star, - const NameScope* scope, + const ASTSelectColumn* ast_select_column, + const ASTStarModifiers* modifiers, const NameList* name_list_for_star, + const Type* type_for_star, const NameScope* scope, QueryResolutionInfo* query_resolution_info, ColumnReplacements* column_replacements); @@ -1904,7 +1934,7 @@ class Resolver { // . // can be ASTStar or ASTStarWithModifiers. absl::Status ResolveSelectStar( - const ASTExpression* ast_select_expr, + const ASTSelectColumn* ast_select_column, const std::shared_ptr& from_clause_name_list, const NameScope* from_scan_scope, QueryResolutionInfo* query_resolution_info); @@ -1919,10 +1949,9 @@ class Resolver { // be added to to materialize the struct/proto before // extracting its fields. // can be ASTStar or ASTStarWithModifiers. - absl::Status ResolveSelectDotStar( - const ASTExpression* ast_dotstar, - const NameScope* from_scan_scope, - QueryResolutionInfo* query_resolution_info); + absl::Status ResolveSelectDotStar(const ASTSelectColumn* ast_select_column, + const NameScope* from_scan_scope, + QueryResolutionInfo* query_resolution_info); // Adds all fields of the column referenced by `src_column_ref` to // `select_column_state_list`, like we do for 'SELECT column.*'. @@ -1934,7 +1963,7 @@ class Resolver { // then if `column_alias_if_no_fields` is non-empty, emits the column itself, // and otherwise returns an error. absl::Status AddColumnFieldsToSelectList( - const ASTExpression* ast_expression, + const ASTSelectColumn* ast_select_column, const ResolvedColumnRef* src_column_ref, bool src_column_has_aggregation, bool src_column_has_analytic, bool src_column_has_volatile, IdString column_alias_if_no_fields, @@ -1945,7 +1974,7 @@ class Resolver { // Add all columns in into , optionally // excluding value table fields that have been marked as excluded. absl::Status AddNameListToSelectList( - const ASTExpression* ast_expression, + const ASTSelectColumn* ast_select_column, const std::shared_ptr& name_list, const CorrelatedColumnsSetList& correlated_columns_set_list, bool ignore_excluded_value_table_fields, @@ -1970,8 +1999,7 @@ class Resolver { // Updates with the mapping between pre-distinct and // post-distinct versions of columns. absl::Status ResolveSelectDistinct( - const ASTSelect* select, - SelectColumnStateList* select_column_state_list, + const ASTSelect* select, SelectColumnStateList* select_column_state_list, const NameList* input_name_list, std::unique_ptr* current_scan, QueryResolutionInfo* query_resolution_info, @@ -2058,8 +2086,8 @@ class Resolver { // will be updated to point at the wrapper // ResolvedAnalyticScan. absl::Status AddAnalyticScan( - QueryResolutionInfo* query_resolution_info, - std::unique_ptr* current_scan); + QueryResolutionInfo* query_resolution_info, + std::unique_ptr* current_scan); // Create a new scan wrapping converting it to a struct type. // If is NULL, convert to a new anonymous struct type. @@ -2384,6 +2412,25 @@ class Resolver { std::unique_ptr* output, std::shared_ptr* output_name_list); + // Handles recursion depth modifier for recursive scan. + // - : the corresponding AST of the depth modifier; + // - : the name of the alias used in the query to + // refer to the recursive table reference. + // - : an optional recursion depth modifier; + // - : Receives a scan containing the result. + // - : Receives a NameList containing the columns of the + // result. + // + // Note that we do post-process of recursion depth modifier because + // the intermediate resolution result will also be used to resolve the + // recursive reference to which recursion depth column is not visible. + absl::Status FinishResolveRecursionWithModifier( + const ASTNode* ast_location, + const std::vector& recursive_alias, + std::unique_ptr depth_modifier, + std::unique_ptr* output, + std::shared_ptr* output_name_list); + private: // Represents the result of resolving one input to the set operation. struct ResolvedInputResult { @@ -2731,10 +2778,11 @@ class Resolver { // Resolves a standalone ORDER BY outside the context of a SELECT. // A ResolvedOrderByScan will be added to `scan`. - absl::Status ResolveOrderBySimple( - const ASTOrderBy* order_by, const NameScope* scope, - const char* clause_name, - std::unique_ptr* scan); + absl::Status ResolveOrderBySimple(const ASTOrderBy* order_by, + const NameScope* scope, + const char* clause_name, + OrderBySimpleMode mode, + std::unique_ptr* scan); // Resolves the table name and predicate expression in an ALTER ROW POLICY // or CREATE ROW POLICY statement. @@ -2752,7 +2800,7 @@ class Resolver { // If the ORDER BY expression is not a column reference or is an outer // reference, then create a ResolvedComputedColumn and insert it into // . - void AddColumnsForOrderByExprs( + absl::Status AddColumnsForOrderByExprs( IdString query_alias, std::vector* order_by_info, std::vector>* computed_columns); @@ -2770,16 +2818,23 @@ class Resolver { // Resolves the given LIMIT or OFFSET clause and stores the // resolved expression in . absl::Status ResolveLimitOrOffsetExpr( - const ASTExpression* ast_expr, - const char* clause_name, + const ASTExpression* ast_expr, const char* clause_name, ExprResolutionInfo* expr_resolution_info, - std::unique_ptr* resolved_expr); + std::unique_ptr* expr); - // Resolve LIMIT and OFFSET clause and add a ResolvedFilterScan onto `scan`. + // Resolves LIMIT and OFFSET clause and adds a ResolvedFilterScan onto `scan`. + // Accepts clauses from ASTLimitOffset. Requires LIMIT to be present. absl::Status ResolveLimitOffsetScan( - const ASTLimitOffset* limit_offset, + const ASTLimitOffset* limit_offset, const NameScope* name_scope, std::unique_ptr* scan); + // Resolves LIMIT and OFFSET clause and adds a ResolvedFilterScan onto `scan`. + // Accepts LIMIT and OFFSET clauses separately. Does not require LIMIT to be + // present. + absl::Status ResolveLimitOffsetScan( + const ASTExpression* limit, const ASTExpression* offset, + const NameScope* name_scope, std::unique_ptr* scan); + // Translates the enum representing an IGNORE NULLS or RESPECT NULLS modifier. ResolvedNonScalarFunctionCallBase::NullHandlingModifier ResolveNullHandlingModifier( @@ -2910,10 +2965,8 @@ class Resolver { // includes all names visible in plus // names earlier in the same FROM clause that are visible. absl::Status ResolveTableExpression( - const ASTTableExpression* table_expr, - const NameScope* external_scope, - const NameScope* local_scope, - std::unique_ptr* output, + const ASTTableExpression* table_expr, const NameScope* external_scope, + const NameScope* local_scope, std::unique_ptr* output, std::shared_ptr* output_name_list); // Table referenced through a path expression. @@ -2933,17 +2986,13 @@ class Resolver { // comprise a single-part name with exactly one element. The is // optional and may be NULL. absl::Status ResolvePathExpressionAsFunctionTableArgument( - const ASTPathExpression* path_expr, - const ASTHint* hint, - IdString alias, - const ASTNode* ast_location, - std::unique_ptr* output, + const ASTPathExpression* path_expr, const ASTHint* hint, IdString alias, + const ASTNode* ast_location, std::unique_ptr* output, std::shared_ptr* output_name_list); // Table referenced through a subquery. absl::Status ResolveTableSubquery( - const ASTTableSubquery* table_ref, - const NameScope* scope, + const ASTTableSubquery* table_ref, const NameScope* scope, std::unique_ptr* output, std::shared_ptr* output_name_list); @@ -2977,12 +3026,10 @@ class Resolver { NameList* output_name_list, std::unique_ptr* join_condition); - absl::Status ResolveJoin( - const ASTJoin* join, - const NameScope* external_scope, - const NameScope* local_scope, - std::unique_ptr* output, - std::shared_ptr* output_name_list); + absl::Status ResolveJoin(const ASTJoin* join, const NameScope* external_scope, + const NameScope* local_scope, + std::unique_ptr* output, + std::shared_ptr* output_name_list); absl::Status ResolveJoinRhs( const ASTJoin* join, const NameScope* external_scope, @@ -2995,7 +3042,7 @@ class Resolver { absl::Status AddScansForJoin( const ASTJoin* join, std::unique_ptr resolved_lhs, std::unique_ptr resolved_rhs, - ResolvedJoinScan::JoinType resolved_join_type, + ResolvedJoinScan::JoinType resolved_join_type, bool has_using, std::unique_ptr join_condition, std::vector> computed_columns, @@ -3003,8 +3050,7 @@ class Resolver { absl::Status ResolveParenthesizedJoin( const ASTParenthesizedJoin* parenthesized_join, - const NameScope* external_scope, - const NameScope* local_scope, + const NameScope* external_scope, const NameScope* local_scope, std::unique_ptr* output, std::shared_ptr* output_name_list); @@ -3105,8 +3151,7 @@ class Resolver { absl::Status CoerceOrRearrangeTVFRelationArgColumns( const FunctionArgumentType& tvf_signature_arg, int arg_idx, const SignatureMatchResult& signature_match_result, - const ASTNode* ast_location, - ResolvedTVFArg* resolved_tvf_arg); + const ASTNode* ast_location, ResolvedTVFArg* resolved_tvf_arg); // Resolve a column in the USING clause on one side of the join. // is "left" or "right", for error messages. @@ -3162,6 +3207,10 @@ class Resolver { const NameScope* scope, std::unique_ptr* output, std::shared_ptr* output_name_list); + // Resolve ASTNullOrder to the enum, checking LanguageFeatures. + absl::StatusOr ResolveNullOrderMode( + const ASTNullOrder* null_order); + // Performs initial resolution of ordering expressions, and distinguishes // between select list ordinals and other resolved expressions. // The OrderByInfo in ->query_resolution_info is @@ -3171,27 +3220,35 @@ class Resolver { ExprResolutionInfo* expr_resolution_info, std::vector* order_by_info); - // Resolves the into , which is - // used for resolving both select ORDER BY clause and ORDER BY arguments - // in the aggregate functions. + // A list of `const vector&`. Used because the object is + // not copyable, for passing lists of one or more of those vectors using + // {vector1, vector2} syntax. + typedef const std::vector< + std::reference_wrapper>> + OrderByItemInfoVectorList; + + // Resolves into , which is + // used for resolving both ORDER BY clauses and ORDER BY arguments in + // aggregate functions. + // // Validation is performed to ensure that the ORDER BY expression result // types support ordering. For resolving select ORDER BY clause, ensures // that the select list ordinal references are within bounds. // The returned ResolvedOrderByItem objects are stored in // . absl::Status ResolveOrderByItems( - const ASTOrderBy* order_by, const std::vector& output_column_list, - const std::vector& order_by_info, + const OrderByItemInfoVectorList& order_by_info_lists, std::vector>* resolved_order_by_items); - // Make a ResolvedOrderByScan from the , adding onto . - // Any hints associated with are resolved. + // Make a ResolvedOrderByScan from the , adding onto + // . + // Hints are included if is non-null. absl::Status MakeResolvedOrderByScan( - const ASTOrderBy* order_by, + const ASTHint* order_by_hint, const std::vector& output_column_list, - const std::vector& order_by_info, + const OrderByItemInfoVectorList& order_by_info_lists, std::unique_ptr* scan); // Make a ResolvedColumnRef for . Caller owns the returned object. @@ -3493,8 +3550,7 @@ class Resolver { absl::Status ResolveExtensionFieldAccess( std::unique_ptr resolved_lhs, const ResolveExtensionFieldOptions& options, - const ASTPathExpression* ast_path_expr, - FlattenState* flatten_state, + const ASTPathExpression* ast_path_expr, FlattenState* flatten_state, std::unique_ptr* resolved_expr_out); absl::Status ResolveOneofCase( @@ -3551,6 +3607,11 @@ class Resolver { ExprResolutionInfo* expr_resolution_info, std::unique_ptr* resolved_expr_out); + absl::Status ResolveLikeExprList( + const ASTLikeExpression* like_expr, + ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out); + absl::Status ResolveLikeExprSubquery( const ASTLikeExpression* like_subquery_expr, ExprResolutionInfo* expr_resolution_info, @@ -3680,8 +3741,7 @@ class Resolver { // string, returns an error. is used for formatting error // messages. absl::Status ResolveFormatOrTimeZoneExpr( - const ASTExpression* expr, - ExprResolutionInfo* expr_resolution_info, + const ASTExpression* expr, ExprResolutionInfo* expr_resolution_info, const char* clause_name, std::unique_ptr* resolved_expr); @@ -3815,9 +3875,9 @@ class Resolver { std::unique_ptr* resolved_expr_out); absl::Status ResolveExtractExpression( - const ASTExtractExpression* extract_expression, - ExprResolutionInfo* expr_resolution_info, - std::unique_ptr* resolved_expr_out); + const ASTExtractExpression* extract_expression, + ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out); absl::Status ResolveNewConstructor( const ASTNewConstructor* ast_new_constructor, @@ -3844,6 +3904,23 @@ class Resolver { ExprResolutionInfo* expr_resolution_info, std::unique_ptr* resolved_expr_out); + absl::Status ResolveStructBracedConstructor( + const ASTStructBracedConstructor* ast_struct_braced_constructor, + const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out); + + absl::Status ResolveBracedConstructorForStruct( + const ASTBracedConstructor* ast_braced_constructor, bool is_bare_struct, + const ASTNode* expression_location_node, + const ASTStructType* ast_struct_type, const Type* inferred_type, + ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out); + + absl::Status ResolveBracedConstructorForProto( + const ASTBracedConstructor* ast_braced_constructor, + const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out); + absl::Status ResolveArrayConstructor( const ASTArrayConstructor* ast_array_constructor, const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, @@ -3884,8 +3961,9 @@ class Resolver { absl::Status ResolveStructConstructorImpl( const ASTNode* ast_location, const ASTStructType* ast_struct_type, absl::Span ast_field_expressions, - absl::Span ast_field_aliases, - const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, + absl::Span ast_field_identifiers, + const Type* inferred_type, bool require_name_match, + ExprResolutionInfo* expr_resolution_info, std::unique_ptr* resolved_expr_out); // If is not null, sets it to the resolved date part. @@ -4136,8 +4214,7 @@ class Resolver { // Common implementation for resolving a single argument of all expressions. // Pushes the related ResolvedExpr onto . absl::Status ResolveExpressionArgument( - const ASTExpression* arg, - ExprResolutionInfo* expr_resolution_info, + const ASTExpression* arg, ExprResolutionInfo* expr_resolution_info, std::vector>* resolved_arguments); // Common implementation for resolving the children of all expressions. @@ -4318,11 +4395,16 @@ class Resolver { // FROM_NAME_SCOPE_IDENTIFIER. It cannot be null when the option resolved has // AllowedOptionProperties::resolving_kind == FROM_NAME_SCOPE_IDENTIFIER in // AllowedHintsAndOptions. Otherwise it can be null. + // indicates whether using the "+=" and "-=" + // operators on array typed options is allowed by the statement. For example, + // using those operators on an option make sense when in the ALTER statements, + // but not in the CREATE statement. absl::Status ResolveHintOrOptionAndAppend( const ASTExpression* ast_value, const ASTIdentifier* ast_qualifier, const ASTIdentifier* ast_name, HintOrOptionType hint_or_option_type, const AllowedHintsAndOptions& allowed, const NameScope* from_name_scope, ASTOptionsEntry::AssignmentOp option_assignment_op, + bool allow_alter_array_operators, std::vector>* option_list); // Resolve and add entries into . @@ -4338,8 +4420,10 @@ class Resolver { // Resolve and add the options onto // as ResolvedHints. + // indicates whether the += and -= operators + // should be allowed. absl::Status ResolveOptionsList( - const ASTOptionsList* options_list, + const ASTOptionsList* options_list, bool allow_alter_array_operators, std::vector>* resolved_options); // Resolve and add the entry into @@ -4530,6 +4614,14 @@ class Resolver { absl::Status MaybeResolveCollationForSubqueryExpr( const ASTNode* error_location, ResolvedSubqueryExpr* subquery_expr); + // Helper to resolve [NOT] LIKE ANY|SOME|ALL expressions when used with IN + // list or UNNEST array. + absl::Status ResolveLikeAnyAllExpressionHelper( + const ASTLikeExpression* like_expr, + absl::Span arguments, + ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out); + void FetchCorrelatedSubqueryParameters( const CorrelatedColumnsSet& correlated_columns_set, std::vector>* parameters); @@ -4586,12 +4678,12 @@ class Resolver { // with any existing access. If analyzer_options_.prune_unused_columns is // true, columns without any recorded access will be removed from the // table_scan(). - void RecordColumnAccess(const ResolvedColumn& column, - ResolvedStatement::ObjectAccess access_flags = - ResolvedStatement::READ); - void RecordColumnAccess(const std::vector& columns, - ResolvedStatement::ObjectAccess access_flags = - ResolvedStatement::READ); + void RecordColumnAccess( + const ResolvedColumn& column, + ResolvedStatement::ObjectAccess access_flags = ResolvedStatement::READ); + void RecordColumnAccess( + absl::Span columns, + ResolvedStatement::ObjectAccess access_flags = ResolvedStatement::READ); // For all ResolvedScan nodes under , prune the column_lists to remove // any columns not included in referenced_columns_. This removes any columns @@ -4913,13 +5005,48 @@ class Resolver { const ASTUnnestExpression* unnest_expr, absl::string_view expression_type) const; - // Validates the given named argument `array_zip_mode`: - // - If the given `array_zip_mode` is nullptr, returns OK. - // - Otherwise, returns an SQL error that `array_zip_mode` is not implemented. - // TODO: Update this function with checks on the `array_zip_mode` - // once we are ready to implement this functionality. - absl::Status ValidateArrayZipMode( - const ASTNamedArgument* array_zip_mode) const; + // Information about an array element column's alias name and parse location. + struct UnnestArrayColumnAlias { + IdString alias; + const ASTNode* alias_location; + }; + + // Enforce correct application of table and column aliases of the given + // explicit UNNEST expression `table_ref`. + absl::Status ValidateUnnestAliases(const ASTTablePathExpression* table_ref); + + // Resolve the `mode` argument for the given UNNEST expression and coerce to + // ARRAY_ZIP_MODE enum type if necessary. + absl::StatusOr> ResolveArrayZipMode( + const ASTUnnestExpression* unnest, ExprResolutionInfo* info); + + // Obtain alias name and location for a given array argument in multiway + // UNNEST, whose number of arguments should be greater than 1. + // Returns empty alias if no explicit alias is provided or it's impossible to + // infer alias. The returned alias_location should always be valid. + UnnestArrayColumnAlias GetArrayElementColumnAlias( + const ASTExpressionWithOptAlias* argument); + + // Resolve a single array argument for explicit UNNEST. + absl::Status ResolveArrayArgumentForExplicitUnnest( + const ASTExpressionWithOptAlias* argument, + UnnestArrayColumnAlias& arg_alias, ExprResolutionInfo* info, + std::vector& output_alias_list, + ResolvedColumnList& output_column_list, + std::shared_ptr& output_name_list, + std::vector>& + resolved_array_expr_list, + std::vector& resolved_element_column_list); + + // Resolve N-ary UNNEST operator with N >= 1. + absl::Status ResolveUnnest( + const ASTTablePathExpression* table_ref, ExprResolutionInfo* info, + std::vector& output_alias_list, + ResolvedColumnList& output_column_list, + std::shared_ptr& output_name_list, + std::vector>& + resolved_array_expr_list, + std::vector& resolved_element_column_list); friend class AnalyticFunctionResolver; friend class FunctionResolver; diff --git a/zetasql/analyzer/resolver_alter_stmt.cc b/zetasql/analyzer/resolver_alter_stmt.cc index d4e37f71d..9c16f8239 100644 --- a/zetasql/analyzer/resolver_alter_stmt.cc +++ b/zetasql/analyzer/resolver_alter_stmt.cc @@ -215,7 +215,7 @@ absl::Status Resolver::ResolveAlterActions( std::vector> resolved_options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList( action->GetAsOrDie()->options_list(), - &resolved_options)); + /*allow_alter_array_operators=*/false, &resolved_options)); alter_actions->push_back( MakeResolvedSetOptionsAction(std::move(resolved_options))); } break; @@ -519,7 +519,8 @@ absl::Status Resolver::ResolveAlterActions( std::vector> resolved_options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList( - ast_add_sub_entity_action->options_list(), &resolved_options)); + ast_add_sub_entity_action->options_list(), + /*allow_alter_array_operators=*/false, &resolved_options)); auto resolved_add_sub_entity_action = MakeResolvedAddSubEntityAction( ast_add_sub_entity_action->type()->GetAsString(), @@ -584,6 +585,25 @@ absl::Status Resolver::ResolveAlterSchemaStatement( return absl::OkStatus(); } +absl::Status Resolver::ResolveAlterExternalSchemaStatement( + const ASTAlterExternalSchemaStatement* ast_statement, + std::unique_ptr* output) { + bool has_only_set_options_action = true; + std::vector> + resolved_alter_actions; + // path() should never be null here because EXTERNAL SCHEMA is a + // schema_object_kind not a generic_entity_type. + ZETASQL_RET_CHECK(ast_statement->path() != nullptr); + ZETASQL_RETURN_IF_ERROR(ResolveAlterActions(ast_statement, "EXTERNAL SCHEMA", output, + &has_only_set_options_action, + &resolved_alter_actions)); + + *output = MakeResolvedAlterExternalSchemaStmt( + ast_statement->path()->ToIdentifierVector(), + std::move(resolved_alter_actions), ast_statement->is_if_exists()); + return absl::OkStatus(); +} + absl::Status Resolver::ResolveAlterTableStatement( const ASTAlterTableStatement* ast_statement, std::unique_ptr* output) { @@ -617,7 +637,7 @@ absl::Status Resolver::ResolveAlterTableStatement( for (const ASTAlterAction* const action : action_list->actions()) { ZETASQL_RETURN_IF_ERROR(ResolveOptionsList( action->GetAsOrDie()->options_list(), - &resolved_options)); + /*allow_alter_array_operators=*/false, &resolved_options)); } *output = MakeResolvedAlterTableSetOptionsStmt( alter_statement->name_path(), std::move(resolved_options), @@ -946,8 +966,9 @@ absl::Status Resolver::ResolveAlterColumnOptionsAction( } } std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(action->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(action->options_list(), + /*allow_alter_array_operators=*/true, + &resolved_options)); *alter_action = MakeResolvedAlterColumnOptionsAction( action->is_if_exists(), column_name.ToString(), std::move(resolved_options)); diff --git a/zetasql/analyzer/resolver_dml.cc b/zetasql/analyzer/resolver_dml.cc index f0e7bac5b..8866e5bc8 100644 --- a/zetasql/analyzer/resolver_dml.cc +++ b/zetasql/analyzer/resolver_dml.cc @@ -806,6 +806,10 @@ absl::Status Resolver::ResolveInsertStatementImpl( resolved_columns_to_catalog_columns_for_target_scan, &out_topologically_sorted_generated_column_ids, &out_generated_column_expr_list)); + // Rewrite the insert... values to insert + if (!row_list.empty()) { + analyzer_output_properties_.MarkRelevant(REWRITE_INSERT_DML_VALUES); + } } *output = MakeResolvedInsertStmt( diff --git a/zetasql/analyzer/resolver_expr.cc b/zetasql/analyzer/resolver_expr.cc index 78c55ae82..46eee99f8 100644 --- a/zetasql/analyzer/resolver_expr.cc +++ b/zetasql/analyzer/resolver_expr.cc @@ -19,6 +19,7 @@ #include #include +#include #include #include #include @@ -75,6 +76,7 @@ #include "zetasql/public/functions/normalize_mode.pb.h" #include "zetasql/public/functions/range.h" #include "zetasql/public/id_string.h" +#include "zetasql/public/input_argument_type.h" #include "zetasql/public/interval_value.h" #include "zetasql/public/json_value.h" #include "zetasql/public/language_options.h" @@ -83,6 +85,7 @@ #include "zetasql/public/parse_location.h" #include "zetasql/public/proto/type_annotation.pb.h" #include "zetasql/public/proto_util.h" +#include "zetasql/public/select_with_mode.h" #include "zetasql/public/signature_match_result.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/sql_function.h" @@ -109,11 +112,13 @@ #include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "absl/flags/flag.h" +#include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/ascii.h" #include "absl/strings/match.h" #include "absl/strings/str_cat.h" +#include "absl/strings/str_format.h" #include "absl/strings/str_join.h" #include "absl/strings/str_replace.h" #include "absl/strings/str_split.h" @@ -201,16 +206,15 @@ absl::Status AddGetFieldToFlatten(std::unique_ptr expr, return absl::OkStatus(); } -std::string GetNodePrefixToken(absl::string_view sql, - const ASTLeaf& leaf_node) { - const ParseLocationRange& parse_location_range = - leaf_node.GetParseLocationRange(); - absl::string_view type_token = absl::StripAsciiWhitespace( - absl::ClippedSubstr(sql, parse_location_range.start().GetByteOffset(), - parse_location_range.end().GetByteOffset() - - parse_location_range.start().GetByteOffset() - - leaf_node.image().size())); - return absl::AsciiStrToUpper(type_token); +static std::string GetTypeNameForPrefixedLiteral(absl::string_view sql, + const ASTLeaf& leaf_node) { + int start_offset = leaf_node.GetParseLocationRange().start().GetByteOffset(); + int end_offset = start_offset; + while (end_offset < sql.size() && std::isalpha(sql[end_offset])) { + end_offset++; + } + return absl::AsciiStrToUpper( + absl::ClippedSubstr(sql, start_offset, end_offset - start_offset)); } inline std::unique_ptr MakeResolvedCast( @@ -252,6 +256,29 @@ absl::Span GetTypeCatalogNamePath(const Type* type) { } return {}; } + +absl::Status MakeUnsupportedGroupingFunctionError( + const ASTFunctionCall* func_call, SelectWithMode mode) { + ABSL_DCHECK(func_call != nullptr); + std::string query_kind; + switch (mode) { + case SelectWithMode::ANONYMIZATION: + query_kind = "anonymization"; + break; + case SelectWithMode::DIFFERENTIAL_PRIVACY: + query_kind = "differential privacy"; + break; + case SelectWithMode::AGGREGATION_THRESHOLD: + query_kind = "aggregation threshold"; + break; + case SelectWithMode::NONE: + default: + return absl::OkStatus(); + } + return MakeSqlErrorAt(func_call) << absl::StrFormat( + "GROUPING function is not supported in %s queries", query_kind); +} + } // namespace absl::Status Resolver::ResolveBuildProto( @@ -356,7 +383,9 @@ absl::Status Resolver::ResolveBuildProto( } return MakeSqlErrorAt(argument.ast_location) << "Could not store value with type " - << GetInputArgumentTypeForExpr(expr.get()) + << GetInputArgumentTypeForExpr( + expr.get(), + /*pick_default_type_for_untyped_expr=*/false) .UserFacingName(product_mode()) << " into proto field " << field->full_name() << " which has SQL type " @@ -718,10 +747,8 @@ absl::Status Resolver::ResolveScalarExpr( absl::StatusOr> Resolver::ResolveJsonLiteral(const ASTJSONLiteral* json_literal) { - std::string unquoted_image; - ZETASQL_RETURN_IF_ERROR(ParseStringLiteral(json_literal->image(), &unquoted_image)); auto status_or_value = JSONValue::ParseJSONString( - unquoted_image, + json_literal->string_literal()->string_value(), JSONParsingOptions{ .wide_number_mode = (language().LanguageFeatureEnabled( @@ -1016,6 +1043,12 @@ absl::Status Resolver::ResolveExpr( expr_resolution_info.get(), resolved_expr_out)); break; + case AST_STRUCT_BRACED_CONSTRUCTOR: + ZETASQL_RETURN_IF_ERROR(ResolveStructBracedConstructor( + ast_expr->GetAsOrDie(), inferred_type, + expr_resolution_info.get(), resolved_expr_out)); + break; + case AST_BRACED_CONSTRUCTOR: ZETASQL_RETURN_IF_ERROR(ResolveBracedConstructor( ast_expr->GetAsOrDie(), inferred_type, @@ -1160,6 +1193,13 @@ absl::Status Resolver::ResolveLiteralExpr( case AST_STRING_LITERAL: { const ASTStringLiteral* literal = ast_expr->GetAsOrDie(); + if (literal->components().size() > 1 && + !language().LanguageFeatureEnabled( + FEATURE_V_1_4_LITERAL_CONCATENATION)) { + return MakeSqlErrorAt(literal->components()[1]) + << "Concatenation of subsequent string literals is not " + "supported. Did you mean to use the || operator?"; + } *resolved_expr_out = MakeResolvedLiteral(ast_expr, Value::String(literal->string_value())); return absl::OkStatus(); @@ -1167,6 +1207,13 @@ absl::Status Resolver::ResolveLiteralExpr( case AST_BYTES_LITERAL: { const ASTBytesLiteral* literal = ast_expr->GetAsOrDie(); + if (literal->components().size() > 1 && + !language().LanguageFeatureEnabled( + FEATURE_V_1_4_LITERAL_CONCATENATION)) { + return MakeSqlErrorAt(literal->components()[1]) + << "Concatenation of subsequent bytes literals is not " + "supported. Did you mean to use the || operator?"; + } *resolved_expr_out = MakeResolvedLiteral(ast_expr, Value::Bytes(literal->bytes_value())); return absl::OkStatus(); @@ -1200,18 +1247,21 @@ absl::Status Resolver::ResolveLiteralExpr( const ASTNumericLiteral* literal = ast_expr->GetAsOrDie(); if (!language().LanguageFeatureEnabled(FEATURE_NUMERIC_TYPE)) { - std::string error_type_token = GetNodePrefixToken(sql_, *literal); + std::string error_type_token = + GetTypeNameForPrefixedLiteral(sql_, *literal); return MakeSqlErrorAt(literal) << error_type_token << " literals are not supported"; } - std::string unquoted_image; - ZETASQL_RETURN_IF_ERROR(ParseStringLiteral(literal->image(), &unquoted_image)); - auto value_or_status = NumericValue::FromStringStrict(unquoted_image); + + absl::string_view string_value = + literal->string_literal()->string_value(); + auto value_or_status = NumericValue::FromStringStrict(string_value); if (!value_or_status.status().ok()) { - std::string error_type_token = GetNodePrefixToken(sql_, *literal); + std::string error_type_token = + GetTypeNameForPrefixedLiteral(sql_, *literal); return MakeSqlErrorAt(literal) << "Invalid " << error_type_token - << " literal: " << ToStringLiteral(unquoted_image); + << " literal: " << ToStringLiteral(string_value); } *resolved_expr_out = MakeResolvedLiteral( ast_expr, {types::NumericType(), /*annotation_map=*/nullptr}, @@ -1223,18 +1273,20 @@ absl::Status Resolver::ResolveLiteralExpr( const ASTBigNumericLiteral* literal = ast_expr->GetAsOrDie(); if (!language().LanguageFeatureEnabled(FEATURE_BIGNUMERIC_TYPE)) { - std::string error_type_token = GetNodePrefixToken(sql_, *literal); + std::string error_type_token = + GetTypeNameForPrefixedLiteral(sql_, *literal); return MakeSqlErrorAt(literal) << error_type_token << " literals are not supported"; } - std::string unquoted_image; - ZETASQL_RETURN_IF_ERROR(ParseStringLiteral(literal->image(), &unquoted_image)); - auto value_or_status = BigNumericValue::FromStringStrict(unquoted_image); + absl::string_view string_value = + literal->string_literal()->string_value(); + auto value_or_status = BigNumericValue::FromStringStrict(string_value); if (!value_or_status.ok()) { - std::string error_type_token = GetNodePrefixToken(sql_, *literal); + std::string error_type_token = + GetTypeNameForPrefixedLiteral(sql_, *literal); return MakeSqlErrorAt(literal) << "Invalid " << error_type_token - << " literal: " << ToStringLiteral(unquoted_image); + << " literal: " << ToStringLiteral(string_value); } *resolved_expr_out = MakeResolvedLiteral( ast_expr, {types::BigNumericType(), /*annotation_map=*/nullptr}, @@ -3335,7 +3387,9 @@ absl::Status Resolver::ResolveInSubquery( // test since if we have two equivalent but different field types // (such as two enums with the same name) we must coerce one to the other. InputArgumentTypeSet type_set; - type_set.Insert(GetInputArgumentTypeForExpr(resolved_in_expr.get())); + type_set.Insert(GetInputArgumentTypeForExpr( + resolved_in_expr.get(), + /*pick_default_type_for_untyped_expr=*/false)); // The output column from the subquery column is non-literal, non-parameter. type_set.Insert(InputArgumentType(in_subquery_type)); const Type* supertype = nullptr; @@ -3440,6 +3494,64 @@ static absl::StatusOr GetLikeAnySomeAllOpTypeString( ZETASQL_RET_CHECK_FAIL() << "Operation type for LIKE must be either ANY, SOME or ALL"; } +static absl::StatusOr GetLikeAnyAllFunctionName( + const ASTLikeExpression* like_expr, bool is_not) { + const bool is_list = like_expr->in_list() != nullptr; + const bool is_array = like_expr->unnest_expr() != nullptr; + switch (like_expr->op()->op()) { + // ANY and SOME are synonyms. + case ASTAnySomeAllOp::kAny: + case ASTAnySomeAllOp::kSome: + if (is_list) { + return is_not ? "$not_like_any" : "$like_any"; + } else if (is_array) { + return is_not ? "$not_like_any_array" : "$like_any_array"; + } + break; + case ASTAnySomeAllOp::kAll: + if (is_list) { + return is_not ? "$not_like_all" : "$like_all"; + } else if (is_array) { + return is_not ? "$not_like_all_array" : "$like_all_array"; + } + break; + default: + break; + } + + ZETASQL_RET_CHECK_FAIL() << "Unsupported LIKE expression operation." + " Operation must be [NOT] ANY|SOME|ALL."; +} + +absl::Status Resolver::ResolveLikeAnyAllExpressionHelper( + const ASTLikeExpression* like_expr, + const absl::Span arguments, + ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out) { + // not-like any/all behavior is documented here (broken link) + const bool new_behavior = language().LanguageFeatureEnabled( + FEATURE_V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL); + ZETASQL_ASSIGN_OR_RETURN(std::string function_name, + GetLikeAnyAllFunctionName( + like_expr, like_expr->is_not() && new_behavior)); + std::unique_ptr resolved_like_expr; + ZETASQL_RETURN_IF_ERROR(ResolveFunctionCallByNameWithoutAggregatePropertyCheck( + like_expr->like_location(), function_name, arguments, + *kEmptyArgumentOptionMap, expr_resolution_info, &resolved_like_expr)); + + // Do not wrap like_expr inside `NOT` if the resolved function is a + // $not_like_any|all variant. That's because like_expr's NOT operator is + // already considered while constructing $not_like_any|all resolved fun node. + if (!new_behavior && like_expr->is_not()) { + return MakeNotExpr(like_expr->like_location(), + std::move(resolved_like_expr), expr_resolution_info, + resolved_expr_out); + } + + *resolved_expr_out = std::move(resolved_like_expr); + return absl::OkStatus(); +} + absl::Status Resolver::ResolveLikeExprArray( const ASTLikeExpression* like_expr, ExprResolutionInfo* expr_resolution_info, @@ -3454,29 +3566,27 @@ absl::Status Resolver::ResolveLikeExprArray( op_type, " (pattern1, pattern2, ...)?"); } - std::string function_type = ""; - switch (like_expr->op()->op()) { - case ASTAnySomeAllOp::kAny: - function_type = "$like_any_array"; - break; - case ASTAnySomeAllOp::kSome: - function_type = "$like_any_array"; - break; - case ASTAnySomeAllOp::kAll: - function_type = "$like_all_array"; - break; - default: - ZETASQL_RET_CHECK_FAIL() << "Unsupported LIKE expression operation." - " Operation must be of type ANY|SOME|ALL."; - } - FlattenState::Restorer restorer; expr_resolution_info->flatten_state.set_can_flatten(true, &restorer); - return ResolveFunctionCallByNameWithoutAggregatePropertyCheck( - like_expr->like_location(), function_type, + return ResolveLikeAnyAllExpressionHelper( + like_expr, {like_expr->lhs(), like_expr->unnest_expr()->expressions()[0]->expression()}, - *kEmptyArgumentOptionMap, expr_resolution_info, resolved_expr_out); + expr_resolution_info, resolved_expr_out); +} + +absl::Status Resolver::ResolveLikeExprList( + const ASTLikeExpression* like_expr, + ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out) { + std::vector like_arguments; + like_arguments.reserve(1 + like_expr->in_list()->list().size()); + like_arguments.push_back(like_expr->lhs()); + for (const ASTExpression* expr : like_expr->in_list()->list()) { + like_arguments.push_back(expr); + } + return ResolveLikeAnyAllExpressionHelper( + like_expr, like_arguments, expr_resolution_info, resolved_expr_out); } // TODO: The noinline attribute is to prevent the stack usage @@ -3497,28 +3607,8 @@ absl::Status Resolver::ResolveLikeExpr( ZETASQL_RETURN_IF_ERROR(ResolveLikeExprSubquery(like_expr, expr_resolution_info, &resolved_like_expr)); } else if (like_expr->in_list() != nullptr) { - std::vector like_arguments; - like_arguments.reserve(1 + like_expr->in_list()->list().size()); - like_arguments.push_back(like_expr->lhs()); - for (const ASTExpression* expr : like_expr->in_list()->list()) { - like_arguments.push_back(expr); - } - std::string function_type = ""; - switch (like_expr->op()->op()) { - case ASTAnySomeAllOp::kAny: - case ASTAnySomeAllOp::kSome: - function_type = "$like_any"; - break; - case ASTAnySomeAllOp::kAll: - function_type = "$like_all"; - break; - default: - ZETASQL_RET_CHECK_FAIL() << "Unsupported LIKE expression operation." - " Operation must be of type ANY|SOME|ALL."; - } - ZETASQL_RETURN_IF_ERROR(ResolveFunctionCallByNameWithoutAggregatePropertyCheck( - like_expr->like_location(), function_type, like_arguments, - *kEmptyArgumentOptionMap, expr_resolution_info, &resolved_like_expr)); + ZETASQL_RETURN_IF_ERROR(ResolveLikeExprList(like_expr, expr_resolution_info, + &resolved_like_expr)); } else if (like_expr->unnest_expr() != nullptr) { ZETASQL_RETURN_IF_ERROR(ResolveLikeExprArray(like_expr, expr_resolution_info, &resolved_like_expr)); @@ -3527,11 +3617,6 @@ absl::Status Resolver::ResolveLikeExpr( << "Internal: Unsupported LIKE expression."; } - if (like_expr->is_not()) { - return MakeNotExpr(like_expr->like_location(), - std::move(resolved_like_expr), expr_resolution_info, - resolved_expr_out); - } *resolved_expr_out = std::move(resolved_like_expr); return absl::OkStatus(); } @@ -4257,9 +4342,11 @@ absl::Status Resolver::ResolveIntervalArgument( if (((resolved_interval_value_arg->node_kind() == RESOLVED_LITERAL || resolved_interval_value_arg->node_kind() == RESOLVED_PARAMETER) && resolved_interval_value_arg->type()->IsString()) || - coercer_.CoercesTo( - GetInputArgumentTypeForExpr(resolved_interval_value_arg.get()), - type_factory_->get_int64(), /*is_explicit=*/false, &result)) { + coercer_.CoercesTo(GetInputArgumentTypeForExpr( + resolved_interval_value_arg.get(), + /*pick_default_type_for_untyped_expr=*/false), + type_factory_->get_int64(), /*is_explicit=*/false, + &result)) { ZETASQL_RETURN_IF_ERROR(CoerceExprToType( interval_expr->interval_value(), type_factory_->get_int64(), kExplicitCoercion, &resolved_interval_value_arg)); @@ -4996,16 +5083,21 @@ absl::Status Resolver::AddColumnToGroupingListSecondPass( const ResolvedAggregateFunctionCall* agg_function_call, ExprResolutionInfo* expr_resolution_info, std::unique_ptr* resolved_column_out) { + QueryResolutionInfo* query_resolution_info = + expr_resolution_info->query_resolution_info; if (agg_function_call->argument_list().size() != 1) { return MakeSqlErrorAt(ast_function) << "GROUPING can only have a single expression argument."; } + if (query_resolution_info->select_with_mode() != SelectWithMode::NONE) { + return MakeUnsupportedGroupingFunctionError( + ast_function, query_resolution_info->select_with_mode()); + } const ResolvedExpr* argument = agg_function_call->argument_list().front().get(); for (const std::unique_ptr& resolved_computed_column : - expr_resolution_info->query_resolution_info - ->group_by_columns_to_compute()) { + query_resolution_info->group_by_columns_to_compute()) { ZETASQL_ASSIGN_OR_RETURN( bool expression_match, IsSameExpressionForGroupBy(argument, resolved_computed_column->expr())); @@ -5024,8 +5116,7 @@ absl::Status Resolver::AddColumnToGroupingListSecondPass( *resolved_column_out = std::make_unique(grouping_output_column); - expr_resolution_info->query_resolution_info->AddGroupingColumn( - std::move(grouping_call)); + query_resolution_info->AddGroupingColumn(std::move(grouping_call)); return absl::OkStatus(); } } @@ -5328,9 +5419,10 @@ absl::StatusOr Resolver::CheckExplicitCast( const ResolvedExpr* resolved_argument, const Type* to_type, ExtendedCompositeCastEvaluator* extended_conversion_evaluator) { SignatureMatchResult result; - return coercer_.CoercesTo(GetInputArgumentTypeForExpr(resolved_argument), - to_type, /*is_explicit=*/true, &result, - extended_conversion_evaluator); + return coercer_.CoercesTo( + GetInputArgumentTypeForExpr(resolved_argument, + /*pick_default_type_for_untyped_expr=*/false), + to_type, /*is_explicit=*/true, &result, extended_conversion_evaluator); } static absl::Status CastResolutionError(const ASTNode* ast_location, @@ -5354,7 +5446,6 @@ absl::Status Resolver::ResolveFormatOrTimeZoneExpr( std::unique_ptr* resolved_expr) { ZETASQL_RETURN_IF_ERROR(ResolveExpr(expr, expr_resolution_info, resolved_expr)); - auto expr_type = GetInputArgumentTypeForExpr(resolved_expr->get()); auto make_error_msg = [clause_name](absl::string_view target_type_name, absl::string_view actual_type_name) { return absl::Substitute("$2 should return type $0, but returns $1", @@ -5561,7 +5652,7 @@ absl::Status Resolver::ResolveExplicitCast( folding_result.code() != absl::StatusCode::kOutOfRange) { // Only kInvalidArgument and kOutOfRange indicate a bad value when // folding. Anything else is a bigger problem that we need to bubble - // up. This logic is not to be used elsehwere as there are exceptional + // up. This logic is not to be used elsewhere as there are exceptional // circumstances here: // 1. kOutOfRange generally is a runtime error, but it may arise during // folding, which we are absorbing and deferring to runtime. @@ -6530,19 +6621,83 @@ absl::Status Resolver::ResolveBracedConstructor( const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, std::unique_ptr* resolved_expr_out) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); - + // This function is called for bare brace, we don't know whether the brace is + // struct or proto, so we need to derive from the infer_type. if (inferred_type == nullptr) { return MakeSqlErrorAt(ast_braced_constructor) << "Unable to infer a type for braced constructor"; } - if (!inferred_type->IsProto()) { - return MakeSqlErrorAt(ast_braced_constructor) - << "Braced constructors are not allowed for type " - << inferred_type->ShortTypeName(product_mode()); + if (inferred_type->IsStruct()) { // bare braced + return ResolveBracedConstructorForStruct( + ast_braced_constructor, /*is_bare_struct*/ false, + /*expression_location_node*/ ast_braced_constructor, + /*ast_struct_type*/ nullptr, inferred_type, expr_resolution_info, + resolved_expr_out); + } else if (inferred_type->IsProto()) { + return ResolveBracedConstructorForProto(ast_braced_constructor, + inferred_type, expr_resolution_info, + resolved_expr_out); } + // Neither Proto or Struct + return MakeSqlErrorAt(ast_braced_constructor) + << "Braced constructors are not allowed for type " + << inferred_type->ShortTypeName(product_mode()); +} - std::vector arguments; +absl::Status Resolver::ResolveBracedConstructorForStruct( + const ASTBracedConstructor* ast_braced_constructor, bool is_bare_struct, + const ASTNode* expression_location_node, + const ASTStructType* ast_struct_type, const Type* inferred_type, + ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out) { + std::vector identifiers; + identifiers.reserve(ast_braced_constructor->fields().size()); + std::vector field_expressions; + field_expressions.reserve(ast_braced_constructor->fields().size()); + bool is_first_field = true; + for (const ASTBracedConstructorField* ast_field : + ast_braced_constructor->fields()) { + // Pure whiteSpace separation for fields is not allowed in STRUCT. + // comma_separated() records whether this record uses comma or pure + // whitespace to separate with the previous records, for the 1st record, + // leading comma is not allowed, so comma_separated() is always false, and + // it's allowed. + if (!is_first_field && !ast_field->comma_separated()) { + return MakeSqlErrorAt(ast_field) + << "STRUCT Braced constructor is not allowed to use pure " + "whitespace separation, please use comma instead"; + } + if (is_first_field) { + is_first_field = false; + } + // Proto extension is parsed as parenthesized_path, otherwise identifier. + if (ast_field->parenthesized_path() != nullptr) { + return MakeSqlErrorAt(ast_field) + << "STRUCT Braced constructor is not allowed to use proto " + "extension."; + } + ZETASQL_RET_CHECK_NE(ast_field->identifier(), nullptr) + << "Fields in STRUCT Braced constructor should always have a " + "single identifier specified, not a path expression"; + identifiers.push_back(ast_field->identifier()); + field_expressions.push_back(ast_field->value()->expression()); + } + // Skip name matching for all bare struct. + // Discussion: http://shortn/_XFezKJYuR7 + ZETASQL_RETURN_IF_ERROR(ResolveStructConstructorImpl( + expression_location_node, ast_struct_type, field_expressions, identifiers, + inferred_type, /*require_name_match=*/!is_bare_struct, + expr_resolution_info, resolved_expr_out)); + return absl::OkStatus(); +} + +absl::Status Resolver::ResolveBracedConstructorForProto( + const ASTBracedConstructor* ast_braced_constructor, + const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out) { + // This is a proto braced constructor. + std::vector resolved_build_proto_args; for (int i = 0; i < ast_braced_constructor->fields().size(); ++i) { const ASTBracedConstructorField* ast_field = ast_braced_constructor->fields(i); @@ -6551,13 +6706,13 @@ absl::Status Resolver::ResolveBracedConstructor( ResolvedBuildProtoArg arg, ResolveBracedConstructorField(ast_field, inferred_type->AsProto(), i, expr_resolution_info)); - arguments.emplace_back(std::move(arg)); + resolved_build_proto_args.emplace_back(std::move(arg)); } ZETASQL_RETURN_IF_ERROR( ResolveBuildProto(ast_braced_constructor, inferred_type->AsProto(), /*input_scan=*/nullptr, "Argument", "Constructor", - &arguments, resolved_expr_out)); + &resolved_build_proto_args, resolved_expr_out)); return absl::OkStatus(); } @@ -6591,9 +6746,42 @@ absl::Status Resolver::ResolveBracedNewConstructor( << resolved_type->ShortTypeName(product_mode()); } - ZETASQL_RETURN_IF_ERROR(ResolveExpr(ast_braced_new_constructor->braced_constructor(), - expr_resolution_info, resolved_expr_out, - resolved_type)); + ZETASQL_RETURN_IF_ERROR(ResolveBracedConstructorForProto( + ast_braced_new_constructor->braced_constructor(), resolved_type, + expr_resolution_info, resolved_expr_out)); + + return absl::OkStatus(); +} + +absl::Status Resolver::ResolveStructBracedConstructor( + const ASTStructBracedConstructor* ast_struct_braced_constructor, + const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, + std::unique_ptr* resolved_expr_out) { + RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); + + const ASTType* type_name = ast_struct_braced_constructor->type_name(); + const ASTStructType* ast_struct_type = nullptr; + bool is_bare_struct = true; + if (type_name) { + is_bare_struct = false; + // STRUCT<...> { ... } with explicit type, use the explicit type. + ast_struct_type = type_name->GetAsOrNull(); + ZETASQL_RET_CHECK_NE(ast_struct_type, nullptr) + << " STRUCT Braced constructors are not allowed for type " + << ast_struct_type->DebugString() << ", which should a struct."; + ZETASQL_RET_CHECK(type_name->type_parameters() == nullptr) + << "The parser does not support type parameters in braced STRUCT " + "constructor syntax"; + ZETASQL_RET_CHECK(type_name->collate() == nullptr) + << "The parser does not support type with collation name in braced " + "STRUCT constructor syntax"; + } + + ZETASQL_RETURN_IF_ERROR(ResolveBracedConstructorForStruct( + ast_struct_braced_constructor->braced_constructor(), is_bare_struct, + ast_struct_braced_constructor, ast_struct_type, inferred_type, + expr_resolution_info, resolved_expr_out)); + return absl::OkStatus(); } @@ -6658,7 +6846,8 @@ absl::Status Resolver::ResolveArrayConstructor( has_explicit_type = true; } - element_type_set.Insert(GetInputArgumentTypeForExpr(resolved_expr.get())); + element_type_set.Insert(GetInputArgumentTypeForExpr( + resolved_expr.get(), /*pick_default_type_for_untyped_expr=*/false)); resolved_elements.push_back(std::move(resolved_expr)); } @@ -6850,8 +7039,8 @@ absl::Status Resolver::ResolveStructConstructorWithParens( return ResolveStructConstructorImpl( ast_struct_constructor, /*ast_struct_type=*/nullptr, ast_struct_constructor->field_expressions(), - /*ast_field_aliases=*/{}, inferred_type, expr_resolution_info, - resolved_expr_out); + /*ast_field_names*/ {}, inferred_type, /*require_name_match*/ false, + expr_resolution_info, resolved_expr_out); } // TODO: The noinline attribute is to prevent the stack usage @@ -6866,27 +7055,29 @@ absl::Status Resolver::ResolveStructConstructorWithKeyword( RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); std::vector ast_field_expressions; - std::vector ast_field_aliases; + std::vector ast_field_identifiers; for (const ASTStructConstructorArg* arg : ast_struct_constructor->fields()) { ast_field_expressions.push_back(arg->expression()); - ast_field_aliases.push_back(arg->alias()); + ast_field_identifiers.push_back(arg->alias() ? arg->alias()->identifier() + : nullptr); } return ResolveStructConstructorImpl( ast_struct_constructor, ast_struct_constructor->struct_type(), - ast_field_expressions, ast_field_aliases, inferred_type, - expr_resolution_info, resolved_expr_out); + ast_field_expressions, ast_field_identifiers, inferred_type, + /*require_name_match=*/false, expr_resolution_info, resolved_expr_out); } absl::Status Resolver::ResolveStructConstructorImpl( const ASTNode* ast_location, const ASTStructType* ast_struct_type, const absl::Span ast_field_expressions, - const absl::Span ast_field_aliases, - const Type* inferred_type, ExprResolutionInfo* expr_resolution_info, + const absl::Span ast_field_identifiers, + const Type* inferred_type, bool require_name_match, + ExprResolutionInfo* expr_resolution_info, std::unique_ptr* resolved_expr_out) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); - if (!ast_field_aliases.empty()) { - ZETASQL_RET_CHECK_EQ(ast_field_expressions.size(), ast_field_aliases.size()); + if (!ast_field_identifiers.empty()) { + ZETASQL_RET_CHECK_EQ(ast_field_expressions.size(), ast_field_identifiers.size()); } // If we have a type from the AST, use it. Otherwise, we'll collect field @@ -6926,6 +7117,27 @@ absl::Status Resolver::ResolveStructConstructorImpl( inferred_struct_type = inferred_type->AsStruct(); } + if (require_name_match && inferred_struct_type != nullptr) { + // Need to additionally check field name matching. + if (inferred_struct_type->num_fields() != ast_field_identifiers.size()) { + return MakeSqlErrorAt(ast_location) + << "Require naming match but field num does not match, " + << "expected: " << inferred_struct_type->num_fields() + << ", actual: " << ast_field_identifiers.size(); + } + for (int i = 0; i < ast_field_identifiers.size(); ++i) { + const ASTIdentifier* ast_identifier = ast_field_identifiers[i]; + if (!zetasql_base::CaseEqual(inferred_struct_type->field(i).name, + ast_identifier->GetAsIdString().ToString())) { + return MakeSqlErrorAt(ast_identifier) + << "Require naming match but field name does not match at " + "position " + << i << ": '" << inferred_struct_type->field(i).name << "' vs '" + << ast_identifier->GetAsIdString().ToString() << "'"; + } + } + } + // Resolve all the field expressions. std::vector> resolved_field_expressions; std::vector struct_fields; @@ -6953,6 +7165,16 @@ absl::Status Resolver::ResolveStructConstructorImpl( for (int i = 0; i < ast_field_expressions.size(); ++i) { const ASTExpression* ast_expression = ast_field_expressions[i]; + const ASTNode* ast_parent = ast_expression->parent(); + if (ast_parent->Is()) { + const auto* field_value = + ast_parent->GetAsOrDie(); + if (!field_value->colon_prefixed()) { + return MakeSqlErrorAt(field_value) + << "Struct field " << (i + 1) + << " should use colon(:) to separate field and value"; + } + } std::unique_ptr resolved_expr; const Type* inferred_field_type = nullptr; if (inferred_struct_type && i < inferred_struct_type->num_fields()) { @@ -6971,7 +7193,9 @@ absl::Status Resolver::ResolveStructConstructorImpl( CollationAnnotation::ExistsIn(resolved_expr->type_annotation_map())) { SignatureMatchResult result; const InputArgumentType input_argument_type = - GetInputArgumentTypeForExpr(resolved_expr.get()); + GetInputArgumentTypeForExpr( + resolved_expr.get(), + /*pick_default_type_for_untyped_expr=*/false); if (!coercer_.CoercesTo(input_argument_type, target_field_type, /*is_explicit=*/false, &result)) { return MakeSqlErrorAt(ast_expression) @@ -7001,23 +7225,24 @@ absl::Status Resolver::ResolveStructConstructorImpl( struct_has_explicit_type); } - if (!ast_field_aliases.empty() && ast_field_aliases[i] != nullptr) { - return MakeSqlErrorAt(ast_field_aliases[i]) + if (!require_name_match && !ast_field_identifiers.empty() && + ast_field_identifiers[i] != nullptr) { + return MakeSqlErrorAt(ast_field_identifiers[i]) << "STRUCT constructors cannot specify both an explicit " "type and field names with AS"; } } else { // Otherwise, we need to compute the struct field type to create. IdString alias; - if (ast_field_aliases.empty()) { + if (ast_field_identifiers.empty()) { // We are in the (...) construction syntax and will always // generate anonymous fields. } else { // We are in the STRUCT(...) constructor syntax and will use the // explicitly provided aliases, or try to infer aliases from the // expression. - if (ast_field_aliases[i] != nullptr) { - alias = ast_field_aliases[i]->GetAsIdString(); + if (ast_field_identifiers[i] != nullptr) { + alias = ast_field_identifiers[i]->GetAsIdString(); } else { alias = GetAliasForExpression(ast_field_expressions[i]); } @@ -7199,9 +7424,9 @@ absl::Status Resolver::FinishResolvingAggregateFunction( // Resolves the ordering expression in arguments. std::vector order_by_info; - ZETASQL_RETURN_IF_ERROR( - ResolveOrderingExprs(order_by_arguments->ordering_expressions(), - expr_resolution_info, &order_by_info)); + ZETASQL_RETURN_IF_ERROR(ResolveOrderingExprs( + order_by_arguments->ordering_expressions(), expr_resolution_info, + &order_by_info)); // Checks if there is any order by index. // Supporting order by index here makes little sense as the function @@ -7239,10 +7464,10 @@ absl::Status Resolver::FinishResolvingAggregateFunction( } } - AddColumnsForOrderByExprs( + ZETASQL_RETURN_IF_ERROR(AddColumnsForOrderByExprs( kOrderById, &order_by_info, query_resolution_info - ->select_list_columns_to_compute_before_aggregation()); + ->select_list_columns_to_compute_before_aggregation())); // We may have precomputed some ORDER BY expression columns before // aggregation. If any aggregate function arguments match those @@ -7271,7 +7496,7 @@ absl::Status Resolver::FinishResolvingAggregateFunction( (*resolved_function_call)->set_argument_list(std::move(updated_args)); } ZETASQL_RETURN_IF_ERROR(ResolveOrderByItems( - order_by_arguments, /*output_column_list=*/{}, order_by_info, + /*output_column_list=*/{}, {order_by_info}, &resolved_order_by_items)); } const ASTLimitOffset* limit_offset = ast_function_call->limit_offset(); @@ -7381,7 +7606,11 @@ absl::Status Resolver::FinishResolvingAggregateFunction( input_arguments.reserve(arg_list.size()); for (const std::unique_ptr& arg : (*resolved_function_call)->argument_list()) { - input_arguments.push_back(GetInputArgumentTypeForExpr(arg.get())); + input_arguments.push_back(GetInputArgumentTypeForExpr( + arg.get(), + /*pick_default_type_for_untyped_expr=*/ + language().LanguageFeatureEnabled( + FEATURE_TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS))); } // Call the TemplatedSQLFunction::Resolve() method to get the output type. @@ -7484,7 +7713,8 @@ absl::Status Resolver::FinishResolvingAggregateFunction( // it out into aggregate_columns_to_compute(). // The actual ResolvedExpr we return is a ColumnRef pointing to that // function call. - ResolvedColumn aggregate_column(AllocateColumnId(), kAggregateId, alias, + const IdString* query_alias = &kAggregateId; + ResolvedColumn aggregate_column(AllocateColumnId(), *query_alias, alias, resolved_agg_call->annotated_type()); query_resolution_info->AddAggregateComputedColumn( @@ -7850,6 +8080,14 @@ absl::Status Resolver::ResolveFunctionCallWithResolvedArguments( case FN_BYTE_ARRAY_LIKE_ALL: case FN_STRING_LIKE_ALL: case FN_BYTE_LIKE_ALL: + case FN_STRING_NOT_LIKE_ANY: + case FN_BYTE_NOT_LIKE_ANY: + case FN_STRING_ARRAY_NOT_LIKE_ANY: + case FN_BYTE_ARRAY_NOT_LIKE_ANY: + case FN_STRING_NOT_LIKE_ALL: + case FN_BYTE_NOT_LIKE_ALL: + case FN_STRING_ARRAY_NOT_LIKE_ALL: + case FN_BYTE_ARRAY_NOT_LIKE_ALL: analyzer_output_properties_.MarkRelevant(REWRITE_LIKE_ANY_ALL); break; default: @@ -8142,7 +8380,7 @@ IdString Resolver::GetColumnAliasForTopLevelExpression( ExprResolutionInfo* expr_resolution_info, const ASTExpression* ast_expr) { const IdString alias = expr_resolution_info->column_alias; if (expr_resolution_info->top_level_ast_expr == ast_expr && - !IsInternalAlias(expr_resolution_info->column_alias)) { + !IsInternalAlias(alias)) { return alias; } return IdString(); @@ -8185,8 +8423,9 @@ absl::Status Resolver::CoerceExprToType( CollationAnnotation::GetId())) { return absl::OkStatus(); } - InputArgumentType expr_arg_type = - GetInputArgumentTypeForExpr(resolved_expr->get()); + // Untyped NULL can make the Coerce more flexible. + InputArgumentType expr_arg_type = GetInputArgumentTypeForExpr( + resolved_expr->get(), /*pick_default_type_for_untyped_expr=*/false); SignatureMatchResult sig_match_result; Coercer coercer(type_factory_, &language(), catalog_); bool success; diff --git a/zetasql/analyzer/resolver_query.cc b/zetasql/analyzer/resolver_query.cc index f124ed845..168ee56ff 100644 --- a/zetasql/analyzer/resolver_query.cc +++ b/zetasql/analyzer/resolver_query.cc @@ -52,6 +52,7 @@ #include "zetasql/analyzer/resolver_common_inl.h" #include "zetasql/common/errors.h" #include "zetasql/public/function_signature.h" +#include "zetasql/public/functions/array_zip_mode.pb.h" #include "zetasql/public/input_argument_type.h" #include "zetasql/public/sql_function.h" #include "zetasql/public/templated_sql_function.h" @@ -112,6 +113,7 @@ #include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "zetasql/base/check.h" +#include "absl/memory/memory.h" #include "absl/meta/type_traits.h" #include "absl/status/status.h" #include "absl/status/statusor.h" @@ -122,6 +124,7 @@ #include "absl/strings/str_join.h" #include "absl/strings/str_replace.h" #include "absl/strings/string_view.h" +#include "absl/strings/substitute.h" #include "absl/types/optional.h" #include "absl/types/span.h" #include "zetasql/base/map_util.h" @@ -280,71 +283,6 @@ absl::Status CreateUnsupportedGroupingSetsError( grouping_caluse_name, clause_name); } -// Check that `select` has no child nodes present other than those -// in `allowed_children`. -// `node_context` is a name for the error message. -absl::Status CheckForUnwantedSelectClauses( - const ASTSelect* select, - absl::flat_hash_set allowed_children, - const char* node_context) { - for (int i = 0; i < select->num_children(); ++i) { - const ASTNode* child = select->child(i); - ZETASQL_RET_CHECK(child != nullptr); - if (!allowed_children.contains(child)) { - ZETASQL_RET_CHECK_FAIL() << node_context << " has unexpected child of type " - << child->GetNodeKindString(); - } - } - return absl::OkStatus(); -} - -// Check that expected scan nodes were added on top of a previous scan. -// Starting from `outer_scan` and recursively traversing input scans, expect -// to find `inner_scan`. In between, expect to find a node of type -// `expected_scan_kind`. If `allow_project_scans` is true, additional -// ResolvedProjectScans are allowed. -// `node_context` is a name for the error message. -// -// Note: This only works for the node types listed in the switch. -absl::Status CheckForExpectedScans(const ResolvedScan* outer_scan, - const ResolvedScan* inner_scan, - ResolvedNodeKind expected_scan_kind, - bool allow_project_scans, - const char* node_context) { - const ResolvedScan* next_scan = outer_scan; - ZETASQL_RET_CHECK_NE(next_scan, inner_scan); - bool found_expected = false; - do { - const ResolvedNodeKind kind = next_scan->node_kind(); - if (!found_expected && kind == expected_scan_kind) { - found_expected = true; - } else if (!(allow_project_scans && kind == RESOLVED_PROJECT_SCAN)) { - ZETASQL_RET_CHECK_FAIL() << node_context << " produced unexpected scan of type " - << ResolvedNodeKindToString(kind); - } - switch (kind) { - case RESOLVED_PROJECT_SCAN: - next_scan = next_scan->GetAs()->input_scan(); - break; - case RESOLVED_AGGREGATE_SCAN: - next_scan = next_scan->GetAs()->input_scan(); - break; - case RESOLVED_ANALYTIC_SCAN: - next_scan = next_scan->GetAs()->input_scan(); - break; - default: - ZETASQL_RET_CHECK_FAIL() << node_context << " produced unhandled scan of type " - << ResolvedNodeKindToString(kind); - } - ZETASQL_RET_CHECK(next_scan != nullptr); - } while (next_scan != inner_scan); - - ZETASQL_RET_CHECK(found_expected) - << node_context << " failed to produce a scan with expected type " - << ResolvedNodeKindToString(expected_scan_kind); - return absl::OkStatus(); -} - // `PartitionedComputedColumns` represents a partitioning of computed columns // such that: // - Each 'dependee_column' is depended upon by at least 1 other column in @@ -421,6 +359,10 @@ const IdString& Resolver::kLambdaArgId = *new IdString(IdString::MakeGlobal("$lambda_arg")); const IdString& Resolver::kWithActionId = *new IdString(IdString::MakeGlobal("$with_action")); +const IdString& Resolver::kRecursionDepthAlias = + *new IdString(IdString::MakeGlobal("depth")); +const IdString& Resolver::kRecursionDepthId = + *new IdString(IdString::MakeGlobal("$recursion_depth")); STATIC_IDSTRING(kDistinctId, "$distinct"); STATIC_IDSTRING(kFullJoinId, "$full_join"); @@ -439,6 +381,21 @@ STATIC_IDSTRING(kDummyTableId, "$dummy_table"); STATIC_IDSTRING(kPivotId, "$pivot"); STATIC_IDSTRING(kUnpivotColumnId, "$unpivot"); +absl::Status Resolver::CheckForUnwantedSelectClauseChildNodes( + const ASTSelect* select, + absl::flat_hash_set allowed_children, + const char* node_context) { + for (int i = 0; i < select->num_children(); ++i) { + const ASTNode* child = select->child(i); + ZETASQL_RET_CHECK(child != nullptr); + if (!allowed_children.contains(child)) { + ZETASQL_RET_CHECK_FAIL() << node_context << " has unexpected child node of type " + << child->GetNodeKindString(); + } + } + return absl::OkStatus(); +} + absl::Status Resolver::ResolveQueryAfterWith( const ASTQuery* query, const NameScope* scope, IdString query_alias, const Type* inferred_type_for_query, @@ -476,14 +433,15 @@ absl::Status Resolver::ResolveQueryAfterWith( if (query->order_by() != nullptr) { const std::unique_ptr query_expression_name_scope( new NameScope(scope, *output_name_list)); - ZETASQL_RETURN_IF_ERROR(ResolveOrderBySimple( - query->order_by(), query_expression_name_scope.get(), - "ORDER BY clause after set operation", - output)); + ZETASQL_RETURN_IF_ERROR(ResolveOrderBySimple(query->order_by(), + query_expression_name_scope.get(), + "ORDER BY clause after set operation", + OrderBySimpleMode::kNormal, output)); } if (query->limit_offset() != nullptr) { - ZETASQL_RETURN_IF_ERROR(ResolveLimitOffsetScan(query->limit_offset(), output)); + ZETASQL_RETURN_IF_ERROR( + ResolveLimitOffsetScan(query->limit_offset(), scope, output)); } return absl::OkStatus(); @@ -713,10 +671,85 @@ absl::StatusOr Resolver::GetRecursiveUnion( return query_set_op; } +absl::Status Resolver::ValidateAliasedQueryModifiers( + IdString query_alias, const ASTAliasedQueryModifiers* modifiers, + bool is_recursive) { + if (modifiers->recursion_depth_modifier() != nullptr) { + if (!language().LanguageFeatureEnabled( + FEATURE_V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER)) { + return MakeSqlErrorAt(modifiers->recursion_depth_modifier()) + << "Recursion depth modifier is not supported"; + } + if (!is_recursive) { + return MakeSqlErrorAt(modifiers->recursion_depth_modifier()) + << "Recursion depth modifier is not allowed for non-recursive " + "CTE named " + << query_alias; + } + } + return absl::OkStatus(); +} + +absl::StatusOr> +Resolver::ResolveRecursionDepthModifier( + const ASTRecursionDepthModifier* recursion_depth_modifier) { + constexpr char kClauseName[] = "WITH DEPTH"; + NameScope empty_scope; + ExprResolutionInfo expr_resolution_info(&empty_scope, kClauseName); + std::unique_ptr lower, upper; + if (recursion_depth_modifier->lower_bound()->bound() != nullptr) { + ZETASQL_RETURN_IF_ERROR( + ResolveExpr(recursion_depth_modifier->lower_bound()->bound(), + &expr_resolution_info, &lower)); + ZETASQL_RETURN_IF_ERROR(ValidateParameterOrLiteralAndCoerceToInt64IfNeeded( + kClauseName, recursion_depth_modifier->lower_bound(), &lower)); + } + if (recursion_depth_modifier->upper_bound()->bound() != nullptr) { + ZETASQL_RETURN_IF_ERROR( + ResolveExpr(recursion_depth_modifier->upper_bound()->bound(), + &expr_resolution_info, &upper)); + ZETASQL_RETURN_IF_ERROR(ValidateParameterOrLiteralAndCoerceToInt64IfNeeded( + kClauseName, recursion_depth_modifier->upper_bound(), &upper)); + } + if (lower != nullptr && upper != nullptr && lower->Is() && + upper->Is()) { + // At this point, we already know lower_value and upper_value are + // non-negative and not null. We still need to ensure lower bound is no + // larger than upper bound. + const Value& lower_value = lower->GetAs()->value(); + const Value& upper_value = upper->GetAs()->value(); + if (lower_value.int64_value() > upper_value.int64_value()) { + return MakeSqlErrorAt(recursion_depth_modifier) + << kClauseName << " expects lower bound (" << lower_value.Format() + << ") no larger than upper bound (" << upper_value.Format() << ")"; + } + } + + const IdString depth_column_alias = + recursion_depth_modifier->alias() != nullptr + ? recursion_depth_modifier->alias()->GetAsIdString() + : kRecursionDepthAlias; + ZETASQL_ASSIGN_OR_RETURN(auto recursion_depth_column, + ResolvedColumnHolderBuilder() + .set_column(ResolvedColumn( + AllocateColumnId(), kRecursionDepthId, + depth_column_alias, type_factory_->get_int64())) + .Build()); + return ResolvedRecursionDepthModifierBuilder() + .set_lower_bound(std::move(lower)) + .set_upper_bound(std::move(upper)) + .set_recursion_depth_column(std::move(recursion_depth_column)) + .Build(); +} + absl::StatusOr> Resolver::ResolveAliasedQuery(const ASTAliasedQuery* with_entry, bool recursive) { const IdString with_alias = with_entry->alias()->GetAsIdString(); + if (with_entry->modifiers() != nullptr) { + ZETASQL_RETURN_IF_ERROR(ValidateAliasedQueryModifiers( + with_alias, with_entry->modifiers(), recursive)); + } // Generate a unique alias for this WITH subquery, if necessary. IdString unique_alias = with_alias; @@ -740,6 +773,19 @@ Resolver::ResolveAliasedQuery(const ASTAliasedQuery* with_entry, ZETASQL_RETURN_IF_ERROR(setop_resolver.ResolveRecursive( empty_name_scope_.get(), {with_alias}, unique_alias, &resolved_subquery, &subquery_name_list)); + + if (with_entry->modifiers() != nullptr && + with_entry->modifiers()->recursion_depth_modifier() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr + recursion_depth_modifier, + ResolveRecursionDepthModifier( + with_entry->modifiers()->recursion_depth_modifier())); + ZETASQL_RETURN_IF_ERROR(setop_resolver.FinishResolveRecursionWithModifier( + with_entry->modifiers()->recursion_depth_modifier(), {with_alias}, + std::move(recursion_depth_modifier), &resolved_subquery, + &subquery_name_list)); + } ZETASQL_RETURN_IF_ERROR(FinishResolveWithClauseIfPresent( with_entry->query(), std::move(inner_with_entries), &resolved_subquery)); @@ -880,17 +926,17 @@ absl::Status Resolver::AddAggregateScan( ZETASQL_RETURN_IF_ERROR(query_resolution_info->ReleaseGroupingSetsAndRollupList( &grouping_set_list, &rollup_column_list, language())); - ZETASQL_RET_CHECK(!column_list.empty()); std::unique_ptr aggregate_scan = MakeResolvedAggregateScan( column_list, std::move(*current_scan), query_resolution_info->release_group_by_columns_to_compute(), query_resolution_info->release_aggregate_columns_to_compute(), std::move(grouping_set_list), std::move(rollup_column_list), - query_resolution_info->release_grouping_columns_list()); + query_resolution_info->release_grouping_call_list()); for (const auto& aggregate_comp_col : aggregate_scan->aggregate_list()) { - const auto& aggregate_expr = aggregate_comp_col->expr(); + const auto& aggregate_expr = + aggregate_comp_col->GetAs()->expr(); ZETASQL_RET_CHECK(aggregate_expr->Is()); const auto& aggregate_func_call = aggregate_expr->GetAs(); @@ -1049,6 +1095,21 @@ Resolver::AddAggregationThresholdAggregateScan( ZETASQL_RETURN_IF_ERROR(ResolveHintsForNode(select->group_by()->hint(), aggregation_threshold_scan.get())); } + // Add the UDA rewriter if present in the aggregates to match the rewrites in + // the checker which is tested in resolved AST rewriter. + for (const auto& aggregate_comp_col : + aggregation_threshold_scan->aggregate_list()) { + const auto& aggregate_expr = aggregate_comp_col->expr(); + ZETASQL_RET_CHECK(aggregate_expr->Is()); + const auto& aggregate_func_call = + aggregate_expr->GetAs(); + if (aggregate_func_call->function()->Is() || + aggregate_func_call->function()->Is()) { + analyzer_output_properties_.MarkRelevant(REWRITE_INLINE_SQL_UDAS); + } + } + // Add the aggregation threshold rewriter. + analyzer_output_properties_.MarkRelevant(REWRITE_AGGREGATION_THRESHOLD); return aggregation_threshold_scan; } @@ -1199,11 +1260,6 @@ absl::Status Resolver::AddRemainingScansForSelect( std::move(*resolved_having_expr)); } - // TODO: There might be some test cases here that are broken, - // we should not need to use a NameScope at this point. It is currently - // being used for resolving the AnalyticFunctionGroup, which ideally should - // already have had its expressions resolved above. Clean this up and - // add more tests if necessary. if (query_resolution_info->HasAnalytic()) { ZETASQL_RETURN_IF_ERROR(AddAnalyticScan(query_resolution_info, current_scan)); } @@ -1326,8 +1382,8 @@ absl::Status Resolver::AddRemainingScansForSelect( current_scan)); ZETASQL_RETURN_IF_ERROR(MakeResolvedOrderByScan( - order_by, query_resolution_info->GetResolvedColumnList(), - query_resolution_info->order_by_item_info(), + order_by->hint(), query_resolution_info->GetResolvedColumnList(), + {query_resolution_info->order_by_item_info()}, current_scan)); } @@ -1345,7 +1401,8 @@ absl::Status Resolver::AddRemainingScansForSelect( } if (limit_offset != nullptr) { - ZETASQL_RETURN_IF_ERROR(ResolveLimitOffsetScan(limit_offset, current_scan)); + ZETASQL_RETURN_IF_ERROR(ResolveLimitOffsetScan( + limit_offset, having_and_order_by_scope, current_scan)); } if (select->select_as() != nullptr) { @@ -1374,7 +1431,7 @@ absl::Status Resolver::AddRemainingScansForSelect( return absl::OkStatus(); } -void Resolver::AddColumnsForOrderByExprs( +absl::Status Resolver::AddColumnsForOrderByExprs( IdString query_alias, std::vector* order_by_info, std::vector>* computed_columns) { @@ -1382,6 +1439,7 @@ void Resolver::AddColumnsForOrderByExprs( ++order_by_item_idx) { OrderByItemInfo& item_info = (*order_by_info)[order_by_item_idx]; if (!item_info.is_select_list_index()) { + ZETASQL_RET_CHECK(item_info.order_expression != nullptr); if (item_info.order_expression->node_kind() == RESOLVED_COLUMN_REF && !item_info.order_expression->GetAs() ->is_correlated()) { @@ -1414,11 +1472,15 @@ void Resolver::AddColumnsForOrderByExprs( } } } + return absl::OkStatus(); } absl::Status Resolver::ResolveOrderBySimple( const ASTOrderBy* order_by, const NameScope* scope, const char* clause_name, - std::unique_ptr* scan) { + OrderBySimpleMode mode, std::unique_ptr* scan) { + ZETASQL_RET_CHECK(mode == OrderBySimpleMode::kNormal // + ); + // We use a new QueryResolutionInfo because resolving the ORDER BY // outside of set operations is independent from its input subquery // resolution. @@ -1430,14 +1492,15 @@ absl::Status Resolver::ResolveOrderBySimple( ExprResolutionInfo expr_resolution_info( scope, scope, scope, /*allows_aggregation_in=*/false, - /*allows_analytic_in=*/ - true, + /*allows_analytic_in=*/true, /*use_post_grouping_columns_in=*/false, clause_name, query_resolution_info.get()); ZETASQL_RETURN_IF_ERROR(ResolveOrderingExprs( order_by->ordering_expressions(), &expr_resolution_info, query_resolution_info->mutable_order_by_item_info())); + ZETASQL_RET_CHECK(!query_resolution_info->HasAggregation()); + // If the ORDER BY clause after set operations includes analytic functions, // then we need to create an analytic scan for them before we do ordering. // For example: @@ -1456,9 +1519,9 @@ absl::Status Resolver::ResolveOrderBySimple( } std::vector> computed_columns; - AddColumnsForOrderByExprs(/*query_alias=*/kOrderById, - query_resolution_info->mutable_order_by_item_info(), - &computed_columns); + ZETASQL_RETURN_IF_ERROR(AddColumnsForOrderByExprs( + /*query_alias=*/kOrderById, + query_resolution_info->mutable_order_by_item_info(), &computed_columns)); // The output columns of the ORDER BY are the same as the output of the // original input. @@ -1469,70 +1532,68 @@ absl::Status Resolver::ResolveOrderBySimple( ZETASQL_RETURN_IF_ERROR( MaybeAddProjectForComputedColumns(std::move(computed_columns), scan)); - return MakeResolvedOrderByScan(order_by, output_columns, - query_resolution_info->order_by_item_info(), + return MakeResolvedOrderByScan(order_by->hint(), output_columns, + {query_resolution_info->order_by_item_info()}, scan); } absl::Status Resolver::ResolveOrderByItems( - const ASTOrderBy* order_by, const std::vector& output_column_list, - const std::vector& order_by_info, + const OrderByItemInfoVectorList& order_by_info_lists, std::vector>* resolved_order_by_items) { resolved_order_by_items->clear(); - ZETASQL_RET_CHECK_EQ(order_by_info.size(), order_by->ordering_expressions().size()); - - for (int i = 0; i < order_by_info.size(); ++i) { - const OrderByItemInfo& item_info = order_by_info[i]; - std::unique_ptr resolved_column_ref; - if (item_info.is_select_list_index()) { - if (item_info.select_list_index < 0 || - item_info.select_list_index >= output_column_list.size()) { - return MakeSqlErrorAt(item_info.ast_location) - << "ORDER BY is out of SELECT column number range: " - << item_info.select_list_index + 1; + ZETASQL_RET_CHECK(!order_by_info_lists.empty()); + for (const auto& order_by_info_vector : order_by_info_lists) { + for (const OrderByItemInfo& item_info : order_by_info_vector.get()) { + std::unique_ptr resolved_column_ref; + if (item_info.is_select_list_index()) { + if (item_info.select_list_index < 0 || + item_info.select_list_index >= output_column_list.size()) { + return MakeSqlErrorAt(item_info.ast_location) + << "ORDER BY is out of SELECT column number range: " + << item_info.select_list_index + 1; + } + // NOTE: Accessing scan column list works now as we don't deduplicate + // anything from the column list. Thus it matches 1:1 with the select + // list. If that changes, we should use name list instead. + // Convert the select list ordinal reference to a column reference. + resolved_column_ref = + MakeColumnRef(output_column_list[item_info.select_list_index]); + } else { + ZETASQL_RET_CHECK(item_info.order_column.IsInitialized()); + resolved_column_ref = MakeColumnRef(item_info.order_column); } - // NOTE: Accessing scan column list works now as we don't deduplicate - // anything from the column list. Thus it matches 1:1 with the select - // list. If that changes, we should use name list instead. - // Convert the select list ordinal reference to a column reference. - resolved_column_ref = - MakeColumnRef(output_column_list[item_info.select_list_index]); - } else { - resolved_column_ref = MakeColumnRef(item_info.order_column); - } - std::unique_ptr resolved_collation_name; - const ASTCollate* ast_collate = - order_by->ordering_expressions().at(i)->collate(); - if (ast_collate != nullptr) { - ZETASQL_RETURN_IF_ERROR(ValidateAndResolveOrderByCollate( - ast_collate, order_by->ordering_expressions().at(i), - resolved_column_ref->column().type(), &resolved_collation_name)); - } + std::unique_ptr resolved_collation_name; + if (item_info.ast_collate != nullptr) { + ZETASQL_RETURN_IF_ERROR(ValidateAndResolveOrderByCollate( + item_info.ast_collate, item_info.ast_location, + resolved_column_ref->column().type(), &resolved_collation_name)); + } - auto resolved_order_by_item = MakeResolvedOrderByItem( - std::move(resolved_column_ref), std::move(resolved_collation_name), - item_info.is_descending, item_info.null_order); - if (language().LanguageFeatureEnabled(FEATURE_V_1_3_COLLATION_SUPPORT)) { - ZETASQL_RETURN_IF_ERROR( - CollationAnnotation::ResolveCollationForResolvedOrderByItem( - resolved_order_by_item.get())); - } - resolved_order_by_items->push_back(std::move(resolved_order_by_item)); + auto resolved_order_by_item = MakeResolvedOrderByItem( + std::move(resolved_column_ref), std::move(resolved_collation_name), + item_info.is_descending, item_info.null_order); + if (language().LanguageFeatureEnabled(FEATURE_V_1_3_COLLATION_SUPPORT)) { + ZETASQL_RETURN_IF_ERROR( + CollationAnnotation::ResolveCollationForResolvedOrderByItem( + resolved_order_by_item.get())); + } + resolved_order_by_items->push_back(std::move(resolved_order_by_item)); - if (!resolved_order_by_items->back() - ->column_ref() - ->type() - ->SupportsOrdering(language(), /*type_description=*/nullptr)) { - return MakeSqlErrorAt(item_info.ast_location) - << "ORDER BY does not support expressions of type " - << resolved_order_by_items->back() - ->column_ref() - ->type() - ->ShortTypeName(product_mode()); + if (!resolved_order_by_items->back() + ->column_ref() + ->type() + ->SupportsOrdering(language(), /*type_description=*/nullptr)) { + return MakeSqlErrorAt(item_info.ast_location) + << "ORDER BY does not support expressions of type " + << resolved_order_by_items->back() + ->column_ref() + ->type() + ->ShortTypeName(product_mode()); + } } } @@ -1540,15 +1601,15 @@ absl::Status Resolver::ResolveOrderByItems( } absl::Status Resolver::MakeResolvedOrderByScan( - const ASTOrderBy* order_by, + const ASTHint* order_by_hint, const std::vector& output_column_list, - const std::vector& order_by_info, + const OrderByItemInfoVectorList& order_by_info_lists, std::unique_ptr* scan) { std::vector> resolved_order_by_items; ZETASQL_RETURN_IF_ERROR( - ResolveOrderByItems(order_by, output_column_list, order_by_info, + ResolveOrderByItems(output_column_list, order_by_info_lists, &resolved_order_by_items)); std::unique_ptr order_by_scan = @@ -1556,7 +1617,7 @@ absl::Status Resolver::MakeResolvedOrderByScan( std::move(resolved_order_by_items)); order_by_scan->set_is_ordered(true); - ZETASQL_RETURN_IF_ERROR(ResolveHintsForNode(order_by->hint(), order_by_scan.get())); + ZETASQL_RETURN_IF_ERROR(ResolveHintsForNode(order_by_hint, order_by_scan.get())); *scan = std::move(order_by_scan); return absl::OkStatus(); @@ -1764,9 +1825,10 @@ absl::Status Resolver::ResolveSelect( // against the FROM clause NameScope, including star and dot-star // expansion. This first pass is necessary to allow the GROUP BY to // resolve against SELECT list aliases. -// 2. Resolve the GROUP BY expressions against SELECT list aliases and +// 2. Resolve the QUALIFY clause first pass to detect any aggregations. +// 3. Resolve the GROUP BY expressions against SELECT list aliases and // the FROM clause NameScope. -// 3. Resolve the SELECT list second pass. If GROUP BY is present, this +// 4. Resolve the SELECT list second pass. If GROUP BY is present, this // re-resolves expressions against a new NameScope that includes // grouped versions of the columns (expressions computed after // grouping/aggregation must reference the grouped versions of the @@ -1775,10 +1837,10 @@ absl::Status Resolver::ResolveSelect( // be looked up by name but they provide errors if accessed. There // are optimizations in place to avoid re-resolution of expressions // whenever possible. -// 4. Resolve the HAVING clause against another new post-grouped column +// 5. Resolve the HAVING clause against another new post-grouped column // NameScope that includes SELECT list aliases. -// 5. Resolve the QUALIFY clause against same NameScope as HAVING. -// 6. Resolve ORDER BY expressions against this post-grouped column +// 6. Resolve the QUALIFY clause against same NameScope as HAVING. +// 7. Resolve ORDER BY expressions against this post-grouped column // NameScope (or if DISTINCT is present, then a post-DISTINCT // NameScope) that includes SELECT list aliases. // @@ -1855,9 +1917,66 @@ absl::Status Resolver::ResolveSelectAfterFrom( inferred_type_for_select_list = inferred_type_for_query; } ZETASQL_RETURN_IF_ERROR(ResolveSelectListExprsFirstPass( - select->select_list(), from_scan_scope.get(), - from_clause_name_list, query_resolution_info.get(), - inferred_type_for_select_list)); + select->select_list(), from_scan_scope.get(), from_clause_name_list, + query_resolution_info.get(), inferred_type_for_select_list)); + + // At this point, query_resolution_info->HasGroupByOrAggregation()` reflects + // either the presence of a `GROUP BY` clause, or the presence of aggregate + // functions in the `SELECT` list. There could still be aggregate functions + // in the `QUALIFY` clause that have not been detected yet. There may also be + // aggregate functions in the `ORDER BY` clause, though that will be an error. + // We need to detect the aggregation in `QUALIFY` before we continue because + // subsequent tasks are handled differently for queries with aggregation or no + // aggregation. + if (!query_resolution_info->HasGroupByOrAggregation() && + select->qualify() != nullptr) { + // QUALIFY may reference names in the from clause scope or explicit aliases + // introduced in the SELECT list. + std::shared_ptr post_analytic_name_list(new NameList); + for (const std::unique_ptr& select_column_state : + select_column_state_list->select_column_state_list()) { + if (!select_column_state->alias.empty() && + !IsInternalAlias(select_column_state->alias)) { + if (select_column_state->resolved_expr->Is()) { + // The expression already resolved to a column (either correlated + // or uncorrelated is ok), so just use it. + ZETASQL_RETURN_IF_ERROR(post_analytic_name_list->AddColumn( + select_column_state->alias, + select_column_state->resolved_expr->GetAs() + ->column(), + /*is_explicit=*/true)); + } else { + ZETASQL_RET_CHECK(select_column_state->GetType() != nullptr); + ResolvedColumn select_column(AllocateColumnId(), query_alias, + select_column_state->alias, + select_column_state->GetType()); + ZETASQL_RETURN_IF_ERROR(post_analytic_name_list->AddColumn( + select_column_state->alias, select_column, /*is_explicit=*/true)); + } + } + } + std::unique_ptr qualify_name_scope; + ZETASQL_RETURN_IF_ERROR(from_scan_scope->CopyNameScopeWithOverridingNames( + post_analytic_name_list, &qualify_name_scope)); + + auto qualify_expr_resolution_info = std::make_unique( + qualify_name_scope.get(), query_resolution_info.get(), + select->qualify()->expression()); + // We don't use this ResolvedExpr. For this first pass, we are only + // resolving this for the side effect in query_resolution_info so that + // subsequent steps get the correct value for `HasGroupByOrAggregation()`. + std::unique_ptr resolved_qualify_expr; + ZETASQL_RETURN_IF_ERROR(ResolveExpr( + select->qualify()->expression(), qualify_expr_resolution_info.get(), + &resolved_qualify_expr, /*inferred_type=*/nullptr)); + bool has_aggregation = query_resolution_info->HasAggregation(); + // Reset GroupByInfo. We will re-resolve the QUALIFY expression later with + // the real input columns. We need to clear it here so that we don't end up + // with duplicate entries in the group_by_expr_map after we re-resolve + // QUALIFY. + query_resolution_info->ClearGroupByInfo(); + query_resolution_info->SetHasAggregation(has_aggregation); + } // Return an appropriate error for anonymization queries that don't perform // aggregation. @@ -1916,20 +2035,21 @@ absl::Status Resolver::ResolveSelectAfterFrom( } } - if (select->from_clause() == nullptr) { - // Note that we do not allow HAVING or ORDER BY if there is no FROM - // clause, so even though aggregation or analytic can appear there - // we do not have to wait until we analyze those clauses to check for - // this error condition. - if (query_resolution_info->HasGroupByOrAggregation()) { - return MakeSqlErrorAt(select) - << "SELECT without FROM clause cannot use aggregation"; - } - if (query_resolution_info->HasAnalytic()) { - return MakeSqlErrorAt(select) - << "SELECT without FROM clause cannot use analytic functions"; - } + bool check_from_clause = select->from_clause() == nullptr; + if (check_from_clause) { + // Note that we do not allow HAVING or ORDER BY if there is no FROM + // clause, so even though aggregation or analytic can appear there + // we do not have to wait until we analyze those clauses to check for + // this error condition. + if (query_resolution_info->HasGroupByOrAggregation()) { + return MakeSqlErrorAt(select) + << "SELECT without FROM clause cannot use aggregation"; + } + if (query_resolution_info->HasAnalytic()) { + return MakeSqlErrorAt(select) + << "SELECT without FROM clause cannot use analytic functions"; } + } if (!query_resolution_info->HasGroupByOrAggregation() && !query_resolution_info->HasAnalytic()) { @@ -2324,6 +2444,9 @@ absl::Status Resolver::AnalyzeSelectColumnsToPrecomputeBeforeAggregation( select_column_state->has_analytic) { continue; } + + // Only if the select list item has an explicit or inferrable alias, can it + // be used by other parts of the queries like GROUP BY, HAVING, etc. if (!IsInternalAlias(select_column_state->alias)) { ZETASQL_RET_CHECK(select_column_state->resolved_expr != nullptr); ResolvedColumn pre_group_by_column; @@ -2336,8 +2459,8 @@ absl::Status Resolver::AnalyzeSelectColumnsToPrecomputeBeforeAggregation( ->column(); } else { // The expression is not a simple column reference, it is a more - // complicated expression that must be computed before aggregation - // so that we can GROUP BY that computed column. + // complicated expression that need to be computed before aggregation + // if we GROUP BY that computed column. pre_group_by_column = ResolvedColumn( AllocateColumnId(), kPreGroupById, select_column_state->alias, select_column_state->resolved_expr->annotated_type()); @@ -2364,6 +2487,13 @@ absl::Status Resolver::AnalyzeSelectColumnsToPrecomputeBeforeAggregation( // This column reference will be used when resolving the GROUP BY // expressions. select_column_state->resolved_expr = MakeColumnRef(pre_group_by_column); + // Keep track of the resolved expr before it gets replaced with a + // reference to a pre-aggregate compute column. + select_column_state->original_resolved_expr = + query_resolution_info + ->select_list_columns_to_compute_before_aggregation() + ->back() + ->expr(); } select_column_state->resolved_pre_group_by_select_column = pre_group_by_column; @@ -2512,10 +2642,10 @@ absl::Status Resolver::ResolveOrderByExprs( expr_resolution_info.query_resolution_info ->mutable_order_by_item_info())); - AddColumnsForOrderByExprs( + ZETASQL_RETURN_IF_ERROR(AddColumnsForOrderByExprs( /*query_alias=*/kOrderById, query_resolution_info->mutable_order_by_item_info(), - query_resolution_info->order_by_columns_to_compute()); + query_resolution_info->order_by_columns_to_compute())); if (!already_saw_group_by_or_aggregation && query_resolution_info->HasGroupByOrAggregation()) { @@ -2665,12 +2795,13 @@ static bool ExcludeOrReplaceColumn( } absl::Status Resolver::AddNameListToSelectList( - const ASTExpression* ast_expression, + const ASTSelectColumn* ast_select_column, const std::shared_ptr& name_list, const CorrelatedColumnsSetList& correlated_columns_set_list, bool ignore_excluded_value_table_fields, SelectColumnStateList* select_column_state_list, ColumnReplacements* column_replacements) { + const ASTExpression* ast_expression = ast_select_column->expression(); const size_t orig_num_columns = select_column_state_list->Size(); for (const NamedColumn& named_column : name_list->columns()) { // Process exclusions first because MakeColumnRef will add columns @@ -2692,7 +2823,7 @@ absl::Status Resolver::AddNameListToSelectList( ZETASQL_RET_CHECK(!named_column.name().empty()); ZETASQL_RETURN_IF_ERROR(AddColumnFieldsToSelectList( - ast_expression, column_ref.get(), + ast_select_column, column_ref.get(), /*src_column_has_aggregation=*/false, /*src_column_has_analytic=*/false, /*src_column_has_volatile=*/false, @@ -2703,7 +2834,7 @@ absl::Status Resolver::AddNameListToSelectList( select_column_state_list, column_replacements)); } else { select_column_state_list->AddSelectColumn( - ast_expression, named_column.name(), named_column.is_explicit(), + ast_select_column, named_column.name(), named_column.is_explicit(), /*has_aggregation=*/false, /*has_analytic=*/false, /*has_volatile=*/false, std::move(column_ref)); } @@ -2861,8 +2992,9 @@ absl::Status Resolver::ResolveSelectDistinct( distinct_column = existing_computed_column->column(); } else { // Create a new DISTINCT column. + const IdString* query_alias = &kDistinctId; distinct_column = - ResolvedColumn(AllocateColumnId(), kDistinctId, column.name_id(), + ResolvedColumn(AllocateColumnId(), *query_alias, column.name_id(), column.annotated_type()); // Add a computed column for the new post-DISTINCT column. query_resolution_info->AddGroupByComputedColumnIfNeeded( @@ -2902,7 +3034,7 @@ absl::Status Resolver::ResolveSelectDistinct( } absl::Status Resolver::ResolveSelectStarModifiers( - const ASTNode* ast_location, const ASTStarModifiers* modifiers, + const ASTSelectColumn* ast_select_column, const ASTStarModifiers* modifiers, const NameList* name_list_for_star, const Type* type_for_star, const NameScope* scope, QueryResolutionInfo* query_resolution_info, ColumnReplacements* column_replacements) { @@ -2917,9 +3049,10 @@ absl::Status Resolver::ResolveSelectStarModifiers( if (!language().LanguageFeatureEnabled( FEATURE_V_1_1_SELECT_STAR_EXCEPT_REPLACE)) { if (except_list != nullptr) { - return MakeSqlErrorAt(ast_location) << "SELECT * EXCEPT is not supported"; + return MakeSqlErrorAt(ast_select_column) + << "SELECT * EXCEPT is not supported"; } else { - return MakeSqlErrorAt(ast_location) + return MakeSqlErrorAt(ast_select_column) << "SELECT * REPLACE is not supported"; } } @@ -3011,11 +3144,15 @@ absl::Status Resolver::ResolveSelectStarModifiers( } auto select_column_state = std::make_unique( - ast_replace_item->expression(), identifier, + ast_select_column, identifier, /*is_explicit=*/true, expr_resolution_info.has_aggregation, expr_resolution_info.has_analytic, expr_resolution_info.has_volatile, std::move(resolved_expr)); + // Override the ast_expr to point at the replacement expression + // rather than `ast_select_column`, which points at the star expression. + select_column_state->ast_expr = ast_replace_item->expression(); + if (!column_replacements->replaced_columns .emplace(identifier, std::move(select_column_state)) .second) { @@ -3031,10 +3168,11 @@ absl::Status Resolver::ResolveSelectStarModifiers( // NOTE: The behavior of star expansion here must match // NameList::SelectStarHasColumn. absl::Status Resolver::ResolveSelectStar( - const ASTExpression* ast_select_expr, + const ASTSelectColumn* ast_select_column, const std::shared_ptr& from_clause_name_list, const NameScope* from_scan_scope, QueryResolutionInfo* query_resolution_info) { + const ASTExpression* ast_select_expr = ast_select_column->expression(); if (in_strict_mode()) { return MakeSqlErrorAt(ast_select_expr) << "SELECT * is not allowed in strict name resolution mode"; @@ -3054,14 +3192,14 @@ absl::Status Resolver::ResolveSelectStar( const ASTStarWithModifiers* ast_node = ast_select_expr->GetAsOrDie(); ZETASQL_RETURN_IF_ERROR(ResolveSelectStarModifiers( - ast_node, ast_node->modifiers(), from_clause_name_list.get(), + ast_select_column, ast_node->modifiers(), from_clause_name_list.get(), /*type_for_star=*/nullptr, from_scan_scope, query_resolution_info, &column_replacements)); } const CorrelatedColumnsSetList correlated_columns_set_list; ZETASQL_RETURN_IF_ERROR(AddNameListToSelectList( - ast_select_expr, from_clause_name_list, correlated_columns_set_list, + ast_select_column, from_clause_name_list, correlated_columns_set_list, /*ignore_excluded_value_table_fields=*/true, query_resolution_info->select_column_state_list(), &column_replacements)); @@ -3087,9 +3225,10 @@ static absl::Status MakeErrorIfTypeDotStarHasNoFields( } absl::Status Resolver::ResolveSelectDotStar( - const ASTExpression* ast_dotstar, const NameScope* from_scan_scope, + const ASTSelectColumn* ast_select_column, const NameScope* from_scan_scope, QueryResolutionInfo* query_resolution_info) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); + const ASTExpression* ast_dotstar = ast_select_column->expression(); const ASTExpression* ast_expr; const ASTStarModifiers* ast_modifiers = nullptr; if (ast_dotstar->node_kind() == AST_DOT_STAR) { @@ -3133,13 +3272,14 @@ absl::Status Resolver::ResolveSelectDotStar( ColumnReplacements column_replacements; if (ast_modifiers != nullptr) { ZETASQL_RETURN_IF_ERROR(ResolveSelectStarModifiers( - ast_dotstar, ast_modifiers, target.scan_columns().get(), + ast_select_column, ast_modifiers, target.scan_columns().get(), /*type_for_star=*/nullptr, from_scan_scope, query_resolution_info, &column_replacements)); } ZETASQL_RETURN_IF_ERROR(AddNameListToSelectList( - ast_dotstar, target.scan_columns(), correlated_columns_set_list, + ast_select_column, target.scan_columns(), + correlated_columns_set_list, /*ignore_excluded_value_table_fields=*/false, query_resolution_info->select_column_state_list(), &column_replacements)); @@ -3223,15 +3363,17 @@ absl::Status Resolver::ResolveSelectDotStar( ColumnReplacements column_replacements; if (ast_modifiers != nullptr) { ZETASQL_RETURN_IF_ERROR(ResolveSelectStarModifiers( - ast_dotstar, ast_modifiers, /*name_list_for_star=*/nullptr, source_type, - from_scan_scope, query_resolution_info, &column_replacements)); + ast_select_column, ast_modifiers, /*name_list_for_star=*/nullptr, + source_type, from_scan_scope, query_resolution_info, + &column_replacements)); } const int orig_num_columns = static_cast( query_resolution_info->select_column_state_list()->Size()); ZETASQL_RETURN_IF_ERROR(AddColumnFieldsToSelectList( - ast_dotstar, src_column_ref.get(), expr_resolution_info.has_aggregation, - expr_resolution_info.has_analytic, expr_resolution_info.has_volatile, + ast_select_column, src_column_ref.get(), + expr_resolution_info.has_aggregation, expr_resolution_info.has_analytic, + expr_resolution_info.has_volatile, /*column_alias_if_no_fields=*/IdString(), /*excluded_field_names=*/nullptr, query_resolution_info->select_column_state_list(), &column_replacements)); @@ -3251,13 +3393,14 @@ absl::Status Resolver::ResolveSelectDotStar( // NOTE: The behavior of star expansion here must match // NameList::SelectStarHasColumn. absl::Status Resolver::AddColumnFieldsToSelectList( - const ASTExpression* ast_expression, + const ASTSelectColumn* ast_select_column, const ResolvedColumnRef* src_column_ref, bool src_column_has_aggregation, bool src_column_has_analytic, bool src_column_has_volatile, IdString column_alias_if_no_fields, const IdStringSetCase* excluded_field_names, SelectColumnStateList* select_column_state_list, ColumnReplacements* column_replacements) { + const ASTExpression* ast_expression = ast_select_column->expression(); const bool allow_no_fields = !column_alias_if_no_fields.empty(); const Type* type = src_column_ref->type(); @@ -3282,7 +3425,7 @@ absl::Status Resolver::AddColumnFieldsToSelectList( // we had an explicit alias for the table. // This is not a strict requirement and we could change it. select_column_state_list->AddSelectColumn( - ast_expression, column_alias_if_no_fields, /*is_explicit=*/false, + ast_select_column, column_alias_if_no_fields, /*is_explicit=*/false, src_column_has_aggregation, src_column_has_analytic, src_column_has_volatile, CopyColumnRef(src_column_ref)); return absl::OkStatus(); @@ -3312,7 +3455,7 @@ absl::Status Resolver::AddColumnFieldsToSelectList( /*error_node=*/nullptr, get_struct_field.get())); // is_explicit=false because we're extracting all fields of a struct. select_column_state_list->AddSelectColumn( - ast_expression, field_name, + ast_select_column, field_name, /*is_explicit=*/false, src_column_has_aggregation, src_column_has_analytic, src_column_has_volatile, std::move(get_struct_field)); @@ -3371,7 +3514,7 @@ absl::Status Resolver::AddColumnFieldsToSelectList( /*return_default_value_when_unset=*/false); // is_explicit=false because we're extracting all fields of a proto. select_column_state_list->AddSelectColumn( - ast_expression, field_name, /*is_explicit=*/false, + ast_select_column, field_name, /*is_explicit=*/false, src_column_has_aggregation, src_column_has_analytic, src_column_has_volatile, std::move(resolved_expr)); } @@ -3390,11 +3533,11 @@ absl::Status Resolver::ResolveSelectColumnFirstPass( switch (ast_select_expr->node_kind()) { case AST_STAR: case AST_STAR_WITH_MODIFIERS: - return ResolveSelectStar(ast_select_expr, from_clause_name_list, + return ResolveSelectStar(ast_select_column, from_clause_name_list, from_scan_scope, query_resolution_info); case AST_DOT_STAR: case AST_DOT_STAR_WITH_MODIFIERS: - return ResolveSelectDotStar(ast_select_expr, from_scan_scope, + return ResolveSelectDotStar(ast_select_column, from_scan_scope, query_resolution_info); default: break; @@ -3416,9 +3559,10 @@ absl::Status Resolver::ResolveSelectColumnFirstPass( // from an AS alias or from a path in the query, or it's an internal name // for an anonymous column (that can't be looked up). query_resolution_info->select_column_state_list()->AddSelectColumn( - ast_select_expr, select_column_alias, /*is_explicit=*/true, + ast_select_column, select_column_alias, /*is_explicit=*/true, expr_resolution_info->has_aggregation, expr_resolution_info->has_analytic, expr_resolution_info->has_volatile, std::move(resolved_expr)); + return absl::OkStatus(); } @@ -3451,6 +3595,20 @@ absl::Status Resolver::ValidateAndResolveOrderByCollate( return ResolveCollate(ast_collate, resolved_collate); } +absl::StatusOr +Resolver::ResolveNullOrderMode(const ASTNullOrder* null_order) { + if (null_order == nullptr) { + return ResolvedOrderByItemEnums::ORDER_UNSPECIFIED; + } + if (!language().LanguageFeatureEnabled( + FEATURE_V_1_3_NULLS_FIRST_LAST_IN_ORDER_BY)) { + return MakeSqlErrorAt(null_order) + << "NULLS FIRST and NULLS LAST are not supported"; + } + return null_order->nulls_first() ? ResolvedOrderByItemEnums::NULLS_FIRST + : ResolvedOrderByItemEnums::NULLS_LAST; +} + absl::Status Resolver::ResolveOrderingExprs( const absl::Span ordering_expressions, ExprResolutionInfo* expr_resolution_info, @@ -3458,19 +3616,9 @@ absl::Status Resolver::ResolveOrderingExprs( RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); for (const ASTOrderingExpression* order_by_expression : ordering_expressions) { - ResolvedOrderByItemEnums::NullOrderMode null_order = - ResolvedOrderByItemEnums::ORDER_UNSPECIFIED; - if (order_by_expression->null_order() != nullptr) { - if (!language().LanguageFeatureEnabled( - FEATURE_V_1_3_NULLS_FIRST_LAST_IN_ORDER_BY)) { - return MakeSqlErrorAt(order_by_expression->null_order()) - << "NULLS FIRST and NULLS LAST are not supported"; - } else { - null_order = order_by_expression->null_order()->nulls_first() - ? ResolvedOrderByItemEnums::NULLS_FIRST - : ResolvedOrderByItemEnums::NULLS_LAST; - } - } + ZETASQL_ASSIGN_OR_RETURN(ResolvedOrderByItemEnums::NullOrderMode null_order, + ResolveNullOrderMode(order_by_expression->null_order())); + std::unique_ptr resolved_order_expression; ZETASQL_RETURN_IF_ERROR(ResolveExpr(order_by_expression->expression(), expr_resolution_info, @@ -3490,9 +3638,9 @@ absl::Status Resolver::ResolveOrderingExprs( << "Found : " << value.int64_value(); } const int64_t int_value = value.int64_value() - 1; // Make it 0-based. - order_by_info->emplace_back(order_by_expression, int_value, - order_by_expression->descending(), - null_order); + order_by_info->emplace_back( + order_by_expression, order_by_expression->collate(), int_value, + order_by_expression->descending(), null_order); } else { return MakeSqlErrorAt(order_by_expression) << "Cannot ORDER BY literal values"; @@ -3500,7 +3648,8 @@ absl::Status Resolver::ResolveOrderingExprs( resolved_order_expression.reset(); // No longer needed. } else { order_by_info->emplace_back( - order_by_expression, std::move(resolved_order_expression), + order_by_expression, order_by_expression->collate(), + std::move(resolved_order_expression), order_by_expression->descending(), null_order); } } @@ -3538,8 +3687,9 @@ absl::Status Resolver::HandleGroupBySelectColumn( // expression. *group_by_column = existing_computed_column->column(); } else { + const IdString* query_alias = &kGroupById; *group_by_column = ResolvedColumn( - AllocateColumnId(), kGroupById, select_column_state->alias, + AllocateColumnId(), *query_alias, select_column_state->alias, select_column_state->resolved_expr->annotated_type()); } @@ -3594,8 +3744,10 @@ absl::Status Resolver::HandleGroupByExpression( for (const std::unique_ptr& computed_column : *query_resolution_info ->select_list_columns_to_compute_before_aggregation()) { - if (IsSameFieldPath(resolved_expr->get(), computed_column->expr(), - FieldPathMatchingOption::kExpression)) { + ZETASQL_ASSIGN_OR_RETURN(bool is_same_expr, + IsSameExpressionForGroupBy(resolved_expr->get(), + computed_column->expr())); + if (is_same_expr) { *group_by_column = computed_column->column(); found_precomputed_expression = true; break; @@ -3612,7 +3764,8 @@ absl::Status Resolver::HandleGroupByExpression( // expression. *group_by_column = existing_computed_column->column(); } else { - *group_by_column = ResolvedColumn(AllocateColumnId(), kGroupById, alias, + const IdString* query_alias = &kGroupById; + *group_by_column = ResolvedColumn(AllocateColumnId(), *query_alias, alias, (*resolved_expr)->annotated_type()); } @@ -3676,6 +3829,9 @@ absl::Status Resolver::AddSelectColumnToGroupByAllComputedColumn( static const ResolvedExpr* GetPreGroupByResolvedExpr( const SelectColumnState* select_column_state) { + if (select_column_state->original_resolved_expr != nullptr) { + return select_column_state->original_resolved_expr; + } return select_column_state->resolved_expr == nullptr ? select_column_state->resolved_computed_column->expr() : select_column_state->resolved_expr.get(); @@ -3769,7 +3925,7 @@ absl::Status Resolver::ResolveGroupByAll( GetSourceColumnAndNamePath( select_expr, /*target_column=*/ResolvedColumn(), &source_column, &is_correlated, &select_expr_name_path, id_string_pool_)) { - // If we identify a path expression, we do not need to look at it anymore + // If we identify a path expression, we do not need to look at it any more // when we go over the select list items again in the third pass. skip_column_positions.insert(i); // If it's a correlated expression that does not reference any FROM scope @@ -3842,10 +3998,11 @@ absl::Status Resolver::ResolveGroupByAll( } // There are a couple of situations when neither the group by list nor the - // aggregate list contains any columns. They act like no GROUP BY clause is - // specified. + // aggregate list contains any columns. They act like group by 0 column which + // produce one row of output. if (query_resolution_info->group_by_columns_to_compute().empty() && query_resolution_info->aggregate_columns_to_compute().empty()) { + query_resolution_info->SetHasAggregation(true); query_resolution_info->set_has_group_by(false); } return absl::OkStatus(); @@ -3952,6 +4109,7 @@ absl::Status Resolver::ResolveGroupByExprs( bool has_rollup_or_cube = false; for (const ASTGroupingItem* grouping_item : group_by->grouping_items()) { + if (grouping_item->rollup() != nullptr) { // GROUP BY ROLLUP has_rollup_or_cube = true; @@ -4156,32 +4314,6 @@ absl::Status Resolver::ResolveGroupingItemExpression( &no_aggregation, &group_by_column_state)); } } - - // If the same expression already has a computed column, then reuse the - // existing computed column. This is to make sure the group_by_list only has - // one expression when multiple duplicated expressions appear in the group - // by clause. - // This change only applies to grouping sets, rollup and cube when - // FEATURE_V_1_4_GROUPING_SETS is enabled for SAFETY. This is because a - // global deduplication may have subtle impact to existing DISTINCT, normal - // GROUP BY, legacy ROLLUP queries, though theoretically there shouldn't be. - // We will apply this global deduplication once it's verifed. - if (from_grouping_set && - language().LanguageFeatureEnabled(FEATURE_V_1_4_GROUPING_SETS)) { - for (const std::unique_ptr& - resolved_computed_column : - query_resolution_info->group_by_columns_to_compute()) { - ZETASQL_ASSIGN_OR_RETURN( - bool is_same_expression, - IsSameExpressionForGroupBy(resolved_expr.get(), - resolved_computed_column->expr())); - - if (is_same_expression) { - column_list->push_back(resolved_computed_column.get()); - return absl::OkStatus(); - } - } - } } IdString alias = GetAliasForExpression(ast_group_by_expr); @@ -4224,6 +4356,32 @@ absl::Status Resolver::ResolveGroupingItemExpression( HandleGroupBySelectColumn(group_by_column_state, query_resolution_info, &resolved_expr, &group_by_column)); } else { + // If the same expression already has a computed column, then reuse the + // existing computed column. This is to make sure the group_by_list only has + // one expression when multiple duplicated expressions appear in the group + // by clause. + // This change only applies to grouping sets, rollup and cube when + // FEATURE_V_1_4_GROUPING_SETS is enabled for SAFETY. This is because a + // global deduplication may have subtle impact to existing DISTINCT, normal + // GROUP BY, legacy ROLLUP queries, though theoretically there shouldn't be. + // We will apply this global deduplication once it's verifed. + if (from_grouping_set && + language().LanguageFeatureEnabled(FEATURE_V_1_4_GROUPING_SETS)) { + for (const std::unique_ptr& + resolved_computed_column : + query_resolution_info->group_by_columns_to_compute()) { + ZETASQL_ASSIGN_OR_RETURN( + bool is_same_expression, + IsSameExpressionForGroupBy(resolved_expr.get(), + resolved_computed_column->expr())); + + if (is_same_expression) { + column_list->push_back(resolved_computed_column.get()); + return absl::OkStatus(); + } + } + } + ZETASQL_RETURN_IF_ERROR(HandleGroupByExpression(ast_group_by_expr, query_resolution_info, alias, &resolved_expr, &group_by_column)); @@ -4420,10 +4578,12 @@ absl::Status Resolver::ResolveSelectColumnSecondPass( } } - return (*final_project_name_list) - ->AddColumn(select_column_state->alias, - select_column_state->resolved_select_column, - select_column_state->is_explicit); + ZETASQL_RETURN_IF_ERROR((*final_project_name_list) + ->AddColumn(select_column_state->alias, + select_column_state->resolved_select_column, + select_column_state->is_explicit)); + + return absl::OkStatus(); } absl::Status Resolver::ResolveSelectListExprsSecondPass( @@ -4809,11 +4969,6 @@ absl::Status Resolver::SetOperationResolver::Resolve( ResolveInputQuery(scope, idx, inferred_type_for_query)); } - if (ASTColumnMatchMode() != ASTSetOperation::BY_POSITION) { - resolver_->analyzer_output_properties_.MarkRelevant( - REWRITE_SET_OPERATION_CORRESPONDING); - } - ResolvedColumnList final_column_list; std::shared_ptr name_list_template; if (ASTColumnMatchMode() == ASTSetOperation::BY_POSITION) { @@ -5519,12 +5674,23 @@ absl::Status Resolver::SetOperationResolver::AdjustAndReorderColumns( null_columns.push_back(std::move(null_column)); } - // TODO: Once we are ready to remove the set operation - // rewriter, we should add a ProjectScan as long as `new_output_column_list` - // is different from the column_list of the item's scan. ResolvedSetOperationItem* input = resolved_inputs[query_idx].node.get(); - if (!null_columns.empty()) { - // Need to add a ProjectScan to pad NULL columns. + if (new_output_column_list != input->scan()->column_list()) { + // Add a ProjectScan to adjust the output columns, including cases when: + // - only some of the expected columns are selected + // - original columns are reordered + // - there are padded NULL columns + // + // Note: `input->output_column_list()` is the same as + // `input->scan()->column_list()` except for the "SELECT DISTINCT" edge + // case where a ProjectScan is missing: b/36095506. For example, in the + // following resolved ast, `output_column_list` has more columns than + // `scan.column_list()`: + // + // ``` + // scan=AggregateScan(column_list=$distinct.[int32#19, int64_t#20]) + // output_column_list=$distinct.[int32#19, int64_t#20, int32_t#19] + // ``` ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr project_scan, ResolvedProjectScanBuilder() @@ -5617,7 +5783,8 @@ InputArgumentType Resolver::SetOperationResolver::GetColumnInputArgumentType( expr = FindProjectExpr(resolved_scan->GetAs(), column); } if (expr != nullptr) { - return GetInputArgumentTypeForExpr(expr); + return GetInputArgumentTypeForExpr( + expr, /*pick_default_type_for_untyped_expr=*/false); } else { return InputArgumentType(column.type()); } @@ -5928,13 +6095,19 @@ Resolver::ValidateRecursiveTermVisitor::VisitResolvedRecursiveRefScan( return absl::OkStatus(); } +static bool IsBasicCorrespondingEnabled( + const LanguageOptions& language_options) { + return language_options.LanguageFeatureEnabled(FEATURE_V_1_4_CORRESPONDING) || + language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_CORRESPONDING_FULL); +} + absl::Status Resolver::SetOperationResolver::ValidateNoCorrespondingForRecursive() const { ZETASQL_RET_CHECK_NE(set_operation_->metadata(), nullptr); for (const auto* metadata : set_operation_->metadata()->set_operation_metadata_list()) { - if (!resolver_->language().LanguageFeatureEnabled( - FEATURE_V_1_4_CORRESPONDING)) { + if (!IsBasicCorrespondingEnabled(resolver_->language())) { if (metadata->column_match_mode() != nullptr) { return MakeSqlErrorAt(metadata->column_match_mode()) << "CORRESPONDING and CORRESPONDING BY for WITH RECURSIVE are " @@ -6232,6 +6405,60 @@ absl::Status Resolver::SetOperationResolver::ResolveRecursive( recursive_scan->column_list(); *output = std::move(recursive_scan); *output_name_list = final_name_list; + + return absl::OkStatus(); +} + +absl::Status Resolver::SetOperationResolver::FinishResolveRecursionWithModifier( + const ASTNode* ast_location, const std::vector& recursive_alias, + std::unique_ptr depth_modifier, + std::unique_ptr* output, + std::shared_ptr* output_name_list) { + ZETASQL_RET_CHECK((*output)->Is()); + + const auto& name_list = *output_name_list; + if (name_list->is_value_table()) { + return MakeSqlErrorAt(ast_location) + << "WITH DEPTH modifier is not allowed when the recursive query " + "produces a value table."; + } + + // Handles the case when the recursion depth column alias is ambiguous. + const ResolvedColumn& depth_column = + depth_modifier->recursion_depth_column()->column(); + NameTarget name_target; + if (name_list->LookupName(depth_column.name_id(), &name_target)) { + return MakeSqlErrorAt(ast_location) + << "WITH DEPTH modifier depth column is named " + << ToSingleQuotedStringLiteral(depth_column.name_id().ToStringView()) + << " which collides with one of the existing names."; + } + + // Adds the recursion depth column to output name list. + auto name_list_with_modifier = std::make_shared(); + ZETASQL_RETURN_IF_ERROR(name_list_with_modifier->MergeFrom(*name_list, ast_location)); + ZETASQL_RETURN_IF_ERROR(name_list_with_modifier->AddColumn(depth_column.name_id(), + depth_column, + /*is_explicit=*/true)); + + // Updates the resolved ast to add depth column to the column list. + ZETASQL_ASSIGN_OR_RETURN( + auto recursive_scan, + ToBuilder( + absl::WrapUnique(output->release()->GetAs())) + .set_recursion_depth_modifier(std::move(depth_modifier)) + .add_column_list(depth_column) + .Build()); + + // Updates the named subquery corresponding to the recursive subquery. + auto& named_subquery = resolver_->named_subquery_map_[recursive_alias].back(); + auto modified_named_subquery = std::make_unique( + named_subquery->unique_alias, named_subquery->is_recursive, + recursive_scan->column_list(), std::move(name_list_with_modifier)); + named_subquery = std::move(modified_named_subquery); + + *output_name_list = std::move(name_list_with_modifier); + *output = std::move(recursive_scan); return absl::OkStatus(); } @@ -6402,14 +6629,13 @@ absl::Status Resolver::SetOperationResolver::ValidateCorresponding() const { const ASTSetOperationColumnMatchMode* column_match_mode = metadata->column_match_mode(); if (column_match_mode != nullptr) { - if (!resolver_->language().LanguageFeatureEnabled( - FEATURE_V_1_4_CORRESPONDING) && + if (!IsBasicCorrespondingEnabled(resolver_->language()) && column_match_mode->value() == ASTSetOperation::CORRESPONDING) { return MakeSqlErrorAt(column_match_mode) << "CORRESPONDING for set operations is not supported"; } if (!resolver_->language().LanguageFeatureEnabled( - FEATURE_V_1_4_CORRESPONDING_BY) && + FEATURE_V_1_4_CORRESPONDING_FULL) && column_match_mode->value() == ASTSetOperation::CORRESPONDING_BY) { return MakeSqlErrorAt(column_match_mode) << "CORRESPONDING BY for set operations is not supported"; @@ -6419,7 +6645,7 @@ absl::Status Resolver::SetOperationResolver::ValidateCorresponding() const { const ASTSetOperationColumnPropagationMode* column_propagation_mode = metadata->column_propagation_mode(); if (!resolver_->language().LanguageFeatureEnabled( - FEATURE_V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE) && + FEATURE_V_1_4_CORRESPONDING_FULL) && column_propagation_mode != nullptr) { return MakeSqlErrorAt(metadata->column_propagation_mode()) << "Column propagation mode (FULL/LEFT/STRICT) for set " @@ -6494,8 +6720,11 @@ absl::Status Resolver::ValidateParameterOrLiteralAndCoerceToInt64IfNeeded( if ((*expr)->node_kind() == RESOLVED_LITERAL) { // If a literal, we can also validate its value. - const Value value = (*expr)->GetAs()->value(); - if (!value.is_null() && value.int64_value() < 0) { + const Value& value = (*expr)->GetAs()->value(); + if (value.is_null()) { + return MakeSqlErrorAt(ast_location) << clause_name << " must not be null"; + } + if (value.int64_value() < 0) { return MakeSqlErrorAt(ast_location) << clause_name << " expects a non-negative integer literal or parameter"; @@ -6510,8 +6739,28 @@ absl::Status Resolver::ResolveLimitOrOffsetExpr( std::unique_ptr* resolved_expr) { ZETASQL_RETURN_IF_ERROR(ResolveExpr(ast_expr, expr_resolution_info, resolved_expr)); ABSL_DCHECK(resolved_expr != nullptr); - ZETASQL_RETURN_IF_ERROR(ValidateParameterOrLiteralAndCoerceToInt64IfNeeded( - clause_name, ast_expr, resolved_expr)); + + if (!language().LanguageFeatureEnabled( + FEATURE_V_1_4_LIMIT_OFFSET_EXPRESSIONS)) { + return ValidateParameterOrLiteralAndCoerceToInt64IfNeeded( + clause_name, ast_expr, resolved_expr); + } + + ZETASQL_RETURN_IF_ERROR(CoerceExprToType( + ast_expr, type_factory_->get_int64(), kImplicitCoercion, + [](absl::string_view target_type_name, + absl::string_view actual_type_name) { + return absl::StrCat("LIMIT ... OFFSET ... expects ", target_type_name, + ", got ", actual_type_name); + }, + resolved_expr)); + ZETASQL_ASSIGN_OR_RETURN(bool is_constant_expr, + IsConstantExpression(resolved_expr->get())); + if (!is_constant_expr) { + return MakeSqlErrorAt(ast_expr) + << clause_name << " expression must be constant"; + } + return absl::OkStatus(); } @@ -6522,6 +6771,7 @@ absl::Status Resolver::ResolveHavingModifier( std::unique_ptr resolved_expr; ZETASQL_RETURN_IF_ERROR(ResolveExpr(ast_having_modifier->expr(), expr_resolution_info, &resolved_expr)); + // TODO: Introduce a feature to support ARRAY type in HAVING // The HAVING MIN/MAX expression type must support ordering, and it cannot // be an array since MIN/MAX is currently undefined for arrays (even if // the array element supports MIN/MAX). @@ -6555,25 +6805,49 @@ absl::Status Resolver::ResolveHavingModifier( // Resolves a LimitOffsetScan. // If an OFFSET is not supplied, then the default value, 0, is used. +// It is an error to not supply a LIMIT. absl::Status Resolver::ResolveLimitOffsetScan( - const ASTLimitOffset* limit_offset, + const ASTLimitOffset* limit_offset, const NameScope* name_scope, std::unique_ptr* scan) { - ExprResolutionInfo expr_resolution_info(empty_name_scope_.get(), + ZETASQL_RET_CHECK(limit_offset->limit() != nullptr); + return ResolveLimitOffsetScan(limit_offset->limit(), limit_offset->offset(), + name_scope, scan); +} + +// Resolves a LimitOffsetScan. +// If an OFFSET is not supplied, then a default value of 0 is used. +// If a LIMIT is not supplied then a default value of kint64max is used. +absl::Status Resolver::ResolveLimitOffsetScan( + const ASTExpression* limit, const ASTExpression* offset, + const NameScope* name_scope, std::unique_ptr* scan) { + // LIMIT and OFFSET cannot reference the current name scope. Theoretically + // they can reference the parent scope. Practically this makes no difference + // than passing in an empty scope because LIMIT and OFFSET can currently only + // take constant expressions and correlated expressions are not that. We pass + // in the parent scope here to get a better error message, and allow easier + // migration when correlated column references are supported in the future. + ExprResolutionInfo expr_resolution_info(name_scope->previous_scope(), "LIMIT OFFSET"); // Resolve and validate the LIMIT. - ZETASQL_RET_CHECK(limit_offset->limit() != nullptr); std::unique_ptr limit_expr; - ZETASQL_RETURN_IF_ERROR(ResolveLimitOrOffsetExpr(limit_offset->limit(), - /*clause_name=*/"LIMIT", - &expr_resolution_info, &limit_expr)); + if (limit == nullptr) { + // There is no specific location that we can associate with the virtual + // literal so we set `ast_location` to nullptr. + limit_expr = MakeResolvedLiteral( + /*ast_location=*/nullptr, + Value::Int64(std::numeric_limits::max())); + } else { + ZETASQL_RETURN_IF_ERROR(ResolveLimitOrOffsetExpr( + limit, + /*clause_name=*/"LIMIT", &expr_resolution_info, &limit_expr)); + } // Resolve and validate the OFFSET. std::unique_ptr offset_expr; - if (limit_offset->offset() != nullptr) { + if (offset != nullptr) { ZETASQL_RETURN_IF_ERROR(ResolveLimitOrOffsetExpr( - limit_offset->offset(), /*clause_name=*/"OFFSET", &expr_resolution_info, - &offset_expr)); + offset, /*clause_name=*/"OFFSET", &expr_resolution_info, &offset_expr)); } const std::vector& column_list = (*scan)->column_list(); @@ -7280,7 +7554,15 @@ absl::Status Resolver::ResolvePivotClause( ast_pivot_clause->output_alias()->GetAsIdString(), ast_pivot_clause->output_alias(), *output_name_list)); } - + for (const auto& pivot_expr : pivot_expr_columns) { + ZETASQL_RET_CHECK(pivot_expr->Is()); + const auto& aggregate_func_call = + pivot_expr->GetAs(); + if (aggregate_func_call->function()->Is() || + aggregate_func_call->function()->Is()) { + analyzer_output_properties_.MarkRelevant(REWRITE_INLINE_SQL_UDAS); + } + } *output = MakeResolvedPivotScan( output_column_list, std::move(input_scan), std::move(group_by_list), std::move(pivot_expr_columns), std::move(resolved_for_expr), @@ -8578,6 +8860,7 @@ absl::Status Resolver::ResolveJoinRhs( bool expect_join_condition; const char* join_type_name = ""; // For error messages. ResolvedJoinScan::JoinType resolved_join_type; + bool has_using = false; switch (join->join_type()) { case ASTJoin::COMMA: // This is the only case without a "JOIN" keyword. @@ -8643,6 +8926,7 @@ absl::Status Resolver::ResolveJoinRhs( << "USING clause cannot be used with " << natural_str << join_type_name; } + has_using = true; std::vector> lhs_computed_columns; @@ -8673,6 +8957,7 @@ absl::Status Resolver::ResolveJoinRhs( << "ON clause cannot be used with " << natural_str << join_type_name; } + const std::unique_ptr on_scope( new NameScope(external_scope, name_list)); @@ -8694,14 +8979,15 @@ absl::Status Resolver::ResolveJoinRhs( *output_name_list = name_list; return AddScansForJoin(join, std::move(resolved_lhs), std::move(resolved_rhs), - resolved_join_type, std::move(join_condition), - std::move(computed_columns), output); + resolved_join_type, has_using, + std::move(join_condition), std::move(computed_columns), + output); } absl::Status Resolver::AddScansForJoin( const ASTJoin* join, std::unique_ptr resolved_lhs, std::unique_ptr resolved_rhs, - ResolvedJoinScan::JoinType resolved_join_type, + ResolvedJoinScan::JoinType resolved_join_type, bool has_using, std::unique_ptr join_condition, std::vector> computed_columns, std::unique_ptr* output_scan) { @@ -8709,7 +8995,7 @@ absl::Status Resolver::AddScansForJoin( resolved_lhs->column_list(), resolved_rhs->column_list()); std::unique_ptr resolved_join = MakeResolvedJoinScan( concat_columns, resolved_join_type, std::move(resolved_lhs), - std::move(resolved_rhs), std::move(join_condition)); + std::move(resolved_rhs), std::move(join_condition), has_using); // If we have a join_type keyword hint (e.g. HASH JOIN or LOOKUP JOIN), // add it on the front of hint_list, before any long-form hints. @@ -9367,8 +9653,7 @@ absl::StatusOr Resolver::MatchTVFSignature( input_arg_types.push_back(std::move(input_arg_type_or_status).value()); } - signature_match_result->set_allow_mismatch_message( - analyzer_options_.show_function_signature_mismatch_details()); + signature_match_result->set_allow_mismatch_message(true); if (!SignatureArgumentCountMatches( function_signature, static_cast(arg_locations->size()), @@ -9561,16 +9846,16 @@ absl::Status Resolver::GenerateTVFNotMatchError( const TableValuedFunction& tvf_catalog_entry, const std::string& tvf_name, const std::vector& input_arg_types, int signature_idx) { const ASTNode* ast_location = ast_tvf; - if (signature_match_result.tvf_bad_argument_index() != -1) { - ZETASQL_RET_CHECK_LT(signature_match_result.tvf_bad_argument_index(), + if (signature_match_result.bad_argument_index() != -1) { + ZETASQL_RET_CHECK_LT(signature_match_result.bad_argument_index(), arg_locations.size()); - ast_location = - arg_locations[signature_match_result.tvf_bad_argument_index()]; + ast_location = arg_locations[signature_match_result.bad_argument_index()]; } return MakeSqlErrorAt(ast_location) << tvf_catalog_entry.GetTVFSignatureErrorMessage( tvf_name, input_arg_types, signature_idx, - signature_match_result, language()); + signature_match_result, language(), + analyzer_options_.show_function_signature_mismatch_details()); } absl::StatusOr Resolver::ResolveTVFArg( @@ -9757,7 +10042,11 @@ absl::StatusOr Resolver::GetTVFArgType( if (resolved_tvf_arg.IsExpr()) { ZETASQL_ASSIGN_OR_RETURN(const ResolvedExpr* const expr, resolved_tvf_arg.GetExpr()); - input_arg_type = GetInputArgumentTypeForExpr(expr); + // We should not force the type for a pure NULL expr, as the output + // InputArgumentType is used for function signature matching and the final + // argument type is not determined yet. + input_arg_type = GetInputArgumentTypeForExpr( + expr, /*pick_default_type_for_untyped_expr=*/false); } else if (resolved_tvf_arg.IsScan()) { ZETASQL_ASSIGN_OR_RETURN(std::shared_ptr name_list, resolved_tvf_arg.GetNameList()); @@ -9945,65 +10234,262 @@ absl::Status Resolver::CoerceOrRearrangeTVFRelationArgColumns( return absl::OkStatus(); } -// Validates the given `unnest_expr` has the valid aliases and correct number of -// expressions. An SQL error will be returned if -// - `unnest_expr` contains more than one expression. -// - Any of the expressions in `unnest_expr` contains an alias. -// TODO: Update the checks once we are ready to support column -// aliases and multiple expressions in UNNEST. -static absl::Status ValidateUnnestAliases( - const ASTUnnestExpression* unnest_expr, +absl::Status Resolver::ValidateUnnestAliases( const ASTTablePathExpression* table_ref) { - ZETASQL_RET_CHECK_NE(unnest_expr, nullptr); ZETASQL_RET_CHECK_NE(table_ref, nullptr); + const ASTUnnestExpression* unnest_expr = table_ref->unnest_expr(); + ZETASQL_RET_CHECK_NE(unnest_expr, nullptr); + ZETASQL_RET_CHECK_GE(unnest_expr->expressions().size(), 1); + const ASTExpressionWithOptAlias* first_arg = unnest_expr->expressions(0); + + if (!language().LanguageFeatureEnabled(FEATURE_V_1_4_MULTIWAY_UNNEST)) { + if (unnest_expr->expressions().size() > 1) { + return MakeSqlErrorAt(unnest_expr->expressions(1)) + << "The UNNEST operator supports exactly one argument"; + } + if (first_arg->optional_alias() != nullptr) { + return MakeSqlErrorAt(first_arg->optional_alias()) + << "Argument alias is not supported in the UNNEST operator"; + } + } if (table_ref->alias() != nullptr) { // The unnest expression has legacy alias. No argument aliases are allowed // and there should be exactly one expression in UNNEST. if (unnest_expr->expressions().size() != 1) { return MakeSqlErrorAt(table_ref->alias()) - << "Table alias in UNNEST in FROM clause is not allowed when " - << "UNNEST contains multiple arguments"; + << "When 2 or more array arguments are supplied to UNNEST, " + "aliases for the element columns must be specified following " + "the argument inside the parenthesis"; } - if (unnest_expr->expressions()[0]->optional_alias() != nullptr) { + if (first_arg->optional_alias() != nullptr) { return MakeSqlErrorAt(table_ref->alias()) - << "Table alias in UNNEST in FROM clause is not allowed when " - << "arguments in UNNEST have alias"; - } - } - - if (unnest_expr->expressions().size() > 1) { - return MakeSqlErrorAt(unnest_expr->expressions()[1]) - << "Multiple arguments in UNNEST in FROM clause is not implemented"; - } - for (const ASTExpressionWithOptAlias* expr : unnest_expr->expressions()) { - if (expr->optional_alias() != nullptr) { - return MakeSqlErrorAt(expr->optional_alias()) - << "Argument alias in UNNEST in FROM clause is not implemented"; + << "Alias outside UNNEST is not allowed when the argument inside " + "the parenthesis has alias"; } } return absl::OkStatus(); } -absl::Status Resolver::ValidateArrayZipMode( - const ASTNamedArgument* array_zip_mode) const { +absl::StatusOr> +Resolver::ResolveArrayZipMode(const ASTUnnestExpression* unnest, + ExprResolutionInfo* info) { + const EnumType* array_zip_mode_type = types::ArrayZipModeEnumType(); + const ASTNamedArgument* array_zip_mode = unnest->array_zip_mode(); + std::unique_ptr resolved_mode; if (array_zip_mode == nullptr) { - // Ok to not specify an array zip mode. - return absl::OkStatus(); + // Ok to not specify an array zip mode. Supply a default 'PAD' mode if it's + // multiway UNNEST syntax. + if (unnest->expressions().size() > 1) { + // We should have already validated in `ValidateUnnestAliases` that when + // unnest contains more than 1 argument, the language feature is on. + ZETASQL_RET_CHECK( + language().LanguageFeatureEnabled(FEATURE_V_1_4_MULTIWAY_UNNEST)); + return MakeResolvedLiteralWithoutLocation( + Value::Enum(array_zip_mode_type, functions::ArrayZipEnums::PAD)); + } + return resolved_mode; } - if (IsNamedLambda(array_zip_mode)) { - return MakeSqlErrorAt(array_zip_mode) << "Array zip mode cannot be lambda"; + if (!language().LanguageFeatureEnabled(FEATURE_V_1_4_MULTIWAY_UNNEST)) { + return MakeSqlErrorAt(array_zip_mode) << "Argument `mode` is not supported"; } + static constexpr absl::string_view kArrayZipModeArgName = "mode"; if (!zetasql_base::CaseEqual(array_zip_mode->name()->GetAsStringView(), kArrayZipModeArgName)) { return MakeSqlErrorAt(array_zip_mode) << "Unsupported named argument `" - << array_zip_mode->name()->GetAsStringView() << "` in UNNEST"; + << array_zip_mode->name()->GetAsStringView() + << "` in UNNEST; use `mode` instead"; + } + + if (unnest->expressions().size() == 1) { + return MakeSqlErrorAt(array_zip_mode) + << "Argument `mode` is not allowed when UNNEST only has one array " + "argument"; + } + if (IsNamedLambda(array_zip_mode)) { + return MakeSqlErrorAt(array_zip_mode) << "Argument `mode` cannot be lambda"; + } + + ZETASQL_RETURN_IF_ERROR(ResolveExpr(array_zip_mode->expr(), info, &resolved_mode)); + + // See if we need to coerce the type. + if (resolved_mode->type()->Equals(array_zip_mode_type)) { + return resolved_mode; + } + auto make_error_msg = [&](absl::string_view target_type_name, + absl::string_view actual_type_name) { + return absl::Substitute( + "Named argument `mode` used in UNNEST should have type $0, but got " + "type $1", + target_type_name, actual_type_name); + }; + AnnotatedType annotated_target_type = {array_zip_mode_type, + /*annotation_map=*/nullptr}; + ZETASQL_RETURN_IF_ERROR(CoerceExprToType(array_zip_mode->expr(), + annotated_target_type, kImplicitCoercion, + make_error_msg, &resolved_mode)) + .With(LocationOverride(array_zip_mode->expr())); + return resolved_mode; +} + +// Returns a more precise parsed location for UNNEST argument alias when the +// node is a path expression. +static const ASTNode* GetInferredAliasLocation( + const ASTExpressionWithOptAlias* argument) { + if (!argument->expression()->Is()) { + return argument; + } + return argument->expression()->GetAsOrDie()->last_name(); +} + +// The alias could be returned empty. It will be allocated post-traversal of +// the resolved expr. +Resolver::UnnestArrayColumnAlias Resolver::GetArrayElementColumnAlias( + const ASTExpressionWithOptAlias* argument) { + if (argument->optional_alias() != nullptr) { + return {/*alias=*/argument->optional_alias()->GetAsIdString(), + /*alias_location=*/argument->optional_alias()}; + } + return {/*alias=*/GetAliasForExpression(argument->expression()), + /*alias_location=*/GetInferredAliasLocation(argument)}; +} + +absl::Status Resolver::ResolveArrayArgumentForExplicitUnnest( + const ASTExpressionWithOptAlias* argument, + UnnestArrayColumnAlias& arg_alias, ExprResolutionInfo* info, + std::vector& output_alias_list, + ResolvedColumnList& output_column_list, + std::shared_ptr& output_name_list, + std::vector>& resolved_array_expr_list, + std::vector& resolved_element_column_list) { + std::unique_ptr resolved_value_expr; + const absl::Status resolve_expr_status = + ResolveExpr(argument->expression(), info, &resolved_value_expr); + + // If resolving the expression failed, and it looked like a valid table + // name, then give a more helpful error message. + // TODO: b/315169608 - Find a better way to detect the desired error + if (resolve_expr_status.code() == absl::StatusCode::kInvalidArgument && + absl::StartsWith(resolve_expr_status.message(), "Unrecognized name: ") && + argument->expression()->node_kind() == AST_PATH_EXPRESSION) { + const ASTPathExpression* path_expr = + argument->expression()->GetAsOrDie(); + const Table* table = nullptr; + int num_names_consumed = 0; + const absl::Status find_status = catalog_->FindTableWithPathPrefix( + path_expr->ToIdentifierVector(), analyzer_options_.find_options(), + &num_names_consumed, &table); + + if (find_status.ok()) { + if (table != nullptr && num_names_consumed < path_expr->num_names()) { + return MakeSqlErrorAt(path_expr) + << "UNNEST cannot be applied on path expression with " + "non-correlated table name prefix " + << table->FullName(); + } + return MakeSqlErrorAt(path_expr) + << "UNNEST cannot be applied on a table: " + << path_expr->ToIdentifierPathString(); + } + if (find_status.code() != absl::StatusCode::kNotFound) { + ZETASQL_RETURN_IF_ERROR(find_status); + } + } + ZETASQL_RETURN_IF_ERROR(resolve_expr_status); // Return original error. + ZETASQL_RET_CHECK(resolved_value_expr != nullptr); + const Type* value_type = resolved_value_expr->type(); + ZETASQL_RET_CHECK(value_type != nullptr); + if (!value_type->IsArray()) { + return MakeSqlErrorAt(argument->expression()) + << "Values referenced in UNNEST must be arrays. " + << "UNNEST contains expression of type " + << value_type->ShortTypeName(product_mode()); } - return MakeSqlErrorAt(array_zip_mode) - << "The named argument `" << kArrayZipModeArgName - << "` used in UNNEST is not implemented"; + + // Compute alias if it's not provided in the user query or not inferrable from + // the original path expression. + if (arg_alias.alias.empty()) { + arg_alias.alias = AllocateUnnestName(); + } + + const AnnotationMap* element_annotation = nullptr; + if (resolved_value_expr->type_annotation_map() != nullptr) { + element_annotation = + resolved_value_expr->type_annotation_map()->AsArrayMap()->element(); + } + const ResolvedColumn array_element_column( + AllocateColumnId(), /*table_name=*/kArrayId, /*name=*/arg_alias.alias, + AnnotatedType(value_type->AsArray()->element_type(), element_annotation)); + output_alias_list.emplace_back(arg_alias); + output_column_list.emplace_back(array_element_column); + resolved_array_expr_list.emplace_back(std::move(resolved_value_expr)); + resolved_element_column_list.emplace_back(array_element_column); + + absl::Status name_list_update_status = output_name_list->AddValueTableColumn( + arg_alias.alias, array_element_column, arg_alias.alias_location); + if (name_list_update_status.code() == absl::StatusCode::kInvalidArgument && + absl::StartsWith(name_list_update_status.message(), "Duplicate alias")) { + return MakeSqlErrorAt(arg_alias.alias_location) + << "Duplicate value table name `" << arg_alias.alias + << "` found in UNNEST is not allowed"; + } + return name_list_update_status; +} + +absl::Status Resolver::ResolveUnnest( + const ASTTablePathExpression* table_ref, ExprResolutionInfo* info, + std::vector& output_alias_list, + ResolvedColumnList& output_column_list, + std::shared_ptr& output_name_list, + std::vector>& resolved_array_expr_list, + std::vector& resolved_element_column_list) { + const ASTUnnestExpression* unnest = table_ref->unnest_expr(); + ZETASQL_RET_CHECK(unnest != nullptr); + + if (unnest->expressions().size() > 1) { + // Mark multiway UNNEST rewriter. + analyzer_output_properties_.MarkRelevant(REWRITE_MULTIWAY_UNNEST); + for (const ASTExpressionWithOptAlias* argument : unnest->expressions()) { + UnnestArrayColumnAlias arg_alias = GetArrayElementColumnAlias(argument); + ZETASQL_RETURN_IF_ERROR(ResolveArrayArgumentForExplicitUnnest( + argument, arg_alias, info, output_alias_list, output_column_list, + output_name_list, resolved_array_expr_list, + resolved_element_column_list)); + } + return absl::OkStatus(); + } + + ZETASQL_RET_CHECK(unnest->expressions().size() == 1); + UnnestArrayColumnAlias arg_alias; + // For the singleton UNNEST case, we respect alias outside the explicit UNNEST + // parenthesis for backward compatibility. + if (table_ref->alias() != nullptr) { + arg_alias.alias = table_ref->alias()->GetAsIdString(); + arg_alias.alias_location = table_ref->alias(); + } else { + // Point alias location to UNNEST by default for backward compatibility. + arg_alias.alias_location = table_ref; + if (language().LanguageFeatureEnabled( + FEATURE_V_1_4_SINGLETON_UNNEST_INFERS_ALIAS)) { + arg_alias.alias = + GetAliasForExpression(unnest->expressions(0)->expression()); + arg_alias.alias_location = + GetInferredAliasLocation(unnest->expressions(0)); + } + if (language().LanguageFeatureEnabled(FEATURE_V_1_4_MULTIWAY_UNNEST) && + unnest->expressions(0)->optional_alias() != nullptr) { + arg_alias.alias = + unnest->expressions(0)->optional_alias()->GetAsIdString(); + arg_alias.alias_location = unnest->expressions(0)->optional_alias(); + } + } + + return ResolveArrayArgumentForExplicitUnnest( + unnest->expressions(0), arg_alias, info, output_alias_list, + output_column_list, output_name_list, resolved_array_expr_list, + resolved_element_column_list); } absl::Status Resolver::ResolveArrayScan( @@ -10031,6 +10517,34 @@ absl::Status Resolver::ResolveArrayScan( // These variables get set in either branch below. std::unique_ptr resolved_value_expr; const Type* value_type = nullptr; + // `mode` argument will only be set and used in explicit UNNEST syntax. + std::unique_ptr resolved_zip_mode; + ResolvedColumnList output_column_list; + if (*resolved_input_scan != nullptr && include_lhs_name_list) { + output_column_list = (*resolved_input_scan)->column_list(); + } + + // Build a name list for correlated names. + // TODO: b/315045184 - clean up shared_ptr usage. + std::shared_ptr name_list_lhs(new NameList); + if (name_list_input != nullptr) { + ZETASQL_RETURN_IF_ERROR(name_list_lhs->MergeFrom(*name_list_input, table_ref)); + } + + // Array aliases are always treated as explicit range variables, + // even if computed. + // This allows + // SELECT t.key, array1, array2 + // FROM Table t, t.array1, array1.array2; + // `array1` and `array2` are also available implicitly as columns on the + // preceding scan, but the array scan makes them implicit. + std::shared_ptr name_list(new NameList); + // `name_list_rhs` is only used if USING clause is present. + std::shared_ptr name_list_rhs(new NameList); + + std::vector> resolved_array_expr_list; + std::vector resolved_element_column_list; + std::vector output_alias_list; if (table_ref->path_expr() != nullptr) { ZETASQL_RET_CHECK(path_expr.has_value()); @@ -10039,16 +10553,26 @@ absl::Status Resolver::ResolveArrayScan( // and shouldn't have made it into ResolveArrayScan. ZETASQL_RET_CHECK_GE(path_expr->num_names(), 2); + // Path expression in the FROM clause only resolves against non-aggregate + // and non-analytic scope, so no aggregate function or window function is + // allowed here. + ExprResolutionInfo no_aggregation(scope, "FROM clause"); + FlattenState::Restorer restorer; + if (language().LanguageFeatureEnabled( + FEATURE_V_1_3_UNNEST_AND_FLATTEN_ARRAYS)) { + no_aggregation.flatten_state.set_can_flatten(true, &restorer); + } + NameTarget first_target; ZETASQL_RET_CHECK(scope->LookupName(path_expr->GetFirstIdString(), &first_target)); switch (first_target.kind()) { case NameTarget::EXPLICIT_COLUMN: - case NameTarget::RANGE_VARIABLE: + case NameTarget::RANGE_VARIABLE: { // These are the allowed cases. break; - - case NameTarget::IMPLICIT_COLUMN: + } + case NameTarget::IMPLICIT_COLUMN: { // We disallowed this because the results were very confusing. // FROM TableName, ColumnName.array_value // is not allowed. @@ -10065,57 +10589,46 @@ absl::Status Resolver::ResolveArrayScan( << path_expr->GetFirstIdString() << " refers to a column and must be qualified with a table " "name."; - - case NameTarget::FIELD_OF: + } + case NameTarget::FIELD_OF: { return MakeSqlErrorAtPoint(path_expr->GetParseLocationRange().start()) << "Aliases referenced in the from clause must refer to " "preceding scans, and cannot refer to columns or fields on " "those scans. " << path_expr->GetFirstIdString() << " refers to a field and must be qualified with a table name."; - - case NameTarget::ACCESS_ERROR: - // This error message is very specific, for the only known case where - // this error occurs (a correlated array scan that is not - // visible or valid to access in the outer query). For example: + } + case NameTarget::ACCESS_ERROR: { + PathExpressionSpan path_expr_span(path_expr.value()); + ZETASQL_RETURN_IF_ERROR(ResolvePathExpressionAsExpression( + path_expr_span, &no_aggregation, ResolvedStatement::READ, + &resolved_value_expr)); + ZETASQL_RET_CHECK(path_expr_span.num_names() > 1); + // This is the allowed case when the whole path can be resolved and + // matched to a post-group by column, despite that the first name is + // ACCESS_ERROR. For example: // - // select tt.key as key, - // IF(EXISTS(select * - // from tt.KitchenSink.repeated_int32_val), - // count(distinct(tt.key)), - // 0) + // select tt.KitchenSink.repeated_int32_val, + // (select sum(e) + // from tt.KitchenSink.repeated_int32_val), -- <- this ref // from TestTable tt - // group by tt.key + // group by 1 // - // In this query, the reference to tt.KitchenSink.repeated_int32_val - // in the EXISTS subquery is invalid because the outer query contains - // GROUP BY and the array is not valid to access post-GROUP BY. - // TODO: It would be nice to say either 'GROUP BY' or - // 'DISTINCT' in this message, not both. But we currently do not - // have context from the outer query to know which one is correct, - // so for now we say 'GROUP BY or DISTINCT'. Fix this. - return MakeSqlErrorAtPoint(path_expr->GetParseLocationRange().start()) - << "Correlated aliases referenced in the from clause must refer " - "to arrays that are valid to access from the outer query, " - "but " - << path_expr->GetFirstIdString() - << " refers to an array that is not valid to access after GROUP" - << " BY or DISTINCT in the outer query"; - - case NameTarget::AMBIGUOUS: + // In this query, the reference to tt.KitchenSink.Repeated_int32_val + // in the subquery is okay because tt.KitchenSink.Repeated_int32_val + // is a group by key and thus a post-group by column. Any other + // repeated field in KitchenSink would trigger the error above + // because KitchenSink is not generally visible post GROUP BY. + break; + } + case NameTarget::AMBIGUOUS: { // This can happen if the array name is ambiguous (resolves to a name // in more than one table previously in the FROM clause). return MakeSqlErrorAtPoint(path_expr->GetParseLocationRange().start()) << path_expr->GetFirstIdString() << " ambiguously references multiple columns in previous FROM" << " clause tables"; - } - - ExprResolutionInfo no_aggregation(scope, "FROM clause"); - FlattenState::Restorer restorer; - if (language().LanguageFeatureEnabled( - FEATURE_V_1_3_UNNEST_AND_FLATTEN_ARRAYS)) { - no_aggregation.flatten_state.set_can_flatten(true, &restorer); + } } // Now we know we have an identifier path starting with a scan. @@ -10125,19 +10638,47 @@ absl::Status Resolver::ResolveArrayScan( path_expr_span, &no_aggregation, ResolvedStatement::READ, &resolved_value_expr)); + ZETASQL_RET_CHECK(resolved_value_expr != nullptr); value_type = resolved_value_expr->type(); + ZETASQL_RET_CHECK(value_type != nullptr); if (!value_type->IsArray()) { return MakeSqlErrorAtPoint(path_expr->GetParseLocationRange().start()) << "Values referenced in FROM clause must be arrays. " << path_expr->ToIdentifierPathString() << " has type " << value_type->ShortTypeName(product_mode()); } + + IdString alias; + const ASTNode* alias_location; + if (table_ref->alias() != nullptr) { + alias = table_ref->alias()->GetAsIdString(); + alias_location = table_ref->alias(); + } else { + alias = GetAliasForExpression(table_ref->path_expr()); + alias_location = table_ref; + } + ZETASQL_RET_CHECK(!alias.empty()); + output_alias_list.emplace_back( + UnnestArrayColumnAlias{alias, alias_location}); + + const AnnotationMap* element_annotation = nullptr; + if (resolved_value_expr->type_annotation_map() != nullptr) { + element_annotation = + resolved_value_expr->type_annotation_map()->AsArrayMap()->element(); + } + const ResolvedColumn array_element_column( + AllocateColumnId(), /*table_name=*/kArrayId, /*name=*/alias, + AnnotatedType(value_type->AsArray()->element_type(), + element_annotation)); + + ZETASQL_RETURN_IF_ERROR(name_list_rhs->AddValueTableColumn( + alias, array_element_column, alias_location)); + output_column_list.emplace_back(array_element_column); + resolved_array_expr_list.push_back(std::move(resolved_value_expr)); + resolved_element_column_list.push_back(array_element_column); } else { ZETASQL_RET_CHECK(table_ref->unnest_expr() != nullptr); - const ASTUnnestExpression* unnest = table_ref->unnest_expr(); - - ZETASQL_RETURN_IF_ERROR(ValidateUnnestAliases(unnest, table_ref)); - ZETASQL_RETURN_IF_ERROR(ValidateArrayZipMode(unnest->array_zip_mode())); + ZETASQL_RETURN_IF_ERROR(ValidateUnnestAliases(table_ref)); ExprResolutionInfo info(scope, "UNNEST"); FlattenState::Restorer restorer; @@ -10145,107 +10686,16 @@ absl::Status Resolver::ResolveArrayScan( FEATURE_V_1_3_UNNEST_AND_FLATTEN_ARRAYS)) { info.flatten_state.set_can_flatten(true, &restorer); } - const absl::Status resolve_expr_status = ResolveExpr( - unnest->expressions()[0]->expression(), &info, &resolved_value_expr); - - // If resolving the expression failed, and it looked like a valid table - // name, then give a more helpful error message. - if (!resolve_expr_status.ok() && - absl::StartsWith(resolve_expr_status.message(), - "Unrecognized name: ") && - unnest->expressions()[0]->expression()->node_kind() == - AST_PATH_EXPRESSION) { - const ASTPathExpression* path_expr = - unnest->expressions()[0] - ->expression() - ->GetAsOrDie(); - const Table* table = nullptr; - int num_names_consumed = 0; - const absl::Status find_status = catalog_->FindTableWithPathPrefix( - path_expr->ToIdentifierVector(), analyzer_options_.find_options(), - &num_names_consumed, &table); - - if (find_status.ok()) { - if (table != nullptr && num_names_consumed < path_expr->num_names()) { - return MakeSqlErrorAt(path_expr) - << "UNNEST cannot be applied on path expression with " - "non-correlated table name prefix " - << table->FullName(); - } - return MakeSqlErrorAt(path_expr) - << "UNNEST cannot be applied on a table: " - << path_expr->ToIdentifierPathString(); - } - if (find_status.code() != absl::StatusCode::kNotFound) { - ZETASQL_RETURN_IF_ERROR(find_status); - } - } - ZETASQL_RETURN_IF_ERROR(resolve_expr_status); // Return original error. - value_type = resolved_value_expr->type(); - if (!value_type->IsArray()) { - return MakeSqlErrorAt(unnest->expressions()[0]->expression()) - << "Values referenced in UNNEST must be arrays. " - << "UNNEST contains expression of type " - << value_type->ShortTypeName(product_mode()); - } - } - ZETASQL_RET_CHECK(resolved_value_expr != nullptr); - ZETASQL_RET_CHECK(value_type != nullptr); - ZETASQL_RET_CHECK(value_type->IsArray()); - - IdString alias; - const ASTNode* alias_location; - if (table_ref->alias() != nullptr) { - alias = table_ref->alias()->GetAsIdString(); - alias_location = table_ref->alias(); - } else { - if (table_ref->path_expr() != nullptr) { - alias = GetAliasForExpression(table_ref->path_expr()); - } else { - alias = IdString(); - if (language().LanguageFeatureEnabled( - FEATURE_V_1_4_SINGLETON_UNNEST_INFERS_ALIAS) && - table_ref->unnest_expr() != nullptr) { - alias = GetAliasForExpression( - table_ref->unnest_expr()->expressions()[0]->expression()); - } - if (alias.empty()) { - alias = AllocateUnnestName(); - } - } - alias_location = table_ref; - } - ZETASQL_RET_CHECK(!alias.empty()); - - const AnnotationMap* element_annotation = nullptr; - if (resolved_value_expr->type_annotation_map() != nullptr) { - element_annotation = - resolved_value_expr->type_annotation_map()->AsArrayMap()->element(); - } - const ResolvedColumn array_element_column( - AllocateColumnId(), /*table_name=*/kArrayId, /*name=*/alias, - AnnotatedType(value_type->AsArray()->element_type(), element_annotation)); - - ResolvedColumnList output_column_list; - if (*resolved_input_scan != nullptr && include_lhs_name_list) { - output_column_list = (*resolved_input_scan)->column_list(); - } - output_column_list.emplace_back(array_element_column); - std::shared_ptr name_list_lhs(new NameList); - if (name_list_input != nullptr) { - ZETASQL_RETURN_IF_ERROR(name_list_lhs->MergeFrom(*name_list_input, table_ref)); + const ASTUnnestExpression* unnest = table_ref->unnest_expr(); + ZETASQL_RETURN_IF_ERROR(ResolveUnnest( + table_ref, &info, output_alias_list, output_column_list, name_list_rhs, + resolved_array_expr_list, resolved_element_column_list)); + ZETASQL_RET_CHECK_EQ(output_alias_list.size(), resolved_element_column_list.size()); + ZETASQL_RET_CHECK_EQ(resolved_array_expr_list.size(), + resolved_element_column_list.size()); + ZETASQL_ASSIGN_OR_RETURN(resolved_zip_mode, ResolveArrayZipMode(unnest, &info)); } - // Array aliases are always treated as explicit range variables, - // even if computed. - // This allows - // SELECT t.key, array1, array2 - // FROM Table t, t.Column.array1, array1.array2; - // `array1` and `array2` are also available implicitly as columns on - // the preceding scan, but the array scan makes them implicit. - std::shared_ptr name_list_rhs(new NameList); - ZETASQL_RETURN_IF_ERROR(name_list_rhs->AddValueTableColumn( - alias, array_element_column, alias_location)); // Resolve WITH OFFSET if present. std::unique_ptr array_position_column; @@ -10274,7 +10724,6 @@ absl::Status Resolver::ResolveArrayScan( : table_ref->with_offset())); } - std::shared_ptr name_list(new NameList); std::unique_ptr resolved_condition; std::vector> computed_columns; @@ -10311,8 +10760,12 @@ absl::Status Resolver::ResolveArrayScan( ZETASQL_RETURN_IF_ERROR(name_list->MergeFrom(*name_list_lhs, table_ref)); // We explicitly add the array element and offset columns to the name_list // instead of merging name_list_rhs to get the exact error location. - ZETASQL_RETURN_IF_ERROR(name_list->AddValueTableColumn(alias, array_element_column, - alias_location)); + for (int i = 0; i < resolved_element_column_list.size(); ++i) { + ZETASQL_RETURN_IF_ERROR(name_list->AddValueTableColumn( + output_alias_list[i].alias, resolved_element_column_list[i], + output_alias_list[i].alias_location)); + } + if (array_position_column != nullptr) { const ASTAlias* with_offset_alias = table_ref->with_offset()->alias(); ZETASQL_RETURN_IF_ERROR(name_list->AddValueTableColumn( @@ -10335,12 +10788,6 @@ absl::Status Resolver::ResolveArrayScan( } } - std::vector> resolved_array_expr_list = - MakeNodeVector(std::move(resolved_value_expr)); - std::vector resolved_element_column_list = { - array_element_column}; - // `mode` argument will only be set and used in explicit UNNEST syntax. - std::unique_ptr resolved_zip_mode; std::unique_ptr resolved_array_scan = MakeResolvedArrayScan(output_column_list, std::move(*resolved_input_scan), std::move(resolved_array_expr_list), @@ -10858,8 +11305,8 @@ absl::Status Resolver::CoerceQueryStatementResultToTypes( const std::vector& column_list = (*output_name_list)->columns(); if (types.size() != column_list.size()) { return MakeSqlErrorAt(ast_node) - << "Query has unexpected number of output columns, " - << "expected " << types.size() << ", but had " << column_list.size(); + << "Query has unexpected number of output columns, " << "expected " + << types.size() << ", but had " << column_list.size(); } ZETASQL_RET_CHECK((*scan)->node_kind() == RESOLVED_PROJECT_SCAN); ResolvedColumnList casted_column_list; @@ -10910,9 +11357,12 @@ absl::Status Resolver::CoerceQueryStatementResultToTypes( << "type for a query"; } SignatureMatchResult unused; - if (!coercer_.AssignableTo(GetInputArgumentTypeForExpr(column_expr), - target_type, - /* is_explicit = */ false, &unused)) { + if (!coercer_.AssignableTo( + GetInputArgumentTypeForExpr( + column_expr, + /*pick_default_type_for_untyped_expr=*/false), + target_type, + /* is_explicit = */ false, &unused)) { return MakeSqlErrorAt(ast_node) << "Query column " << (i + 1) << " has type " << result_type->ShortTypeName(product_mode()) diff --git a/zetasql/analyzer/resolver_stmt.cc b/zetasql/analyzer/resolver_stmt.cc index 36a613607..b5e15b445 100644 --- a/zetasql/analyzer/resolver_stmt.cc +++ b/zetasql/analyzer/resolver_stmt.cc @@ -141,6 +141,188 @@ STATIC_IDSTRING(kQueryId, "$query"); STATIC_IDSTRING(kViewId, "$view"); STATIC_IDSTRING(kCreateAsCastId, "$create_as_cast"); +// Generates an error status if 'type' is or contains in its nesting structure +// a Type for which SupportsReturning is false. +static absl::Status ValidateTypeIsReturnable( + const Type* type, const LanguageOptions& language_options, + const ASTNode* error_node) { + std::string type_description; + if (type->SupportsReturning(language_options, + /*type_description=*/&type_description)) { + return absl::OkStatus(); + } + return MakeSqlErrorAt(error_node) << "Returning expressions of type " + << type_description << " is not allowed"; +} + +template +static absl::Status ValidateColumnListIsReturnable( + const std::vector& output_list, + const LanguageOptions& language_options, const ASTNode* error_node) { + for (const auto& element : output_list) { + ZETASQL_RETURN_IF_ERROR(ValidateTypeIsReturnable(element->column().type(), + language_options, error_node)); + } + return absl::OkStatus(); +} + +// Generates an error status if the statement represents a return boundary +// and has output columns that contain a Type for which SupportsReturning +// is false. +// +// Any new ResolvedStatement must be added here and should invoke +// `ValidateTypeIsReturnable` if it represents a return boundary with output +// columns. +static absl::Status ValidateStatementIsReturnable( + const ResolvedStatement* statement, const LanguageOptions& language_options, + const ASTNode* error_node) { + auto CheckOutputColumns = [&](T& output_list) -> absl::Status { + ZETASQL_RETURN_IF_ERROR(ValidateColumnListIsReturnable( + output_list, language_options, error_node)); + return absl::OkStatus(); + }; + switch (statement->node_kind()) { + case RESOLVED_QUERY_STMT: + ZETASQL_RETURN_IF_ERROR(CheckOutputColumns( + statement->GetAs()->output_column_list())); + break; + case RESOLVED_CREATE_VIEW_STMT: + ZETASQL_RETURN_IF_ERROR(CheckOutputColumns( + statement->GetAs()->output_column_list())); + break; + case RESOLVED_CREATE_MATERIALIZED_VIEW_STMT: + ZETASQL_RETURN_IF_ERROR(CheckOutputColumns( + statement->GetAs() + ->output_column_list())); + break; + case RESOLVED_CREATE_APPROX_VIEW_STMT: + ZETASQL_RETURN_IF_ERROR( + CheckOutputColumns(statement->GetAs() + ->output_column_list())); + break; + case RESOLVED_CREATE_MODEL_STMT: + ZETASQL_RETURN_IF_ERROR(CheckOutputColumns( + statement->GetAs()->output_column_list())); + break; + case RESOLVED_CREATE_TABLE_FUNCTION_STMT: + ZETASQL_RETURN_IF_ERROR( + CheckOutputColumns(statement->GetAs() + ->output_column_list())); + break; + case RESOLVED_CREATE_TABLE_STMT: + ZETASQL_RETURN_IF_ERROR( + CheckOutputColumns(statement->GetAs() + ->column_definition_list())); + break; + case RESOLVED_CREATE_TABLE_AS_SELECT_STMT: + ZETASQL_RETURN_IF_ERROR( + CheckOutputColumns(statement->GetAs() + ->column_definition_list())); + break; + case RESOLVED_INSERT_STMT: + if (statement->GetAs()->returning() != nullptr) { + ZETASQL_RETURN_IF_ERROR( + CheckOutputColumns(statement->GetAs() + ->returning() + ->output_column_list())); + } + break; + case RESOLVED_DELETE_STMT: + if (statement->GetAs()->returning() != nullptr) { + ZETASQL_RETURN_IF_ERROR( + CheckOutputColumns(statement->GetAs() + ->returning() + ->output_column_list())); + } + break; + case RESOLVED_UPDATE_STMT: + if (statement->GetAs()->returning() != nullptr) { + ZETASQL_RETURN_IF_ERROR( + CheckOutputColumns(statement->GetAs() + ->returning() + ->output_column_list())); + } + break; + case RESOLVED_CREATE_FUNCTION_STMT: + if (statement->GetAs()->return_type() != + nullptr) { + ZETASQL_RETURN_IF_ERROR(ValidateTypeIsReturnable( + statement->GetAs()->return_type(), + language_options, error_node)); + } + break; + case RESOLVED_EXPLAIN_STMT: + case RESOLVED_CREATE_DATABASE_STMT: + case RESOLVED_CREATE_INDEX_STMT: + case RESOLVED_CREATE_SCHEMA_STMT: + case RESOLVED_CREATE_EXTERNAL_SCHEMA_STMT: + case RESOLVED_CREATE_SNAPSHOT_TABLE_STMT: + case RESOLVED_CREATE_EXTERNAL_TABLE_STMT: + case RESOLVED_CREATE_PRIVILEGE_RESTRICTION_STMT: + case RESOLVED_ALTER_PRIVILEGE_RESTRICTION_STMT: + case RESOLVED_CREATE_ROW_ACCESS_POLICY_STMT: + case RESOLVED_CREATE_CONSTANT_STMT: + case RESOLVED_CREATE_PROCEDURE_STMT: + case RESOLVED_CLONE_DATA_STMT: + case RESOLVED_EXPORT_DATA_STMT: + case RESOLVED_EXPORT_MODEL_STMT: + case RESOLVED_EXPORT_METADATA_STMT: + case RESOLVED_CALL_STMT: + case RESOLVED_DEFINE_TABLE_STMT: + case RESOLVED_DESCRIBE_STMT: + case RESOLVED_SHOW_STMT: + case RESOLVED_BEGIN_STMT: + case RESOLVED_SET_TRANSACTION_STMT: + case RESOLVED_COMMIT_STMT: + case RESOLVED_ROLLBACK_STMT: + case RESOLVED_START_BATCH_STMT: + case RESOLVED_RUN_BATCH_STMT: + case RESOLVED_ABORT_BATCH_STMT: + case RESOLVED_UNDROP_STMT: + case RESOLVED_DROP_STMT: + case RESOLVED_DROP_MATERIALIZED_VIEW_STMT: + case RESOLVED_DROP_FUNCTION_STMT: + case RESOLVED_DROP_SNAPSHOT_TABLE_STMT: + case RESOLVED_DROP_TABLE_FUNCTION_STMT: + case RESOLVED_DROP_PRIVILEGE_RESTRICTION_STMT: + case RESOLVED_DROP_ROW_ACCESS_POLICY_STMT: + case RESOLVED_DROP_INDEX_STMT: + case RESOLVED_GRANT_STMT: + case RESOLVED_REVOKE_STMT: + case RESOLVED_MERGE_STMT: + case RESOLVED_TRUNCATE_STMT: + case RESOLVED_ALTER_ROW_ACCESS_POLICY_STMT: + case RESOLVED_ALTER_ALL_ROW_ACCESS_POLICIES_STMT: + case RESOLVED_ALTER_MATERIALIZED_VIEW_STMT: + case RESOLVED_ALTER_APPROX_VIEW_STMT: + case RESOLVED_ALTER_MODEL_STMT: + case RESOLVED_ALTER_TABLE_SET_OPTIONS_STMT: + case RESOLVED_ALTER_DATABASE_STMT: + case RESOLVED_ALTER_SCHEMA_STMT: + case RESOLVED_ALTER_EXTERNAL_SCHEMA_STMT: + case RESOLVED_ALTER_TABLE_STMT: + case RESOLVED_ALTER_VIEW_STMT: + case RESOLVED_RENAME_STMT: + case RESOLVED_IMPORT_STMT: + case RESOLVED_MODULE_STMT: + case RESOLVED_ANALYZE_STMT: + case RESOLVED_ASSERT_STMT: + case RESOLVED_ASSIGNMENT_STMT: + case RESOLVED_EXECUTE_IMMEDIATE_STMT: + case RESOLVED_CREATE_ENTITY_STMT: + case RESOLVED_ALTER_ENTITY_STMT: + case RESOLVED_AUX_LOAD_DATA_STMT: + break; + default: + ZETASQL_RET_CHECK_FAIL() << "Unhandled statement type in " + "ValidateStatementIsReturnable: " + << statement->node_kind_string() + << ". Did you add a new ResolvedStatement and forget to " + "handle its output validation?"; + } + return absl::OkStatus(); +} + // NOLINTBEGIN(readability/fn_size) absl::Status Resolver::ResolveStatement( absl::string_view sql, const ASTStatement* statement, @@ -635,6 +817,17 @@ absl::Status Resolver::ResolveStatement( statement->GetAsOrDie(), &stmt)); } break; + case AST_ALTER_EXTERNAL_SCHEMA_STATEMENT: + if (language().SupportsStatementKind( + RESOLVED_ALTER_EXTERNAL_SCHEMA_STMT)) { + if (!language().LanguageFeatureEnabled(FEATURE_EXTERNAL_SCHEMA_DDL)) { + return MakeSqlErrorAt(statement) + << "ALTER EXTERNAL SCHEMA is not supported"; + } + ZETASQL_RETURN_IF_ERROR(ResolveAlterExternalSchemaStatement( + statement->GetAsOrDie(), &stmt)); + } + break; case AST_ALTER_TABLE_STATEMENT: if (language().SupportsStatementKind( RESOLVED_ALTER_TABLE_SET_OPTIONS_STMT) || @@ -708,6 +901,17 @@ absl::Status Resolver::ResolveStatement( statement->GetAsOrDie(), &stmt)); } break; + case AST_CREATE_EXTERNAL_SCHEMA_STATEMENT: + if (language().SupportsStatementKind( + RESOLVED_CREATE_EXTERNAL_SCHEMA_STMT)) { + if (!language().LanguageFeatureEnabled(FEATURE_EXTERNAL_SCHEMA_DDL)) { + return MakeSqlErrorAt(statement) + << "CREATE EXTERNAL SCHEMA is not supported"; + } + ZETASQL_RETURN_IF_ERROR(ResolveCreateExternalSchemaStatement( + statement->GetAsOrDie(), &stmt)); + } + break; case AST_ANALYZE_STATEMENT: if (language().SupportsStatementKind(RESOLVED_ANALYZE_STMT)) { ZETASQL_RETURN_IF_ERROR(ResolveAnalyzeStatement( @@ -763,6 +967,8 @@ absl::Status Resolver::ResolveStatement( ZETASQL_RETURN_IF_ERROR(PruneColumnLists(stmt.get())); ZETASQL_RETURN_IF_ERROR(SetColumnAccessList(stmt.get())); + ZETASQL_RETURN_IF_ERROR( + ValidateStatementIsReturnable(stmt.get(), language(), statement)); *output = std::move(stmt); return absl::OkStatus(); } @@ -1243,8 +1449,8 @@ absl::Status Resolver::ResolveColumnDefaultExpression( ast_default_expression_range.end().GetByteOffset() - ast_default_expression_range.start().GetByteOffset()); - *default_value = MakeResolvedColumnDefaultValue( - std::move(resolved_expression), std::string(sql)); + *default_value = + MakeResolvedColumnDefaultValue(std::move(resolved_expression), sql); return absl::OkStatus(); } @@ -1510,8 +1716,9 @@ absl::Status Resolver::ResolveColumnSchema( } ZETASQL_RETURN_IF_ERROR(ValidateColumnAttributeList(attributes)); std::vector> resolved_column_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(schema->options_list(), &resolved_column_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(schema->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_column_options)); std::vector> child_annotation_list; const bool enable_nested_annotations = @@ -1743,6 +1950,7 @@ absl::Status Resolver::ResolvePrimaryKey( } std::vector> options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_primary_key->options_list(), + /*allow_alter_array_operators=*/false, &options)); std::string constraint_name; @@ -1808,7 +2016,7 @@ absl::Status Resolver::ResolvePrimaryKey( absl::Status Resolver::ResolveForeignKeys( const absl::Span ast_table_elements, const ColumnIndexMap& column_indexes, - const std::vector>& + absl::Span> column_definitions, std::set* constraint_names, std::vector>* foreign_key_list) { @@ -1910,6 +2118,7 @@ absl::Status Resolver::ResolveForeignKeyTableConstraint( // OPTIONS options. std::vector> options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_foreign_key->options_list(), + /*allow_alter_array_operators=*/false, &options)); for (auto& option : options) { foreign_key->add_option_list(std::move(option)); @@ -2094,6 +2303,7 @@ absl::Status Resolver::ResolveCheckConstraints( } std::vector> resolved_options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_check_constraint->options_list(), + /*allow_alter_array_operators=*/false, &resolved_options)); auto resolved_check_constraint = MakeResolvedCheckConstraint( @@ -2167,8 +2377,9 @@ absl::Status Resolver::ResolveCreateDatabaseStatement( const ASTCreateDatabaseStatement* ast_statement, std::unique_ptr* output) { std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); const std::vector database_name = ast_statement->name()->ToIdentifierVector(); *output = MakeResolvedCreateDatabaseStmt(database_name, @@ -2194,11 +2405,46 @@ absl::Status Resolver::ResolveCreateSchemaStatement( } ZETASQL_RETURN_IF_ERROR(ResolveCreateStatementOptions(ast_statement, "CREATE SCHEMA", &create_scope, &create_mode)); - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); *output = MakeResolvedCreateSchemaStmt( ast_statement->name()->ToIdentifierVector(), create_scope, create_mode, - std::move(resolved_collation), std::move(resolved_options)); + std::move(resolved_options), std::move(resolved_collation)); + return absl::OkStatus(); +} + +absl::Status Resolver::ResolveCreateExternalSchemaStatement( + const ASTCreateExternalSchemaStatement* ast_statement, + std::unique_ptr* output) { + ResolvedCreateStatement::CreateScope create_scope; + ResolvedCreateStatement::CreateMode create_mode; + std::vector> resolved_options; + + ZETASQL_RETURN_IF_ERROR(ResolveCreateStatementOptions( + ast_statement, "CREATE EXTERNAL SCHEMA", &create_scope, &create_mode)); + + // Resolve connection. + const ASTWithConnectionClause* with_connection_clause = + ast_statement->with_connection_clause(); + // External schema requires a connection to be set; this should be enforced by + // the parser. + ZETASQL_RET_CHECK(with_connection_clause != nullptr); + std::unique_ptr resolved_connection; + ZETASQL_RETURN_IF_ERROR(ResolveConnection(ast_statement->with_connection_clause() + ->connection_clause() + ->connection_path(), + &resolved_connection)); + + // Engine-specific options are required for external schema (as they are how + // the source of the external schema is provided) + ZETASQL_RET_CHECK(ast_statement->options_list() != nullptr); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); + *output = MakeResolvedCreateExternalSchemaStmt( + ast_statement->name()->ToIdentifierVector(), create_scope, create_mode, + std::move(resolved_options), std::move(resolved_connection)); return absl::OkStatus(); } @@ -2377,8 +2623,9 @@ absl::Status Resolver::ResolveCreateIndexStatement( ZETASQL_RETURN_IF_ERROR(ResolveCreateStatementOptions( ast_statement, "CREATE INDEX", &create_scope, &create_mode)); std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); const std::vector index_name = ast_statement->name()->ToIdentifierVector(); @@ -2587,9 +2834,9 @@ absl::Status Resolver::ResolveCreateModelStatement( if (is_remote) { // Remote model. - if (query != nullptr) { + if (query != nullptr && input_output_clause != nullptr) { return MakeSqlErrorAt(query) - << "The AS SELECT clause cannot be used with REMOTE"; + << "The AS SELECT clause cannot be used with INPUT and OUTPUT"; } if (aliased_query_list != nullptr) { return MakeSqlErrorAt(aliased_query_list) @@ -2727,8 +2974,9 @@ absl::Status Resolver::ResolveCreateModelStatement( // Resolve options. std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); const std::vector model_name = ast_statement->name()->ToIdentifierVector(); @@ -2946,6 +3194,7 @@ absl::Status Resolver::ResolveCreateTableStmtBaseProperties( ZETASQL_RETURN_IF_ERROR( ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, &statement_base_properties->resolved_options)); statement_base_properties->is_value_table = false; @@ -3433,7 +3682,9 @@ Resolver::MakeResolvedColumnAnnotationsWithCollation( const ASTOptionsList* options_list) { std::unique_ptr column_annotations; std::vector> resolved_options_list; - ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(options_list, &resolved_options_list)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(options_list, + /*allow_alter_array_operators=*/false, + &resolved_options_list)); if (language().LanguageFeatureEnabled(FEATURE_V_1_3_COLLATION_SUPPORT) && type_annotation_map != nullptr && type_annotation_map->Has()) { @@ -3604,6 +3855,7 @@ absl::Status Resolver::ResolveCreateViewStatementBaseProperties( *table_name = ast_statement->name()->ToIdentifierVector(); ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, resolved_options)); *is_value_table = false; @@ -3887,8 +4139,9 @@ absl::Status Resolver::ResolveCreateSnapshotTableStatement( ast_statement->clone_data_source(), &clone_from)); ZETASQL_RET_CHECK(!clone_from->column_list().empty()); - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); *output = MakeResolvedCreateSnapshotTableStmt( table_name, create_scope, create_mode, std::move(clone_from), @@ -4117,6 +4370,10 @@ absl::Status Resolver::ResolveCreateFunctionStatement( &resolved_expr, return_type)); ZETASQL_RET_CHECK(!expr_info.has_analytic); + if (expr_info.query_resolution_info->HasGroupingCall()) { + return MakeSqlErrorAt(ast_statement) + << "GROUPING function is not supported in SQL function body."; + } ZETASQL_RETURN_IF_ERROR( FunctionResolver::CheckCreateAggregateFunctionProperties( *resolved_expr, sql_function_body->expression(), &expr_info, @@ -4203,6 +4460,7 @@ absl::Status Resolver::ResolveCreateFunctionStatement( ZETASQL_RET_CHECK_EQ(function_argument_info_, nullptr); std::vector> resolved_options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, &resolved_options)); // If the function has a SQL function body, copy the body SQL to the code @@ -4527,8 +4785,9 @@ absl::Status Resolver::ResolveCreateTableFunctionStatement( // time, but not at function create time when options are evaluated. ZETASQL_RET_CHECK_EQ(function_argument_info_, nullptr); std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); // If the function has a SQL statement body, copy the body SQL to the code // field. @@ -4910,6 +5169,7 @@ absl::Status Resolver::ResolveCreateProcedureStatement( std::vector> resolved_options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, &resolved_options)); std::string procedure_body; @@ -5901,6 +6161,7 @@ absl::Status Resolver::ResolveExportDataStatement( std::vector> resolved_options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, &resolved_options)); *output = MakeResolvedExportDataStmt( @@ -5925,12 +6186,13 @@ absl::Status Resolver::ResolveExportMetadataStatement( } std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); *output = MakeResolvedExportMetadataStmt( - std::string(SchemaObjectKindToName(ast_statement->schema_object_kind())), - name_path, std::move(resolved_connection), std::move(resolved_options)); + SchemaObjectKindToName(ast_statement->schema_object_kind()), name_path, + std::move(resolved_connection), std::move(resolved_options)); return absl::OkStatus(); } @@ -5949,8 +6211,9 @@ absl::Status Resolver::ResolveExportModelStatement( } std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); *output = MakeResolvedExportModelStmt(model_name_path, std::move(resolved_connection), @@ -6007,7 +6270,8 @@ absl::Status Resolver::ResolveCallStatement( } ZETASQL_RETURN_IF_ERROR( ResolveStandaloneExpr(sql_, ast_tvf_argument->expr(), &expr)); - input_arg_types[i] = GetInputArgumentTypeForExpr(expr.get()); + input_arg_types[i] = GetInputArgumentTypeForExpr( + expr.get(), /*pick_default_type_for_untyped_expr=*/false); resolved_args_exprs[i] = std::move(expr); } @@ -6055,6 +6319,7 @@ absl::Status Resolver::ResolveDefineTableStatement( ast_statement->name()->ToIdentifierVector(); std::vector> resolved_options; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, &resolved_options)); *output = @@ -6223,13 +6488,18 @@ absl::Status Resolver::ResolveUndropStatement( const std::vector name = ast_statement->name()->ToIdentifierVector(); std::unique_ptr for_system_time_expr; + std::vector> resolved_options; if (ast_statement->for_system_time() != nullptr) { ZETASQL_RETURN_IF_ERROR(ResolveForSystemTimeExpr(ast_statement->for_system_time(), &for_system_time_expr)); } + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); *output = MakeResolvedUndropStmt( - std::string(SchemaObjectKindToName(ast_statement->schema_object_kind())), - ast_statement->is_if_not_exists(), name, std::move(for_system_time_expr)); + SchemaObjectKindToName(ast_statement->schema_object_kind()), + ast_statement->is_if_not_exists(), name, std::move(for_system_time_expr), + std::move(resolved_options)); return absl::OkStatus(); } @@ -6251,7 +6521,7 @@ absl::Status Resolver::ResolveDropStatement( const std::vector name = ast_statement->name()->ToIdentifierVector(); *output = MakeResolvedDropStmt( - std::string(SchemaObjectKindToName(ast_statement->schema_object_kind())), + SchemaObjectKindToName(ast_statement->schema_object_kind()), ast_statement->is_if_exists(), name, ConvertDropMode(ast_statement->drop_mode())); return absl::OkStatus(); @@ -6666,8 +6936,9 @@ absl::Status Resolver::ResolveImportStatement( } std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); // ResolvedStatement populates either name_path or file_path but not both. ZETASQL_RET_CHECK(name_path.empty() || file_path.empty()) @@ -6694,8 +6965,9 @@ absl::Status Resolver::ResolveModuleStatement( ast_statement->name()->ToIdentifierVector(); std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); *output = MakeResolvedModuleStmt(name_path, std::move(resolved_options)); MaybeRecordParseLocation(ast_statement->name(), output->get()); @@ -6813,8 +7085,9 @@ absl::Status Resolver::ResolveAnalyzeStatement( const ASTAnalyzeStatement* ast_statement, std::unique_ptr* output) { std::vector> resolved_options; - ZETASQL_RETURN_IF_ERROR( - ResolveOptionsList(ast_statement->options_list(), &resolved_options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &resolved_options)); std::vector> resolved_table_and_column_info_list; ZETASQL_RETURN_IF_ERROR( @@ -6928,7 +7201,9 @@ absl::Status Resolver::ResolveCreateEntityStatement( &create_scope, &create_mode)); std::vector> options; - ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), &options)); + ZETASQL_RETURN_IF_ERROR(ResolveOptionsList(ast_statement->options_list(), + /*allow_alter_array_operators=*/false, + &options)); std::string entity_body_json; if (ast_statement->json_body() != nullptr) { // TODO: Use ResolveExpr() once JSON goes GA. @@ -7015,7 +7290,8 @@ absl::Status Resolver::ResolveAuxLoadDataStatement( std::vector> from_files_options_list; ZETASQL_RETURN_IF_ERROR(ResolveOptionsList( - ast_statement->from_files()->options_list(), &from_files_options_list)); + ast_statement->from_files()->options_list(), + /*allow_alter_array_operators=*/false, &from_files_options_list)); NameList columns; ZETASQL_RETURN_IF_ERROR(statement_base_properties.GetVisibleColumnNames(&columns)); ZETASQL_RETURN_IF_ERROR(statement_base_properties.WithPartitionColumnNames(&columns)); diff --git a/zetasql/analyzer/resolver_test.cc b/zetasql/analyzer/resolver_test.cc index f26854af9..4a8b07a73 100644 --- a/zetasql/analyzer/resolver_test.cc +++ b/zetasql/analyzer/resolver_test.cc @@ -98,6 +98,8 @@ class ResolverTest : public ::testing::Test { analyzer_options_.mutable_language()->EnableMaximumLanguageFeatures(); analyzer_options_.mutable_language()->EnableLanguageFeature( FEATURE_ANONYMIZATION); + analyzer_options_.mutable_language()->EnableLanguageFeature( + FEATURE_AGGREGATION_THRESHOLD); analyzer_options_.mutable_language()->EnableLanguageFeature( FEATURE_V_1_3_UNNEST_AND_FLATTEN_ARRAYS); analyzer_options_.mutable_language()->EnableLanguageFeature( @@ -1188,6 +1190,43 @@ TEST_F(ResolverTest, TestHasAnonymization) { REWRITE_ANONYMIZATION)); } +TEST_F(ResolverTest, TestHasAggregationThreshold) { + std::unique_ptr parser_output; + std::unique_ptr resolved_statement; + std::string sql; + // Test that a statement with aggregation thresholding uses the new rewriter. + sql = "SELECT WITH AGGREGATION_THRESHOLD key FROM KeyValue GROUP BY key"; + ResetResolver(sample_catalog_->catalog()); + ZETASQL_ASSERT_OK(ParseStatement(sql, + ParserOptions(analyzer_options_.GetParserOptions()), + &parser_output)); + ZETASQL_EXPECT_OK(resolver_->ResolveStatement(sql, parser_output->statement(), + &resolved_statement)); + // Aggregation threshold rewriter should be present, anonymization rewriter + // should not be present. + EXPECT_TRUE(resolver_->analyzer_output_properties().IsRelevant( + REWRITE_AGGREGATION_THRESHOLD)); + EXPECT_FALSE(resolver_->analyzer_output_properties().IsRelevant( + REWRITE_ANONYMIZATION)); +} + +TEST_F(ResolverTest, TestDoesNotHaveAggregationThreshold) { + std::unique_ptr parser_output; + std::unique_ptr resolved_statement; + std::string sql; + + // Test a statement without aggregation thresholding. + sql = "SELECT * FROM KeyValue"; + ResetResolver(sample_catalog_->catalog()); + ZETASQL_ASSERT_OK(ParseStatement(sql, + ParserOptions(analyzer_options_.GetParserOptions()), + &parser_output)); + ZETASQL_EXPECT_OK(resolver_->ResolveStatement(sql, parser_output->statement(), + &resolved_statement)); + EXPECT_FALSE(resolver_->analyzer_output_properties().IsRelevant( + REWRITE_AGGREGATION_THRESHOLD)); +} + TEST_F(ResolverTest, FlattenInCatalogButFeatureOff) { analyzer_options_.mutable_language()->DisableAllLanguageFeatures(); ResetResolver(sample_catalog_->catalog()); @@ -1343,6 +1382,30 @@ TEST_F(ResolverTest, TestIntervalLiteral) { TestIntervalLiteral("0-0 0 87840000:0:0", "INTERVAL '316224000000' SECOND"); TestIntervalLiteral("0-0 0 -87840000:0:0", "INTERVAL '-316224000000' SECOND"); + TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '0' MILLISECOND"); + TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '-0' MILLISECOND"); + TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '+0' MILLISECOND"); + TestIntervalLiteral("0-0 0 27777777:46:39.999", + "INTERVAL '99999999999999' MILLISECOND"); + TestIntervalLiteral("0-0 0 -27777777:46:39.999", + "INTERVAL '-99999999999999' MILLISECOND"); + TestIntervalLiteral("0-0 0 87840000:0:0", + "INTERVAL '316224000000000' MILLISECOND"); + TestIntervalLiteral("0-0 0 -87840000:0:0", + "INTERVAL '-316224000000000' MILLISECOND"); + + TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '0' MICROSECOND"); + TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '-0' MICROSECOND"); + TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '+0' MICROSECOND"); + TestIntervalLiteral("0-0 0 27777777:46:39.999999", + "INTERVAL '99999999999999999' MICROSECOND"); + TestIntervalLiteral("0-0 0 -27777777:46:39.999999", + "INTERVAL '-99999999999999999' MICROSECOND"); + TestIntervalLiteral("0-0 0 87840000:0:0", + "INTERVAL '316224000000000000' MICROSECOND"); + TestIntervalLiteral("0-0 0 -87840000:0:0", + "INTERVAL '-316224000000000000' MICROSECOND"); + TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '0-0' YEAR TO MONTH"); TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '-0-0' YEAR TO MONTH"); TestIntervalLiteral("0-0 0 0:0:0", "INTERVAL '+0-0' YEAR TO MONTH"); @@ -1860,11 +1923,49 @@ TEST_F(ResolverTest, TestIntervalLiteral) { TestIntervalLiteralError("INTERVAL '9223372036854775808' SECOND"); TestIntervalLiteralError("INTERVAL '-9223372036854775809' SECOND"); + TestIntervalLiteralError("INTERVAL '' MILLISECOND"); + TestIntervalLiteralError("INTERVAL ' 1' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '1 ' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '.' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '1.' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '-1.' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '+1.' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '\t1' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '1\t' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '\\n1' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '1\\n' MILLISECOND"); + // fractional digits + TestIntervalLiteralError("INTERVAL '0.1' MILLISECOND"); + // exceeds max number of milliseconds + TestIntervalLiteralError("INTERVAL '316224000000001' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '-316224000000001' MILLISECOND"); + // overflow fitting into int64_t at SimpleAtoi + TestIntervalLiteralError("INTERVAL '9223372036854775808' MILLISECOND"); + TestIntervalLiteralError("INTERVAL '-9223372036854775809' MILLISECOND"); + + TestIntervalLiteralError("INTERVAL '' MICROSECOND"); + TestIntervalLiteralError("INTERVAL ' 1' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '1 ' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '.' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '1.' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '-1.' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '+1.' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '\t1' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '1\t' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '\\n1' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '1\\n' MICROSECOND"); + // fractional digits + TestIntervalLiteralError("INTERVAL '0.1' MICROSECOND"); + // exceeds max number of microseconds + TestIntervalLiteralError("INTERVAL '316224000000000001' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '-316224000000000001' MICROSECOND"); + // overflow fitting into int64_t at SimpleAtoi + TestIntervalLiteralError("INTERVAL '9223372036854775808' MICROSECOND"); + TestIntervalLiteralError("INTERVAL '-9223372036854775809' MICROSECOND"); + // Unsupported dateparts TestIntervalLiteralError("INTERVAL '0' DAYOFWEEK"); TestIntervalLiteralError("INTERVAL '0' DAYOFYEAR"); - TestIntervalLiteralError("INTERVAL '0' MILLISECOND"); - TestIntervalLiteralError("INTERVAL '0' MICROSECOND"); TestIntervalLiteralError("INTERVAL '0' NANOSECOND"); TestIntervalLiteralError("INTERVAL '0' DATE"); TestIntervalLiteralError("INTERVAL '0' DATETIME"); diff --git a/zetasql/analyzer/rewriters/BUILD b/zetasql/analyzer/rewriters/BUILD index ae5cae490..4791e7b84 100644 --- a/zetasql/analyzer/rewriters/BUILD +++ b/zetasql/analyzer/rewriters/BUILD @@ -51,7 +51,6 @@ cc_library( "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output", "//zetasql/public:analyzer_output_properties", "//zetasql/public:catalog", "//zetasql/public:options_cc_proto", @@ -63,7 +62,6 @@ cc_library( "//zetasql/resolved_ast:rewrite_utils", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", - "@com_google_absl//absl/types:span", ], ) @@ -96,7 +94,6 @@ cc_library( "//zetasql/base:status", "//zetasql/base:varsetter", "//zetasql/common:errors", - "//zetasql/parser:parse_tree", "//zetasql/public:analyzer_options", "//zetasql/public:analyzer_output_properties", "//zetasql/public:catalog", @@ -116,6 +113,7 @@ cc_library( "//zetasql/resolved_ast:resolved_node_kind_cc_proto", "//zetasql/resolved_ast:rewrite_utils", "@com_google_absl//absl/cleanup", + "@com_google_absl//absl/container:btree", "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/memory", @@ -132,6 +130,7 @@ cc_library( srcs = ["sql_view_inliner.cc"], hdrs = ["sql_view_inliner.h"], deps = [ + "//zetasql/base:check", "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:analyzer_options", @@ -142,13 +141,11 @@ cc_library( "//zetasql/public:sql_view", "//zetasql/public/types", "//zetasql/resolved_ast", + "//zetasql/resolved_ast:resolved_ast_enums_cc_proto", "//zetasql/resolved_ast:rewrite_utils", - "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings:str_format", - "@com_google_absl//absl/types:optional", - "@com_google_absl//absl/types:span", ], ) @@ -161,7 +158,6 @@ cc_library( "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output", "//zetasql/public:analyzer_output_properties", "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", @@ -175,7 +171,6 @@ cc_library( "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", - "@com_google_absl//absl/types:span", ], ) @@ -197,10 +192,10 @@ cc_library( "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", - "@com_google_absl//absl/types:span", ], ) @@ -212,7 +207,6 @@ cc_library( "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output", "//zetasql/public:analyzer_output_properties", "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", @@ -226,9 +220,7 @@ cc_library( "//zetasql/resolved_ast:resolved_ast_builder", "//zetasql/resolved_ast:resolved_ast_rewrite_visitor", "//zetasql/resolved_ast:rewrite_utils", - "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", - "@com_google_absl//absl/types:span", ], ) @@ -240,7 +232,6 @@ cc_library( "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output", "//zetasql/public:analyzer_output_properties", "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", @@ -251,6 +242,7 @@ cc_library( "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", @@ -265,20 +257,17 @@ cc_library( deps = [ "//zetasql/analyzer:resolver", "//zetasql/analyzer:substitute", - "//zetasql/base:logging", + "//zetasql/base:check", "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/common:aggregate_null_handling", - "//zetasql/common:errors", "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output", "//zetasql/public:analyzer_output_properties", "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", "//zetasql/public:function", "//zetasql/public:language_options", "//zetasql/public:options_cc_proto", - "//zetasql/public:parse_location", "//zetasql/public:rewriter_interface", "//zetasql/public:value", "//zetasql/public/annotation:collation", @@ -288,7 +277,6 @@ cc_library( "//zetasql/resolved_ast:resolved_node_kind_cc_proto", "//zetasql/resolved_ast:rewrite_utils", "@com_google_absl//absl/container:flat_hash_map", - "@com_google_absl//absl/memory", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", @@ -304,12 +292,10 @@ cc_library( "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output", "//zetasql/public:analyzer_output_properties", "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", "//zetasql/public:function", - "//zetasql/public:language_options", "//zetasql/public:options_cc_proto", "//zetasql/public:rewriter_interface", "//zetasql/public:value", @@ -318,7 +304,7 @@ cc_library( "//zetasql/resolved_ast:resolved_ast_builder", "//zetasql/resolved_ast:resolved_ast_rewrite_visitor", "//zetasql/resolved_ast:rewrite_utils", - "@com_google_absl//absl/types:span", + "@com_google_absl//absl/status:statusor", ], ) @@ -327,13 +313,16 @@ cc_library( srcs = ["registration.cc"], hdrs = ["registration.h"], deps = [ + "//zetasql/base:check", "//zetasql/base:logging", "//zetasql/public:options_cc_proto", + "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/memory", "@com_google_absl//absl/strings", "@com_google_absl//absl/synchronization", "@com_google_absl//absl/types:optional", + "@com_google_absl//absl/types:span", ], ) @@ -350,7 +339,6 @@ cc_library( "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", "//zetasql/public:rewriter_interface", - "//zetasql/public:value", "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:rewrite_utils", @@ -378,6 +366,7 @@ cc_library( "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", @@ -391,35 +380,46 @@ cc_test( ":registration", "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/public:analyzer_options", - "//zetasql/public:analyzer_output", "//zetasql/public:options_cc_proto", "//zetasql/public:rewriter_interface", "//zetasql/resolved_ast", - "@com_google_absl//absl/memory", "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", ], ) cc_library( - name = "set_operation_corresponding_rewriter", - srcs = ["set_operation_corresponding_rewriter.cc"], - hdrs = ["set_operation_corresponding_rewriter.h"], + name = "grouping_set_rewriter", + srcs = ["grouping_set_rewriter.cc"], + hdrs = ["grouping_set_rewriter.h"], deps = [ + "//zetasql/base:check", + "//zetasql/base:map_util", + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/public:analyzer_options", + "//zetasql/public:analyzer_output_properties", + "//zetasql/public:catalog", + "//zetasql/public:options_cc_proto", "//zetasql/public:rewriter_interface", + "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:resolved_ast_builder", "//zetasql/resolved_ast:resolved_ast_rewrite_visitor", - "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/algorithm:container", + "@com_google_absl//absl/container:flat_hash_set", + "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings:str_format", ], ) cc_library( - name = "grouping_set_rewriter", - srcs = ["grouping_set_rewriter.cc"], - hdrs = ["grouping_set_rewriter.h"], + name = "insert_dml_values_rewriter", + srcs = ["insert_dml_values_rewriter.cc"], + hdrs = ["insert_dml_values_rewriter.h"], deps = [ - "//zetasql/base:check", - "//zetasql/base:map_util", + "//zetasql/base:no_destructor", "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:analyzer_options", @@ -427,13 +427,134 @@ cc_library( "//zetasql/public:catalog", "//zetasql/public:options_cc_proto", "//zetasql/public:rewriter_interface", + "//zetasql/public:sql_view", "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:resolved_ast_builder", "//zetasql/resolved_ast:resolved_ast_rewrite_visitor", + "//zetasql/resolved_ast:resolved_node_kind_cc_proto", + "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/types:optional", + ], +) + +cc_library( + name = "multiway_unnest_rewriter", + srcs = ["multiway_unnest_rewriter.cc"], + hdrs = ["multiway_unnest_rewriter.h"], + deps = [ + "//zetasql/base:check", + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/public:analyzer_options", + "//zetasql/public:catalog", + "//zetasql/public:function_headers", + "//zetasql/public:rewriter_interface", + "//zetasql/public:value", + "//zetasql/public/functions:array_zip_mode_cc_proto", + "//zetasql/public/types", + "//zetasql/resolved_ast", + "//zetasql/resolved_ast:make_node_vector", + "//zetasql/resolved_ast:resolved_ast_builder", + "//zetasql/resolved_ast:resolved_ast_rewrite_visitor", + "//zetasql/resolved_ast:rewrite_utils", "@com_google_absl//absl/container:flat_hash_set", + "@com_google_absl//absl/log", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", + ], +) + +cc_library( + name = "anonymization_helper", + srcs = ["anonymization_helper.cc"], + hdrs = ["anonymization_helper.h"], + deps = [ + "//zetasql/analyzer:expr_matching_helpers", + "//zetasql/analyzer:name_scope", + "//zetasql/analyzer:resolver", + "//zetasql/base:ret_check", + "//zetasql/base:source_location", + "//zetasql/base:status", + "//zetasql/common:errors", + "//zetasql/common:status_payload_utils", + "//zetasql/parser", + "//zetasql/proto:anon_output_with_report_cc_proto", + "//zetasql/proto:internal_error_location_cc_proto", + "//zetasql/public:analyzer_options", + "//zetasql/public:analyzer_output_properties", + "//zetasql/public:anon_function", + "//zetasql/public:anonymization_utils", + "//zetasql/public:builtin_function", + "//zetasql/public:builtin_function_cc_proto", + "//zetasql/public:catalog", + "//zetasql/public:function", + "//zetasql/public:function_cc_proto", + "//zetasql/public:id_string", + "//zetasql/public:language_options", + "//zetasql/public:options_cc_proto", + "//zetasql/public:parse_location", + "//zetasql/public:rewriter_interface", + "//zetasql/public:select_with_mode", + "//zetasql/public:strings", + "//zetasql/public:type", + "//zetasql/public:type_cc_proto", + "//zetasql/public:value", + "//zetasql/public/functions:differential_privacy_cc_proto", + "//zetasql/public/types", + "//zetasql/resolved_ast", + "//zetasql/resolved_ast:make_node_vector", + "//zetasql/resolved_ast:resolved_ast_enums_cc_proto", + "//zetasql/resolved_ast:resolved_node_kind_cc_proto", + "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/container:flat_hash_set", + "@com_google_absl//absl/memory", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", + "@com_google_absl//absl/types:span", + ], +) + +cc_library( + name = "anonymization_rewriter", + srcs = [ + "anonymization_rewriter.cc", + ], + hdrs = ["anonymization_rewriter.h"], + deps = [ + ":anonymization_helper", + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/public:analyzer_options", + "//zetasql/public:catalog", + "//zetasql/public:rewriter_interface", + "//zetasql/resolved_ast", + "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/status:statusor", + ], +) + +cc_library( + name = "aggregation_threshold_rewriter", + srcs = [ + "aggregation_threshold_rewriter.cc", + ], + hdrs = ["aggregation_threshold_rewriter.h"], + deps = [ + ":anonymization_helper", + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/public:analyzer_options", + "//zetasql/public:catalog", + "//zetasql/public:rewriter_interface", + "//zetasql/resolved_ast", + "//zetasql/resolved_ast:rewrite_utils", + "@com_google_absl//absl/status:statusor", ], ) diff --git a/zetasql/analyzer/rewriters/aggregation_threshold_rewriter.cc b/zetasql/analyzer/rewriters/aggregation_threshold_rewriter.cc new file mode 100644 index 000000000..78db2ee33 --- /dev/null +++ b/zetasql/analyzer/rewriters/aggregation_threshold_rewriter.cc @@ -0,0 +1,58 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/analyzer/rewriters/aggregation_threshold_rewriter.h" + +#include +#include + +#include "zetasql/analyzer/rewriters/anonymization_helper.h" +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/catalog.h" +#include "zetasql/public/rewriter_interface.h" +#include "zetasql/resolved_ast/resolved_node.h" +#include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/status/statusor.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +class AggregationThresholdRewriter : public Rewriter { + public: + absl::StatusOr> Rewrite( + const AnalyzerOptions& options, const ResolvedNode& input, + Catalog& catalog, TypeFactory& type_factory, + AnalyzerOutputProperties& output_properties) const override { + ZETASQL_RET_CHECK(options.AllArenasAreInitialized()); + ColumnFactory column_factory(/*max_col_id=*/0, + options.id_string_pool().get(), + options.column_id_sequence_number()); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr node, + RewriteHelper(input, options, column_factory, catalog, type_factory)); + return node; + } + + std::string Name() const override { return "AggregationThresholdRewriter"; } +}; + +const Rewriter* GetAggregationThresholdRewriter() { + static const Rewriter* kRewriter = new AggregationThresholdRewriter; + return kRewriter; +} + +} // namespace zetasql diff --git a/zetasql/analyzer/rewriters/aggregation_threshold_rewriter.h b/zetasql/analyzer/rewriters/aggregation_threshold_rewriter.h new file mode 100644 index 000000000..397009cca --- /dev/null +++ b/zetasql/analyzer/rewriters/aggregation_threshold_rewriter.h @@ -0,0 +1,29 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_ANALYZER_REWRITERS_AGGREGATION_THRESHOLD_REWRITER_H_ +#define ZETASQL_ANALYZER_REWRITERS_AGGREGATION_THRESHOLD_REWRITER_H_ + +#include "zetasql/public/analyzer_options.h" + +namespace zetasql { + +// Returns a pointer to the aggregation threshold rewriter. +const Rewriter* GetAggregationThresholdRewriter(); + +} // namespace zetasql + +#endif // ZETASQL_ANALYZER_REWRITERS_AGGREGATION_THRESHOLD_REWRITER_H_ diff --git a/zetasql/analyzer/anonymization_rewriter.cc b/zetasql/analyzer/rewriters/anonymization_helper.cc similarity index 95% rename from zetasql/analyzer/anonymization_rewriter.cc rename to zetasql/analyzer/rewriters/anonymization_helper.cc index 209b6307a..3d5febdec 100644 --- a/zetasql/analyzer/anonymization_rewriter.cc +++ b/zetasql/analyzer/rewriters/anonymization_helper.cc @@ -14,7 +14,7 @@ // limitations under the License. // -#include "zetasql/analyzer/anonymization_rewriter.h" +#include "zetasql/analyzer/rewriters/anonymization_helper.h" #include #include @@ -114,6 +114,27 @@ struct SelectWithModeName { bool uses_a_article; }; +constexpr absl::string_view kPublicGroupsWithNamePrefix = "$public_groups"; + +// Keeps global state for the public groups feature to provide unique query +// names for newly added `WithScan` entries across multiple anon / dp aggregate +// scans. +// +// The name of a `WithEntry` must be unique across the query. +class GlobalPublicGroupsQueryNameProvider { + public: + // Returns the next name of an otherwise unnamed WithEntry query name. + std::string GetNextPublicGroupsWithQueryName() { + const std::string query_name = + absl::StrCat(kPublicGroupsWithNamePrefix, next_query_name_number_); + ++next_query_name_number_; + return query_name; + } + + private: + int64_t next_query_name_number_ = 0; +}; + // Keeps state for using the public groups feature. This class will be invoked // by the PerUserRewriterVisitor to check and add joins with the public groups // table. It will be invoked from the AnonymizationRewriter to ensure all @@ -189,9 +210,11 @@ class PublicGroupsState { // join. static absl::StatusOr> CreateFromGroupByList( - const std::vector>& + absl::Span> group_by_list, - bool bound_contributions_across_groups, absl::string_view error_prefix); + bool bound_contributions_across_groups, absl::string_view error_prefix, + GlobalPublicGroupsQueryNameProvider* + global_public_groups_query_name_provider); // Rewrites an existing copy of the resolved join scan with the modifications // required for the public groups feature if the shape satisfies the public @@ -255,15 +278,22 @@ class PublicGroupsState { return std::nullopt; } + absl::StatusOr>> + GetPublicGroupScansAsWithEntries(ColumnFactory* column_factory) const; + private: PublicGroupsState( const absl::flat_hash_map& column_id_map, const absl::flat_hash_set& public_group_column_ids, - bool bound_contributions_across_groups) + bool bound_contributions_across_groups, + GlobalPublicGroupsQueryNameProvider* + global_public_groups_query_name_provider) : column_id_map_(column_id_map), public_group_column_ids_(public_group_column_ids), unjoined_column_ids_(public_group_column_ids), - bound_contributions_across_groups_(bound_contributions_across_groups) {} + bound_contributions_across_groups_(bound_contributions_across_groups), + global_public_groups_query_name_provider_( + global_public_groups_query_name_provider) {} // Copies relevant information about the scan and the join conditions, so that // they can be later added as CTE. This method also ensures that columns in @@ -291,9 +321,6 @@ class PublicGroupsState { const std::vector>& with_entries) const; - absl::StatusOr>> - GetPublicGroupScansAsWithEntries(ColumnFactory* column_factory) const; - // Contains more information for the columns in the following sets. This map // is keyed by column id. Only populated when group selection strategy is // public groups and there is at least one element in the group-by list. @@ -309,13 +336,11 @@ class PublicGroupsState { absl::flat_hash_set unjoined_column_ids_; const bool bound_contributions_across_groups_; - int next_with_query_name_number_ = 0; struct PublicGroupsWithScan { // The name of the CTE. // - // * If with_scan_already_present = false, we will use $public_groupsX, - // where X is replaced by the next number using - // next_with_query_name_number_. + // * If with_scan_already_present = false, we will use a unique name from + // the global public groups state. // * Otherwise this will be the user-given query name of the CTE. std::string with_query_name; @@ -336,6 +361,9 @@ class PublicGroupsState { // Mapping of columns to an equivalent column in the `join_expr` in any of the // `with_scans_`. absl::flat_hash_map column_to_join_column_id_; + + GlobalPublicGroupsQueryNameProvider* + global_public_groups_query_name_provider_; // unowned }; std::string CreateOptionNotAllowedWithPublicGroupsError( @@ -469,7 +497,8 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { absl::StatusOr RewritePerUserTransform( const ResolvedAggregateScanBase* node, SelectWithModeName select_with_mode_name, - std::optional options_uid_column); + std::optional options_uid_column, + PublicGroupsState* public_groups_state); // Rewrites node using InnerAggregateListRewriterVisitor. // @@ -503,7 +532,8 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { absl::StatusOr> BoundGroupsContributedToInputScan( const NodeType* original_input_scan, RewritePerUserTransformResult rewritten_per_user_transform, - PrivacyOptionSpec privacy_option_spec); + PrivacyOptionSpec privacy_option_spec, + PublicGroupsState* public_groups_state); // Rewrites the node contained in rewritten_per_user_transform, inserting a // `SampleScan` to bound the number of rows that a user can contribute. @@ -511,7 +541,7 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { absl::StatusOr> BoundRowsContributedToInputScan( const NodeType* original_input_scan, RewritePerUserTransformResult rewritten_per_user_transform, - int64_t max_rows_contributed); + int64_t max_rows_contributed, bool filter_values_with_null_uid); // Returns a reference to a column containing the count of unique users // (accounting for the different report types like JSON or proto). If this is @@ -543,14 +573,14 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { IdentifyOrAddNoisyCountDistinctPrivacyIdsColumnToAggregateList( const NodeType* original_input_scan, const OuterAggregateListRewriterVisitor& outer_rewriter_visitor, - std::vector>& + std::vector>& outer_aggregate_list); // Adds a new column containing the exact number of distinct privacy ids per // group to the outer_aggregate_list. A reference to that column is returned. absl::StatusOr> AddExactCountDistinctPrivacyIdsColumnToAggregateList( - std::vector>& + std::vector>& outer_aggregate_list); // Returns the expression that should be added to the DP aggregate @@ -563,7 +593,7 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { const NodeType* original_input_scan, const OuterAggregateListRewriterVisitor& outer_rewriter_visitor, std::optional min_privacy_units_per_group, - std::vector>& + std::vector>& outer_aggregate_list); // Creates a new `ResolvedComputedColumn` that counts distinct user IDs, @@ -573,13 +603,14 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { absl::StatusOr> CreateCountDistinctPrivacyIdsColumn( const NodeType* original_input_scan, - std::vector>& aggregate_list); + std::vector>& aggregate_list); std::unique_ptr CreateAggregateScan( const ResolvedAnonymizedAggregateScan* node, std::unique_ptr input_scan, std::vector> outer_group_by_list, - std::vector> outer_aggregate_list, + std::vector> + outer_aggregate_list, std::unique_ptr group_selection_threshold_expr, std::vector> resolved_options); @@ -587,7 +618,8 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { const ResolvedDifferentialPrivacyAggregateScan* node, std::unique_ptr input_scan, std::vector> outer_group_by_list, - std::vector> outer_aggregate_list, + std::vector> + outer_aggregate_list, std::unique_ptr group_selection_threshold_expr, std::vector> resolved_options); @@ -601,7 +633,8 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { // SampleScan using default_anon_kappa_value. absl::StatusOr> AddCrossPartitionSampleScan( std::unique_ptr input_scan, - std::optional max_groups_contributed, ResolvedColumn uid_column); + std::optional max_groups_contributed, ResolvedColumn uid_column, + PublicGroupsState* public_groups_state); // Wraps input_scan with a sample scan that bounds the number of rows that a // user can contribute to the dataset. @@ -622,20 +655,6 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { absl::Status VisitResolvedDifferentialPrivacyAggregateScanTemplate( const NodeType* node); - absl::Status VisitResolvedQueryStmt(const ResolvedQueryStmt* node) override { - ZETASQL_RETURN_IF_ERROR(CopyVisitResolvedQueryStmt(node)); - if (public_groups_state_ == nullptr) { - return absl::OkStatus(); - } - ResolvedQueryStmt* stmt = GetUnownedTopOfStack(); - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr new_query, - public_groups_state_->WrapScanWithPublicGroupsWithEntries( - allocator_, stmt->release_query())); - stmt->set_query(std::move(new_query)); - public_groups_state_.reset(); - return absl::OkStatus(); - } - absl::Status VisitResolvedWithScan(const ResolvedWithScan* node) override; absl::Status VisitResolvedProjectScan( const ResolvedProjectScan* node) override; @@ -649,9 +668,11 @@ class RewriterVisitor : public ResolvedASTDeepCopyVisitor { Catalog* catalog_; // unowned AnalyzerOptions* analyzer_options_; // unowned - // The following field will only be populated iff group selection strategy is - // public groups. - std::unique_ptr public_groups_state_; + // Public groups state that is kept over multiple anonymization / differential + // privacy aggregate scans. + std::unique_ptr + global_public_groups_query_name_provider_ = + std::make_unique(); std::vector> with_entries_; }; @@ -1366,7 +1387,7 @@ class OuterAggregateListRewriterVisitor : public ResolvedASTDeepCopyVisitor { // Rewrite the outer aggregate list, changing each ANON_* function to refer to // the intermediate column with pre-aggregated values that was produced by the // per-user aggregate scan. - absl::StatusOr>> + absl::StatusOr>> RewriteAggregateColumns(const ResolvedAggregateScanBase* node) { return ProcessNodeList(node->aggregate_list()); } @@ -1515,7 +1536,7 @@ struct UidColumnState { } bool SetColumn(const zetasql::ResolvedColumn& col, - const std::string& new_alias) { + absl::string_view new_alias) { SetColumn(col); alias = new_alias; return true; @@ -2255,8 +2276,12 @@ class PerUserRewriterVisitor : public ResolvedASTDeepCopyVisitor { left_visitor.current_uid_, right_visitor.current_uid_, with_entries_, copy)); if (public_groups_uid_column.has_value()) { - current_uid_ = public_groups_uid_column.value(); - current_uid_.ProjectIfMissing(*copy); + // The public groups uid column might not be initialized in cases where + // the user provided the uid column via an option. + if (public_groups_uid_column->column.IsInitialized()) { + current_uid_ = public_groups_uid_column.value(); + current_uid_.ProjectIfMissing(*copy); + } found_public_groups_join_ = true; return absl::OkStatus(); } @@ -2430,24 +2455,8 @@ class PerUserRewriterVisitor : public ResolvedASTDeepCopyVisitor { // Table doesn't contain any private data, so do nothing. return absl::OkStatus(); } - ResolvedAggregateScan* copy = GetUnownedTopOfStack(); - // Track the column refs from this group column in the public group state to - // eventually use this information for rewriting the join expr. - if (public_groups_state_ != nullptr) { - for (const std::unique_ptr& group : - copy->group_by_list()) { - std::vector> column_refs; - ZETASQL_RETURN_IF_ERROR(CollectColumnRefs(*group->expr(), &column_refs)); - for (const std::unique_ptr& column_ref : - column_refs) { - public_groups_state_->TrackColumnReplacement(column_ref->column(), - group->column()); - } - } - } - // If the source table is a value table the uid column refs will be // GetProtoField or GetStructField expressions, replace them with ColumnRef // expressions. @@ -2780,10 +2789,11 @@ absl::StatusOr RewriterVisitor::RewritePerUserTransform( const ResolvedAggregateScanBase* node, SelectWithModeName select_with_mode_name, - std::optional options_uid_column) { + std::optional options_uid_column, + PublicGroupsState* public_groups_state) { PerUserRewriterVisitor per_user_visitor(allocator_, type_factory_, resolver_, with_entries_, select_with_mode_name, - public_groups_state_.get()); + public_groups_state); ZETASQL_RETURN_IF_ERROR(node->input_scan()->Accept(&per_user_visitor)); ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr rewritten_scan, per_user_visitor.ConsumeRootNode()); @@ -2933,7 +2943,7 @@ template absl::StatusOr> RewriterVisitor::CreateCountDistinctPrivacyIdsColumn( const NodeType* original_input_scan, - std::vector>& aggregate_list) { + std::vector>& aggregate_list) { ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr group_selection_threshold_col, MakeGroupSelectionThresholdFunctionColumn(original_input_scan)); @@ -2948,7 +2958,8 @@ RewriterVisitor::CreateAggregateScan( const ResolvedAnonymizedAggregateScan* node, std::unique_ptr input_scan, std::vector> outer_group_by_list, - std::vector> outer_aggregate_list, + std::vector> + outer_aggregate_list, std::unique_ptr group_selection_threshold_expr, std::vector> resolved_options) { auto result = MakeResolvedAnonymizedAggregateScan( @@ -2965,7 +2976,8 @@ RewriterVisitor::CreateAggregateScan( const ResolvedDifferentialPrivacyAggregateScan* node, std::unique_ptr input_scan, std::vector> outer_group_by_list, - std::vector> outer_aggregate_list, + std::vector> + outer_aggregate_list, std::unique_ptr group_selection_threshold_expr, std::vector> resolved_options) { auto result = MakeResolvedDifferentialPrivacyAggregateScan( @@ -2984,7 +2996,7 @@ RewriterVisitor::CreateAggregateScan( // In some places, the privacy libraries only support int32_t values (e.g. // max_groups_contributed), but those options are declared as int64_t values in // SQL. -absl::StatusOr ValidatePositiveInt32Option( +absl::StatusOr ParseNullOrPositiveInt32Option( const ResolvedOption& option, absl::string_view dp_option_error_prefix) { zetasql_base::StatusBuilder invalid_value_message = MakeSqlErrorAtNode(option) @@ -3024,11 +3036,19 @@ ValidateAndParseGroupSelectionStrategyEnum( << error_prefix << " is invalid: " << enum_value; const auto strategy = static_cast(enum_value); - if (strategy == DifferentialPrivacyEnums::PUBLIC_GROUPS && - !language_options.LanguageFeatureEnabled( - FEATURE_DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS)) { - return MakeSqlErrorAtNode(option) - << error_prefix << " PUBLIC_GROUPS has not been enabled"; + if (strategy == DifferentialPrivacyEnums::PUBLIC_GROUPS) { + if (!language_options.LanguageFeatureEnabled( + FEATURE_DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS)) { + return MakeSqlErrorAtNode(option) + << error_prefix << " PUBLIC_GROUPS has not been enabled"; + } + if (!language_options.LanguageFeatureEnabled( + FEATURE_V_1_1_WITH_ON_SUBQUERY)) { + return MakeSqlErrorAtNode(option) + << error_prefix + << " PUBLIC_GROUPS is not supported without support for WITH " + "subqueries"; + } } return strategy; } @@ -3036,7 +3056,8 @@ ValidateAndParseGroupSelectionStrategyEnum( absl::StatusOr> RewriterVisitor::AddCrossPartitionSampleScan( std::unique_ptr input_scan, - std::optional max_groups_contributed, ResolvedColumn uid_column) { + std::optional max_groups_contributed, ResolvedColumn uid_column, + PublicGroupsState* public_groups_state) { if (max_groups_contributed.has_value()) { std::vector> partition_by_list; partition_by_list.push_back(MakeColRef(uid_column)); @@ -3047,9 +3068,9 @@ RewriterVisitor::AddCrossPartitionSampleScan( MakeResolvedLiteral(Value::Int64(*max_groups_contributed)), ResolvedSampleScan::ROWS, /*repeatable_argument=*/nullptr, /*weight_column=*/nullptr, std::move(partition_by_list)); - if (public_groups_state_ != nullptr) { + if (public_groups_state != nullptr) { ZETASQL_ASSIGN_OR_RETURN(input_scan, - public_groups_state_->CreateJoinScanAfterSampleScan( + public_groups_state->CreateJoinScanAfterSampleScan( std::move(input_scan), *allocator_)); } } @@ -3338,13 +3359,13 @@ struct PrivacyOptionSpec { // raw options of the differential private or anonymized aggregate scan. template static absl::StatusOr FromScanOptions( - const std::vector>& scan_options, + absl::Span> scan_options, const LanguageOptions& language_options); }; template absl::StatusOr PrivacyOptionSpec::FromScanOptions( - const std::vector>& scan_options, + absl::Span> scan_options, const LanguageOptions& language_options) { std::optional max_groups_contributed; std::optional max_rows_contributed; @@ -3360,15 +3381,15 @@ absl::StatusOr PrivacyOptionSpec::FromScanOptions( << " can only be set once"; ZETASQL_ASSIGN_OR_RETURN( max_groups_contributed, - ValidatePositiveInt32Option( + ParseNullOrPositiveInt32Option( *option, DPNodeSpecificData::kMaxGroupsContributedErrorPrefix)); } else if (zetasql_base::CaseEqual(option->name(), "max_rows_contributed")) { ZETASQL_RET_CHECK(!max_rows_contributed.has_value()) << "max_rows_contributed can only be set once"; - ZETASQL_ASSIGN_OR_RETURN( - max_rows_contributed, - ValidatePositiveInt32Option(*option, "Option MAX_ROWS_CONTRIBUTED")); + ZETASQL_ASSIGN_OR_RETURN(max_rows_contributed, + ParseNullOrPositiveInt32Option( + *option, "Option MAX_ROWS_CONTRIBUTED")); } else if (zetasql_base::CaseEqual(option->name(), "min_privacy_units_per_group")) { // The following two conditions were already checked in the resolver. @@ -3379,7 +3400,7 @@ absl::StatusOr PrivacyOptionSpec::FromScanOptions( ZETASQL_RET_CHECK(!min_privacy_units_per_group.has_value()) << "min_privacy_units_per_group can only be set once"; ZETASQL_ASSIGN_OR_RETURN(min_privacy_units_per_group, - ValidatePositiveInt32Option( + ParseNullOrPositiveInt32Option( *option, "Option MIN_PRIVACY_UNITS_PER_GROUP")); } else if (zetasql_base::CaseEqual(option->name(), "group_selection_strategy")) { @@ -3412,8 +3433,10 @@ absl::StatusOr PrivacyOptionSpec::FromScanOptions( // should_disable_bounding is true iff one of the contribution bounds is // explicitly set to NULL. bool should_disable_bounding = - (max_groups_contributed ? max_groups_contributed->is_null() : false) || - (max_rows_contributed ? max_rows_contributed->is_null() : false); + (max_groups_contributed.has_value() ? max_groups_contributed->is_null() + : false) || + (max_rows_contributed.has_value() ? max_rows_contributed->is_null() + : false); if (should_disable_bounding) { privacy_option_spec.strategy = BOUNDING_NONE; @@ -3714,35 +3737,32 @@ ExtractPublicGroupKeyColumnsFromJoinExpr(const ResolvedExpr* node) { return std::vector>(); } -// Returns the resulting uid column state. Assumes that the public_groups_join -// is indeed a public groups join. -// -// When it returns a value, it also ensures that: -// * the public groups side uid column is *not* initialized, and -// * the user data side uid column *is* initialized. -std::optional GetUidColumnFromPublicGroupJoin( +struct PublicGroupsUserDataAndPublicDataUidColumn { + UidColumnState user_data_uid_column_state; + UidColumnState public_data_uid_column_state; +}; + +// Returns the uid column state of the user data and the public data side of the +// join. Assumes that the join is actually a public groups join. Might return a +// nullopt in the case where the provided join is not a public groups join. +std::optional +GetPublicGroupsUserDataAndPublicDataUidColumn( const ResolvedJoinScan* public_groups_join, const UidColumnState& left_uid, const UidColumnState& right_uid) { - const UidColumnState* user_data_uid_column_state; - const UidColumnState* public_groups_uid_column_state; switch (public_groups_join->join_type()) { - case ResolvedJoinScanEnums::RIGHT: - user_data_uid_column_state = &left_uid; - public_groups_uid_column_state = &right_uid; - break; case ResolvedJoinScanEnums::LEFT: - user_data_uid_column_state = &right_uid; - public_groups_uid_column_state = &left_uid; - break; + return PublicGroupsUserDataAndPublicDataUidColumn{ + .user_data_uid_column_state = right_uid, + .public_data_uid_column_state = left_uid, + }; + case ResolvedJoinScanEnums::RIGHT: + return PublicGroupsUserDataAndPublicDataUidColumn{ + .user_data_uid_column_state = left_uid, + .public_data_uid_column_state = right_uid, + }; default: - // Other join types are not public group scans. return std::nullopt; } - if (!user_data_uid_column_state->column.IsInitialized() || - public_groups_uid_column_state->column.IsInitialized()) { - return std::nullopt; - } - return *user_data_uid_column_state; } absl::StatusOr> @@ -3751,22 +3771,27 @@ PublicGroupsState::MaybeRewritePublicGroupsJoinAndReturnUid( const UidColumnState& right_scan_uid_state, const std::vector>& with_entries, ResolvedJoinScan* join) { - const std::optional uid_column = - GetUidColumnFromPublicGroupJoin(join, left_scan_uid_state, - right_scan_uid_state); - if (!uid_column.has_value()) { + std::optional uid_columns = + GetPublicGroupsUserDataAndPublicDataUidColumn(join, left_scan_uid_state, + right_scan_uid_state); + if (!uid_columns.has_value()) { // Not a public groups join. return std::nullopt; } + if (uid_columns->public_data_uid_column_state.column.IsInitialized()) { + // Public data cannot have a uid set. + return std::nullopt; + } + const ResolvedScan* public_groups_scan = GetPublicGroupsScanOrNull(join, with_entries); - if (!public_groups_scan) { + if (public_groups_scan == nullptr) { // Not a public groups join. return std::nullopt; } MarkPublicGroupColumnsAsVisited(public_groups_scan->column_list()); if (!bound_contributions_across_groups_) { - return uid_column; + return uid_columns->user_data_uid_column_state; } // In case we bound the contribution across groups, we need to introduce a // second join later after the sample scan. @@ -3804,7 +3829,7 @@ PublicGroupsState::MaybeRewritePublicGroupsJoinAndReturnUid( join->set_left_scan(std::move(ref_scan)); } join->set_join_type(ResolvedJoinScan::INNER); - return uid_column; + return uid_columns->user_data_uid_column_state; } bool IsColumnListEqualToGroupByList(const ResolvedAggregateScanBase* node) { @@ -3864,7 +3889,7 @@ bool IsSelectDistinctSubquery(const ResolvedScan* node) { const ResolvedScan* TryResolveCTESubqueryOrReturnSame( const ResolvedScan* node, - const std::vector>& with_entries) { + absl::Span> with_entries) { if (!node->Is()) { return node; } @@ -3960,9 +3985,8 @@ absl::StatusOr PublicGroupsState::RecordPublicGroupsWithScan( public_groups_scan->GetAs()->with_query_name(); with_scan_already_present = true; } else { - with_query_name = - absl::StrCat("$public_groups", next_with_query_name_number_); - next_with_query_name_number_++; + with_query_name = global_public_groups_query_name_provider_ + ->GetNextPublicGroupsWithQueryName(); with_scan_already_present = false; } @@ -4149,9 +4173,11 @@ absl::Status ExtractPublicGroupColumns( absl::StatusOr> PublicGroupsState::CreateFromGroupByList( - const std::vector>& + absl::Span> group_by_list, - bool bound_contributions_across_groups, absl::string_view error_prefix) { + bool bound_contributions_across_groups, absl::string_view error_prefix, + GlobalPublicGroupsQueryNameProvider* + global_public_groups_query_name_provider) { absl::flat_hash_set public_group_columns; for (const std::unique_ptr& group : group_by_list) { @@ -4164,9 +4190,9 @@ PublicGroupsState::CreateFromGroupByList( public_group_column_ids.insert(column.column_id()); column_id_map.try_emplace(column.column_id(), column); } - return absl::WrapUnique( - new PublicGroupsState(column_id_map, public_group_column_ids, - bound_contributions_across_groups)); + return absl::WrapUnique(new PublicGroupsState( + column_id_map, public_group_column_ids, bound_contributions_across_groups, + global_public_groups_query_name_provider)); } template @@ -4174,7 +4200,7 @@ absl::StatusOr> RewriterVisitor::BoundRowsContributedToInputScan( const NodeType* original_input_scan, RewritePerUserTransformResult rewritten_per_user_transform, - int64_t max_rows_contributed) { + int64_t max_rows_contributed, bool filter_values_with_null_uid) { ZETASQL_ASSIGN_OR_RETURN(std::vector> options_list, ResolvedASTDeepCopyVisitor::CopyNodeList( GetOptions(original_input_scan))); @@ -4190,11 +4216,10 @@ RewriterVisitor::BoundRowsContributedToInputScan( GetResolvedColumn(rewritten_per_user_transform.inner_uid_column.get()) .value_or(ResolvedColumn()), std::make_unique( - resolver_, - /*filter_values_with_null_uid=*/public_groups_state_ != nullptr), - /*filter_values_with_null_uid=*/public_groups_state_ != nullptr); + resolver_, filter_values_with_null_uid), + filter_values_with_null_uid); ZETASQL_ASSIGN_OR_RETURN( - std::vector> aggregate_list, + std::vector> aggregate_list, aggregate_rewriter_visitor.RewriteAggregateColumns(original_input_scan)); ZETASQL_ASSIGN_OR_RETURN( @@ -4324,7 +4349,7 @@ class BoundGroupsAggregateFunctionCallResolver final }; void AppendResolvedComputedColumnsToList( - const std::vector>& + absl::Span> resolved_computed_columns, std::vector& out_column_list) { for (const std::unique_ptr& computed_column : @@ -4338,7 +4363,7 @@ absl::StatusOr> RewriterVisitor::IdentifyOrAddNoisyCountDistinctPrivacyIdsColumnToAggregateList( const NodeType* original_input_scan, const OuterAggregateListRewriterVisitor& outer_rewriter_visitor, - std::vector>& + std::vector>& outer_aggregate_list) { ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr noisy_count_distinct_privacy_ids_column, @@ -4354,7 +4379,7 @@ RewriterVisitor::IdentifyOrAddNoisyCountDistinctPrivacyIdsColumnToAggregateList( absl::StatusOr> RewriterVisitor::AddExactCountDistinctPrivacyIdsColumnToAggregateList( - std::vector>& + std::vector>& outer_aggregate_list) { // Create aggregate function call SUM(1). Since the "outer aggregation" // aggregates over the per privacy unit aggregates, this counts the number @@ -4370,7 +4395,9 @@ RewriterVisitor::AddExactCountDistinctPrivacyIdsColumnToAggregateList( "$anon", "$exact_count_distinct_privacy_units_col", sum->type()); outer_aggregate_list.emplace_back(MakeResolvedComputedColumn( exact_count_distinct_privacy_units_col, std::move(sum))); - return MakeColRef(outer_aggregate_list.back()->column()); + return MakeColRef(outer_aggregate_list.back() + ->GetAs() + ->column()); } template @@ -4379,7 +4406,7 @@ RewriterVisitor::AddGroupSelectionThresholding( const NodeType* original_input_scan, const OuterAggregateListRewriterVisitor& outer_rewriter_visitor, std::optional min_privacy_units_per_group, - std::vector>& + std::vector>& outer_aggregate_list) { ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr noisy_count_distinct_privacy_ids_expr, @@ -4439,8 +4466,7 @@ absl::StatusOr> GetMaxGroupsContributedOrDefault( max_groups_contributed = default_max_groups_contributed; std::unique_ptr max_groups_contributed_option = MakeResolvedOption( - /*qualifier=*/"", - std::string(default_max_groups_contributed_option_name), + /*qualifier=*/"", default_max_groups_contributed_option_name, MakeResolvedLiteral(Value::Int64(*max_groups_contributed))); resolved_anonymization_options.push_back( std::move(max_groups_contributed_option)); @@ -4455,7 +4481,8 @@ absl::StatusOr> RewriterVisitor::BoundGroupsContributedToInputScan( const NodeType* original_input_scan, RewritePerUserTransformResult rewritten_per_user_transform, - PrivacyOptionSpec privacy_option_spec) { + PrivacyOptionSpec privacy_option_spec, + PublicGroupsState* public_groups_state) { auto [rewritten_scan, inner_uid_column] = std::move(rewritten_per_user_transform); const ResolvedColumn inner_uid_resolved_column = @@ -4480,33 +4507,19 @@ RewriterVisitor::BoundGroupsContributedToInputScan( std::move(inner_group_by_list), std::move(inner_aggregate_list), /*grouping_set_list=*/{}, /*rollup_column_list=*/{}); - if (public_groups_state_ != nullptr) { - for (const std::unique_ptr& group : - original_input_scan->group_by_list()) { - // Track the column refs from this group column in the public group - // state to eventually use this information for rewriting the join expr. - std::vector> column_refs; - ZETASQL_RETURN_IF_ERROR(CollectColumnRefs(*group->expr(), &column_refs)); - for (const std::unique_ptr& column_ref : - column_refs) { - public_groups_state_->TrackColumnReplacement(column_ref->column(), - group->column()); - } - } - } - OuterAggregateListRewriterVisitor outer_rewriter_visitor( resolver_, inner_uid_resolved_column, std::make_unique( injected_col_map, resolver_, uid_column, - /*filter_values_with_null_uid=*/public_groups_state_ != nullptr), - /*filter_values_with_null_uid=*/public_groups_state_ != nullptr); + /*filter_values_with_null_uid=*/public_groups_state != nullptr), + /*filter_values_with_null_uid=*/public_groups_state != nullptr); ZETASQL_ASSIGN_OR_RETURN( - std::vector> outer_aggregate_list, + std::vector> + outer_aggregate_list, outer_rewriter_visitor.RewriteAggregateColumns(original_input_scan)); std::unique_ptr group_selection_threshold_expr; - if (public_groups_state_ == nullptr) { + if (public_groups_state == nullptr) { ZETASQL_ASSIGN_OR_RETURN(group_selection_threshold_expr, AddGroupSelectionThresholding( original_input_scan, outer_rewriter_visitor, @@ -4557,12 +4570,6 @@ RewriterVisitor::BoundGroupsContributedToInputScan( } if (privacy_option_spec.strategy == BOUNDING_MAX_GROUPS) { - if (public_groups_state_ != nullptr) { - for (const auto& [old_column, new_column] : injected_col_map) { - public_groups_state_->TrackColumnReplacement(old_column, new_column); - public_groups_state_->TrackColumnReplacement(new_column, old_column); - } - } ZETASQL_ASSIGN_OR_RETURN( std::optional max_groups_contributed, GetMaxGroupsContributedOrDefault( @@ -4571,10 +4578,11 @@ RewriterVisitor::BoundGroupsContributedToInputScan( NodeType>::kDefaultMaxGroupsContributedOptionName, resolver_->analyzer_options().default_anon_kappa_value(), resolved_anonymization_options)); - ZETASQL_ASSIGN_OR_RETURN(rewritten_scan, AddCrossPartitionSampleScan( - std::move(rewritten_scan), - max_groups_contributed, uid_column)); - if (public_groups_state_ != nullptr && max_groups_contributed.has_value()) { + ZETASQL_ASSIGN_OR_RETURN(rewritten_scan, + AddCrossPartitionSampleScan( + std::move(rewritten_scan), max_groups_contributed, + uid_column, public_groups_state)); + if (public_groups_state != nullptr && max_groups_contributed.has_value()) { // Replace columns in the group-by list. Since we modified the original // join to be an INNER JOIN and added an additional OUTER JOIN, we need to // make sure that the columns now matches with the OUTER JOIN. @@ -4584,7 +4592,7 @@ RewriterVisitor::BoundGroupsContributedToInputScan( ResolvedColumn column = group_by->expr()->GetAs()->column(); ResolvedColumn added_column = - public_groups_state_->FindPublicGroupColumnForAddedJoin(column) + public_groups_state->FindPublicGroupColumnForAddedJoin(column) .value_or(column); outer_group_by_list.emplace_back(MakeResolvedComputedColumn( group_by->column(), MakeColRef(added_column))); @@ -4599,6 +4607,25 @@ RewriterVisitor::BoundGroupsContributedToInputScan( std::move(resolved_anonymization_options)); } +// Combines information from the user-provided privacy options in the query and +// the default value as set via the analyzer. The default value in the analyzer +// is only used when the user did not explicitly specify the +// `max_groups_contributed` option. +bool BoundContributionsAcrossGroups(const PrivacyOptionSpec& privacy_options, + const AnalyzerOptions& analyzer_options) { + if (privacy_options.strategy != BOUNDING_MAX_GROUPS) { + return false; + } + if (privacy_options.max_groups_contributed.has_value()) { + return true; + } + if (analyzer_options.default_anon_kappa_value() > 0) { + return true; + } + // This might throw an error later in the code. + return false; +} + template absl::Status RewriterVisitor::VisitResolvedDifferentialPrivacyAggregateScanTemplate( @@ -4607,6 +4634,7 @@ RewriterVisitor::VisitResolvedDifferentialPrivacyAggregateScanTemplate( PrivacyOptionSpec::FromScanOptions( GetOptions(node), resolver_->language())); + std::unique_ptr public_groups_state; if (privacy_options_spec.group_selection_strategy == DifferentialPrivacyEnums::PUBLIC_GROUPS) { if (privacy_options_spec.min_privacy_units_per_group.has_value()) { @@ -4614,13 +4642,14 @@ RewriterVisitor::VisitResolvedDifferentialPrivacyAggregateScanTemplate( << "The MIN_PRIVACY_UNITS_PER_GROUP option must not be specified " "if GROUP_SELECTION_STRATEGY=PUBLIC_GROUPS"; } - const bool bound_contribution_across_groups = - privacy_options_spec.max_groups_contributed.has_value(); ZETASQL_ASSIGN_OR_RETURN( - public_groups_state_, + public_groups_state, PublicGroupsState::CreateFromGroupByList( - node->group_by_list(), bound_contribution_across_groups, - DPNodeSpecificData::kGroupSelectionErrorPrefix)); + node->group_by_list(), + BoundContributionsAcrossGroups(privacy_options_spec, + *analyzer_options_), + DPNodeSpecificData::kGroupSelectionErrorPrefix, + global_public_groups_query_name_provider_.get())); } ZETASQL_ASSIGN_OR_RETURN(std::optional options_uid_column, @@ -4629,9 +4658,9 @@ RewriterVisitor::VisitResolvedDifferentialPrivacyAggregateScanTemplate( ZETASQL_ASSIGN_OR_RETURN(RewritePerUserTransformResult rewrite_per_user_result, RewritePerUserTransform( node, DPNodeSpecificData::kSelectWithModeName, - options_uid_column)); + options_uid_column, public_groups_state.get())); - std::unique_ptr result; + std::unique_ptr result; if (privacy_options_spec.strategy == BOUNDING_MAX_ROWS) { ZETASQL_RET_CHECK(privacy_options_spec.max_rows_contributed.has_value()) << "There is no way to bound max rows without explicitly specifying " @@ -4645,23 +4674,38 @@ RewriterVisitor::VisitResolvedDifferentialPrivacyAggregateScanTemplate( "contribution bounding strategy BOUNDING_MAX_ROWS"); } - ZETASQL_ASSIGN_OR_RETURN(result, - BoundRowsContributedToInputScan( - node, std::move(rewrite_per_user_result), - privacy_options_spec.max_rows_contributed.value())); + ZETASQL_ASSIGN_OR_RETURN( + result, + BoundRowsContributedToInputScan( + node, std::move(rewrite_per_user_result), + privacy_options_spec.max_rows_contributed.value(), + /*filter_values_with_null_uid=*/public_groups_state != nullptr)); } else { // Note that `BoundGroupsContributedToInputScan` also handles the case where // no bounding is applied. - ZETASQL_ASSIGN_OR_RETURN(result, BoundGroupsContributedToInputScan( - node, std::move(rewrite_per_user_result), - privacy_options_spec)); + ZETASQL_ASSIGN_OR_RETURN(result, + BoundGroupsContributedToInputScan( + node, std::move(rewrite_per_user_result), + privacy_options_spec, public_groups_state.get())); } - if (public_groups_state_ != nullptr) { + if (public_groups_state != nullptr) { ZETASQL_RETURN_IF_ERROR(MaybeAttachParseLocation( - public_groups_state_->ValidateAllRequiredJoinsAdded( + public_groups_state->ValidateAllRequiredJoinsAdded( DPNodeSpecificData::kGroupSelectionErrorPrefix), *node)); + + ZETASQL_ASSIGN_OR_RETURN( + std::vector> with_entries, + public_groups_state->GetPublicGroupScansAsWithEntries(allocator_)); + if (!with_entries.empty()) { + // Wrap `result` in a `WithScan` with all required `WithEntry` objects for + // public groups. + std::vector column_list = result->column_list(); + result = MakeResolvedWithScan(std::move(column_list), + std::move(with_entries), std::move(result), + /*recursive=*/false); + } } ZETASQL_RETURN_IF_ERROR(AttachExtraNodeFields(*node, *result)); @@ -4735,8 +4779,9 @@ absl::Status RewriterVisitor::VisitResolvedProjectScan( const ResolvedProjectScan* node) { return MaybeAttachParseLocation(CopyVisitResolvedProjectScan(node), *node); } +} // namespace -absl::StatusOr> RewriteInternal( +absl::StatusOr> RewriteHelper( const ResolvedNode& tree, AnalyzerOptions options, ColumnFactory& column_factory, Catalog& catalog, TypeFactory& type_factory) { @@ -4764,39 +4809,4 @@ absl::StatusOr> RewriteInternal( return node; } -} // namespace - -class AnonymizationRewriter : public Rewriter { - public: - absl::StatusOr> Rewrite( - const AnalyzerOptions& options, const ResolvedNode& input, - Catalog& catalog, TypeFactory& type_factory, - AnalyzerOutputProperties& output_properties) const override { - ZETASQL_RET_CHECK(options.AllArenasAreInitialized()); - ColumnFactory column_factory(0, options.id_string_pool().get(), - options.column_id_sequence_number()); - ZETASQL_ASSIGN_OR_RETURN( - std::unique_ptr node, - RewriteInternal(input, options, column_factory, catalog, type_factory)); - return node; - } - - std::string Name() const override { return "AnonymizationRewriter"; } -}; - -absl::StatusOr RewriteForAnonymization( - const ResolvedNode& query, Catalog* catalog, TypeFactory* type_factory, - const AnalyzerOptions& analyzer_options, ColumnFactory& column_factory) { - RewriteForAnonymizationOutput result; - ZETASQL_ASSIGN_OR_RETURN(result.node, - RewriteInternal(query, analyzer_options, column_factory, - *catalog, *type_factory)); - return result; -} - -const Rewriter* GetAnonymizationRewriter() { - static const Rewriter* kRewriter = new AnonymizationRewriter; - return kRewriter; -} - } // namespace zetasql diff --git a/zetasql/analyzer/rewriters/anonymization_helper.h b/zetasql/analyzer/rewriters/anonymization_helper.h new file mode 100644 index 000000000..aeb24273d --- /dev/null +++ b/zetasql/analyzer/rewriters/anonymization_helper.h @@ -0,0 +1,35 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_ANALYZER_REWRITERS_ANONYMIZATION_HELPER_H_ +#define ZETASQL_ANALYZER_REWRITERS_ANONYMIZATION_HELPER_H_ + +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/catalog.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/resolved_ast/resolved_node.h" +#include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/status/statusor.h" + +namespace zetasql { + +absl::StatusOr> RewriteHelper( + const ResolvedNode& tree, AnalyzerOptions options, + ColumnFactory& column_factory, Catalog& catalog, TypeFactory& type_factory); + +} // namespace zetasql + +#endif // ZETASQL_ANALYZER_REWRITERS_ANONYMIZATION_HELPER_H_ diff --git a/zetasql/analyzer/rewriters/anonymization_rewriter.cc b/zetasql/analyzer/rewriters/anonymization_rewriter.cc new file mode 100644 index 000000000..634b095ef --- /dev/null +++ b/zetasql/analyzer/rewriters/anonymization_rewriter.cc @@ -0,0 +1,67 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/analyzer/rewriters/anonymization_rewriter.h" + +#include +#include + +#include "zetasql/analyzer/rewriters/anonymization_helper.h" +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/catalog.h" +#include "zetasql/public/rewriter_interface.h" +#include "zetasql/resolved_ast/resolved_node.h" +#include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/status/statusor.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +class AnonymizationRewriter : public Rewriter { + public: + absl::StatusOr> Rewrite( + const AnalyzerOptions& options, const ResolvedNode& input, + Catalog& catalog, TypeFactory& type_factory, + AnalyzerOutputProperties& output_properties) const override { + ZETASQL_RET_CHECK(options.AllArenasAreInitialized()); + ColumnFactory column_factory(0, options.id_string_pool().get(), + options.column_id_sequence_number()); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr node, + RewriteHelper(input, options, column_factory, catalog, type_factory)); + return node; + } + + std::string Name() const override { return "AnonymizationRewriter"; } +}; + +absl::StatusOr RewriteForAnonymization( + const ResolvedNode& query, Catalog* catalog, TypeFactory* type_factory, + const AnalyzerOptions& analyzer_options, ColumnFactory& column_factory) { + RewriteForAnonymizationOutput result; + ZETASQL_ASSIGN_OR_RETURN(result.node, + RewriteHelper(query, analyzer_options, column_factory, + *catalog, *type_factory)); + return result; +} + +const Rewriter* GetAnonymizationRewriter() { + static const Rewriter* kRewriter = new AnonymizationRewriter; + return kRewriter; +} + +} // namespace zetasql diff --git a/zetasql/analyzer/anonymization_rewriter.h b/zetasql/analyzer/rewriters/anonymization_rewriter.h similarity index 87% rename from zetasql/analyzer/anonymization_rewriter.h rename to zetasql/analyzer/rewriters/anonymization_rewriter.h index 4a3c257d2..e34138dfb 100644 --- a/zetasql/analyzer/anonymization_rewriter.h +++ b/zetasql/analyzer/rewriters/anonymization_rewriter.h @@ -14,19 +14,12 @@ // limitations under the License. // -#ifndef ZETASQL_ANALYZER_ANONYMIZATION_REWRITER_H_ -#define ZETASQL_ANALYZER_ANONYMIZATION_REWRITER_H_ - -#include +#ifndef ZETASQL_ANALYZER_REWRITERS_ANONYMIZATION_REWRITER_H_ +#define ZETASQL_ANALYZER_REWRITERS_ANONYMIZATION_REWRITER_H_ #include "zetasql/public/analyzer_options.h" #include "zetasql/public/catalog.h" -#include "zetasql/public/rewriter_interface.h" -#include "zetasql/public/type.h" -#include "zetasql/resolved_ast/resolved_ast.h" -#include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/rewrite_utils.h" -#include "absl/base/attributes.h" #include "absl/container/flat_hash_map.h" #include "absl/status/statusor.h" @@ -61,7 +54,7 @@ struct RewriteForAnonymizationOutput { // through this rewriter to apply the differential privacy semantic rewrites // before executing. // -// Calls to RewriteForAnonymization are not idempotent, don't parse and rewrite +// Calls to RewriteForAnonymization are not idempotent. Don't parse and rewrite // a query, unparse with SqlBuilder, and parse and rewrite again. // TODO: add a state enum to ResolvedAnonymizedAggregateScan to // track rewrite status @@ -80,4 +73,4 @@ const Rewriter* GetAnonymizationRewriter(); } // namespace zetasql -#endif // ZETASQL_ANALYZER_ANONYMIZATION_REWRITER_H_ +#endif // ZETASQL_ANALYZER_REWRITERS_ANONYMIZATION_REWRITER_H_ diff --git a/zetasql/analyzer/rewriters/array_functions_rewriter.cc b/zetasql/analyzer/rewriters/array_functions_rewriter.cc index fdc7ad116..d64009457 100644 --- a/zetasql/analyzer/rewriters/array_functions_rewriter.cc +++ b/zetasql/analyzer/rewriters/array_functions_rewriter.cc @@ -32,11 +32,11 @@ #include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" #include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/container/flat_hash_map.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" -#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -65,7 +65,7 @@ class ArrayFunctionRewriteVisitor : public ResolvedASTDeepCopyVisitor { // Rewrites array function calls after choosing the appropriate template absl::Status VisitResolvedFunctionCall( const ResolvedFunctionCall* node) override { - // If not empty, the template that has null hanlding and ordering. + // If not empty, the template that has null handling and ordering. absl::string_view rewrite_template; switch (node->signature().context_id()) { diff --git a/zetasql/analyzer/rewriters/builtin_function_inliner.cc b/zetasql/analyzer/rewriters/builtin_function_inliner.cc index 8b719a2e6..b50a318d3 100644 --- a/zetasql/analyzer/rewriters/builtin_function_inliner.cc +++ b/zetasql/analyzer/rewriters/builtin_function_inliner.cc @@ -24,6 +24,7 @@ #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/rewriter_interface.h" #include "zetasql/public/types/type_factory.h" @@ -31,6 +32,7 @@ #include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" #include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/container/flat_hash_map.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" diff --git a/zetasql/analyzer/rewriters/flatten_rewriter.cc b/zetasql/analyzer/rewriters/flatten_rewriter.cc index 9004611c7..b91dc1148 100644 --- a/zetasql/analyzer/rewriters/flatten_rewriter.cc +++ b/zetasql/analyzer/rewriters/flatten_rewriter.cc @@ -20,7 +20,6 @@ #include #include "zetasql/public/analyzer_options.h" -#include "zetasql/public/analyzer_output.h" #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/catalog.h" #include "zetasql/public/options.pb.h" @@ -37,7 +36,6 @@ #include "zetasql/resolved_ast/rewrite_utils.h" #include "absl/status/status.h" #include "absl/status/statusor.h" -#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -81,7 +79,7 @@ class FlattenRewriterVisitor : public ResolvedASTDeepCopyVisitor { // The result is the last column in the output scan's column list. absl::StatusOr> FlattenToScan( std::unique_ptr flatten_expr, - const std::vector>& get_field_list, + absl::Span> get_field_list, std::unique_ptr input_scan, bool order_results, bool in_subquery); @@ -91,10 +89,14 @@ class FlattenRewriterVisitor : public ResolvedASTDeepCopyVisitor { absl::Status FlattenRewriterVisitor::VisitResolvedArrayScan( const ResolvedArrayScan* node) { - if (!node->array_expr()->Is()) { + // Multiway UNNEST with more than one array, if containing any FLATTEN + // expression, will be handled by `VisitResolvedFlatten` instead. + if (node->array_expr_list_size() > 1 || + !node->array_expr_list(0)->Is()) { return CopyVisitResolvedArrayScan(node); } - const ResolvedFlatten* flatten = node->array_expr()->GetAs(); + const ResolvedFlatten* flatten = + node->array_expr_list(0)->GetAs(); ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr input_scan, ProcessNode(node->input_scan())); ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr join_expr, @@ -139,10 +141,15 @@ absl::Status FlattenRewriterVisitor::VisitResolvedArrayScan( /*in_expr=*/nullptr, std::move(scan)); ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr offset_column, ProcessNode(node->array_offset_column())); + std::vector> array_expr_list; + array_expr_list.push_back(std::move(subquery)); + std::vector element_column_list; + element_column_list.push_back(node->element_column()); PushNodeToStack(MakeResolvedArrayScan( - node->column_list(), std::move(input_scan), std::move(subquery), - node->element_column(), std::move(offset_column), std::move(join_expr), - node->is_outer())); + node->column_list(), std::move(input_scan), std::move(array_expr_list), + std::move(element_column_list), std::move(offset_column), + std::move(join_expr), node->is_outer(), + /*array_zip_mode=*/nullptr)); return absl::OkStatus(); } @@ -228,7 +235,7 @@ absl::Status FlattenRewriterVisitor::VisitResolvedFlatten( absl::StatusOr> FlattenRewriterVisitor::FlattenToScan( std::unique_ptr flatten_expr, - const std::vector>& get_field_list, + absl::Span> get_field_list, std::unique_ptr input_scan, bool order_results, bool in_subquery) { std::vector column_list; @@ -246,10 +253,17 @@ FlattenRewriterVisitor::FlattenToScan( column_list.push_back(offset_column); } + std::vector> array_expr_list_233; + array_expr_list_233.push_back(std::move(flatten_expr)); + std::vector element_column_list_233; + element_column_list_233.push_back(column); std::unique_ptr scan = MakeResolvedArrayScan( - column_list, std::move(input_scan), std::move(flatten_expr), column, + column_list, std::move(input_scan), std::move(array_expr_list_233), + std::move(element_column_list_233), order_results ? MakeResolvedColumnHolder(offset_column) : nullptr, - /*join_expr=*/nullptr, /*is_outer=*/false); + /*join_expr=*/nullptr, + /*is_outer=*/false, + /*array_zip_mode=*/nullptr); // Keep track of pending Get*Field on non-array fields. std::unique_ptr input; @@ -299,11 +313,17 @@ FlattenRewriterVisitor::FlattenToScan( offset_columns.push_back(offset_column); column_list.push_back(offset_column); } + std::vector> array_expr_list; + array_expr_list.push_back(std::move(get_field)); + std::vector element_column_list; + element_column_list.push_back(column); scan = MakeResolvedArrayScan( - column_list, std::move(scan), std::move(get_field), column, + column_list, std::move(scan), std::move(array_expr_list), + std::move(element_column_list), order_results ? MakeResolvedColumnHolder(offset_column) : nullptr, /*join_expr=*/nullptr, - /*is_outer=*/false); + /*is_outer=*/false, + /*array_zip_mode=*/nullptr); } } diff --git a/zetasql/analyzer/rewriters/grouping_set_rewriter.cc b/zetasql/analyzer/rewriters/grouping_set_rewriter.cc index d1a69d7a5..1f1704ca2 100644 --- a/zetasql/analyzer/rewriters/grouping_set_rewriter.cc +++ b/zetasql/analyzer/rewriters/grouping_set_rewriter.cc @@ -36,6 +36,7 @@ #include "zetasql/resolved_ast/resolved_ast_rewrite_visitor.h" #include "zetasql/resolved_ast/resolved_column.h" #include "zetasql/resolved_ast/resolved_node.h" +#include "absl/algorithm/container.h" #include "absl/container/flat_hash_set.h" #include "zetasql/base/check.h" #include "absl/status/status.h" @@ -56,45 +57,49 @@ namespace { // OOO issues. absl::StatusOr ShouldRewrite(const ResolvedAggregateScanBase* node, const GroupingSetRewriteOptions& options) { + // If the aggregate scan has an empty grouping_set_list, it's a regular group + // by query and won't be rewritten, skip all following checks. + if (node->grouping_set_list().empty()) { + return false; + } + + // This is a grouping sets/rollup/cube query. if (node->grouping_set_list_size() > options.max_grouping_sets()) { return absl::InvalidArgumentError(absl::StrFormat( "At most %d grouping sets are allowed, but %d were provided", options.max_grouping_sets(), node->grouping_set_list_size())); } + if (node->group_by_list_size() > options.max_columns_in_grouping_set()) { + return absl::InvalidArgumentError(absl::StrFormat( + "At most %d distinct columns are allowed in grouping " + "sets, but %d were provided", + options.max_columns_in_grouping_set(), node->group_by_list_size())); + } bool should_rewrite = false; int64_t grouping_set_count = 0; - absl::flat_hash_set distinct_grouping_set_columns; for (const std::unique_ptr& grouping_set_base : node->grouping_set_list()) { ZETASQL_RET_CHECK(grouping_set_base->Is() || grouping_set_base->Is() || grouping_set_base->Is()); if (grouping_set_base->Is()) { - const ResolvedGroupingSet* grouping_set = - grouping_set_base->GetAs(); - for (const auto& column_ref : grouping_set->group_by_column_list()) { - zetasql_base::InsertIfNotPresent(&distinct_grouping_set_columns, - column_ref->column()); - } grouping_set_count++; } else if (grouping_set_base->Is()) { const ResolvedRollup* rollup = grouping_set_base->GetAs(); - for (const auto& multi_column : rollup->rollup_column_list()) { - for (const auto& column_ref : multi_column->column_list()) { - distinct_grouping_set_columns.insert(column_ref->column()); - } - } grouping_set_count += rollup->rollup_column_list_size() + 1; should_rewrite = true; } else { const ResolvedCube* cube = grouping_set_base->GetAs(); - for (const auto& multi_column : cube->cube_column_list()) { - for (const auto& column_ref : multi_column->column_list()) { - distinct_grouping_set_columns.insert(column_ref->column()); - } + // This is a hard limit of the number of columns in cube, to avoid the + // following computation overflow. The same check will be applied in the + // CUBE expansion method too. + int cube_size = cube->cube_column_list_size(); + if (cube_size > 31) { + return absl::InvalidArgumentError( + "Cube can not have more than 31 elements"); } - grouping_set_count += 1ull << cube->cube_column_list_size(); + grouping_set_count += 1ull << cube_size; should_rewrite = true; } if (grouping_set_count > options.max_grouping_sets()) { @@ -102,14 +107,6 @@ absl::StatusOr ShouldRewrite(const ResolvedAggregateScanBase* node, "At most %d grouping sets are allowed, but %d were provided", options.max_grouping_sets(), grouping_set_count)); } - if (distinct_grouping_set_columns.size() > - options.max_columns_in_grouping_set()) { - return absl::InvalidArgumentError( - absl::StrFormat("At most %d distinct columns are allowed in grouping " - "sets, but %d were provided", - options.max_columns_in_grouping_set(), - distinct_grouping_set_columns.size())); - } } return should_rewrite; } diff --git a/zetasql/analyzer/rewriters/insert_dml_values_rewriter.cc b/zetasql/analyzer/rewriters/insert_dml_values_rewriter.cc new file mode 100644 index 000000000..54d70cd66 --- /dev/null +++ b/zetasql/analyzer/rewriters/insert_dml_values_rewriter.cc @@ -0,0 +1,191 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/analyzer/rewriters/insert_dml_values_rewriter.h" + +#include +#include +#include +#include + +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/analyzer_output_properties.h" +#include "zetasql/public/catalog.h" +#include "zetasql/public/options.pb.h" +#include "zetasql/public/rewriter_interface.h" +#include "zetasql/public/sql_view.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/resolved_ast/resolved_ast_builder.h" +#include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" +#include "zetasql/resolved_ast/resolved_ast_rewrite_visitor.h" +#include "zetasql/resolved_ast/resolved_column.h" +#include "zetasql/resolved_ast/resolved_node.h" +#include "zetasql/resolved_ast/resolved_node_kind.pb.h" +#include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/status/statusor.h" +#include "absl/types/optional.h" +#include "zetasql/base/no_destructor.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_builder.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +class InsertDmlValuesRewriteVisitor : public ResolvedASTRewriteVisitor { + public: + explicit InsertDmlValuesRewriteVisitor(ColumnFactory& column_factory) + : column_factory_(column_factory) {} + + private: + // This api takes a ResolvedInsertRow of a ResolvedInsertStatement and returns + // the equivalent query represented by a ResolvedProjectScan. + // For example, if insert statement is "INSERT INTO T VALUES (1)", the + // ResolvedInsertRow will have Literals representing value 1. This api will + // return equivalent query for this row which is "SELECT 1". + absl::StatusOr> + GetRowAsProjectScan(const ResolvedInsertRow* row, + const ResolvedInsertStmt* node) { + std::vector columns; + std::vector> expr_list; + std::vector> values = + const_cast(row)->release_value_list(); + for (int i = 0; i < values.size(); ++i) { + std::unique_ptr expr = + const_cast(values[i].get())->release_value(); + if (expr->node_kind() == RESOLVED_DMLDEFAULT) { + // If a column is being inserted with explicit DEFAULT keyword used to + // specify, the default value of the column, inline the default + // expression of the column while rewrites. + const Table* table = node->table_scan()->table(); + const Column* column = + table->FindColumnByName(node->insert_column_list()[i].name()); + ZETASQL_RET_CHECK(column != nullptr); + ZETASQL_RET_CHECK(column->HasDefaultExpression()) + << "No default expression exists for the column specified with " + "DEFAULT keyword in insert dml"; + // For default expressions we are inlining the default expression + // owned by the catalog in the insert statement. Hence, creating a copy + // of the catalog owned expression before inlining. + ZETASQL_ASSIGN_OR_RETURN(expr, + ResolvedASTDeepCopyVisitor::Copy( + column->GetExpression()->GetResolvedExpression())); + } + ResolvedColumn select_column = column_factory_.MakeCol( + node->table_scan()->table()->Name(), "$col", expr->annotated_type()); + columns.push_back(select_column); + ZETASQL_ASSIGN_OR_RETURN(auto computed_column, ResolvedComputedColumnBuilder() + .set_column(select_column) + .set_expr(std::move(expr)) + .Build()); + expr_list.push_back(std::move(computed_column)); + } + return ResolvedProjectScanBuilder() + .set_column_list(columns) + .set_expr_list(std::move(expr_list)) + .set_input_scan(ResolvedSingleRowScanBuilder()) + .Build(); + } + + // This api takes in the resolved insert statement node which represents + // literals being inserted , for example "INSERT INTO T VALUES (1),(2)" and + // returns a rewritten resolved insert statement node rewriting the literals + // to corresponding select... union all queries, for example for above case, + // "INSERT INTO T SELECT 1 UNION ALL SELECT 2" + + absl::StatusOr> + PostVisitResolvedInsertStmt( + std::unique_ptr node) override { + // We takes in the resolved insert statement node which represents + // literals being inserted , for example "INSERT INTO T VALUES (1),(2)" and + // return a rewritten resolved insert statement node rewriting the literals + // to corresponding select... union all queries, for example for above case, + // "INSERT INTO T SELECT 1 UNION ALL SELECT 2" + ZETASQL_RET_CHECK(!node->row_list().empty()); + ResolvedSetOperationScanBuilder union_all_builder; + for (const std::unique_ptr& row : + node->row_list()) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr scan, + GetRowAsProjectScan(row.get(), node.get())); + ResolvedColumnList column_list = scan->column_list(); + union_all_builder.add_input_item_list( + ResolvedSetOperationItemBuilder() + .set_output_column_list(std::move(column_list)) + .set_scan(std::move(scan))); + } + + ZETASQL_RET_CHECK(!union_all_builder.input_item_list().empty()); + + std::vector output_columns; + for (const ResolvedColumn& column : + union_all_builder.input_item_list()[0]->scan()->column_list()) { + output_columns.push_back(column_factory_.MakeCol( + node->table_scan()->table()->Name(), column.name(), column.type())); + } + ResolvedInsertStmtBuilder insert_stmt_builder = + ToBuilder(std::move(node)) + .set_row_list( + std::vector>{}); + + if (union_all_builder.input_item_list().size() == 1) { + ResolvedColumnList query_output_column_list = + union_all_builder.input_item_list()[0]->scan()->column_list(); + return std::move(insert_stmt_builder) + .set_query( + ToBuilder( + std::move( + union_all_builder.release_input_item_list().front())) + .release_scan()) + .set_query_output_column_list(std::move(query_output_column_list)) + .Build(); + } + + ResolvedColumnList query_output_column_list(output_columns); + return std::move(insert_stmt_builder) + .set_query( + union_all_builder.set_op_type(ResolvedSetOperationScan::UNION_ALL) + .set_column_list(std::move(output_columns))) + .set_query_output_column_list(std::move(query_output_column_list)) + .Build(); + } + + ColumnFactory& column_factory_; +}; + +class InsertDmlValuesRewriter : public Rewriter { + public: + absl::StatusOr> Rewrite( + const AnalyzerOptions& options, const ResolvedNode& input, + Catalog& catalog, TypeFactory& type_factory, + AnalyzerOutputProperties& output_properties) const override { + ZETASQL_RET_CHECK(options.column_id_sequence_number() != nullptr); + ColumnFactory column_factory(0, options.id_string_pool().get(), + options.column_id_sequence_number()); + InsertDmlValuesRewriteVisitor rewriter(column_factory); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr copied_node, + ResolvedASTDeepCopyVisitor::Copy(&input)); + return rewriter.VisitAll(std::move(copied_node)); + } + + std::string Name() const override { return "InsertDmlValuesRewriter"; } +}; + +const Rewriter* GetInsertDmlValuesRewriter() { + static const zetasql_base::NoDestructor dmlValuesRewriter; + return dmlValuesRewriter.get(); +} + +} // namespace zetasql diff --git a/zetasql/analyzer/rewriters/insert_dml_values_rewriter.h b/zetasql/analyzer/rewriters/insert_dml_values_rewriter.h new file mode 100644 index 000000000..e012776fa --- /dev/null +++ b/zetasql/analyzer/rewriters/insert_dml_values_rewriter.h @@ -0,0 +1,35 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_ANALYZER_REWRITERS_INSERT_DML_VALUES_REWRITER_H_ +#define ZETASQL_ANALYZER_REWRITERS_INSERT_DML_VALUES_REWRITER_H_ + +#include "zetasql/public/rewriter_interface.h" + +namespace zetasql { + +// Gets a pointer to the dml values rewriter. This is a global object which +// is never destructed. +// This rewriter rewrites insert dmls with values to select queries representing +// the corresponding values. +// For example, below query +// INSERT INTO T VALUES (1),(2) will be rewritten to +// INSERT INTO T SELECT 1 UNION ALL SELECT 2 +const Rewriter* GetInsertDmlValuesRewriter(); + +} // namespace zetasql + +#endif // ZETASQL_ANALYZER_REWRITERS_INSERT_DML_VALUES_REWRITER_H_ diff --git a/zetasql/analyzer/rewriters/like_any_all_rewriter.cc b/zetasql/analyzer/rewriters/like_any_all_rewriter.cc index 7284c4be8..14cce65b4 100644 --- a/zetasql/analyzer/rewriters/like_any_all_rewriter.cc +++ b/zetasql/analyzer/rewriters/like_any_all_rewriter.cc @@ -16,8 +16,8 @@ #include "zetasql/analyzer/rewriters/like_any_all_rewriter.h" -#include #include +#include #include #include #include @@ -28,8 +28,8 @@ #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/catalog.h" #include "zetasql/public/rewriter_interface.h" +#include "zetasql/public/types/type.h" #include "zetasql/public/types/type_factory.h" -#include "zetasql/public/value.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" #include "zetasql/resolved_ast/resolved_node.h" @@ -72,6 +72,31 @@ constexpr absl::string_view kLikeAnyTemplate = R"( FROM UNNEST(patterns) as pattern) )"; +// Template for rewriting NOT LIKE ANY +// This template is used for a newer implementation of LIKE ANY with NOT +// operator. Details in (broken link) +// Arguments: +// input - STRING or BYTES +// patterns - ARRAY or ARRAY +// Returns: BOOL +// Semantic Rules: +// If patterns is empty or NULL, return FALSE +// If input is NULL, return NULL +// If LOGICAL_OR(input NOT LIKE pattern) is TRUE, return TRUE +// If patterns contains any NULL values, return NULL +// Otherwise, return FALSE +constexpr absl::string_view kNotLikeAnyTemplate = R"( +(SELECT + CASE + WHEN patterns IS NULL OR ARRAY_LENGTH(patterns) = 0 THEN FALSE + WHEN input IS NULL THEN NULL + WHEN LOGICAL_OR(input NOT LIKE pattern) THEN TRUE + WHEN LOGICAL_OR(pattern IS NULL) THEN NULL + ELSE FALSE + END +FROM UNNEST(patterns) as pattern) +)"; + // Template for rewriting LIKE ALL with null handling for cases: // SELECT LIKE ALL UNNEST([]) -> TRUE // SELECT NULL LIKE ALL {{UNNEST()|()}} -> NULL @@ -98,6 +123,31 @@ constexpr absl::string_view kLikeAllTemplate = R"( FROM UNNEST(patterns) as pattern) )"; +// Template for rewriting NOT LIKE ALL +// This template is used for a newer implementation of LIKE ALL with NOT +// operator. Details in (broken link) +// Arguments: +// input - STRING or BYTES +// patterns - ARRAY or ARRAY +// Returns: BOOL +// Semantic Rules: +// If patterns is empty or NULL, return TRUE +// If input is NULL, return NULL +// If input not like pattern is FALSE, return FALSE +// If patterns contains any NULL values, return NULL +// Otherwise, return TRUE +constexpr absl::string_view kNotLikeAllTemplate = R"( +(SELECT + CASE + WHEN patterns IS NULL OR ARRAY_LENGTH(patterns) = 0 THEN TRUE + WHEN input IS NULL THEN NULL + WHEN NOT LOGICAL_AND(input NOT LIKE pattern) THEN FALSE + WHEN LOGICAL_OR(pattern IS NULL) THEN NULL + ELSE TRUE + END + FROM UNNEST(patterns) as pattern) +)"; + class LikeAnyAllRewriteVisitor : public ResolvedASTDeepCopyVisitor { public: LikeAnyAllRewriteVisitor(const AnalyzerOptions* analyzer_options, @@ -112,14 +162,16 @@ class LikeAnyAllRewriteVisitor : public ResolvedASTDeepCopyVisitor { const ResolvedFunctionCall* node) override; // Rewrites a function of the form: - // input LIKE {{ANY|ALL}} (pattern1, [...]) + // input [NOT] LIKE {{ANY|ALL}} (pattern1, [...]) // to use the LOGICAL_OR aggregation function with the LIKE operator - absl::Status RewriteLikeAnyAll(const ResolvedFunctionCall* node); + absl::Status RewriteLikeAnyAll(const ResolvedFunctionCall* node, + absl::string_view rewrite_template); // Rewrites a function of the form: - // input LIKE {{ANY|ALL}} UNNEST() + // input [NOT] LIKE {{ANY|ALL}} UNNEST() // to use the LOGICAL_OR aggregation function with the LIKE operator - absl::Status RewriteLikeAnyAllArray(const ResolvedFunctionCall* node); + absl::Status RewriteLikeAnyAllArray(const ResolvedFunctionCall* node, + absl::string_view rewrite_template); absl::Status RewriteLikeAnyAllArrayWithAggregate( std::unique_ptr input_expr, @@ -132,24 +184,105 @@ class LikeAnyAllRewriteVisitor : public ResolvedASTDeepCopyVisitor { TypeFactory* type_factory_; }; +struct LikeAnyAllRewriterConfig { + enum RewriterVariant { kLikeAnyAll, kLikeAnyAllArray }; + + RewriterVariant rewriter_variant; + const absl::string_view rewrite_template; +}; + +static bool IsLikeAnyFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_LIKE_ANY) || + IsBuiltInFunctionIdEq(node, FN_BYTE_LIKE_ANY); +} + +static bool IsNotLikeAnyFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_NOT_LIKE_ANY) || + IsBuiltInFunctionIdEq(node, FN_BYTE_NOT_LIKE_ANY); +} + +static bool IsLikeAllFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_LIKE_ALL) || + IsBuiltInFunctionIdEq(node, FN_BYTE_LIKE_ALL); +} + +static bool IsNotLikeAllFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_NOT_LIKE_ALL) || + IsBuiltInFunctionIdEq(node, FN_BYTE_NOT_LIKE_ALL); +} + +static bool IsLikeAnyArrayFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_ARRAY_LIKE_ANY) || + IsBuiltInFunctionIdEq(node, FN_BYTE_ARRAY_LIKE_ANY); +} + +static bool IsNotLikeAnyArrayFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_ARRAY_NOT_LIKE_ANY) || + IsBuiltInFunctionIdEq(node, FN_BYTE_ARRAY_NOT_LIKE_ANY); +} + +static bool IsLikeAllArrayFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_ARRAY_LIKE_ALL) || + IsBuiltInFunctionIdEq(node, FN_BYTE_ARRAY_LIKE_ALL); +} + +static bool IsNotLikeAllArrayFunctionNode(const ResolvedFunctionCall* node) { + return IsBuiltInFunctionIdEq(node, FN_STRING_ARRAY_NOT_LIKE_ALL) || + IsBuiltInFunctionIdEq(node, FN_BYTE_ARRAY_NOT_LIKE_ALL); +} + +static std::optional GetRewriterConfig( + const ResolvedFunctionCall* node) { + if (IsLikeAnyFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAll, + kLikeAnyTemplate}; + } else if (IsNotLikeAnyFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAll, + kNotLikeAnyTemplate}; + } else if (IsLikeAllFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAll, + kLikeAllTemplate}; + } else if (IsNotLikeAllFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAll, + kNotLikeAllTemplate}; + } else if (IsLikeAnyArrayFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAllArray, + kLikeAnyTemplate}; + } else if (IsNotLikeAnyArrayFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAllArray, + kNotLikeAnyTemplate}; + } else if (IsLikeAllArrayFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAllArray, + kLikeAllTemplate}; + } else if (IsNotLikeAllArrayFunctionNode(node)) { + return LikeAnyAllRewriterConfig{LikeAnyAllRewriterConfig::kLikeAnyAllArray, + kNotLikeAllTemplate}; + } + return std::nullopt; +} + absl::Status LikeAnyAllRewriteVisitor::VisitResolvedFunctionCall( const ResolvedFunctionCall* node) { - if (IsBuiltInFunctionIdEq(node, FN_STRING_LIKE_ANY) || - IsBuiltInFunctionIdEq(node, FN_BYTE_LIKE_ANY) || - IsBuiltInFunctionIdEq(node, FN_STRING_LIKE_ALL) || - IsBuiltInFunctionIdEq(node, FN_BYTE_LIKE_ALL)) { - return RewriteLikeAnyAll(node); - } else if (IsBuiltInFunctionIdEq(node, FN_STRING_ARRAY_LIKE_ANY) || - IsBuiltInFunctionIdEq(node, FN_BYTE_ARRAY_LIKE_ANY) || - IsBuiltInFunctionIdEq(node, FN_STRING_ARRAY_LIKE_ALL) || - IsBuiltInFunctionIdEq(node, FN_BYTE_ARRAY_LIKE_ALL)) { - return RewriteLikeAnyAllArray(node); + std::optional optional_rewriter_info = + GetRewriterConfig(node); + if (!optional_rewriter_info.has_value()) { + return CopyVisitResolvedFunctionCall(node); + } + + LikeAnyAllRewriterConfig& rewriter_info = optional_rewriter_info.value(); + switch (rewriter_info.rewriter_variant) { + case LikeAnyAllRewriterConfig::kLikeAnyAll: + return RewriteLikeAnyAll(node, rewriter_info.rewrite_template); + case LikeAnyAllRewriterConfig::kLikeAnyAllArray: + return RewriteLikeAnyAllArray(node, rewriter_info.rewrite_template); + default: + ZETASQL_RET_CHECK_FAIL() + << "All enum value are covered above, should never reach here."; } - return CopyVisitResolvedFunctionCall(node); } absl::Status LikeAnyAllRewriteVisitor::RewriteLikeAnyAll( - const ResolvedFunctionCall* node) { + const ResolvedFunctionCall* node, absl::string_view rewrite_template) { ZETASQL_RET_CHECK_GE(node->argument_list_size(), 2) << "LIKE ANY should have at least 2 arguments. Got: " << node->DebugString(); @@ -174,19 +307,13 @@ absl::Status LikeAnyAllRewriteVisitor::RewriteLikeAnyAll( ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr patterns_array_expr, fn_builder_.MakeArray(input_type, pattern_elements_list)); - if (IsBuiltInFunctionIdEq(node, FN_STRING_LIKE_ANY) || - IsBuiltInFunctionIdEq(node, FN_BYTE_LIKE_ANY)) { - return RewriteLikeAnyAllArrayWithAggregate(std::move(rewritten_input_expr), - std::move(patterns_array_expr), - kLikeAnyTemplate); - } return RewriteLikeAnyAllArrayWithAggregate(std::move(rewritten_input_expr), std::move(patterns_array_expr), - kLikeAllTemplate); + rewrite_template); } absl::Status LikeAnyAllRewriteVisitor::RewriteLikeAnyAllArray( - const ResolvedFunctionCall* node) { + const ResolvedFunctionCall* node, absl::string_view rewrite_template) { // Extract LIKE ANY arguments when given an array of patterns ZETASQL_RET_CHECK_EQ(node->argument_list_size(), 2) << "LIKE ANY with UNNEST has exactly 2 arguments. Got: " @@ -201,16 +328,9 @@ absl::Status LikeAnyAllRewriteVisitor::RewriteLikeAnyAllArray( ProcessNode(input_expr)); ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr rewritten_patterns_array_expr, ProcessNode(patterns_array_expr)); - - if (IsBuiltInFunctionIdEq(node, FN_STRING_ARRAY_LIKE_ANY) || - IsBuiltInFunctionIdEq(node, FN_BYTE_ARRAY_LIKE_ANY)) { - return RewriteLikeAnyAllArrayWithAggregate( - std::move(rewritten_input_expr), - std::move(rewritten_patterns_array_expr), kLikeAnyTemplate); - } return RewriteLikeAnyAllArrayWithAggregate( std::move(rewritten_input_expr), std::move(rewritten_patterns_array_expr), - kLikeAllTemplate); + rewrite_template); } absl::Status LikeAnyAllRewriteVisitor::RewriteLikeAnyAllArrayWithAggregate( diff --git a/zetasql/analyzer/rewriters/map_function_rewriter.cc b/zetasql/analyzer/rewriters/map_function_rewriter.cc index d8cccaf0c..b9554fc97 100644 --- a/zetasql/analyzer/rewriters/map_function_rewriter.cc +++ b/zetasql/analyzer/rewriters/map_function_rewriter.cc @@ -21,7 +21,6 @@ #include "zetasql/analyzer/substitute.h" #include "zetasql/public/analyzer_options.h" -#include "zetasql/public/analyzer_output.h" #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/catalog.h" @@ -39,7 +38,6 @@ #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" -#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" diff --git a/zetasql/analyzer/rewriters/multiway_unnest_rewriter.cc b/zetasql/analyzer/rewriters/multiway_unnest_rewriter.cc new file mode 100644 index 000000000..bb4660228 --- /dev/null +++ b/zetasql/analyzer/rewriters/multiway_unnest_rewriter.cc @@ -0,0 +1,914 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/analyzer/rewriters/multiway_unnest_rewriter.h" + +#include +#include +#include +#include +#include + +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/catalog.h" +#include "zetasql/public/function.h" +#include "zetasql/public/functions/array_zip_mode.pb.h" +#include "zetasql/public/rewriter_interface.h" +#include "zetasql/public/types/array_type.h" +#include "zetasql/public/types/struct_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/public/value.h" +#include "zetasql/resolved_ast/make_node_vector.h" +#include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/resolved_ast/resolved_ast_builder.h" +#include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" +#include "zetasql/resolved_ast/resolved_ast_rewrite_visitor.h" +#include "zetasql/resolved_ast/resolved_column.h" +#include "zetasql/resolved_ast/resolved_node.h" +#include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/container/flat_hash_set.h" +#include "zetasql/base/check.h" +#include "absl/log/log.h" +#include "absl/status/statusor.h" +#include "absl/strings/str_cat.h" +#include "absl/types/span.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +namespace { + +class MultiwayUnnestRewriteVisitor : public ResolvedASTRewriteVisitor { + public: + MultiwayUnnestRewriteVisitor(const AnalyzerOptions& analyzer_options, + Catalog& catalog, TypeFactory& type_factory) + : type_factory_(type_factory), + column_factory_(/*max_col_id=*/0, + analyzer_options.id_string_pool().get(), + analyzer_options.column_id_sequence_number()), + fn_builder_(analyzer_options, catalog, type_factory) {} + + private: + // States of pre-rewritten and post-rewritten input_scan columns, element + // columns, and array offset columns. + class State { + public: + static absl::StatusOr Create(const ResolvedArrayScan& node) { + ZETASQL_RET_CHECK_GE(node.element_column_list_size(), 2); + State state(node.element_column_list_size()); + state.Init(node); + return state; + } + + const ResolvedColumn& pre_rewritten_array_offset_column() const { + ABSL_DCHECK(HasPreRewrittenArrayOffsetColumn()); + return *pre_rewritten_array_offset_column_; + } + // Returns a list of column in the pre-rewrite ArrayScan's + // `element_column_list`. The size is equal to the number of input arrays. + const std::vector& pre_rewritten_element_columns() const { + return pre_rewritten_element_columns_; + } + // Returns a list of columns that originate from the `column_list` of the + // pre-rewrite ArrayScan's `input_scan`. + const std::vector& input_scan_columns() const { + return input_scan_columns_; + } + // Returns all rewritten element columns. The size is equal to the number + // of input arrays. + // + // The i-th element represents the element_column_list[0] of the i-th + // rewritten array scan. + const std::vector& element_columns() const { + return element_columns_; + } + // Returns a list of new columns that are assigned and point to the original + // array expressions. The size is equal to the number of input arrays. + // + // The i-th element represents array_expr_list[i] of the pre-rewrite array + // scan. + const std::vector& with_expr_array_columns() const { + return with_expr_array_columns_; + } + // Returns all array offset columns that come out of rewritten array scans. + // The size is equal to the number of input arrays. + // + // The i-th element represents the offset column of the i-th rewritten array + // scan. + const std::vector& array_offset_columns() const { + return array_offset_columns_; + } + // Returns all offset columns that come out of FULL JOIN USING. The size is + // equal to the number of input arrays. + // + // The i-th element represents the final offset column for the JoinScan of + // the (i-1)-th and i-th array scans (i is 0-based). It's computed as a + // COALESCE of two offset columns. + const std::vector& full_join_offset_columns() const { + return full_join_offset_columns_; + } + // Returns the offset column that come out of the top level FULL JOIN USING. + const ResolvedColumn& GetLastFullJoinOffsetColumn() const { + return full_join_offset_columns_.back(); + } + int element_column_count() const { return element_column_count_; } + bool HasPreRewrittenArrayOffsetColumn() const { + return pre_rewritten_array_offset_column_.has_value(); + } + const ResolvedColumn& GetResultLengthColumn() const { + return result_length_; + } + void SetResultLengthColumn(const ResolvedColumn& column) { + result_length_ = column; + } + + void SetWithExprArrayColumn(int i, const ResolvedColumn& column) { + with_expr_array_columns_[i] = column; + } + void SetElementColumn(int i, const ResolvedColumn& column) { + element_columns_[i] = column; + } + void SetArrayOffsetColumn(int i, const ResolvedColumn& column) { + array_offset_columns_[i] = column; + } + void SetFullJoinOffsetColumn(int i, const ResolvedColumn& column) { + full_join_offset_columns_[i] = column; + } + + // STRUCT(arr1, arr2 [, ...], offset) + // REQUIRES: all rewritten columns are initialized before calling this + // function. + absl::StatusOr> MakeStructExpr( + TypeFactory& type_factory) { + std::vector> field_list; + std::vector struct_fields; + for (int i = 0; i < element_column_count_; ++i) { + const ResolvedColumn& rewritten_element_column = element_columns_[i]; + field_list.push_back(MakeResolvedColumnRef( + rewritten_element_column.type(), rewritten_element_column, + /*is_correlated=*/false)); + struct_fields.push_back( + {rewritten_element_column.name(), rewritten_element_column.type()}); + } + + const ResolvedColumn& rewritten_offset_column = + GetLastFullJoinOffsetColumn(); + field_list.push_back(MakeResolvedColumnRef(rewritten_offset_column.type(), + rewritten_offset_column, + /*is_correlated=*/false)); + // It's OK to choose a fixed offset column name here as it will be mapped + // to the pre-rewritten offset column in the final column replacement + // ProjectScan. + // WARNING: The offset column has be the last field. As the parent node + // depends on the order of the struct field to map pre-rewrite element + // columns back to rewritten get struct field columns. + struct_fields.push_back({"offset", rewritten_offset_column.type()}); + + ZETASQL_RETURN_IF_ERROR( + type_factory.MakeStructType(struct_fields, &struct_type_)); + + return ResolvedMakeStructBuilder() + .set_type(struct_type_) + .set_field_list(std::move(field_list)) + .Build(); + } + + const StructType* GetStructType() const { + ABSL_DCHECK(struct_type_ != nullptr); + return struct_type_; + } + + private: + void Init(const ResolvedArrayScan& node) { + RecordPreRewrittenArrayOffsetColumn(node); + RecordPreRewrittenElementColumns(node); + RecordInputScanColumns(node); + ResizeRewrittenStates(); + } + + void ResizeRewrittenStates() { + with_expr_array_columns_.resize(element_column_count_); + element_columns_.resize(element_column_count_); + array_offset_columns_.resize(element_column_count_); + full_join_offset_columns_.resize(element_column_count_); + } + + void RecordPreRewrittenArrayOffsetColumn(const ResolvedArrayScan& node) { + if (node.array_offset_column() != nullptr) { + pre_rewritten_array_offset_column_ = + node.array_offset_column()->column(); + } + } + + void RecordPreRewrittenElementColumns(const ResolvedArrayScan& node) { + pre_rewritten_element_columns_.resize(node.element_column_list_size()); + for (int i = 0; i < node.element_column_list_size(); ++i) { + pre_rewritten_element_columns_[i] = node.element_column_list(i); + } + } + + // Record the column list from the ArrayScan's input_scan as a side channel. + // They might or might not be used in the ArrayScan. But we will populate + // these columns to the output column_list of rewritten scan. + void RecordInputScanColumns(const ResolvedArrayScan& node) { + absl::flat_hash_set local_column_set( + node.element_column_list().begin(), node.element_column_list().end()); + if (node.array_offset_column() != nullptr) { + local_column_set.insert(node.array_offset_column()->column()); + } + for (const ResolvedColumn& column : node.column_list()) { + if (!local_column_set.contains(column)) { + input_scan_columns_.push_back(column); + } + } + } + + explicit State(int element_column_count) + : element_column_count_(element_column_count), + pre_rewritten_array_offset_column_(std::nullopt), + struct_type_(nullptr) {} + int element_column_count_; + + // Pre-rewrite states + ResolvedColumnList pre_rewritten_element_columns_; + std::optional pre_rewritten_array_offset_column_; + ResolvedColumnList input_scan_columns_; + + // Post-rewrite states + ResolvedColumnList with_expr_array_columns_; + ResolvedColumn result_length_; + ResolvedColumnList element_columns_; + ResolvedColumnList array_offset_columns_; + ResolvedColumnList full_join_offset_columns_; + const StructType* struct_type_; + }; + + absl::StatusOr> + PostVisitResolvedArrayScan( + std::unique_ptr node) override { + if (node->array_expr_list_size() < 2) { + return node; + } + // Populate pre-rewritten states and allocate space for post-rewrite states. + ZETASQL_ASSIGN_OR_RETURN(State state, State::Create(*node)); + + return BuildTopLevelRewrittenScan(node, state); + } + + absl::StatusOr> + BuildTopLevelRewrittenScan( + const std::unique_ptr& original_array_scan, + State& state) { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr array_scan, + BuildTopLevelArrayScanOfWithExpr(original_array_scan, state)); + + // Map pre-rewrite element column names to rewritten GetStructFields. + return BuildStructExpansionWithColumnReplacement(std::move(array_scan), + state); + } + + absl::StatusOr> + BuildTopLevelArrayScanOfWithExpr( + const std::unique_ptr& original_array_scan, + State& state) { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr with_expr, + BuildWithExpr(original_array_scan->array_expr_list(), + original_array_scan->array_zip_mode(), state)); + + ResolvedColumnList column_list = state.input_scan_columns(); + const ResolvedColumn array_element_col = + column_factory_.MakeCol("$array", "$with_expr_element", + with_expr->type()->AsArray()->element_type()); + column_list.push_back(array_element_col); + + std::unique_ptr input_scan; + if (original_array_scan->input_scan() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN(input_scan, ResolvedASTDeepCopyVisitor::Copy( + original_array_scan->input_scan())); + } + + return ResolvedArrayScanBuilder() + .set_column_list(column_list) + .set_input_scan(std::move(input_scan)) + .add_array_expr_list(std::move(with_expr)) + .add_element_column_list(array_element_col) + .Build(); + } + + // LEAST(arr1_len, arr2_len [, ... ] ) + absl::StatusOr> BuildLeastArrayLengthExpr( + const std::vector>& + arr_len_exprs) { + ZETASQL_ASSIGN_OR_RETURN( + std::vector> copied_array_lens, + CopyArrayLengthExpressions(arr_len_exprs)); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr least, + fn_builder_.Least(std::move(copied_array_lens))); + return least; + } + + // GREATEST(arr1_len, arr2_len [, ... ] ) + absl::StatusOr> + BuildGreatestArrayLengthExpr( + absl::Span> + arr_len_exprs) { + ZETASQL_ASSIGN_OR_RETURN( + std::vector> copied_array_lens, + CopyArrayLengthExpressions(arr_len_exprs)); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr greatest, + fn_builder_.Greatest(std::move(copied_array_lens))); + return greatest; + } + + absl::StatusOr>> + CopyArrayLengthExpressions( + absl::Span> array_lens) { + std::vector> copied_array_lens; + copied_array_lens.reserve(array_lens.size()); + for (const auto& arr_len : array_lens) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr copied, + ResolvedASTDeepCopyVisitor::Copy(arr_len.get())); + copied_array_lens.push_back(std::move(copied)); + } + return copied_array_lens; + } + + absl::StatusOr> BuildStrictCheckExpr( + const std::vector>& array_lens, + const ResolvedColumn& mode) { + // mode = 'STRICT' + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr equal, + fn_builder_.Equal(MakeResolvedColumnRef(types::ArrayZipModeEnumType(), + mode, /*is_correlated=*/false), + MakeResolvedLiteral( + types::ArrayZipModeEnumType(), + Value::Enum(types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::STRICT)))); + + // LEAST(arr1_len, arr2_len [, ... ] ) + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr least, + BuildLeastArrayLengthExpr(array_lens)); + + // GREATEST(arr1_len, arr2_len [, ... ] ) + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr greatest, + BuildGreatestArrayLengthExpr(array_lens)); + + // LEAST(...) != GREATEST(...) + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr not_equal, + fn_builder_.NotEqual(std::move(least), std::move(greatest))); + + // ERROR("...") + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr error, + fn_builder_.Error( + "Unnested arrays under STRICT mode must have equal lengths")); + + // IF(mode = 'STRICT' AND + // LEAST(arr1_len, arr2_len ) != + // GREATEST(arr1_len, arr2_len ), + // ERROR('strict'), + // NULL) + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr condition, + fn_builder_.And(MakeNodeVector(std::move(equal), + std::move(not_equal)))); + return fn_builder_.If( + std::move(condition), std::move(error), + MakeResolvedLiteral(types::Int64Type(), Value::NullInt64())); + } + + absl::StatusOr> BuildResultLengthEpxr( + const std::vector>& + array_lengths, + const ResolvedColumn& mode) { + // mode = 'TRUNCATE' + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr equal, + fn_builder_.Equal( + MakeResolvedColumnRef(types::ArrayZipModeEnumType(), + mode, /*is_correlated=*/false), + MakeResolvedLiteral( + types::ArrayZipModeEnumType(), + Value::Enum(types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::TRUNCATE)))); + + // LEAST(arr1_len, arr2_len [, ... ] ) + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr least, + BuildLeastArrayLengthExpr(array_lengths)); + + // GREATEST(arr1_len, arr2_len [, ... ] ) + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr greatest, + BuildGreatestArrayLengthExpr(array_lengths)); + + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr if_expr, + fn_builder_.If(std::move(equal), std::move(least), + std::move(greatest))); + return if_expr; + } + + // IF(mode_expr IS NULL, ERROR, mode_expr) + absl::StatusOr> BuildModeExpr( + const ResolvedExpr* mode_expr, const ResolvedColumn& mode_col) { + // ERROR("...") + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr error, + fn_builder_.Error("UNNEST does not allow NULL mode argument", + types::ArrayZipModeEnumType())); + + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr mode_copied1, + ResolvedASTDeepCopyVisitor::Copy(mode_expr)); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr condition, + fn_builder_.IsNull(std::move(mode_copied1))); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr mode_copied2, + ResolvedASTDeepCopyVisitor::Copy(mode_expr)); + return fn_builder_.If(std::move(condition), std::move(error), + std::move(mode_copied2)); + } + + // IF(array_expr IS NULL, 0, ARRAY_LENGTH(array_expr)) + absl::StatusOr> BuildArrayLengthExpr( + const Type* array_type, const ResolvedColumn& array_col) { + // array_expr IS NULL + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr condition, + fn_builder_.IsNull(MakeResolvedColumnRef( + array_type, array_col, /*is_correlated=*/false))); + // ARRAY_LENGTH(array_expr) + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr array_length, + fn_builder_.ArrayLength(MakeResolvedColumnRef( + array_type, array_col, /*is_correlated=*/false))); + return fn_builder_.If( + std::move(condition), + MakeResolvedLiteral(types::Int64Type(), Value::Int64(0)), + std::move(array_length)); + } + + absl::StatusOr> BuildWithExpr( + absl::Span> input_arrays, + const ResolvedExpr* mode, State& state) { + ZETASQL_RET_CHECK(mode != nullptr); + ZETASQL_RET_CHECK_EQ(mode->type(), types::ArrayZipModeEnumType()); + ZETASQL_RET_CHECK_GE(input_arrays.size(), 2); + + std::vector> array_lens; + std::vector> assignment_list; + assignment_list.reserve(input_arrays.size() + 1); + for (int i = 0; i < input_arrays.size(); ++i) { + // `arrN` expr + ResolvedColumn arr_col = column_factory_.MakeCol( + "$with_expr", absl::StrCat("arr", i), input_arrays[i]->type()); + state.SetWithExprArrayColumn(i, arr_col); + + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr array_expr, + ResolvedASTDeepCopyVisitor::Copy(input_arrays[i].get())); + assignment_list.push_back( + MakeResolvedComputedColumn(arr_col, std::move(array_expr))); + + // `arrN_len` expr + ResolvedColumn arr_len_col = column_factory_.MakeCol( + "$with_expr", absl::StrCat("arr", i, "_len"), types::Int64Type()); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr arr_len_expr, + BuildArrayLengthExpr(input_arrays[i]->type(), arr_col)); + array_lens.push_back(MakeResolvedColumnRef( + arr_len_expr->type(), arr_len_col, /*is_correlated=*/false)); + assignment_list.push_back( + MakeResolvedComputedColumn(arr_len_col, std::move(arr_len_expr))); + } + + // `mode` expr + ResolvedColumn mode_col = column_factory_.MakeCol( + "$with_expr", "mode", types::ArrayZipModeEnumType()); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr mode_expr, + BuildModeExpr(mode, mode_col)); + assignment_list.push_back( + MakeResolvedComputedColumn(mode_col, std::move(mode_expr))); + + // `strict_check` expr + ResolvedColumn strict_check_col = column_factory_.MakeCol( + "$with_expr", "strict_check", types::Int64Type()); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr strict_check_expr, + BuildStrictCheckExpr(array_lens, mode_col)); + assignment_list.push_back(MakeResolvedComputedColumn( + strict_check_col, std::move(strict_check_expr))); + + // `result_len` expr + ResolvedColumn result_len_col = + column_factory_.MakeCol("$with_expr", "result_len", types::Int64Type()); + state.SetResultLengthColumn(result_len_col); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result_len, + BuildResultLengthEpxr(array_lens, mode_col)); + assignment_list.push_back( + MakeResolvedComputedColumn(result_len_col, std::move(result_len))); + + // Starting from here, we build the rewritten tree in a post-order traversal + // way. + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr array_subquery, + BuildArraySubqueryExpr(state)); + const Type* array_type = array_subquery->type(); + + return ResolvedWithExprBuilder() + .set_type(array_type) + .set_expr(std::move(array_subquery)) + .set_assignment_list(std::move(assignment_list)) + .Build(); + } + + // Build an ARRAY subquery expr representing + // ARRAY> + absl::StatusOr> BuildArraySubqueryExpr( + State& state) { + // Build a top level ProjectScan wrapping the chain of ArrayScan joins. + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr project_scan, + BuildNestedProjectScanOfFullJoinUsing(state)); + + // Wrap with FilterScan with predicate + // `COALESCE(..., offsetN) < result_len`. + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr filter_scan, + BuildFilterScanFromNestedProjectScan(std::move(project_scan), state)); + + // Wrap with OrderByScan, ordering by the top level COALESCE column. + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr order_by_scan, + BuildOrderByScanOfFilterScan(std::move(filter_scan), state)); + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr make_struct_scan, + BuildMakeStructProjectScan(std::move(order_by_scan), state)); + ZETASQL_RET_CHECK_GT(state.GetStructType()->num_fields(), 2); + + // `parameter_list` only needs to reference array columns and result_len + // column from with expr's `assignment_list`. + std::vector> + subquery_parameter_list(state.element_column_count() + 1); + for (int i = 0; i < state.element_column_count(); ++i) { + subquery_parameter_list[i] = + MakeResolvedColumnRef(state.with_expr_array_columns()[i].type(), + state.with_expr_array_columns()[i], + /*is_correlated=*/false); + } + subquery_parameter_list.back() = MakeResolvedColumnRef( + state.GetResultLengthColumn().type(), state.GetResultLengthColumn(), + /*is_correlated=*/false); + + const ArrayType* array_type = nullptr; + ZETASQL_RETURN_IF_ERROR( + type_factory_.MakeArrayType(state.GetStructType(), &array_type)); + + return ResolvedSubqueryExprBuilder() + .set_type(array_type) + .set_subquery_type(ResolvedSubqueryExpr::ARRAY) + .set_parameter_list(std::move(subquery_parameter_list)) + .set_subquery(std::move(make_struct_scan)) + .Build(); + } + + absl::StatusOr> + BuildMakeStructProjectScan(std::unique_ptr input_scan, + State& state) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr struct_expr, + state.MakeStructExpr(type_factory_)); + + ResolvedColumnList column_list; + ResolvedColumn make_struct_col = + column_factory_.MakeCol("$make_struct", "$struct", struct_expr->type()); + column_list.push_back(make_struct_col); + + return ResolvedProjectScanBuilder() + .set_column_list(column_list) + .set_input_scan(std::move(input_scan)) + .set_is_ordered(true) + .add_expr_list( + MakeResolvedComputedColumn(make_struct_col, std::move(struct_expr))) + .Build(); + } + + // Build a OrderByScan on top of the filtered nested ProjectScan. + absl::StatusOr> + BuildOrderByScanOfFilterScan(std::unique_ptr input_scan, + const State& state) { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr order_by_item, + ResolvedOrderByItemBuilder() + .set_column_ref(MakeResolvedColumnRef( + types::Int64Type(), state.GetLastFullJoinOffsetColumn(), + /*is_correlated=*/false)) + .Build()); + + // Propagate the exact same column_list from its input_scan's column_list. + ResolvedColumnList column_list = input_scan->column_list(); + return ResolvedOrderByScanBuilder() + .set_column_list(column_list) + .set_is_ordered(true) + .set_input_scan(std::move(input_scan)) + .add_order_by_item_list(std::move(order_by_item)) + .Build(); + } + + // Build a FilterScan on top of the nested ProjectScan containing a chain of + // joins among arrays. The filter_expr is: + // COALESCE(..., offsetN) < result_len + absl::StatusOr> + BuildFilterScanFromNestedProjectScan( + std::unique_ptr input_scan, const State& state) { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr filter_expr, + fn_builder_.Less( + MakeResolvedColumnRef(state.GetLastFullJoinOffsetColumn().type(), + state.GetLastFullJoinOffsetColumn(), + /*is_correlated=*/false), + MakeResolvedColumnRef(state.GetResultLengthColumn().type(), + state.GetResultLengthColumn(), + /*is_correlated=*/true))); + + // Propagate the exact same column_list from its input_scan's column_list. + ResolvedColumnList column_list = input_scan->column_list(); + return ResolvedFilterScanBuilder() + .set_column_list(column_list) + .set_input_scan(std::move(input_scan)) + .set_filter_expr(std::move(filter_expr)) + .Build(); + } + + // Build nested ProjectScans containing left-deep JoinScans of ArrayScans to + // represent a chain of FULL JOIN of arrays. + // UNNEST(arr1) AS arr1 WITH OFFSET + // FULL JOIN + // UNNEST(arr2) AS arr2 WITH OFFSET + // USING (offset) + // [ ... + // FULL JOIN + // UNNEST(arrN) AS arrN WITH OFFSET + // USING (offset) + // ] + absl::StatusOr> + BuildNestedProjectScanOfFullJoinUsing(State& state) { + std::unique_ptr final_scan; + for (int i = 0; i < state.element_column_count() - 1; ++i) { + std::unique_ptr lhs = std::move(final_scan); + if (i == 0) { + // Lhs ArrayScan + ZETASQL_ASSIGN_OR_RETURN(lhs, BuildSingletonUnnestArrayScan( + /*index=*/i, state)); + } + // Rhs ArrayScan + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr rhs, + BuildSingletonUnnestArrayScan( + /*index=*/i + 1, state)); + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr project_scan, + BuildProjectScanOfJoinOnCoalesce(std::move(lhs), std::move(rhs), + /*lhs_index=*/i, state)); + final_scan = std::move(project_scan); + } + return final_scan; + } + + // Build a ProjectScan doing a struct expansion against the output of + // UNNEST(ARRAY(SELECT AS STRUCT arr1, arr2 [ , ... ], offset)): + // ARRAY> + // TODO: b/297249122 - attach type_annotation_map to appropriate columns + absl::StatusOr> + BuildStructExpansionWithColumnReplacement( + std::unique_ptr array_scan, State& state) { + ZETASQL_RET_CHECK_EQ(array_scan->element_column_list_size(), 1); + const StructType* struct_type = state.GetStructType(); + ZETASQL_RET_CHECK_EQ(state.element_column_count() + 1, struct_type->num_fields()); + + ResolvedColumnList column_list = state.input_scan_columns(); + int expr_list_size = state.HasPreRewrittenArrayOffsetColumn() + ? struct_type->num_fields() + : struct_type->num_fields() - 1; + std::vector> expr_list( + expr_list_size); + for (int i = 0; i < struct_type->num_fields(); ++i) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr expr, + BuildGetStructFieldExpr( + struct_type, i, array_scan->element_column_list(0))); + + ResolvedColumn pre_rewrite_column; + if (i < struct_type->num_fields() - 1) { + // Array element column. + pre_rewrite_column = state.pre_rewritten_element_columns()[i]; + } else { + // If the pre-rewritten UNNEST specified WITH OFFSET, it needs to be + // placed back, computed by the rewritten full join array offset column. + // Otherwise, skip it. + if (!state.HasPreRewrittenArrayOffsetColumn()) { + continue; + } + pre_rewrite_column = state.pre_rewritten_array_offset_column(); + } + expr_list[i] = + MakeResolvedComputedColumn(pre_rewrite_column, std::move(expr)); + column_list.push_back(pre_rewrite_column); + } + + return ResolvedProjectScanBuilder() + .set_column_list(column_list) + .set_input_scan(std::move(array_scan)) + .set_expr_list(std::move(expr_list)) + .Build(); + } + + // TODO: b/297249122 - investigate propagation of type annotation map for + // collated element column type + absl::StatusOr> BuildGetStructFieldExpr( + const StructType* struct_type, int field_index, + const ResolvedColumn& source_column) { + return ResolvedGetStructFieldBuilder() + .set_type(struct_type->field(field_index).type) + .set_field_idx(field_index) + .set_expr(MakeResolvedColumnRef(struct_type, source_column, + /*is_correlated=*/false)) + .Build(); + } + + // Build a ProjectScan wrapping a JoinScan that represents a left-deep + // FULL JOIN ON + absl::StatusOr> + BuildProjectScanOfJoinOnCoalesce(std::unique_ptr lhs_scan, + std::unique_ptr rhs_scan, + int lhs_index, State& state) { + ZETASQL_RET_CHECK_GE(lhs_index, 0); + ZETASQL_RET_CHECK_LT(lhs_index, state.element_column_count() - 1); + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr join_scan, + BuildJoinScanWithUsing(std::move(lhs_scan), std::move(rhs_scan), + lhs_index, state)); + + // Build `expr_list` field for ResolvedProjectScan. + const ResolvedColumn& lhs_array_offset = + lhs_index == 0 ? state.array_offset_columns().front() + : state.full_join_offset_columns()[lhs_index]; + const ResolvedColumn& rhs_array_offset = + state.array_offset_columns()[lhs_index + 1]; + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr coalesce, + BuildCoalesceOfTwoArrayOffsets(lhs_array_offset, rhs_array_offset)); + ResolvedColumn full_join_offset_col = + column_factory_.MakeCol("$full_join", "offset", types::Int64Type()); + std::vector> expr_list; + expr_list.push_back( + MakeResolvedComputedColumn(full_join_offset_col, std::move(coalesce))); + + // Populate local state for newly created full join coalesce offset column. + state.SetFullJoinOffsetColumn(lhs_index + 1, full_join_offset_col); + + // Populate the `column_list` with all columns from JoinScan and the newly + // created full join offset column. It contains all array element columns + // and offset columns for input arrays in index range [0, lhs_index + 1]. + ResolvedColumnList column_list = join_scan->column_list(); + column_list.push_back(full_join_offset_col); + + return ResolvedProjectScanBuilder() + .set_column_list(std::move(column_list)) + .set_expr_list(std::move(expr_list)) + .set_input_scan(std::move(join_scan)) + .Build(); + } + + // Build a JoinScan that represents a left-deep + // FULL JOIN + absl::StatusOr> + BuildJoinScanWithUsing(std::unique_ptr lhs_scan, + std::unique_ptr rhs_scan, + int lhs_index, const State& state) { + if (lhs_index == 0) { + ZETASQL_RET_CHECK(lhs_scan->Is()); + } else { + ZETASQL_RET_CHECK(lhs_scan->Is()); + } + ZETASQL_RET_CHECK(rhs_scan->Is()); + + // We build `join_expr` by passing in offset column (lhs_index) and array + // offset column (lhs_index + 1). Note that, if lhs_index is 0, we use array + // offset column, otherwise, full join offset column is used. + const ResolvedColumn& lhs_array_offset = + lhs_index == 0 ? state.array_offset_columns().front() + : state.full_join_offset_columns()[lhs_index]; + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr join_expr, + BuildJoinExprOfTwoOffsets( + /*lhs_array_offset=*/lhs_array_offset, + /*rhs_array_offset=*/state.array_offset_columns()[lhs_index + 1])); + + // Populate element columns and offset columns to the output `column_list`. + ResolvedColumnList column_list = lhs_scan->column_list(); + column_list.insert(column_list.end(), rhs_scan->column_list().begin(), + rhs_scan->column_list().end()); + + return ResolvedJoinScanBuilder() + .set_column_list(column_list) + .set_join_type(ResolvedJoinScan::FULL) + .set_left_scan(std::move(lhs_scan)) + .set_right_scan(std::move(rhs_scan)) + .set_join_expr(std::move(join_expr)) + .Build(); + } + + // Build the coalesce expr of two array offset columns. + absl::StatusOr> + BuildCoalesceOfTwoArrayOffsets(const ResolvedColumn& lhs_array_offset, + const ResolvedColumn& rhs_array_offset) { + std::vector> array_offsets(2); + array_offsets[0] = + MakeResolvedColumnRef(lhs_array_offset.type(), lhs_array_offset, + /*is_correlated=*/false); + array_offsets[1] = + MakeResolvedColumnRef(rhs_array_offset.type(), rhs_array_offset, + /*is_correlated=*/false); + return fn_builder_.Coalesce(std::move(array_offsets)); + } + + // Build a ResolvedArrayScan out of the `array_expr` as the `index`-th + // argument in the original UNNEST. + absl::StatusOr> + BuildSingletonUnnestArrayScan(int index, State& state) { + ZETASQL_RET_CHECK_GE(index, 0); + ZETASQL_RET_CHECK_LT(index, state.element_column_count()); + + // `array_expr` is a column reference to the `index`-th `arrN` expression in + // the with expr. + const ResolvedColumn& array_expr_column = + state.with_expr_array_columns()[index]; + + const ResolvedColumn array_element_col = column_factory_.MakeCol( + "$array", absl::StrCat("arr", index), + array_expr_column.type()->AsArray()->element_type()); + const ResolvedColumn array_position_col = + column_factory_.MakeCol("$array_offset", "offset", types::Int64Type()); + ResolvedColumnList column_list = {array_element_col, array_position_col}; + + // Populate local states for newly created array columns. + state.SetElementColumn(index, array_element_col); + state.SetArrayOffsetColumn(index, array_position_col); + + return ResolvedArrayScanBuilder() + .set_column_list(column_list) + .add_array_expr_list(MakeResolvedColumnRef(array_expr_column.type(), + array_expr_column, + /*is_correlated=*/true)) + .add_element_column_list(array_element_col) + .set_array_offset_column(MakeResolvedColumnHolder(array_position_col)) + .Build(); + } + + // Build the `join_expr` for a ResolvedJoinScan: + // lhs_array_offset = rhs_array_offset + absl::StatusOr> BuildJoinExprOfTwoOffsets( + const ResolvedColumn& lhs_array_offset, + const ResolvedColumn& rhs_array_offset) { + return fn_builder_.Equal( + MakeResolvedColumnRef(lhs_array_offset.type(), lhs_array_offset, + /*is_correlated=*/false), + MakeResolvedColumnRef(rhs_array_offset.type(), rhs_array_offset, + /*is_correlated=*/false)); + } + + TypeFactory& type_factory_; + ColumnFactory column_factory_; + FunctionCallBuilder fn_builder_; +}; + +} // namespace + +class MultiwayUnnestRewriter : public Rewriter { + public: + std::string Name() const override { return "MultiwayUnnestRewriter"; } + + absl::StatusOr> Rewrite( + const AnalyzerOptions& options, std::unique_ptr input, + Catalog& catalog, TypeFactory& type_factory, + AnalyzerOutputProperties& output_properties) const override { + ZETASQL_RET_CHECK(options.id_string_pool() != nullptr); + ZETASQL_RET_CHECK(options.column_id_sequence_number() != nullptr); + MultiwayUnnestRewriteVisitor rewriter(options, catalog, type_factory); + return rewriter.VisitAll(std::move(input)); + }; +}; + +const Rewriter* GetMultiwayUnnestRewriter() { + static const auto* const kRewriter = new MultiwayUnnestRewriter; + return kRewriter; +} + +} // namespace zetasql diff --git a/zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.h b/zetasql/analyzer/rewriters/multiway_unnest_rewriter.h similarity index 66% rename from zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.h rename to zetasql/analyzer/rewriters/multiway_unnest_rewriter.h index 31d17b8bc..ac9056e8b 100644 --- a/zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.h +++ b/zetasql/analyzer/rewriters/multiway_unnest_rewriter.h @@ -14,16 +14,16 @@ // limitations under the License. // -#ifndef ZETASQL_ANALYZER_REWRITERS_SET_OPERATION_CORRESPONDING_REWRITER_H_ -#define ZETASQL_ANALYZER_REWRITERS_SET_OPERATION_CORRESPONDING_REWRITER_H_ +#ifndef ZETASQL_ANALYZER_REWRITERS_MULTIWAY_UNNEST_REWRITER_H_ +#define ZETASQL_ANALYZER_REWRITERS_MULTIWAY_UNNEST_REWRITER_H_ #include "zetasql/public/rewriter_interface.h" namespace zetasql { -// Gets a pointer to the singleton SetOperationCorresponding rewriter. -const Rewriter* GetSetOperationCorrespondingRewriter(); +// Gets a pointer to the singleton multiway unnest rewriter. +const Rewriter* GetMultiwayUnnestRewriter(); } // namespace zetasql -#endif // ZETASQL_ANALYZER_REWRITERS_SET_OPERATION_CORRESPONDING_REWRITER_H_ +#endif // ZETASQL_ANALYZER_REWRITERS_MULTIWAY_UNNEST_REWRITER_H_ diff --git a/zetasql/analyzer/rewriters/nulliferror_function_rewriter.cc b/zetasql/analyzer/rewriters/nulliferror_function_rewriter.cc index 5efa0a019..d4d5ccf2d 100644 --- a/zetasql/analyzer/rewriters/nulliferror_function_rewriter.cc +++ b/zetasql/analyzer/rewriters/nulliferror_function_rewriter.cc @@ -22,12 +22,10 @@ #include #include "zetasql/public/analyzer_options.h" -#include "zetasql/public/analyzer_output.h" #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" -#include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/rewriter_interface.h" #include "zetasql/public/types/type.h" @@ -35,11 +33,10 @@ #include "zetasql/public/value.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/resolved_ast/resolved_ast_builder.h" -#include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" #include "zetasql/resolved_ast/resolved_ast_rewrite_visitor.h" #include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/rewrite_utils.h" -#include "absl/types/span.h" +#include "absl/status/statusor.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" diff --git a/zetasql/analyzer/rewriters/pivot_rewriter.cc b/zetasql/analyzer/rewriters/pivot_rewriter.cc index 19d16a28f..99df68beb 100644 --- a/zetasql/analyzer/rewriters/pivot_rewriter.cc +++ b/zetasql/analyzer/rewriters/pivot_rewriter.cc @@ -19,21 +19,18 @@ #include #include -#include "zetasql/base/logging.h" #include "zetasql/analyzer/expr_resolver_helper.h" #include "zetasql/analyzer/substitute.h" #include "zetasql/common/aggregate_null_handling.h" -#include "zetasql/common/errors.h" #include "zetasql/public/analyzer_options.h" -#include "zetasql/public/analyzer_output.h" #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/annotation/collation.h" #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" -#include "zetasql/public/parse_location.h" #include "zetasql/public/rewriter_interface.h" #include "zetasql/public/types/array_type.h" #include "zetasql/public/types/struct_type.h" @@ -48,9 +45,10 @@ #include "zetasql/resolved_ast/resolved_node_kind.pb.h" #include "zetasql/resolved_ast/rewrite_utils.h" #include "absl/container/flat_hash_map.h" -#include "absl/memory/memory.h" +#include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/status/statusor.h" +#include "absl/strings/str_cat.h" #include "absl/strings/substitute.h" #include "absl/types/span.h" #include "zetasql/base/ret_check.h" @@ -115,7 +113,7 @@ class PivotRewriterVisitor : public ResolvedASTDeepCopyVisitor { absl::StatusOr> MakeAggregateExpr( const ResolvedExpr* pivot_expr, const ResolvedExpr* pivot_value_expr, const ResolvedColumn& pivot_column, - const std::vector& agg_fn_arg_columns); + absl::Span agg_fn_arg_columns); // Wraps the input scan of a pivot with a project scan, adding computed // columns holding the result of the FOR expression, plus each argument to the @@ -587,7 +585,7 @@ absl::StatusOr> PivotRewriterVisitor::MakeAggregateExpr( const ResolvedExpr* pivot_expr, const ResolvedExpr* pivot_value_expr, const ResolvedColumn& pivot_column, - const std::vector& agg_fn_arg_columns) { + absl::Span agg_fn_arg_columns) { // This condition guaranteed by the resolver and this check // really belongs in the validator; however, the validator currently has no // way to call IsConstantExpression() without creating a circular build diff --git a/zetasql/analyzer/rewriters/registration.cc b/zetasql/analyzer/rewriters/registration.cc index f582b5575..9d1621c96 100644 --- a/zetasql/analyzer/rewriters/registration.cc +++ b/zetasql/analyzer/rewriters/registration.cc @@ -23,7 +23,9 @@ #include "zetasql/base/logging.h" #include "zetasql/public/options.pb.h" #include "absl/container/flat_hash_map.h" -#include "absl/strings/string_view.h" +#include "zetasql/base/check.h" +#include "absl/synchronization/mutex.h" +#include "absl/types/span.h" namespace zetasql { diff --git a/zetasql/analyzer/rewriters/registration.h b/zetasql/analyzer/rewriters/registration.h index 81d69d0ec..414d2fa56 100644 --- a/zetasql/analyzer/rewriters/registration.h +++ b/zetasql/analyzer/rewriters/registration.h @@ -22,11 +22,13 @@ #include #include "zetasql/public/options.pb.h" +#include "absl/base/thread_annotations.h" #include "absl/container/flat_hash_map.h" #include "absl/memory/memory.h" #include "absl/strings/string_view.h" #include "absl/synchronization/mutex.h" #include "absl/types/optional.h" +#include "absl/types/span.h" namespace zetasql { diff --git a/zetasql/analyzer/rewriters/registration_test.cc b/zetasql/analyzer/rewriters/registration_test.cc index d2130b704..23c2b2a1b 100644 --- a/zetasql/analyzer/rewriters/registration_test.cc +++ b/zetasql/analyzer/rewriters/registration_test.cc @@ -20,14 +20,12 @@ #include #include "zetasql/public/analyzer_options.h" -#include "zetasql/public/analyzer_output.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/rewriter_interface.h" #include "zetasql/resolved_ast/resolved_node.h" -#include "gmock/gmock.h" #include "gtest/gtest.h" -#include "absl/memory/memory.h" #include "absl/status/status.h" +#include "absl/status/statusor.h" namespace zetasql { namespace { diff --git a/zetasql/analyzer/rewriters/rewriter_relevance_checker.cc b/zetasql/analyzer/rewriters/rewriter_relevance_checker.cc index d5bc5f0e6..eda495214 100644 --- a/zetasql/analyzer/rewriters/rewriter_relevance_checker.cc +++ b/zetasql/analyzer/rewriters/rewriter_relevance_checker.cc @@ -59,6 +59,15 @@ class RewriteApplicabilityChecker : public ResolvedASTVisitor { absl::Status VisitResolvedPivotScan(const ResolvedPivotScan* node) override { applicable_rewrites_->insert(REWRITE_PIVOT); + for (const auto& pivot_expr : node->pivot_expr_list()) { + ZETASQL_RET_CHECK(pivot_expr->Is()); + const auto& aggregate_func_call = + pivot_expr->GetAs(); + if (aggregate_func_call->function()->Is() || + aggregate_func_call->function()->Is()) { + applicable_rewrites_->insert(REWRITE_INLINE_SQL_UDAS); + } + } return DefaultVisit(node); } @@ -133,6 +142,14 @@ class RewriteApplicabilityChecker : public ResolvedASTVisitor { case FN_BYTE_ARRAY_LIKE_ALL: case FN_STRING_LIKE_ALL: case FN_BYTE_LIKE_ALL: + case FN_STRING_NOT_LIKE_ANY: + case FN_BYTE_NOT_LIKE_ANY: + case FN_STRING_ARRAY_NOT_LIKE_ANY: + case FN_BYTE_ARRAY_NOT_LIKE_ANY: + case FN_STRING_NOT_LIKE_ALL: + case FN_BYTE_NOT_LIKE_ALL: + case FN_STRING_ARRAY_NOT_LIKE_ALL: + case FN_BYTE_ARRAY_NOT_LIKE_ALL: applicable_rewrites_->insert(REWRITE_LIKE_ANY_ALL); break; default: @@ -149,6 +166,15 @@ class RewriteApplicabilityChecker : public ResolvedASTVisitor { return DefaultVisit(node); } + absl::Status VisitResolvedInsertStmt( + const ResolvedInsertStmt* node) override { + if (!node->row_list().empty() && node->table_scan() != nullptr) { + applicable_rewrites_->insert( + ResolvedASTRewrite::REWRITE_INSERT_DML_VALUES); + } + return DefaultVisit(node); + } + absl::Status VisitResolvedTVFScan(const ResolvedTVFScan* node) override { if (node->tvf()->Is() || node->tvf()->Is()) { @@ -159,10 +185,6 @@ class RewriteApplicabilityChecker : public ResolvedASTVisitor { absl::Status VisitResolvedSetOperationScan( const ResolvedSetOperationScan* node) override { - if (node->column_match_mode() != ResolvedSetOperationScan::BY_POSITION) { - applicable_rewrites_->insert( - ResolvedASTRewrite::REWRITE_SET_OPERATION_CORRESPONDING); - } return DefaultVisit(node); } @@ -175,6 +197,14 @@ class RewriteApplicabilityChecker : public ResolvedASTVisitor { absl::Status VisitResolvedAggregationThresholdAggregateScan( const ResolvedAggregationThresholdAggregateScan* node) override { ZETASQL_RETURN_IF_ERROR(VisitResolvedAggregateScanBasePrivate(node)); + applicable_rewrites_->insert(REWRITE_AGGREGATION_THRESHOLD); + return DefaultVisit(node); + } + + absl::Status VisitResolvedArrayScan(const ResolvedArrayScan* node) override { + if (node->array_expr_list_size() > 1) { + applicable_rewrites_->insert(REWRITE_MULTIWAY_UNNEST); + } return DefaultVisit(node); } @@ -192,7 +222,9 @@ class RewriteApplicabilityChecker : public ResolvedASTVisitor { } } for (const auto& aggregate_comp_col : node->aggregate_list()) { - const auto& aggregate_expr = aggregate_comp_col->expr(); + ZETASQL_RET_CHECK(aggregate_comp_col->Is()); + const auto& aggregate_expr = + aggregate_comp_col->GetAs()->expr(); ZETASQL_RET_CHECK(aggregate_expr->Is()); const auto& aggregate_func_call = aggregate_expr->GetAs(); diff --git a/zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.cc b/zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.cc deleted file mode 100644 index 4279c5b36..000000000 --- a/zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.cc +++ /dev/null @@ -1,93 +0,0 @@ -// -// Copyright 2019 Google LLC -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// - -#include "zetasql/analyzer/rewriters/set_operation_corresponding_rewriter.h" - -#include -#include -#include -#include - -#include "zetasql/resolved_ast/resolved_ast.h" -#include "zetasql/resolved_ast/resolved_ast_builder.h" -#include "zetasql/resolved_ast/resolved_ast_rewrite_visitor.h" -#include "zetasql/resolved_ast/rewrite_utils.h" - -namespace zetasql { -namespace { - -bool NeedsProjectScanForByPosition(const ResolvedSetOperationItem& item) { - return item.scan()->column_list() != item.output_column_list(); -} - -class SetOperationCorrespondingRewriteVisitor - : public ResolvedASTRewriteVisitor { - private: - absl::StatusOr> - PostVisitResolvedSetOperationScan( - std::unique_ptr node) override { - if (node->column_match_mode() == ResolvedSetOperationScan::BY_POSITION) { - return node; - } - ResolvedSetOperationScanBuilder builder = ToBuilder(std::move(node)); - std::vector> items = - builder.release_input_item_list(); - for (std::unique_ptr& item : items) { - if (!NeedsProjectScanForByPosition(*item)) { - continue; - } - ResolvedSetOperationItemBuilder item_builder = ToBuilder(std::move(item)); - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr project_scan, - ResolvedProjectScanBuilder() - .set_input_scan(item_builder.release_scan()) - .set_column_list(item_builder.output_column_list()) - .Build()); - item_builder.set_scan(std::move(project_scan)); - ZETASQL_ASSIGN_OR_RETURN(item, std::move(item_builder).Build()); - } - - builder.set_column_match_mode(ResolvedSetOperationScan::BY_POSITION); - builder.set_column_propagation_mode(ResolvedSetOperationScan::STRICT); - - return std::move(builder).set_input_item_list(std::move(items)).Build(); - } -}; - -} // namespace - -class SetOperationCorrespondingRewriter : public Rewriter { - public: - std::string Name() const override { - return "SetOperationCorrespondingRewriter"; - } - - absl::StatusOr> Rewrite( - const AnalyzerOptions& options, std::unique_ptr input, - Catalog& catalog, TypeFactory& type_factory, - AnalyzerOutputProperties& output_properties) const override { - ZETASQL_RET_CHECK(options.id_string_pool() != nullptr); - ZETASQL_RET_CHECK(options.column_id_sequence_number() != nullptr); - SetOperationCorrespondingRewriteVisitor rewriter; - return rewriter.VisitAll(std::move(input)); - }; -}; - -const Rewriter* GetSetOperationCorrespondingRewriter() { - static const auto* const kRewriter = new SetOperationCorrespondingRewriter; - return kRewriter; -} - -} // namespace zetasql diff --git a/zetasql/analyzer/rewriters/sql_function_inliner.cc b/zetasql/analyzer/rewriters/sql_function_inliner.cc index 735b90d94..7120d591f 100644 --- a/zetasql/analyzer/rewriters/sql_function_inliner.cc +++ b/zetasql/analyzer/rewriters/sql_function_inliner.cc @@ -27,11 +27,11 @@ #include "zetasql/base/varsetter.h" #include "zetasql/common/errors.h" -#include "zetasql/parser/parse_tree.h" #include "zetasql/public/analyzer_options.h" #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/parse_location.h" #include "zetasql/public/rewriter_interface.h" #include "zetasql/public/sql_function.h" @@ -51,6 +51,7 @@ #include "zetasql/resolved_ast/resolved_node_kind.pb.h" #include "zetasql/resolved_ast/rewrite_utils.h" #include "absl/cleanup/cleanup.h" +#include "absl/container/btree_set.h" #include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "absl/memory/memory.h" @@ -357,7 +358,7 @@ class SqlFunctionInlineVistor : public ResolvedASTDeepCopyVisitor { auto arg_expr = ConsumeTopOfStack(); ResolvedColumn arg_column = column_factory_->MakeCol( absl::StrCat("$inlined_", call->function()->Name()), - argument_names[i], arg_expr->type()); + argument_names[i], arg_expr->annotated_type()); args[argument_names[i]] = [type = arg_expr->type(), arg_column](bool is_correlated) { return MakeResolvedColumnRef(type, arg_column, is_correlated); @@ -586,7 +587,7 @@ class SqlTableFunctionInlineVistor : public ResolvedASTDeepCopyVisitor { }; ResolvedColumn arg_column = column_factory_->MakeCol( absl::StrCat("$inlined_", scan->tvf()->Name()), arg_name, - arg_expr->type()); + arg_expr->annotated_type()); scalar_arg_exprs.push_back( MakeResolvedComputedColumn(arg_column, std::move(arg_expr))); arg_columns.push_back(arg_column); @@ -750,7 +751,8 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { absl::flat_hash_map calls_to_inline; - for (const auto& col : node->aggregate_list()) { + for (const auto& column : node->aggregate_list()) { + const auto* col = column->GetAs(); ZETASQL_RET_CHECK(col->expr()->Is()); const ResolvedAggregateFunctionCall* aggr_function_call = col->expr()->GetAs(); @@ -764,11 +766,12 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { return node; } ResolvedAggregateScanBuilder aggr_builder = ToBuilder(std::move(node)); - std::vector> old_aggregates = - aggr_builder.release_aggregate_list(); + std::vector> + old_aggregates = aggr_builder.release_aggregate_list(); // The aggregations included in the aggregate scan post-rewrite. - std::vector> new_aggregates; + std::vector> + new_aggregates; // The column list produced by thew new aggregate scan post-rewrite. std::vector new_aggr_col_list; @@ -790,13 +793,15 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { // expressions for the post-aggregate project scan that will host the // expression from the SQL-defined aggregate that modifies or combines the // results of any aggregations called internally. - for (auto& aggr : old_aggregates) { + for (auto& aggr_column : old_aggregates) { + ZETASQL_RET_CHECK(aggr_column->Is()); + auto aggr = aggr_column->GetAs(); ZETASQL_RET_CHECK(aggr->expr()->Is()); const ResolvedAggregateFunctionCall* aggr_function_call = aggr->expr()->GetAs(); if (!calls_to_inline.contains(aggr_function_call)) { - new_aggregates.emplace_back(std::move(aggr)); + new_aggregates.emplace_back(std::move(aggr_column)); continue; } AggregateFnDetails& details = calls_to_inline.at(aggr_function_call); @@ -804,9 +809,10 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { columns_to_remove_from_aggr.insert(aggr->column()); std::string function_name = aggr_function_call->function()->Name(); - ResolvedComputedColumnBuilder aggr_builder = ToBuilder(std::move(aggr)); ResolvedAggregateFunctionCallBuilder aggr_expr_builder = ToBuilder( - absl::WrapUnique(aggr_builder.release_expr() + absl::WrapUnique(const_cast( + aggr->GetAs()) + ->release_expr() .release() ->GetAs())); auto aggr_args = aggr_expr_builder.release_argument_list(); @@ -821,7 +827,6 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { signature.arguments()[i].options().is_not_aggregate(); std::unique_ptr& arg = aggr_args[i]; if (is_non_aggregate_arg) { - ResolvedNodeKind expr_kind = arg->node_kind(); // If we ever extend non-aggregate args beyond these types, the // rewriter will need to change to accommodate as-if-evaluated-once // semantics. The ResolvedAST is not expressive enough for that right @@ -829,9 +834,16 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { // aggregation which some query optimizers would not remove. The // expressive power that is needed is a lateral join with a single row // table on the LHS. + const ResolvedExpr* without_cast = arg.get(); + // LINT.IfChange(non_aggregate_args_def) + while (without_cast->node_kind() == RESOLVED_CAST) { + without_cast = without_cast->GetAs()->expr(); + } + ResolvedNodeKind expr_kind = without_cast->node_kind(); ZETASQL_RET_CHECK(expr_kind == RESOLVED_LITERAL || expr_kind == RESOLVED_PARAMETER || expr_kind == RESOLVED_ARGUMENT_REF); + // LINT.ThenChange(../expr_resolver_helper.cc:non_aggregate_args_def) auto arg_replacement_builder = [&arg, this](bool is_correlated) { // Making a copy like this is only safe because the expressions // that are allowed as non-aggregate args are immutable and @@ -850,9 +862,9 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { aggregate_args.emplace(details.arg_names[i], arg_replacement_builder); } else { // This is an aggregate arg. - ResolvedColumn new_arg_column = - column_factory_.MakeCol(absl::StrCat("$inlined_", function_name), - details.arg_names[i], arg->type()); + ResolvedColumn new_arg_column = column_factory_.MakeCol( + absl::StrCat("$inlined_", function_name), details.arg_names[i], + arg->annotated_type()); ZETASQL_ASSIGN_OR_RETURN(auto new_arg_computed_col, ResolvedComputedColumnBuilder() .set_column(new_arg_column) @@ -944,7 +956,9 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { PostVisitResolvedAggregationThresholdAggregateScan( std::unique_ptr node) override { - for (const auto& computed_col : node->aggregate_list()) { + for (const auto& computed_column : node->aggregate_list()) { + ZETASQL_RET_CHECK(computed_column->Is()); + auto computed_col = computed_column->GetAs(); if (computed_col->expr()->Is() && (computed_col->expr() ->GetAs() @@ -964,6 +978,28 @@ class SqlAggregateFunctionInlineVisitor : public ResolvedASTRewriteVisitor { return node; } + absl::StatusOr> + PostVisitResolvedPivotScan( + std::unique_ptr node) override { + absl::flat_hash_map + calls_to_inline; + for (const auto& expr : node->pivot_expr_list()) { + ZETASQL_RET_CHECK(expr->Is()); + const auto* call = expr->GetAs(); + ZETASQL_ASSIGN_OR_RETURN(std::optional details, + IsInlineable(call)); + if (details.has_value()) { + calls_to_inline.emplace(call, *details); + } + } + if (!calls_to_inline.empty()) { + return absl::InvalidArgumentError( + "SQL-defined aggregate functions are not supported in PIVOT"); + } + return node; + } + private: ColumnFactory& column_factory_; }; diff --git a/zetasql/analyzer/rewriters/sql_view_inliner.cc b/zetasql/analyzer/rewriters/sql_view_inliner.cc index c8a42b629..a0c092631 100644 --- a/zetasql/analyzer/rewriters/sql_view_inliner.cc +++ b/zetasql/analyzer/rewriters/sql_view_inliner.cc @@ -30,17 +30,14 @@ #include "zetasql/public/types/type_factory.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" -#include "zetasql/resolved_ast/resolved_column.h" +#include "zetasql/resolved_ast/resolved_ast_enums.pb.h" #include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/rewrite_utils.h" -#include "absl/container/flat_hash_map.h" +#include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_format.h" -#include "absl/types/optional.h" -#include "absl/types/span.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" namespace zetasql { diff --git a/zetasql/analyzer/rewriters/typeof_function_rewriter.cc b/zetasql/analyzer/rewriters/typeof_function_rewriter.cc index ea495d49a..b672b97fc 100644 --- a/zetasql/analyzer/rewriters/typeof_function_rewriter.cc +++ b/zetasql/analyzer/rewriters/typeof_function_rewriter.cc @@ -22,7 +22,6 @@ #include #include "zetasql/public/analyzer_options.h" -#include "zetasql/public/analyzer_output.h" #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/catalog.h" @@ -38,9 +37,7 @@ #include "zetasql/resolved_ast/resolved_ast_rewrite_visitor.h" #include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/rewrite_utils.h" -#include "absl/status/status.h" #include "absl/status/statusor.h" -#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" diff --git a/zetasql/analyzer/rewriters/unpivot_rewriter.cc b/zetasql/analyzer/rewriters/unpivot_rewriter.cc index a142a7c49..c36be6822 100644 --- a/zetasql/analyzer/rewriters/unpivot_rewriter.cc +++ b/zetasql/analyzer/rewriters/unpivot_rewriter.cc @@ -21,11 +21,11 @@ #include "zetasql/base/atomic_sequence_num.h" #include "zetasql/public/analyzer_options.h" -#include "zetasql/public/analyzer_output.h" #include "zetasql/public/analyzer_output_properties.h" #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/id_string.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/rewriter_interface.h" @@ -37,6 +37,7 @@ #include "zetasql/resolved_ast/resolved_column.h" #include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/rewrite_utils.h" +#include "absl/container/flat_hash_set.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" @@ -309,10 +310,17 @@ UnpivotRewriterVisitor::CreateArrayScanWithStructElements( std::vector output_column_list = input_scan->column_list(); output_column_list.push_back(unnest_column); + std::vector> array_expr_list; + array_expr_list.push_back(std::move(resolved_function_call)); + std::vector element_column_list; + element_column_list.push_back(unnest_column); return MakeResolvedArrayScan(output_column_list, std::move(input_scan), - std::move(resolved_function_call), unnest_column, + std::move(array_expr_list), + std::move(element_column_list), /*array_offset_column=*/nullptr, - /*join_expr=*/nullptr, /*is_outer=*/false); + /*join_expr=*/nullptr, + /*is_outer=*/false, + /*array_zip_mode=*/nullptr); } } // namespace diff --git a/zetasql/analyzer/run_analyzer_test.cc b/zetasql/analyzer/run_analyzer_test.cc index 481e924aa..7600ba65d 100644 --- a/zetasql/analyzer/run_analyzer_test.cc +++ b/zetasql/analyzer/run_analyzer_test.cc @@ -18,6 +18,7 @@ #include #include +#include #include #include #include @@ -268,7 +269,9 @@ absl::StatusOr> CopyAnalyzerOutput( output.analyzer_output_properties(), /*parser_output=*/nullptr, output.deprecation_warnings(), output.undeclared_parameters(), - output.undeclared_positional_parameters(), output.max_column_id()); + output.undeclared_positional_parameters(), + output.max_column_id() + ); } else if (output.resolved_expr() != nullptr) { ZETASQL_RETURN_IF_ERROR(output.resolved_expr()->Accept(&visitor)); ret = std::make_unique( @@ -277,7 +280,9 @@ absl::StatusOr> CopyAnalyzerOutput( output.analyzer_output_properties(), /*parser_output=*/nullptr, output.deprecation_warnings(), output.undeclared_parameters(), - output.undeclared_positional_parameters(), output.max_column_id()); + output.undeclared_positional_parameters(), + output.max_column_id() + ); } ZETASQL_RET_CHECK(ret) << "No resolved AST in AnalyzerOutput"; @@ -2323,6 +2328,7 @@ class AnalyzerTestRunner { case RESOLVED_ALTER_MODEL_STMT: case RESOLVED_ALTER_ROW_ACCESS_POLICY_STMT: case RESOLVED_ALTER_SCHEMA_STMT: + case RESOLVED_ALTER_EXTERNAL_SCHEMA_STMT: case RESOLVED_ALTER_TABLE_SET_OPTIONS_STMT: case RESOLVED_ALTER_TABLE_STMT: case RESOLVED_ALTER_VIEW_STMT: @@ -2342,6 +2348,7 @@ class AnalyzerTestRunner { case RESOLVED_CREATE_PRIVILEGE_RESTRICTION_STMT: case RESOLVED_CREATE_ROW_ACCESS_POLICY_STMT: case RESOLVED_CREATE_SCHEMA_STMT: + case RESOLVED_CREATE_EXTERNAL_SCHEMA_STMT: case RESOLVED_CREATE_SNAPSHOT_TABLE_STMT: case RESOLVED_CREATE_TABLE_FUNCTION_STMT: case RESOLVED_CREATE_VIEW_STMT: @@ -2479,9 +2486,9 @@ class AnalyzerTestRunner { } bool CompareColumnDefinitionList( - const std::vector>& + absl::Span> output_col_list, - const std::vector>& + absl::Span> unparsed_col_list) { if (output_col_list.size() != unparsed_col_list.size()) { return false; diff --git a/zetasql/analyzer/substitute.cc b/zetasql/analyzer/substitute.cc index eaf6b4e1b..56e288f3f 100644 --- a/zetasql/analyzer/substitute.cc +++ b/zetasql/analyzer/substitute.cc @@ -31,6 +31,7 @@ #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" #include "zetasql/public/function.pb.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/multi_catalog.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/simple_catalog.h" @@ -130,7 +131,7 @@ class ColumnRefReplacer : public ResolvedASTDeepCopyVisitor { const ResolvedColumn& column) override { if (!column_map_.contains(column)) { column_map_[column] = column_factory_.MakeCol( - column.table_name(), column.name(), column.type()); + column.table_name(), column.name(), column.annotated_type()); } return column_map_[column]; } @@ -307,6 +308,17 @@ absl::Status ExpressionSubstitutor::SetupLambdasCatalog( /*name=*/name, /*group=*/"SubstitutionLambda", Function::SCALAR, FunctionOptions()); + // When the lambda body has a type annotation map, that type annotation map + // needs to be conveyed to the fake function we're creating as a body + // placeholder. + FunctionSignatureOptions signature_options; + if (lambda->body()->type_annotation_map()) { + signature_options.set_compute_result_annotations_callback( + [map = lambda->body()->type_annotation_map()]( + const AnnotationCallbackArgs& args, TypeFactory& type_factory) + -> absl::StatusOr { return map; }); + } + // We add a signature for the injected lambda function with the following // properties. The context_id is set to kSubstitutionLambdaContextId // in order to differentiate it as an injected lambda function. The result @@ -316,7 +328,7 @@ absl::Status ExpressionSubstitutor::SetupLambdasCatalog( /*arguments=*/ {{SignatureArgumentKind::ARG_TYPE_ARBITRARY, FunctionArgumentType::REPEATED}}, - /*context_id=*/kSubstitutionLambdaContextId)); + /*context_id=*/kSubstitutionLambdaContextId, signature_options)); lambdas_catalog_->AddOwnedFunction(std::move(lambda_function)); } diff --git a/zetasql/analyzer/testdata/aggregation.test b/zetasql/analyzer/testdata/aggregation.test index ce2c95931..2a16c6aec 100644 --- a/zetasql/analyzer/testdata/aggregation.test +++ b/zetasql/analyzer/testdata/aggregation.test @@ -837,6 +837,87 @@ QueryStmt +-Literal(type=INT64, value=1) == +# Regression test b/292533989: +# GROUP BY key fails to match with the pre-group by expression of the aliased +# select list item. +[no_enable_literal_replacement] +select key + 1 AS alias +from KeyValue +group by key + 1 +HAVING TRUE +-- +QueryStmt ++-output_column_list= +| +-$groupby.$groupbycol1#4 AS alias [INT64] ++-query= + +-ProjectScan + +-column_list=[$groupby.$groupbycol1#4] + +-input_scan= + +-FilterScan + +-column_list=[$groupby.$groupbycol1#4] + +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.$groupbycol1#4] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[KeyValue.Key#1, $pre_groupby.alias#3] + | | +-expr_list= + | | | +-alias#3 := + | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | | +-Literal(type=INT64, value=1) + | | +-input_scan= + | | +-TableScan(column_list=[KeyValue.Key#1], table=KeyValue, column_index_list=[0]) + | +-group_by_list= + | +-$groupbycol1#4 := ColumnRef(type=INT64, column=$pre_groupby.alias#3) + +-filter_expr= + +-Literal(type=BOOL, value=true) +== + +# Regression test b/292533989: +# GROUP BY key match with column referenced by the pre-group by expression of +# the aliased select list item. +select key + 1 AS alias +from KeyValue +group by key +HAVING TRUE +-- +QueryStmt ++-output_column_list= +| +-$query.alias#5 AS alias [INT64] ++-query= + +-ProjectScan + +-column_list=[$query.alias#5] + +-input_scan= + +-FilterScan + +-column_list=[$groupby.key#4, $query.alias#5] + +-input_scan= + | +-ProjectScan + | +-column_list=[$groupby.key#4, $query.alias#5] + | +-expr_list= + | | +-alias#5 := + | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$groupby.key#4) + | | +-Literal(type=INT64, value=1) + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.key#4] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[KeyValue.Key#1, $pre_groupby.alias#3] + | | +-expr_list= + | | | +-alias#3 := + | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | | +-Literal(type=INT64, value=1) + | | +-input_scan= + | | +-TableScan(column_list=[KeyValue.Key#1], table=KeyValue, column_index_list=[0]) + | +-group_by_list= + | +-key#4 := ColumnRef(type=INT64, column=KeyValue.Key#1) + +-filter_expr= + +-Literal(type=BOOL, value=true) +== + select key as c, value as c from KeyValue group by c -- ERROR: Name c in GROUP BY clause is ambiguous; it may refer to multiple columns in the SELECT-list [at 1:52] @@ -1041,6 +1122,8 @@ QueryStmt +-Literal(type=INT64, value=1) == +# GROUP BY key match with column referenced by the pre-group by expression of +# the aliased select list item. select key+1 as k from KeyValue group by key, k -- QueryStmt @@ -2588,31 +2671,31 @@ select 10 from ArrayTypes group by -- QueryStmt +-output_column_list= -| +-$query.$col1#32 AS `$col1` [INT64] +| +-$query.$col1#35 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$query.$col1#32] + +-column_list=[$query.$col1#35] +-expr_list= - | +-$col1#32 := Literal(type=INT64, value=10) + | +-$col1#35 := Literal(type=INT64, value=10) +-input_scan= +-AggregateScan +-input_scan= | +-TableScan(column_list=ArrayTypes.[Int32Array#1, Int64Array#2, UInt32Array#3, UInt64Array#4, StringArray#5, BytesArray#6, BoolArray#7, FloatArray#8, DoubleArray#9, DateArray#10, TimestampSecondsArray#11, TimestampMillisArray#12, TimestampMicrosArray#13, TimestampArray#14], table=ArrayTypes, column_index_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]) +-group_by_list= - +-Int32Array#18 := ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1) - +-Int64Array#19 := ColumnRef(type=ARRAY, column=ArrayTypes.Int64Array#2) - +-UInt32Array#20 := ColumnRef(type=ARRAY, column=ArrayTypes.UInt32Array#3) - +-UInt64Array#21 := ColumnRef(type=ARRAY, column=ArrayTypes.UInt64Array#4) - +-StringArray#22 := ColumnRef(type=ARRAY, column=ArrayTypes.StringArray#5) - +-BytesArray#23 := ColumnRef(type=ARRAY, column=ArrayTypes.BytesArray#6) - +-BoolArray#24 := ColumnRef(type=ARRAY, column=ArrayTypes.BoolArray#7) - +-FloatArray#25 := ColumnRef(type=ARRAY, column=ArrayTypes.FloatArray#8) - +-DoubleArray#26 := ColumnRef(type=ARRAY, column=ArrayTypes.DoubleArray#9) - +-DateArray#27 := ColumnRef(type=ARRAY, column=ArrayTypes.DateArray#10) - +-TimestampSecondsArray#28 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampSecondsArray#11) - +-TimestampMillisArray#29 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampMillisArray#12) - +-TimestampMicrosArray#30 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampMicrosArray#13) - +-TimestampArray#31 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampArray#14) + +-Int32Array#21 := ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1) + +-Int64Array#22 := ColumnRef(type=ARRAY, column=ArrayTypes.Int64Array#2) + +-UInt32Array#23 := ColumnRef(type=ARRAY, column=ArrayTypes.UInt32Array#3) + +-UInt64Array#24 := ColumnRef(type=ARRAY, column=ArrayTypes.UInt64Array#4) + +-StringArray#25 := ColumnRef(type=ARRAY, column=ArrayTypes.StringArray#5) + +-BytesArray#26 := ColumnRef(type=ARRAY, column=ArrayTypes.BytesArray#6) + +-BoolArray#27 := ColumnRef(type=ARRAY, column=ArrayTypes.BoolArray#7) + +-FloatArray#28 := ColumnRef(type=ARRAY, column=ArrayTypes.FloatArray#8) + +-DoubleArray#29 := ColumnRef(type=ARRAY, column=ArrayTypes.DoubleArray#9) + +-DateArray#30 := ColumnRef(type=ARRAY, column=ArrayTypes.DateArray#10) + +-TimestampSecondsArray#31 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampSecondsArray#11) + +-TimestampMillisArray#32 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampMillisArray#12) + +-TimestampMicrosArray#33 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampMicrosArray#13) + +-TimestampArray#34 := ColumnRef(type=ARRAY, column=ArrayTypes.TimestampArray#14) == # Test GROUP BY with an unsupported type. @@ -2638,18 +2721,18 @@ ALTERNATION GROUP: V_1_2_GROUP_BY_STRUCT,V_1_2_GROUP_BY_ARRAY -- QueryStmt +-output_column_list= -| +-$query.$col1#19 AS `$col1` [INT64] +| +-$query.$col1#22 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$query.$col1#19] + +-column_list=[$query.$col1#22] +-expr_list= - | +-$col1#19 := Literal(type=INT64, value=10) + | +-$col1#22 := Literal(type=INT64, value=10) +-input_scan= +-AggregateScan +-input_scan= | +-TableScan(column_list=[ArrayTypes.StructArray#16], table=ArrayTypes, column_index_list=[15]) +-group_by_list= - +-StructArray#18 := ColumnRef(type=ARRAY>, column=ArrayTypes.StructArray#16) + +-StructArray#21 := ColumnRef(type=ARRAY>, column=ArrayTypes.StructArray#16) == # Test GROUP BY for an array of structs containing a proto. @@ -5223,18 +5306,56 @@ select array_agg(Key limit @test_param_int32) select array_agg(Key limit KitchenSink.int32_val) from TestTable -- -ERROR: Syntax error: Unexpected identifier "KitchenSink" [at 1:28] +ALTERNATION GROUP: V_1_1_LIMIT_IN_AGGREGATE +-- +ERROR: Unrecognized name: KitchenSink [at 1:28] select array_agg(Key limit KitchenSink.int32_val) ^ +-- +ALTERNATION GROUP: +-- +ERROR: LIMIT in aggregate function arguments is not supported [at 1:22] +select array_agg(Key limit KitchenSink.int32_val) + ^ == -[language_features={{V_1_1_LIMIT_IN_AGGREGATE|}}] +[language_features={{V_1_1_LIMIT_IN_AGGREGATE|V_1_1_LIMIT_IN_AGGREGATE,V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] select array_agg(Key limit 1 + 2) from TestTable -- -ERROR: Syntax error: Expected ")" but got "+" [at 1:30] +ALTERNATION GROUP: V_1_1_LIMIT_IN_AGGREGATE +-- +ERROR: LIMIT expects an integer literal or parameter [at 1:28] select array_agg(Key limit 1 + 2) - ^ + ^ +-- +ALTERNATION GROUP: V_1_1_LIMIT_IN_AGGREGATE,V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#4 AS `$col1` [ARRAY] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#4] + +-input_scan= + +-AggregateScan + +-column_list=[$aggregate.$agg1#4] + +-input_scan= + | +-TableScan(column_list=[TestTable.key#1], table=TestTable, column_index_list=[0]) + +-aggregate_list= + +-$agg1#4 := + +-AggregateFunctionCall(ZetaSQL:array_agg(INT32) -> ARRAY) + +-ColumnRef(type=INT32, column=TestTable.key#1) + +-limit= + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + +-Literal(type=INT64, value=1) + +-Literal(type=INT64, value=2) +-- +ALTERNATION GROUP: +-- +ERROR: LIMIT in aggregate function arguments is not supported [at 1:22] +select array_agg(Key limit 1 + 2) + ^ == [language_features={{V_1_1_HAVING_IN_AGGREGATE|}}] diff --git a/zetasql/analyzer/testdata/alter_external_schema.test b/zetasql/analyzer/testdata/alter_external_schema.test new file mode 100644 index 000000000..3b586b99f --- /dev/null +++ b/zetasql/analyzer/testdata/alter_external_schema.test @@ -0,0 +1,110 @@ +# Tests for unsupported alter schema statements. +[default language_features=EXTERNAL_SCHEMA_DDL] + +ALTER EXTERNAL SCHEMA {{|IF EXISTS}} myProject.mySchema SET OPTIONS (x=1) +-- +ALTERNATION GROUP: +-- +AlterExternalSchemaStmt ++-name_path=myProject.mySchema ++-alter_action_list= + +-SetOptionsAction + +-option_list= + +-x := Literal(type=INT64, value=1) +-- +ALTERNATION GROUP: IF EXISTS +-- +AlterExternalSchemaStmt ++-name_path=myProject.mySchema ++-alter_action_list= +| +-SetOptionsAction +| +-option_list= +| +-x := Literal(type=INT64, value=1) ++-is_if_exists=TRUE +== + +[language_features=] + +ALTER EXTERNAL SCHEMA {{|IF EXISTS}} myProject.mySchema SET OPTIONS (x=1) +-- +ALTERNATION GROUP: +-- +ERROR: ALTER EXTERNAL SCHEMA is not supported [at 1:1] +ALTER EXTERNAL SCHEMA myProject.mySchema SET OPTIONS (x=1) +^ +-- +ALTERNATION GROUP: IF EXISTS +-- +ERROR: ALTER EXTERNAL SCHEMA is not supported [at 1:1] +ALTER EXTERNAL SCHEMA IF EXISTS myProject.mySchema SET OPTIONS (x=1) +^ +== + +ALTER EXTERNAL SCHEMA myProject.mySchema +SET AS JSON '{"key": "value"}'; + +-- +ERROR: ALTER EXTERNAL SCHEMA does not support SET AS [at 2:1] +SET AS JSON '{"key": "value"}'; +^ +== + +ALTER EXTERNAL SCHEMA myProject.mySchema +ADD COLUMN bar STRING; + +-- +ERROR: ALTER EXTERNAL SCHEMA does not support ADD COLUMN [at 2:1] +ADD COLUMN bar STRING; +^ +== + +ALTER EXTERNAL SCHEMA myProject.mySchema +FALSE_ACTION; + +-- +ERROR: Syntax error: Unexpected identifier "FALSE_ACTION" [at 2:1] +FALSE_ACTION; +^ +== + +[language_features=V_1_3_COLLATION_SUPPORT,V_1_3_ANNOTATION_FRAMEWORK,EXTERNAL_SCHEMA_DDL] +alter external schema if exists entity set default collate 'unicode:ci'; +-- +ERROR: ALTER EXTERNAL SCHEMA does not support SET DEFAULT COLLATE [at 1:40] +alter external schema if exists entity set default collate 'unicode:ci'; + ^ +== + +ALTER EXTERNAL SCHEMA foo SET OPTIONS(x=1), SET OPTIONS(y=5) +-- +AlterExternalSchemaStmt ++-name_path=foo ++-alter_action_list= + +-SetOptionsAction + | +-option_list= + | +-x := Literal(type=INT64, value=1) + +-SetOptionsAction + +-option_list= + +-y := Literal(type=INT64, value=5) +== + +[disallow_duplicate_options] +ALTER EXTERNAL SCHEMA foo SET OPTIONS(x=1), SET OPTIONS (x=5) +-- +AlterExternalSchemaStmt ++-name_path=foo ++-alter_action_list= + +-SetOptionsAction + | +-option_list= + | +-x := Literal(type=INT64, value=1) + +-SetOptionsAction + +-option_list= + +-x := Literal(type=INT64, value=5) +== + +[disallow_duplicate_options] +ALTER EXTERNAL SCHEMA foo SET OPTIONS(x=1, x=5) +-- +ERROR: Duplicate option specified for 'x' [at 1:44] +ALTER EXTERNAL SCHEMA foo SET OPTIONS(x=1, x=5) + ^ diff --git a/zetasql/analyzer/testdata/analytic_function_partitionby_orderby.test b/zetasql/analyzer/testdata/analytic_function_partitionby_orderby.test index 016f253f7..c238b26eb 100644 --- a/zetasql/analyzer/testdata/analytic_function_partitionby_orderby.test +++ b/zetasql/analyzer/testdata/analytic_function_partitionby_orderby.test @@ -421,7 +421,7 @@ select key as a, value as a from keyvalue order by afn_agg() over (partition by a) -- -ERROR: Name a in PARTITION BY is ambiguous; it may refer to multiple columns in the SELECT-list [at 3:39] +ERROR: Column name a is ambiguous [at 3:39] order by afn_agg() over (partition by a) ^ == @@ -430,7 +430,7 @@ select key as a, key as a from keyvalue order by afn_agg() over (partition by a) -- -ERROR: Name a in PARTITION BY is ambiguous; it may refer to multiple columns in the SELECT-list [at 3:39] +ERROR: Column name a is ambiguous [at 3:39] order by afn_agg() over (partition by a) ^ == @@ -440,7 +440,7 @@ select key as a, value as a from keyvalue order by afn_agg() over (order by a) -- -ERROR: Name a in Window ORDER BY is ambiguous; it may refer to multiple columns in the SELECT-list [at 3:35] +ERROR: Column name a is ambiguous [at 3:35] order by afn_agg() over (order by a) ^ == diff --git a/zetasql/analyzer/testdata/analytic_functions.test b/zetasql/analyzer/testdata/analytic_functions.test index 60a68bc48..9a05eb1b6 100644 --- a/zetasql/analyzer/testdata/analytic_functions.test +++ b/zetasql/analyzer/testdata/analytic_functions.test @@ -2476,3 +2476,210 @@ QueryStmt +-$analytic1#4 := +-AnalyticFunctionCall(ZetaSQL:ntile(INT64) -> INT64) +-Literal(type=INT64, value=3) +== + +# This previously triggered a bug saying it was not allowed to +# parition by or order by a proto. +select tv +from TestExtraValueTable tv +order by rank() over (partition by tv.int32_val1 + order by tv.int32_val2) +-- +QueryStmt ++-output_column_list= +| +-TestExtraValueTable.value#1 AS tv [PROTO] ++-query= + +-OrderByScan + +-column_list=[TestExtraValueTable.value#1] + +-is_ordered=TRUE + +-input_scan= + | +-AnalyticScan + | +-column_list=[TestExtraValueTable.value#1, $analytic.$analytic1#4] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[TestExtraValueTable.value#1, $partitionby.int32_val1#5, $orderby.int32_val2#6] + | | +-expr_list= + | | | +-int32_val1#5 := + | | | | +-GetProtoField + | | | | +-type=INT32 + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestExtraValueTable.value#1) + | | | | +-field_descriptor=int32_val1 + | | | | +-default_value=0 + | | | +-int32_val2#6 := + | | | +-GetProtoField + | | | +-type=INT32 + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestExtraValueTable.value#1) + | | | +-field_descriptor=int32_val2 + | | | +-default_value=0 + | | +-input_scan= + | | +-TableScan(column_list=[TestExtraValueTable.value#1], table=TestExtraValueTable, column_index_list=[0], alias="tv") + | +-function_group_list= + | +-AnalyticFunctionGroup + | +-partition_by= + | | +-WindowPartitioning + | | +-partition_by_list= + | | +-ColumnRef(type=INT32, column=$partitionby.int32_val1#5) + | +-order_by= + | | +-WindowOrdering + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | +-ColumnRef(type=INT32, column=$orderby.int32_val2#6) + | +-analytic_function_list= + | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:rank() -> INT64) + +-order_by_item_list= + +-OrderByItem + +-column_ref= + +-ColumnRef(type=INT64, column=$analytic.$analytic1#4) +== + +select tv +from TestExtraValueTable tv +order by count(*) over ({{order|partition}} by tv) +-- +ALTERNATION GROUP: order +-- +ERROR: Ordering by expressions of type PROTO is not allowed [at 3:34] +order by count(*) over (order by tv) + ^ +-- +ALTERNATION GROUP: partition +-- +ERROR: Partitioning by expressions of type PROTO is not allowed [at 3:38] +order by count(*) over (partition by tv) + ^ +== + +select tv, tv +from TestExtraValueTable tv +order by count(*) over (partition by {{tv|tv.int32_val1}}) +-- +ALTERNATION GROUP: tv +-- +ERROR: Column name tv is ambiguous [at 3:38] +order by count(*) over (partition by tv) + ^ +-- +ALTERNATION GROUP: tv.int32_val1 +-- +ERROR: Column name tv is ambiguous [at 3:38] +order by count(*) over (partition by tv.int32_val1) + ^ +== + +select count(*) cnt +from TestExtraValueTable tv +order by count(*) over (partition by cnt order by cnt) +-- +QueryStmt ++-output_column_list= +| +-$aggregate.cnt#4 AS cnt [INT64] ++-query= + +-OrderByScan + +-column_list=[$aggregate.cnt#4] + +-is_ordered=TRUE + +-input_scan= + | +-AnalyticScan + | +-column_list=[$aggregate.cnt#4, $analytic.$analytic1#5] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$aggregate.cnt#4] + | | +-input_scan= + | | | +-TableScan(table=TestExtraValueTable, alias="tv") + | | +-aggregate_list= + | | +-cnt#4 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | +-function_group_list= + | +-AnalyticFunctionGroup + | +-partition_by= + | | +-WindowPartitioning + | | +-partition_by_list= + | | +-ColumnRef(type=INT64, column=$aggregate.cnt#4) + | +-order_by= + | | +-WindowOrdering + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | +-ColumnRef(type=INT64, column=$aggregate.cnt#4) + | +-analytic_function_list= + | +-$analytic1#5 := + | +-AnalyticFunctionCall(ZetaSQL:$count_star() -> INT64) + | +-window_frame= + | +-WindowFrame(frame_unit=RANGE) + | +-start_expr= + | | +-WindowFrameExpr(boundary_type=UNBOUNDED PRECEDING) + | +-end_expr= + | +-WindowFrameExpr(boundary_type=CURRENT ROW) + +-order_by_item_list= + +-OrderByItem + +-column_ref= + +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) +== + +select count(*) cnt +from TestExtraValueTable tv +order by count(*) over ({{order|partition}} by sum(cnt)) +-- +ALTERNATION GROUP: order +-- +ERROR: Aggregations of aggregations are not allowed [at 3:38] +order by count(*) over (order by sum(cnt)) + ^ +-- +ALTERNATION GROUP: partition +-- +ERROR: Aggregations of aggregations are not allowed [at 3:42] +order by count(*) over (partition by sum(cnt)) + ^ +== + +select count(*) OVER () AS cnt +from TestExtraValueTable tv +order by count(*) over ({{order|partition}} by cnt) +-- +ALTERNATION GROUP: order +-- +ERROR: Column cnt contains an analytic function, which is not allowed in Window ORDER BY [at 3:34] +order by count(*) over (order by cnt) + ^ +-- +ALTERNATION GROUP: partition +-- +ERROR: Column cnt contains an analytic function, which is not allowed in PARTITION BY [at 3:38] +order by count(*) over (partition by cnt) + ^ +== + +select count(*) OVER () AS cnt +from TestExtraValueTable tv +order by count(*) over ({{order|partition}} by sum(cnt)) +-- +ALTERNATION GROUP: order +-- +ERROR: Analytic functions cannot be arguments to aggregate functions [at 3:38] +order by count(*) over (order by sum(cnt)) + ^ +-- +ALTERNATION GROUP: partition +-- +ERROR: Analytic functions cannot be arguments to aggregate functions [at 3:42] +order by count(*) over (partition by sum(cnt)) + ^ +== + +select STRUCT(count(*) AS cnt) s_cnt +from TestExtraValueTable tv +order by count(*) over ({{order|partition}} by s_cnt.cnt, sum(s_cnt.cnt)) +-- +ALTERNATION GROUP: order +-- +ERROR: Aggregations of aggregations are not allowed [at 3:49] +order by count(*) over (order by s_cnt.cnt, sum(s_cnt.cnt)) + ^ +-- +ALTERNATION GROUP: partition +-- +ERROR: Aggregations of aggregations are not allowed [at 3:53] +order by count(*) over (partition by s_cnt.cnt, sum(s_cnt.cnt)) + ^ diff --git a/zetasql/analyzer/testdata/anon_functions.test b/zetasql/analyzer/testdata/anon_functions.test index 4def3ff3d..9d9022559 100644 --- a/zetasql/analyzer/testdata/anon_functions.test +++ b/zetasql/analyzer/testdata/anon_functions.test @@ -1,5 +1,7 @@ # Valid function calls [language_features=ANONYMIZATION] +[default also_show_signature_mismatch_details] + select ANON_COUNT(string clamped between 0 and 1), ANON_COUNT(* clamped between 0 and 1) from SimpleTypes -- QueryStmt @@ -222,15 +224,39 @@ ERROR: No matching signature for aggregate operator ANON_COUNT(*) for argument t select ANON_COUNT(* clamped between 0 and "def") from SimpleTypes ^ -- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_COUNT(*) + Argument types: INT64, STRING + Signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64]) + Argument 2: Unable to coerce type STRING to expected type INT64 [at 1:8] +select ANON_COUNT(* clamped between 0 and "def") from SimpleTypes + ^ +-- ALTERNATION GROUP: "abc",1 -- ERROR: No matching signature for aggregate operator ANON_COUNT(*) for argument types: STRING, INT64. Supported signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64]) [at 1:8] select ANON_COUNT(* clamped between "abc" and 1) from SimpleTypes ^ -- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_COUNT(*) + Argument types: STRING, INT64 + Signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64]) + Argument 1: Unable to coerce type STRING to expected type INT64 [at 1:8] +select ANON_COUNT(* clamped between "abc" and 1) from SimpleTypes + ^ +-- ALTERNATION GROUP: "abc","def" -- ERROR: No matching signature for aggregate operator ANON_COUNT(*) for argument types: STRING, STRING. Supported signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64]) [at 1:8] +select ANON_COUNT(* clamped between "abc" and "def") from SimpleTypes + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_COUNT(*) + Argument types: STRING, STRING + Signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64]) + Argument 1: Unable to coerce type STRING to expected type INT64 [at 1:8] select ANON_COUNT(* clamped between "abc" and "def") from SimpleTypes ^ == @@ -274,15 +300,39 @@ ERROR: No matching signature for aggregate function ANON_COUNT for argument type select ANON_COUNT(string clamped between 0 and "def") from SimpleTypes ^ -- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_COUNT + Argument types: STRING, INT64, STRING + Signature: ANON_COUNT(ANY [CLAMPED BETWEEN INT64 AND INT64]) + Argument 3: Unable to coerce type STRING to expected type INT64 [at 1:8] +select ANON_COUNT(string clamped between 0 and "def") from SimpleTypes + ^ +-- ALTERNATION GROUP: "abc",1 -- ERROR: No matching signature for aggregate function ANON_COUNT for argument types: STRING, STRING, INT64. Supported signature: ANON_COUNT(ANY [CLAMPED BETWEEN INT64 AND INT64]) [at 1:8] select ANON_COUNT(string clamped between "abc" and 1) from SimpleTypes ^ -- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_COUNT + Argument types: STRING, STRING, INT64 + Signature: ANON_COUNT(ANY [CLAMPED BETWEEN INT64 AND INT64]) + Argument 2: Unable to coerce type STRING to expected type INT64 [at 1:8] +select ANON_COUNT(string clamped between "abc" and 1) from SimpleTypes + ^ +-- ALTERNATION GROUP: "abc","def" -- ERROR: No matching signature for aggregate function ANON_COUNT for argument types: STRING, STRING, STRING. Supported signature: ANON_COUNT(ANY [CLAMPED BETWEEN INT64 AND INT64]) [at 1:8] +select ANON_COUNT(string clamped between "abc" and "def") from SimpleTypes + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_COUNT + Argument types: STRING, STRING, STRING + Signature: ANON_COUNT(ANY [CLAMPED BETWEEN INT64 AND INT64]) + Argument 2: Unable to coerce type STRING to expected type INT64 [at 1:8] select ANON_COUNT(string clamped between "abc" and "def") from SimpleTypes ^ == @@ -367,6 +417,18 @@ QueryStmt select ANON_SUM(string clamped between 0 and 1) from SimpleTypes -- ERROR: No matching signature for aggregate function ANON_SUM for argument types: STRING, INT64, INT64. Supported signatures: ANON_SUM(INT64 [CLAMPED BETWEEN INT64 AND INT64]), ANON_SUM(UINT64 [CLAMPED BETWEEN UINT64 AND UINT64]), ANON_SUM(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) [at 1:8] +select ANON_SUM(string clamped between 0 and 1) from SimpleTypes + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_SUM + Argument types: STRING, INT64, INT64 + Signature: ANON_SUM(INT64 [CLAMPED BETWEEN INT64 AND INT64]) + Argument 1: Unable to coerce type STRING to expected type INT64 + Signature: ANON_SUM(UINT64 [CLAMPED BETWEEN UINT64 AND UINT64]) + Argument 1: Unable to coerce type STRING to expected type UINT64 + Signature: ANON_SUM(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) + Argument 1: Unable to coerce type STRING to expected type DOUBLE [at 1:8] select ANON_SUM(string clamped between 0 and 1) from SimpleTypes ^ == @@ -376,6 +438,14 @@ select ANON_SUM(string clamped between 0 and 1) from SimpleTypes select ANON_AVG(string clamped between 0 and 1) from SimpleTypes -- ERROR: No matching signature for aggregate function ANON_AVG for argument types: STRING, INT64, INT64. Supported signature: ANON_AVG(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) [at 1:8] +select ANON_AVG(string clamped between 0 and 1) from SimpleTypes + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_AVG + Argument types: STRING, INT64, INT64 + Signature: ANON_AVG(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) + Argument 1: Unable to coerce type STRING to expected type DOUBLE [at 1:8] select ANON_AVG(string clamped between 0 and 1) from SimpleTypes ^ == @@ -678,6 +748,14 @@ select with anonymization ANON_VAR_POP(double_array) from ArrayWithAnonymizationUid; -- ERROR: No matching signature for aggregate function ANON_VAR_POP for argument types: ARRAY. Supported signatures: ANON_VAR_POP(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) [at 1:27] +select with anonymization ANON_VAR_POP(double_array) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_VAR_POP + Argument types: ARRAY + Signature: ANON_VAR_POP(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) + Argument 1: Unable to coerce type ARRAY to expected type DOUBLE [at 1:27] select with anonymization ANON_VAR_POP(double_array) ^ == @@ -689,6 +767,14 @@ select with anonymization ANON_STDDEV_POP(double_array) from ArrayWithAnonymizationUid; -- ERROR: No matching signature for aggregate function ANON_STDDEV_POP for argument types: ARRAY. Supported signatures: ANON_STDDEV_POP(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) [at 1:27] +select with anonymization ANON_STDDEV_POP(double_array) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_STDDEV_POP + Argument types: ARRAY + Signature: ANON_STDDEV_POP(DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) + Argument 1: Unable to coerce type ARRAY to expected type DOUBLE [at 1:27] select with anonymization ANON_STDDEV_POP(double_array) ^ == @@ -700,6 +786,14 @@ select with anonymization ANON_PERCENTILE_CONT(double_array, 0.4) from ArrayWithAnonymizationUid; -- ERROR: No matching signature for aggregate function ANON_PERCENTILE_CONT for argument types: ARRAY, DOUBLE. Supported signatures: ANON_PERCENTILE_CONT(DOUBLE, DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) [at 1:27] +select with anonymization ANON_PERCENTILE_CONT(double_array, 0.4) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_PERCENTILE_CONT + Argument types: ARRAY, DOUBLE + Signature: ANON_PERCENTILE_CONT(DOUBLE, DOUBLE [CLAMPED BETWEEN DOUBLE AND DOUBLE]) + Argument 1: Unable to coerce type ARRAY to expected type DOUBLE [at 1:27] select with anonymization ANON_PERCENTILE_CONT(double_array, 0.4) ^ == @@ -812,6 +906,14 @@ select with anonymization ANON_QUANTILES(double_array, 4) from ArrayWithAnonymizationUid; -- ERROR: No matching signature for aggregate function ANON_QUANTILES for argument types: ARRAY, INT64. Supported signatures: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE) [at 1:27] +select with anonymization ANON_QUANTILES(double_array, 4) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_QUANTILES + Argument types: ARRAY, INT64 + Signature: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE) + Signature requires at least 4 arguments, found 2 arguments [at 1:27] select with anonymization ANON_QUANTILES(double_array, 4) ^ == @@ -843,6 +945,14 @@ select with anonymization ANON_QUANTILES(double, double) from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate function ANON_QUANTILES for argument types: DOUBLE, DOUBLE. Supported signatures: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE) [at 1:27] +select with anonymization ANON_QUANTILES(double, double) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_QUANTILES + Argument types: DOUBLE, DOUBLE + Signature: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE) + Signature requires at least 4 arguments, found 2 arguments [at 1:27] select with anonymization ANON_QUANTILES(double, double) ^ == @@ -853,6 +963,14 @@ select with anonymization ANON_QUANTILES(double, int64) from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate function ANON_QUANTILES for argument types: DOUBLE, INT64. Supported signatures: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE) [at 1:27] +select with anonymization ANON_QUANTILES(double, int64) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate function ANON_QUANTILES + Argument types: DOUBLE, INT64 + Signature: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE) + Signature requires at least 4 arguments, found 2 arguments [at 1:27] select with anonymization ANON_QUANTILES(double, int64) ^ == diff --git a/zetasql/analyzer/testdata/anonymization.test b/zetasql/analyzer/testdata/anonymization.test index e409b31c3..4d02f13a9 100644 --- a/zetasql/analyzer/testdata/anonymization.test +++ b/zetasql/analyzer/testdata/anonymization.test @@ -1,5 +1,7 @@ # Specify WITH ANONYMIZATION but no FROM clause [default language_features=ANONYMIZATION,NUMERIC_TYPE] +[default also_show_signature_mismatch_details] + select with anonymization sum(); -- ERROR: SELECT without FROM clause cannot specify WITH ANONYMIZATION [at 1:1] @@ -6849,6 +6851,14 @@ select with anonymization anon_count(* clampED Between 10 and "100" with report( from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate operator ANON_COUNT(*) for argument types: INT64, STRING. Supported signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64] WITH REPORT(FORMAT=PROTO)) [at 1:27] +select with anonymization anon_count(* clampED Between 10 and "100" with repo... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_COUNT(*) + Argument types: INT64, STRING + Signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64] WITH REPORT(FORMAT=PROTO)) + Argument 2: Unable to coerce type STRING to expected type INT64 [at 1:27] select with anonymization anon_count(* clampED Between 10 and "100" with repo... ^ == @@ -6859,6 +6869,14 @@ select with anonymization anon_count(* clampED Between bool and 100 with report( from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate operator ANON_COUNT(*) for argument types: BOOL, INT64. Supported signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64] WITH REPORT(FORMAT=JSON)) [at 1:27] +select with anonymization anon_count(* clampED Between bool and 100 with repo... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_COUNT(*) + Argument types: BOOL, INT64 + Signature: ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64] WITH REPORT(FORMAT=JSON)) + Argument 1: Unable to coerce type BOOL to expected type INT64 [at 1:27] select with anonymization anon_count(* clampED Between bool and 100 with repo... ^ == @@ -7711,6 +7729,14 @@ select with anonymization ANON_QUANTILES(int64, 4 CLAMPED BETWEEN bool AND 3 WIT from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate operator ANON_QUANTILES for argument types: INT64, INT64, BOOL, INT64. Supported signatures: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) [at 1:27] +select with anonymization ANON_QUANTILES(int64, 4 CLAMPED BETWEEN bool AND 3 ... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_QUANTILES + Argument types: INT64, INT64, BOOL, INT64 + Signature: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) + Argument 3: Unable to coerce type BOOL to expected type DOUBLE [at 1:27] select with anonymization ANON_QUANTILES(int64, 4 CLAMPED BETWEEN bool AND 3 ... ^ == @@ -7721,6 +7747,14 @@ select with anonymization ANON_QUANTILES(double, 4 CLAMPED BETWEEN bool AND 3 WI from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate operator ANON_QUANTILES for argument types: DOUBLE, INT64, BOOL, INT64. Supported signatures: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) [at 1:27] +select with anonymization ANON_QUANTILES(double, 4 CLAMPED BETWEEN bool AND 3... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_QUANTILES + Argument types: DOUBLE, INT64, BOOL, INT64 + Signature: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) + Argument 3: Unable to coerce type BOOL to expected type DOUBLE [at 1:27] select with anonymization ANON_QUANTILES(double, 4 CLAMPED BETWEEN bool AND 3... ^ == @@ -7731,6 +7765,14 @@ select with anonymization ANON_QUANTILES(int64, 4 CLAMPED BETWEEN 2 AND bool WIT from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate operator ANON_QUANTILES for argument types: INT64, INT64, INT64, BOOL. Supported signatures: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) [at 1:27] +select with anonymization ANON_QUANTILES(int64, 4 CLAMPED BETWEEN 2 AND bool ... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_QUANTILES + Argument types: INT64, INT64, INT64, BOOL + Signature: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) + Argument 4: Unable to coerce type BOOL to expected type DOUBLE [at 1:27] select with anonymization ANON_QUANTILES(int64, 4 CLAMPED BETWEEN 2 AND bool ... ^ == @@ -7741,6 +7783,14 @@ select with anonymization ANON_QUANTILES(double, 4 CLAMPED BETWEEN 2 AND bool WI from SimpleTypesWithAnonymizationUid; -- ERROR: No matching signature for aggregate operator ANON_QUANTILES for argument types: DOUBLE, INT64, INT64, BOOL. Supported signatures: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) [at 1:27] +select with anonymization ANON_QUANTILES(double, 4 CLAMPED BETWEEN 2 AND bool... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator ANON_QUANTILES + Argument types: DOUBLE, INT64, INT64, BOOL + Signature: ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH REPORT(FORMAT=PROTO)) + Argument 4: Unable to coerce type BOOL to expected type DOUBLE [at 1:27] select with anonymization ANON_QUANTILES(double, 4 CLAMPED BETWEEN 2 AND bool... ^ == @@ -8949,3 +8999,266 @@ GROUP BY int32 Rewrite ERROR: Reading the table SimpleTypesWithAnonymizationUid containing user data in expression subqueries is not allowed [at 7:31] EXISTS(SELECT 1 FROM SimpleTypesWithAnonymizationUid), ^ +== + +# GROUPING function is unsupported in with anonymization clause. +[language_features=ANONYMIZATION,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT WITH ANONYMIZATION GROUPING(int64), int64, ANON_SUM(int32) +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double; +-- +ERROR: GROUPING function is not supported in anonymization queries [at 1:27] +SELECT WITH ANONYMIZATION GROUPING(int64), int64, ANON_SUM(int32) + ^ +== + +# GROUPING function is unsupported in the same select list with anonymization. +[language_features=ANONYMIZATION,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT WITH ANONYMIZATION int64, GROUPING(int64), ANON_SUM(int32) +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double; +-- +ERROR: GROUPING function is not supported in anonymization queries [at 1:34] +SELECT WITH ANONYMIZATION int64, GROUPING(int64), ANON_SUM(int32) + ^ +== + +# GROUPING function is unsupported in the same query with the anonymization. +[language_features=ANONYMIZATION,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT WITH ANONYMIZATION int64, ANON_SUM(int32) +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double HAVING GROUPING(int64) = 0; +-- +ERROR: GROUPING function is not supported in anonymization queries [at 3:31] +GROUP BY int64, double HAVING GROUPING(int64) = 0; + ^ +== + +# GROUPING function should be allowed in a subquery. +# GROUPING function should be allowed in a subquery. +[language_features=ANONYMIZATION,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT + WITH ANONYMIZATION int64, ANON_SUM(int32), + EXISTS(SELECT GROUPING(key) FROM KeyValue GROUP BY key) +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double; +-- +QueryStmt ++-output_column_list= +| +-$groupby.int64#19 AS int64 [INT64] +| +-$aggregate.$agg1#13 AS `$col2` [INT64] +| +-$query.$col3#26 AS `$col3` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$groupby.int64#19, $aggregate.$agg1#13, $query.$col3#26] + +-expr_list= + | +-$col3#26 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=EXISTS + | +-subquery= + | +-ProjectScan + | +-column_list=[$grouping_call.$grouping_call1#25] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.key#24, $grouping_call.$grouping_call1#25] + | +-input_scan= + | | +-TableScan(column_list=[KeyValue.Key#21], table=KeyValue, column_index_list=[0]) + | +-group_by_list= + | | +-key#24 := ColumnRef(type=INT64, column=KeyValue.Key#21) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.key#24) + | +-output_column=$grouping_call.$grouping_call1#25 + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.int64#19, $aggregate.$agg1#13] + +-input_scan= + | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int32#1, int64#2, double#9], table=SimpleTypesWithAnonymizationUid, column_index_list=[0, 1, 8]) + +-group_by_list= + | +-int64#19 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | +-double#20 := ColumnRef(type=DOUBLE, column=SimpleTypesWithAnonymizationUid.double#9) + +-aggregate_list= + +-$agg1#13 := + +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + +-Cast(INT32 -> INT64) + +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.int64#19 AS int64 [INT64] +| +-$aggregate.$agg1#13 AS `$col2` [INT64] +| +-$query.$col3#26 AS `$col3` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$groupby.int64#19, $aggregate.$agg1#13, $query.$col3#26] + +-expr_list= + | +-$col3#26 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=EXISTS + | +-subquery= + | +-ProjectScan + | +-column_list=[$grouping_call.$grouping_call1#25] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.key#24, $grouping_call.$grouping_call1#25] + | +-input_scan= + | | +-TableScan(column_list=[KeyValue.Key#21], table=KeyValue, column_index_list=[0]) + | +-group_by_list= + | | +-key#24 := ColumnRef(type=INT64, column=KeyValue.Key#21) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.key#24) + | +-output_column=$grouping_call.$grouping_call1#25 + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.int64#19, $aggregate.$agg1#13] + +-input_scan= + | +-AggregateScan + | +-column_list=[$aggregate.$agg1_partial#29, $groupby.int64_partial#30, $groupby.double_partial#31, $group_by.$uid#32] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int32#1, int64#2, double#9, uid#27], table=SimpleTypesWithAnonymizationUid, column_index_list=[0, 1, 8, 10]) + | +-group_by_list= + | | +-int64_partial#30 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-double_partial#31 := ColumnRef(type=DOUBLE, column=SimpleTypesWithAnonymizationUid.double#9) + | | +-$uid#32 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#27) + | +-aggregate_list= + | +-$agg1_partial#29 := + | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) + +-group_by_list= + | +-int64#19 := ColumnRef(type=INT64, column=$groupby.int64_partial#30) + | +-double#20 := ColumnRef(type=DOUBLE, column=$groupby.double_partial#31) + +-aggregate_list= + | +-$agg1#13 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#29) + | +-$k_threshold_col#35 := + | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(1) INT64, optional(1) INT64) -> INT64) + | +-Literal(type=INT64, value=1) + | +-Literal(type=INT64, value=0) + | +-Literal(type=INT64, value=1) + +-k_threshold_expr= + +-ColumnRef(type=INT64, column=$anon.$k_threshold_col#35) +== + +# GROUPING function should be allowed in the with clause. +[language_features=ANONYMIZATION,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +WITH T AS ( + SELECT uid, int64, GROUPING(int64) AS grp + FROM SimpleTypesWithAnonymizationUid + GROUP BY uid, int64 +) +SELECT WITH ANONYMIZATION int64, ANON_SUM(int64) +FROM T +GROUP BY int64; +-- +QueryStmt ++-output_column_list= +| +-$groupby.int64#21 AS int64 [INT64] +| +-$aggregate.$agg1#20 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#21, $aggregate.$agg1#20] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#11], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | +-group_by_list= + | | +-uid#14 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-int64#15 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.int64#15) + | +-output_column=$grouping_call.$grouping_call1#16 + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#21, $aggregate.$agg1#20] + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.int64#21, $aggregate.$agg1#20] + +-input_scan= + | +-WithRefScan(column_list=T.[uid#17, int64#18, grp#19], with_query_name="T") + +-group_by_list= + | +-int64#21 := ColumnRef(type=INT64, column=T.int64#18) + +-aggregate_list= + +-$agg1#20 := + +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + +-ColumnRef(type=INT64, column=T.int64#18) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.int64#21 AS int64 [INT64] +| +-$aggregate.$agg1#20 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#21, $aggregate.$agg1#20] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#11], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | +-group_by_list= + | | +-uid#14 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-int64#15 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.int64#15) + | +-output_column=$grouping_call.$grouping_call1#16 + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#21, $aggregate.$agg1#20] + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.int64#21, $aggregate.$agg1#20] + +-input_scan= + | +-AggregateScan + | +-column_list=[$aggregate.$agg1_partial#23, $groupby.int64_partial#24, $group_by.$uid#25] + | +-input_scan= + | | +-WithRefScan(column_list=T.[uid#17, int64#18, grp#19], with_query_name="T") + | +-group_by_list= + | | +-int64_partial#24 := ColumnRef(type=INT64, column=T.int64#18) + | | +-$uid#25 := ColumnRef(type=INT64, column=T.uid#17) + | +-aggregate_list= + | +-$agg1_partial#23 := + | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | +-ColumnRef(type=INT64, column=T.int64#18) + +-group_by_list= + | +-int64#21 := ColumnRef(type=INT64, column=$groupby.int64_partial#24) + +-aggregate_list= + | +-$agg1#20 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#23) + | +-$k_threshold_col#28 := + | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(1) INT64, optional(1) INT64) -> INT64) + | +-Literal(type=INT64, value=1) + | +-Literal(type=INT64, value=0) + | +-Literal(type=INT64, value=1) + +-k_threshold_expr= + +-ColumnRef(type=INT64, column=$anon.$k_threshold_col#28) diff --git a/zetasql/analyzer/testdata/anonymization_group_selection_strategy.test b/zetasql/analyzer/testdata/anonymization_group_selection_strategy.test index 6582a77d5..9421cbaa4 100644 --- a/zetasql/analyzer/testdata/anonymization_group_selection_strategy.test +++ b/zetasql/analyzer/testdata/anonymization_group_selection_strategy.test @@ -1,4 +1,4 @@ -[default language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS] +[default language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY] [default enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] [default no_run_unparser] # When no group_selection_strategy is set, default to laplace thresholding. @@ -262,6 +262,33 @@ QueryStmt Rewrite ERROR: Anonymization option group_selection_strategy PUBLIC_GROUPS has not been enabled == +# public groups requires V_1_1_WITH_ON_SUBQUERY feature +[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS] +select with anonymization options(group_selection_strategy=PUBLIC_GROUPS) +anon_count(*) +from SimpleTypesWithAnonymizationUid; +-- +[PRE-REWRITE AST] +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#13 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#13] + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$aggregate.$agg1#13] + +-input_scan= + | +-TableScan(table=SimpleTypesWithAnonymizationUid) + +-aggregate_list= + | +-$agg1#13 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + +-anonymization_option_list= + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +Rewrite ERROR: Anonymization option group_selection_strategy PUBLIC_GROUPS is not supported without support for WITH subqueries +== + # Happy path without max_groups_contributed and no GROUP BY. select with anonymization options(group_selection_strategy=PUBLIC_GROUPS) anon_count(*) @@ -413,9 +440,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -451,9 +479,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-has_using=TRUE | +-group_by_list= | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -507,9 +536,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -524,22 +554,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -560,9 +590,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -629,9 +660,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-uid#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -646,22 +678,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#40] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#41], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#40 := ColumnRef(type=INT64, column=SimpleTypes.int64#41) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#40] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#41], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#40 := ColumnRef(type=INT64, column=SimpleTypes.int64#41) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -682,9 +714,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-uid_partial#36 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -752,9 +785,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#21], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#19) +-aggregate_list= @@ -769,22 +803,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -805,9 +839,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#19) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -878,9 +913,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#21], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$groupby.int64#19) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$groupby.int64#19) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$groupby.int64#19) +-aggregate_list= @@ -896,25 +932,25 @@ QueryStmt | +-$groupby.int64#33 AS int64 [INT64] | +-$aggregate.$agg1#32 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.int64#33, $aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-ProjectScan - | +-column_list=[$groupby.int64#41] - | +-input_scan= - | +-AggregateScan - | +-column_list=[$groupby.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.int64#33, $aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#41] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.int64#33, $aggregate.$agg1#32] +-input_scan= @@ -935,9 +971,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=$groupby.int64#19) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$groupby.int64#19) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$groupby.int64#19) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -1011,9 +1048,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#22], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$groupby.int64#19) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#22) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$groupby.int64#19) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#22) + | +-has_using=TRUE +-group_by_list= | +-int64#34 := ColumnRef(type=INT64, column=$groupby.int64#19) +-aggregate_list= @@ -1067,9 +1105,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#14, uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10], alias="b") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.int64#43] @@ -1078,9 +1117,10 @@ QueryStmt | | +-group_by_list= | | +-int64#43 := ColumnRef(type=INT64, column=SimpleTypes.int64#26) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) - | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) + | | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | +-has_using=TRUE +-group_by_list= | +-int64#45 := ColumnRef(type=INT64, column=$distinct.int64#43) +-aggregate_list= @@ -1096,22 +1136,22 @@ QueryStmt | +-$groupby.int64#45 AS int64 [INT64] | +-$aggregate.$agg1#44 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.int64#45, $aggregate.$agg1#44] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#52] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#53], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#52 := ColumnRef(type=INT64, column=SimpleTypes.int64#53) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.int64#45, $aggregate.$agg1#44] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#52] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#53], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#52 := ColumnRef(type=INT64, column=SimpleTypes.int64#53) + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.int64#45, $aggregate.$agg1#44] +-input_scan= @@ -1139,15 +1179,17 @@ QueryStmt | | | | | +-right_scan= | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#14, uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10], alias="b") | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | | | +-has_using=TRUE | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#43], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#48 := ColumnRef(type=INT64, column=$distinct.int64#43) | | | | +-$uid#49 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) @@ -1214,13 +1256,14 @@ QueryStmt | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-string#32 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | +-has_using=TRUE +-group_by_list= | +-int64#34 := ColumnRef(type=INT64, column=$distinct.int64#31) | +-string#35 := ColumnRef(type=STRING, column=$distinct.string#32) @@ -1236,23 +1279,23 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#33 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#33] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=$distinct.[int64#45, string#46] - | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int64#47, string#48], table=SimpleTypes, column_index_list=[1, 4]) - | +-group_by_list= - | +-int64#45 := ColumnRef(type=INT64, column=SimpleTypes.int64#47) - | +-string#46 := ColumnRef(type=STRING, column=SimpleTypes.string#48) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#33] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=$distinct.[int64#45, string#46] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int64#47, string#48], table=SimpleTypes, column_index_list=[1, 4]) + | +-group_by_list= + | +-int64#45 := ColumnRef(type=INT64, column=SimpleTypes.int64#47) + | +-string#46 := ColumnRef(type=STRING, column=SimpleTypes.string#48) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#33] +-input_scan= @@ -1273,13 +1316,14 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=$distinct.[int64#31, string#32], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | | | | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#39 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-string_partial#40 := ColumnRef(type=STRING, column=$distinct.string#32) @@ -1360,9 +1404,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -1398,9 +1443,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-has_using=TRUE | +-group_by_list= | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -1451,9 +1497,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -1555,9 +1602,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#34 := ColumnRef(type=INT64, column=$full_join.int64#32) +-aggregate_list= @@ -1601,9 +1649,10 @@ QueryStmt | | +-group_by_list= | | +-uid#25 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | +-ColumnRef(type=INT64, column=$distinct.uid#25) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-ColumnRef(type=INT64, column=$distinct.uid#25) + | +-has_using=TRUE +-group_by_list= | +-uid#27 := ColumnRef(type=INT64, column=$distinct.uid#25) +-aggregate_list= @@ -1756,9 +1805,10 @@ QueryStmt | | +-group_by_list= | | +-int64#26 := ColumnRef(type=INT64, column=$distinct.int64#25) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#26) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#26) + | +-has_using=TRUE +-group_by_list= | +-int64#28 := ColumnRef(type=INT64, column=$distinct.int64#26) +-aggregate_list= @@ -1797,9 +1847,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypes.int64#14], table=SimpleTypes, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#14) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#14) + | +-has_using=TRUE +-group_by_list= | +-int64#32 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) +-aggregate_list= @@ -1838,9 +1889,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#20], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | +-has_using=TRUE +-group_by_list= | +-int64#32 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) +-aggregate_list= @@ -1882,9 +1934,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#20], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | +-has_using=TRUE +-group_by_list= | +-int64#32 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) +-aggregate_list= @@ -1929,9 +1982,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -1946,22 +2000,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -1982,9 +2036,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -2067,22 +2122,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -2203,28 +2258,28 @@ QueryStmt | +-$groupby.public_dayofweek#16 AS public_dayofweek [INT64] | +-$aggregate.$agg1#15 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.public_dayofweek#24] - | +-input_scan= - | | +-ArrayScan - | | +-column_list=[$array.public_dayofweek#25] - | | +-array_expr_list= - | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) - | | | +-Literal(type=INT64, value=1) - | | | +-Literal(type=INT64, value=7) - | | +-element_column_list=[$array.public_dayofweek#25] - | +-group_by_list= - | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.public_dayofweek#24] + | +-input_scan= + | | +-ArrayScan + | | +-column_list=[$array.public_dayofweek#25] + | | +-array_expr_list= + | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=7) + | | +-element_column_list=[$array.public_dayofweek#25] + | +-group_by_list= + | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] +-input_scan= @@ -2350,28 +2405,28 @@ QueryStmt | +-$groupby.public_dayofweek#16 AS public_dayofweek [INT64] | +-$aggregate.$agg1#15 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.public_dayofweek#24] - | +-input_scan= - | | +-ArrayScan - | | +-column_list=[$array.public_dayofweek#25] - | | +-array_expr_list= - | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) - | | | +-Literal(type=INT64, value=1) - | | | +-Literal(type=INT64, value=7) - | | +-element_column_list=[$array.public_dayofweek#25] - | +-group_by_list= - | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.public_dayofweek#24] + | +-input_scan= + | | +-ArrayScan + | | +-column_list=[$array.public_dayofweek#25] + | | +-array_expr_list= + | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=7) + | | +-element_column_list=[$array.public_dayofweek#25] + | +-group_by_list= + | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] +-input_scan= @@ -2468,9 +2523,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-$groupbycol1#33 := | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) @@ -2525,9 +2581,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.string#50] @@ -2536,9 +2593,10 @@ QueryStmt | | +-group_by_list= | | +-string#50 := ColumnRef(type=STRING, column=SimpleTypes.string#36) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | +-has_using=TRUE +-group_by_list= | +-int64#52 := ColumnRef(type=INT64, column=$distinct.int64#31) | +-string#53 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -2554,31 +2612,31 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#51 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#51] - +-with_entry_list= - | +-WithEntry - | | +-with_query_name="$public_groups0" - | | +-with_subquery= - | | +-AggregateScan - | | +-column_list=[$distinct.int64#63] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) - | | +-group_by_list= - | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) - | +-WithEntry - | +-with_query_name="$public_groups1" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#65] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#51] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#63] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) + | +-WithEntry + | +-with_query_name="$public_groups1" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#65] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#51] +-input_scan= @@ -2606,15 +2664,17 @@ QueryStmt | | | | | | +-right_scan= | | | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | | | +-join_expr= - | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | | +-has_using=TRUE | | | | | +-right_scan= | | | | | | +-WithRefScan(column_list=[$distinct.string#50], with_query_name="$public_groups1") | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-int64_partial#57 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | | +-string_partial#58 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -2695,9 +2755,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, string#24], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.string#50] @@ -2706,9 +2767,10 @@ QueryStmt | | +-group_by_list= | | +-string#50 := ColumnRef(type=STRING, column=SimpleTypes.string#36) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) - | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) + | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | +-has_using=TRUE +-group_by_list= | +-int64#52 := ColumnRef(type=INT64, column=$distinct.int64#19) | +-string#53 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -2724,31 +2786,31 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#51 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#51] - +-with_entry_list= - | +-WithEntry - | | +-with_query_name="$public_groups0" - | | +-with_subquery= - | | +-AggregateScan - | | +-column_list=[$distinct.int64#63] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) - | | +-group_by_list= - | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) - | +-WithEntry - | +-with_query_name="$public_groups1" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#65] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#51] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#63] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) + | +-WithEntry + | +-with_query_name="$public_groups1" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#65] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#51] +-input_scan= @@ -2776,15 +2838,17 @@ QueryStmt | | | | | | +-right_scan= | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, string#24, uid#54], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4, 10]) | | | | | | +-join_expr= - | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | | +-has_using=TRUE | | | | | +-right_scan= | | | | | | +-WithRefScan(column_list=[$distinct.string#50], with_query_name="$public_groups1") | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-int64_partial#57 := ColumnRef(type=INT64, column=$distinct.int64#19) | | | | | +-string_partial#58 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -2873,13 +2937,15 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#40, string#43], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=$distinct.string#38) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$distinct.string#38) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | +-has_using=TRUE | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | +-has_using=TRUE +-group_by_list= | +-int64#52 := ColumnRef(type=INT64, column=$distinct.int64#19) | +-string#53 := ColumnRef(type=STRING, column=$distinct.string#38) @@ -2895,31 +2961,31 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#51 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#51] - +-with_entry_list= - | +-WithEntry - | | +-with_query_name="$public_groups0" - | | +-with_subquery= - | | +-AggregateScan - | | +-column_list=[$distinct.string#63] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#64], table=SimpleTypes, column_index_list=[4]) - | | +-group_by_list= - | | +-string#63 := ColumnRef(type=STRING, column=SimpleTypes.string#64) - | +-WithEntry - | +-with_query_name="$public_groups1" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#65] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#66], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#65 := ColumnRef(type=INT64, column=SimpleTypes.int64#66) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#51] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.string#63] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#64], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#63 := ColumnRef(type=STRING, column=SimpleTypes.string#64) + | +-WithEntry + | +-with_query_name="$public_groups1" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#65] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#66], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#65 := ColumnRef(type=INT64, column=SimpleTypes.int64#66) + +-query= +-AnonymizedAggregateScan +-column_list=[$aggregate.$agg1#51] +-input_scan= @@ -2949,13 +3015,15 @@ QueryStmt | | | | | | +-right_scan= | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#40, string#43, uid#54], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4, 10]) | | | | | | +-join_expr= - | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#38) - | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#38) + | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | | | | | +-has_using=TRUE | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-int64_partial#57 := ColumnRef(type=INT64, column=$distinct.int64#19) | | | | | +-string_partial#58 := ColumnRef(type=STRING, column=$distinct.string#38) @@ -3067,9 +3135,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) | +-string#34 := ColumnRef(type=STRING, column=$distinct.string#31) @@ -3125,9 +3194,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#25 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#25) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#25) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.string#44] @@ -3136,9 +3206,10 @@ QueryStmt | | +-group_by_list= | | +-string#44 := ColumnRef(type=STRING, column=SimpleTypes.string#30) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#44) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#44) + | +-has_using=TRUE +-group_by_list= | +-int64#46 := ColumnRef(type=INT64, column=$distinct.int64#25) | +-string#47 := ColumnRef(type=STRING, column=$distinct.string#44) @@ -3195,9 +3266,10 @@ QueryStmt | +-right_scan= | | +-WithRefScan(column_list=[public_int64s.int64#32], with_query_name="public_int64s") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) - | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | +-has_using=TRUE +-group_by_list= | +-int64#34 := ColumnRef(type=INT64, column=public_int64s.int64#32) +-aggregate_list= @@ -3248,9 +3320,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[public_int64s.int64#32], with_query_name="public_int64s") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) - | | | | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#38 := ColumnRef(type=INT64, column=public_int64s.int64#32) | | | | +-$uid#39 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#35) @@ -3325,9 +3398,10 @@ QueryStmt | +-right_scan= | | +-WithRefScan(column_list=[public_int64s.int64#31], with_query_name="public_int64s") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) - | +-ColumnRef(type=INT64, column=public_int64s.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | | +-ColumnRef(type=INT64, column=public_int64s.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=public_int64s.int64#31) +-aggregate_list= @@ -3373,9 +3447,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -3393,22 +3468,22 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.$agg1#32 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string#33, $aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.string#33, $aggregate.$agg1#32] +-input_scan= @@ -3429,9 +3504,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -3508,15 +3584,17 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-right_scan= | | +-TableScan(column_list=[SimpleTypes.string#36], table=SimpleTypes, column_index_list=[4]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=$distinct.string#31) - | +-ColumnRef(type=STRING, column=SimpleTypes.string#36) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-ColumnRef(type=STRING, column=SimpleTypes.string#36) + | +-has_using=TRUE +-group_by_list= | +-string#51 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -3570,9 +3648,10 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-filter_expr= | +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) | +-ColumnRef(type=STRING, column=$distinct.string#31) @@ -3589,305 +3668,6 @@ QueryStmt Rewrite ERROR: group_selection_strategy = PUBLIC_GROUPS does not allow operations between the public groups join and the aggregation, because they could suppress public groups from the result. Try moving the operation to an input subquery of the public groups join. == -# Filter scan after aggregation is allowed. -SELECT WITH ANONYMIZATION OPTIONS ( - max_groups_contributed=3, - group_selection_strategy = PUBLIC_GROUPS -) string, ANON_COUNT(*) AS anon_users -FROM - SimpleTypesWithAnonymizationUid -RIGHT OUTER JOIN - (SELECT DISTINCT string FROM SimpleTypes) - USING (string) -GROUP BY string -HAVING STARTS_WITH(string, 'abc'); --- -QueryStmt -+-output_column_list= -| +-$groupby.string#33 AS string [STRING] -| +-$aggregate.anon_users#32 AS anon_users [INT64] -+-query= - +-ProjectScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - +-FilterScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - | +-AnonymizedAggregateScan - | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | +-input_scan= - | | +-JoinScan - | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31] - | | +-join_type=RIGHT - | | +-left_scan= - | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#5], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) - | | +-right_scan= - | | | +-AggregateScan - | | | +-column_list=[$distinct.string#31] - | | | +-input_scan= - | | | | +-TableScan(column_list=[SimpleTypes.string#17], table=SimpleTypes, column_index_list=[4]) - | | | +-group_by_list= - | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) - | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) - | +-group_by_list= - | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) - | +-aggregate_list= - | | +-anon_users#32 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) - | +-anonymization_option_list= - | +-max_groups_contributed := Literal(type=INT64, value=3) - | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) - +-filter_expr= - +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=$groupby.string#33) - +-Literal(type=STRING, value="abc") - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$groupby.string#33 AS string [STRING] -| +-$aggregate.anon_users#32 AS anon_users [INT64] -+-query= - +-WithScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) - +-query= - +-ProjectScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - +-FilterScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - | +-AnonymizedAggregateScan - | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | +-input_scan= - | | +-JoinScan - | | +-column_list=[$public_groups0.string#40, $aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | +-join_type=RIGHT - | | +-left_scan= - | | | +-SampleScan - | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | +-input_scan= - | | | | +-AggregateScan - | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | +-input_scan= - | | | | | +-JoinScan - | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#34] - | | | | | +-left_scan= - | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) - | | | | | +-right_scan= - | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") - | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) - | | | | +-group_by_list= - | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) - | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) - | | | | +-aggregate_list= - | | | | +-anon_users_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) - | | | +-method="RESERVOIR" - | | | +-size= - | | | | +-Literal(type=INT64, value=3) - | | | +-unit=ROWS - | | | +-partition_by_list= - | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | +-right_scan= - | | | +-WithRefScan(column_list=[$public_groups0.string#40], with_query_name="$public_groups0") - | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=$groupby.string_partial#37) - | | +-ColumnRef(type=STRING, column=$public_groups0.string#40) - | +-group_by_list= - | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#40) - | +-aggregate_list= - | | +-anon_users#32 := - | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) - | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) - | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) - | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | +-Literal(type=INT64, value=NULL) - | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#36) - | +-anonymization_option_list= - | +-max_groups_contributed := Literal(type=INT64, value=3) - | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) - +-filter_expr= - +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=$groupby.string#33) - +-Literal(type=STRING, value="abc") -== - -# Filter scans in input scans of public group joins are allowed. -SELECT WITH ANONYMIZATION OPTIONS ( - max_groups_contributed=3, - group_selection_strategy = PUBLIC_GROUPS -) string, ANON_COUNT(*) AS anon_users -FROM - (SELECT uid, string FROM SimpleTypesWithAnonymizationUid WHERE int64 > 10) -RIGHT OUTER JOIN - (SELECT DISTINCT string FROM SimpleTypes WHERE STARTS_WITH(string, 'abc')) - USING (string) -GROUP BY string; --- -QueryStmt -+-output_column_list= -| +-$groupby.string#33 AS string [STRING] -| +-$aggregate.anon_users#32 AS anon_users [INT64] -+-query= - +-ProjectScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - +-AnonymizedAggregateScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - | +-JoinScan - | +-column_list=[SimpleTypesWithAnonymizationUid.uid#11, SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31] - | +-join_type=RIGHT - | +-left_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypesWithAnonymizationUid.[uid#11, string#5] - | | +-input_scan= - | | +-FilterScan - | | +-column_list=SimpleTypesWithAnonymizationUid.[int64#2, string#5, uid#11] - | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, string#5, uid#11], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4, 10]) - | | +-filter_expr= - | | +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-Literal(type=INT64, value=10) - | +-right_scan= - | | +-AggregateScan - | | +-column_list=[$distinct.string#31] - | | +-input_scan= - | | | +-FilterScan - | | | +-column_list=[SimpleTypes.string#17] - | | | +-input_scan= - | | | | +-TableScan(column_list=[SimpleTypes.string#17], table=SimpleTypes, column_index_list=[4]) - | | | +-filter_expr= - | | | +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) - | | | +-ColumnRef(type=STRING, column=SimpleTypes.string#17) - | | | +-Literal(type=STRING, value="abc") - | | +-group_by_list= - | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) - | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) - +-group_by_list= - | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) - +-aggregate_list= - | +-anon_users#32 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) - +-anonymization_option_list= - +-max_groups_contributed := Literal(type=INT64, value=3) - +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$groupby.string#33 AS string [STRING] -| +-$aggregate.anon_users#32 AS anon_users [INT64] -+-query= - +-WithScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-FilterScan - | | +-column_list=[SimpleTypes.string#41] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | | +-filter_expr= - | | +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypes.string#41) - | | +-Literal(type=STRING, value="abc") - | +-group_by_list= - | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - +-query= - +-ProjectScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - +-AnonymizedAggregateScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - | +-JoinScan - | +-column_list=[$public_groups0.string#39, $aggregate.anon_users_partial#35, $groupby.string_partial#36, $group_by.$uid#37] - | +-join_type=RIGHT - | +-left_scan= - | | +-SampleScan - | | +-column_list=[$aggregate.anon_users_partial#35, $groupby.string_partial#36, $group_by.$uid#37] - | | +-input_scan= - | | | +-AggregateScan - | | | +-column_list=[$aggregate.anon_users_partial#35, $groupby.string_partial#36, $group_by.$uid#37] - | | | +-input_scan= - | | | | +-JoinScan - | | | | +-column_list=[SimpleTypesWithAnonymizationUid.uid#11, SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31] - | | | | +-left_scan= - | | | | | +-ProjectScan - | | | | | +-column_list=SimpleTypesWithAnonymizationUid.[uid#11, string#5] - | | | | | +-input_scan= - | | | | | +-FilterScan - | | | | | +-column_list=SimpleTypesWithAnonymizationUid.[int64#2, string#5, uid#11] - | | | | | +-input_scan= - | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, string#5, uid#11], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4, 10]) - | | | | | +-filter_expr= - | | | | | +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | | +-Literal(type=INT64, value=10) - | | | | +-right_scan= - | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") - | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) - | | | +-group_by_list= - | | | | +-string_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) - | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | | +-aggregate_list= - | | | +-anon_users_partial#35 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) - | | +-method="RESERVOIR" - | | +-size= - | | | +-Literal(type=INT64, value=3) - | | +-unit=ROWS - | | +-partition_by_list= - | | +-ColumnRef(type=INT64, column=$group_by.$uid#37) - | +-right_scan= - | | +-WithRefScan(column_list=[$public_groups0.string#39], with_query_name="$public_groups0") - | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=$groupby.string_partial#36) - | +-ColumnRef(type=STRING, column=$public_groups0.string#39) - +-group_by_list= - | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#39) - +-aggregate_list= - | +-anon_users#32 := - | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) - | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) - | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=$group_by.$uid#37) - | +-Literal(type=INT64, value=NULL) - | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#35) - +-anonymization_option_list= - +-max_groups_contributed := Literal(type=INT64, value=3) - +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) -== - # Limit scans after public groups joins are not allowed. SELECT WITH ANONYMIZATION OPTIONS ( max_groups_contributed=3, @@ -3933,9 +3713,10 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-limit= | +-Literal(type=INT64, value=10) +-group_by_list= @@ -3999,9 +3780,10 @@ QueryStmt | | +-limit= | | +-Literal(type=INT64, value=11) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4017,27 +3799,27 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-LimitOffsetScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-AggregateScan - | | +-column_list=[$distinct.string#40] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | | +-group_by_list= - | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - | +-limit= - | +-Literal(type=INT64, value=11) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-LimitOffsetScan + | +-column_list=[$distinct.string#40] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#40] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) + | +-limit= + | +-Literal(type=INT64, value=11) + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] +-input_scan= @@ -4066,9 +3848,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -4101,7 +3884,7 @@ QueryStmt +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) == -[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Sample scans between public groups join and aggregation are not allowed. SELECT WITH ANONYMIZATION OPTIONS ( max_groups_contributed=3, @@ -4147,9 +3930,10 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-method="bernoulli" | +-size= | | +-Literal(type=INT64, value=1) @@ -4166,7 +3950,7 @@ QueryStmt Rewrite ERROR: group_selection_strategy = PUBLIC_GROUPS does not allow operations between the public groups join and the aggregation, because they could suppress public groups from the result. Try moving the operation to an input subquery of the public groups join. == -[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Sample scans in input to public groups joins are allowed. SELECT WITH ANONYMIZATION OPTIONS ( max_groups_contributed=3, @@ -4224,9 +4008,10 @@ QueryStmt | | | +-Literal(type=INT64, value=2) | | +-unit=PERCENT | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4242,29 +4027,29 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-SampleScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-AggregateScan - | | +-column_list=[$distinct.string#40] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | | +-group_by_list= - | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - | +-method="bernoulli" - | +-size= - | | +-Literal(type=INT64, value=2) - | +-unit=PERCENT - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-SampleScan + | +-column_list=[$distinct.string#40] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#40] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) + | +-method="bernoulli" + | +-size= + | | +-Literal(type=INT64, value=2) + | +-unit=PERCENT + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] +-input_scan= @@ -4295,9 +4080,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -4379,9 +4165,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#34 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4436,9 +4223,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string2#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4454,22 +4242,22 @@ QueryStmt | +-$groupby.string2#33 AS string2 [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string2#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string2#33, $aggregate.anon_users#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#40] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) + +-query= +-AnonymizedAggregateScan +-column_list=[$groupby.string2#33, $aggregate.anon_users#32] +-input_scan= @@ -4493,9 +4281,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string2_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -4578,18 +4367,20 @@ QueryStmt | | | | +-group_by_list= | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE | | +-right_scan= | | | +-ProjectScan | | | +-column_list=[SimpleTypes.int32#32] | | | +-input_scan= | | | +-TableScan(column_list=[SimpleTypes.int32#32], table=SimpleTypes, column_index_list=[0]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.int64#68] @@ -4598,9 +4389,10 @@ QueryStmt | | +-group_by_list= | | +-int64#68 := ColumnRef(type=INT64, column=SimpleTypes.int64#51) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | +-has_using=TRUE +-group_by_list= | +-string#70 := ColumnRef(type=STRING, column=$distinct.string#31) | +-int64#71 := ColumnRef(type=INT64, column=$distinct.int64#68) @@ -4664,18 +4456,20 @@ QueryStmt | | | | +-group_by_list= | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE | | +-right_scan= | | | +-ProjectScan | | | +-column_list=[SimpleTypes.int32#32] | | | +-input_scan= | | | +-TableScan(column_list=[SimpleTypes.int32#32], table=SimpleTypes, column_index_list=[0]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.int64#68] @@ -4684,9 +4478,10 @@ QueryStmt | | +-group_by_list= | | +-int64#68 := ColumnRef(type=INT64, column=SimpleTypes.int64#51) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | +-has_using=TRUE +-group_by_list= | +-string#70 := ColumnRef(type=STRING, column=$distinct.string#31) | +-int64#71 := ColumnRef(type=INT64, column=$distinct.int64#68) @@ -4699,7 +4494,7 @@ QueryStmt Rewrite ERROR: group_selection_strategy = PUBLIC_GROUPS does not allow JOIN operations between the public groups join and the aggregation, because they could suppress public groups from the result. Try moving the operation to an input subquery of the public groups join. == -[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Forbidden operations after public groups joins are allowed outside of the # anon aggregate scan. SELECT string, anon_users @@ -4756,9 +4551,10 @@ QueryStmt | | | | | +-group_by_list= | | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) | | | +-aggregate_list= @@ -4784,100 +4580,101 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-LimitOffsetScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#41] + +-input_scan= + | +-ProjectScan + | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | +-input_scan= + | +-FilterScan + | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) - +-query= - +-LimitOffsetScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - | +-ProjectScan - | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | +-input_scan= - | +-FilterScan - | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | +-input_scan= - | | +-SampleScan - | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | | | +-input_scan= - | | | +-AnonymizedAggregateScan - | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | | | +-input_scan= - | | | | +-JoinScan - | | | | +-column_list=[$public_groups0.string#40, $aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | +-join_type=RIGHT - | | | | +-left_scan= - | | | | | +-SampleScan - | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | | +-input_scan= - | | | | | | +-AggregateScan - | | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | | | +-input_scan= - | | | | | | | +-JoinScan - | | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#34] - | | | | | | | +-left_scan= - | | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) - | | | | | | | +-right_scan= - | | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") - | | | | | | | +-join_expr= - | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) - | | | | | | +-group_by_list= - | | | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) - | | | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) - | | | | | | +-aggregate_list= - | | | | | | +-anon_users_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) - | | | | | +-method="RESERVOIR" - | | | | | +-size= - | | | | | | +-Literal(type=INT64, value=3) - | | | | | +-unit=ROWS - | | | | | +-partition_by_list= - | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | | | +-right_scan= - | | | | | +-WithRefScan(column_list=[$public_groups0.string#40], with_query_name="$public_groups0") - | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#37) - | | | | +-ColumnRef(type=STRING, column=$public_groups0.string#40) - | | | +-group_by_list= - | | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#40) - | | | +-aggregate_list= - | | | | +-anon_users#32 := - | | | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) - | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) - | | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | | | +-Literal(type=INT64, value=NULL) - | | | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#36) - | | | +-anonymization_option_list= - | | | +-max_groups_contributed := Literal(type=INT64, value=3) - | | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) - | | +-method="bernoulli" - | | +-size= - | | | +-Literal(type=INT64, value=1) - | | +-unit=PERCENT - | +-filter_expr= - | +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$aggregate.anon_users#32) - | +-Literal(type=INT64, value=10) - +-limit= - +-Literal(type=INT64, value=11) + | | +-SampleScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | | +-input_scan= + | | | +-WithScan + | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | | +-with_entry_list= + | | | | +-WithEntry + | | | | +-with_query_name="$public_groups0" + | | | | +-with_subquery= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.string#41] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) + | | | | +-group_by_list= + | | | | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) + | | | +-query= + | | | +-AnonymizedAggregateScan + | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[$public_groups0.string#40, $aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | | +-join_type=RIGHT + | | | | +-left_scan= + | | | | | +-SampleScan + | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | | | +-input_scan= + | | | | | | +-AggregateScan + | | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | | | | +-input_scan= + | | | | | | | +-JoinScan + | | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#34] + | | | | | | | +-left_scan= + | | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | | | +-right_scan= + | | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") + | | | | | | | +-join_expr= + | | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | | +-has_using=TRUE + | | | | | | +-group_by_list= + | | | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) + | | | | | | +-aggregate_list= + | | | | | | +-anon_users_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | | +-method="RESERVOIR" + | | | | | +-size= + | | | | | | +-Literal(type=INT64, value=3) + | | | | | +-unit=ROWS + | | | | | +-partition_by_list= + | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$public_groups0.string#40], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#37) + | | | | +-ColumnRef(type=STRING, column=$public_groups0.string#40) + | | | +-group_by_list= + | | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#40) + | | | +-aggregate_list= + | | | | +-anon_users#32 := + | | | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | | | +-Literal(type=INT64, value=NULL) + | | | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#36) + | | | +-anonymization_option_list= + | | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | | +-method="bernoulli" + | | +-size= + | | | +-Literal(type=INT64, value=1) + | | +-unit=PERCENT + | +-filter_expr= + | +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$aggregate.anon_users#32) + | +-Literal(type=INT64, value=10) + +-limit= + +-Literal(type=INT64, value=11) == -[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Forbidden operations after public groups joins are allowed outside of the # anon aggregate scan with per-group contribution bounding. SELECT string, anon_users @@ -4933,9 +4730,10 @@ QueryStmt | | | | | +-group_by_list= | | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) | | | +-aggregate_list= @@ -4994,9 +4792,10 @@ QueryStmt | | | | | | +-group_by_list= | | | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -5024,3 +4823,1352 @@ QueryStmt | +-Literal(type=INT64, value=10) +-limit= +-Literal(type=INT64, value=11) +== + +# Multiple anon aggregate scans. +WITH + with1 AS ( + SELECT + WITH ANONYMIZATION OPTIONS (max_groups_contributed=3) + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64), + with2 AS ( + SELECT WITH ANONYMIZATION + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64) +SELECT + WITH ANONYMIZATION OPTIONS ( + max_groups_contributed = 5, + group_selection_strategy = PUBLIC_GROUPS) + string, + ANON_COUNT(*) +FROM SimpleTypesWithAnonymizationUid +RIGHT OUTER JOIN (SELECT DISTINCT string FROM SimpleTypes) + USING (string) +GROUP BY string +-- + +QueryStmt ++-output_column_list= +| +-$groupby.string#59 AS string [STRING] +| +-$aggregate.$agg1#58 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#2], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-WithEntry + | +-with_query_name="with2" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#26] + | +-input_scan= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.int64#26] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#15], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-group_by_list= + | +-int64#26 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) + +-query= + +-ProjectScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypesWithAnonymizationUid.string#31, $distinct.string#57] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#31], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#57] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#43], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#57 := ColumnRef(type=STRING, column=SimpleTypes.string#43) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#31) + | | +-ColumnRef(type=STRING, column=$distinct.string#57) + | +-has_using=TRUE + +-group_by_list= + | +-string#59 := ColumnRef(type=STRING, column=$distinct.string#57) + +-aggregate_list= + | +-$agg1#58 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + +-anonymization_option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.string#59 AS string [STRING] +| +-$aggregate.$agg1#58 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-SampleScan + | | | +-column_list=[$groupby.int64_partial#70, $group_by.$uid#71] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$groupby.int64_partial#70, $group_by.$uid#71] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#69], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | +-group_by_list= + | | | | +-int64_partial#70 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | +-$uid#71 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#69) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=3) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#71) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=$groupby.int64_partial#70) + | | +-aggregate_list= + | | | +-$k_threshold_col#73 := + | | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(1) INT64, optional(1) INT64) -> INT64) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=INT64, value=1) + | | +-k_threshold_expr= + | | | +-ColumnRef(type=INT64, column=$anon.$k_threshold_col#73) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + | +-WithEntry + | +-with_query_name="with2" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#26] + | +-input_scan= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.int64#26] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$groupby.int64_partial#75, $group_by.$uid#76] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#15, uid#74], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | +-group_by_list= + | | +-int64_partial#75 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) + | | +-$uid#76 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#74) + | +-group_by_list= + | | +-int64#26 := ColumnRef(type=INT64, column=$groupby.int64_partial#75) + | +-aggregate_list= + | | +-$k_threshold_col#78 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(1) INT64, optional(1) INT64) -> INT64) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=INT64, value=0) + | | +-Literal(type=INT64, value=1) + | +-k_threshold_expr= + | | +-ColumnRef(type=INT64, column=$anon.$k_threshold_col#78) + | +-anonymization_option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + +-query= + +-ProjectScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-input_scan= + +-WithScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#67] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#68], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#67 := ColumnRef(type=STRING, column=SimpleTypes.string#68) + +-query= + +-AnonymizedAggregateScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-input_scan= + | +-JoinScan + | +-column_list=[$public_groups0.string#66, $aggregate.$agg1_partial#62, $groupby.string_partial#63, $group_by.$uid#64] + | +-join_type=RIGHT + | +-left_scan= + | | +-SampleScan + | | +-column_list=[$aggregate.$agg1_partial#62, $groupby.string_partial#63, $group_by.$uid#64] + | | +-input_scan= + | | | +-AggregateScan + | | | +-column_list=[$aggregate.$agg1_partial#62, $groupby.string_partial#63, $group_by.$uid#64] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#31, $distinct.string#57, SimpleTypesWithAnonymizationUid.uid#60] + | | | | +-left_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#31, uid#60], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$distinct.string#57], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#31) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#57) + | | | | +-has_using=TRUE + | | | +-group_by_list= + | | | | +-string_partial#63 := ColumnRef(type=STRING, column=$distinct.string#57) + | | | | +-$uid#64 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#60) + | | | +-aggregate_list= + | | | +-$agg1_partial#62 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | +-method="RESERVOIR" + | | +-size= + | | | +-Literal(type=INT64, value=5) + | | +-unit=ROWS + | | +-partition_by_list= + | | +-ColumnRef(type=INT64, column=$group_by.$uid#64) + | +-right_scan= + | | +-WithRefScan(column_list=[$public_groups0.string#66], with_query_name="$public_groups0") + | +-join_expr= + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=$groupby.string_partial#63) + | +-ColumnRef(type=STRING, column=$public_groups0.string#66) + +-group_by_list= + | +-string#59 := ColumnRef(type=STRING, column=$public_groups0.string#66) + +-aggregate_list= + | +-$agg1#58 := + | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$group_by.$uid#64) + | +-Literal(type=INT64, value=NULL) + | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#62) + +-anonymization_option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) +== + +# Multiple anon aggregate scans when the public groups query is in a with +# entry. +WITH + with1 AS ( + SELECT WITH ANONYMIZATION OPTIONS (max_groups_contributed=3) + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64 + ), + with2 AS ( + SELECT WITH ANONYMIZATION OPTIONS ( + max_groups_contributed=4, + group_selection_strategy=PUBLIC_GROUPS) + int64, ANON_COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT int64 FROM SimpleTypes + ) USING (int64) + GROUP BY int64 + ), + with3 AS ( + SELECT WITH ANONYMIZATION + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64) +SELECT + WITH ANONYMIZATION OPTIONS ( + max_groups_contributed = 5) + string, + ANON_COUNT(*) +FROM SimpleTypesWithAnonymizationUid +GROUP BY string +-- +QueryStmt ++-output_column_list= +| +-$groupby.string#73 AS string [STRING] +| +-$aggregate.$agg1#72 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#2], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-WithEntry + | | +-with_query_name="with2" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#15, $distinct.int64#44] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#15], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | | | +-right_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.int64#44] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.int64#27], table=SimpleTypes, column_index_list=[1]) + | | | | +-group_by_list= + | | | | +-int64#44 := ColumnRef(type=INT64, column=SimpleTypes.int64#27) + | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) + | | | | +-ColumnRef(type=INT64, column=$distinct.int64#44) + | | | +-has_using=TRUE + | | +-group_by_list= + | | | +-int64#46 := ColumnRef(type=INT64, column=$distinct.int64#44) + | | +-aggregate_list= + | | | +-$agg1#45 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=4) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="with3" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#48], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-group_by_list= + | +-int64#59 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#48) + +-query= + +-ProjectScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#64], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + +-group_by_list= + | +-string#73 := ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#64) + +-aggregate_list= + | +-$agg1#72 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + +-anonymization_option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.string#73 AS string [STRING] +| +-$aggregate.$agg1#72 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-SampleScan + | | | +-column_list=[$groupby.int64_partial#83, $group_by.$uid#84] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$groupby.int64_partial#83, $group_by.$uid#84] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#82], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | +-group_by_list= + | | | | +-int64_partial#83 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | +-$uid#84 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#82) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=3) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#84) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=$groupby.int64_partial#83) + | | +-aggregate_list= + | | | +-$k_threshold_col#86 := + | | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(1) INT64, optional(1) INT64) -> INT64) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=INT64, value=1) + | | +-k_threshold_expr= + | | | +-ColumnRef(type=INT64, column=$anon.$k_threshold_col#86) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + | +-WithEntry + | | +-with_query_name="with2" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | +-WithScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-with_entry_list= + | | | +-WithEntry + | | | +-with_query_name="$public_groups0" + | | | +-with_subquery= + | | | +-AggregateScan + | | | +-column_list=[$distinct.int64#94] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.int64#95], table=SimpleTypes, column_index_list=[1]) + | | | +-group_by_list= + | | | +-int64#94 := ColumnRef(type=INT64, column=SimpleTypes.int64#95) + | | +-query= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$public_groups0.int64#93, $aggregate.$agg1_partial#89, $groupby.int64_partial#90, $group_by.$uid#91] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-SampleScan + | | | | +-column_list=[$aggregate.$agg1_partial#89, $groupby.int64_partial#90, $group_by.$uid#91] + | | | | +-input_scan= + | | | | | +-AggregateScan + | | | | | +-column_list=[$aggregate.$agg1_partial#89, $groupby.int64_partial#90, $group_by.$uid#91] + | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#15, $distinct.int64#44, SimpleTypesWithAnonymizationUid.uid#87] + | | | | | | +-left_scan= + | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#15, uid#87], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | | | +-right_scan= + | | | | | | | +-WithRefScan(column_list=[$distinct.int64#44], with_query_name="$public_groups0") + | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) + | | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#44) + | | | | | | +-has_using=TRUE + | | | | | +-group_by_list= + | | | | | | +-int64_partial#90 := ColumnRef(type=INT64, column=$distinct.int64#44) + | | | | | | +-$uid#91 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#87) + | | | | | +-aggregate_list= + | | | | | +-$agg1_partial#89 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | +-method="RESERVOIR" + | | | | +-size= + | | | | | +-Literal(type=INT64, value=4) + | | | | +-unit=ROWS + | | | | +-partition_by_list= + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#91) + | | | +-right_scan= + | | | | +-WithRefScan(column_list=[$public_groups0.int64#93], with_query_name="$public_groups0") + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$groupby.int64_partial#90) + | | | +-ColumnRef(type=INT64, column=$public_groups0.int64#93) + | | +-group_by_list= + | | | +-int64#46 := ColumnRef(type=INT64, column=$public_groups0.int64#93) + | | +-aggregate_list= + | | | +-$agg1#45 := + | | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#91) + | | | +-Literal(type=INT64, value=NULL) + | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#89) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=4) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="with3" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$groupby.int64_partial#97, $group_by.$uid#98] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#48, uid#96], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | +-group_by_list= + | | +-int64_partial#97 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#48) + | | +-$uid#98 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#96) + | +-group_by_list= + | | +-int64#59 := ColumnRef(type=INT64, column=$groupby.int64_partial#97) + | +-aggregate_list= + | | +-$k_threshold_col#100 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(1) INT64, optional(1) INT64) -> INT64) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=INT64, value=0) + | | +-Literal(type=INT64, value=1) + | +-k_threshold_expr= + | | +-ColumnRef(type=INT64, column=$anon.$k_threshold_col#100) + | +-anonymization_option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + +-query= + +-ProjectScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + | +-SampleScan + | +-column_list=[$aggregate.$agg1_partial#76, $groupby.string_partial#77, $group_by.$uid#78] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$aggregate.$agg1_partial#76, $groupby.string_partial#77, $group_by.$uid#78] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#64, uid#74], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | +-group_by_list= + | | | +-string_partial#77 := ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#64) + | | | +-$uid#78 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#74) + | | +-aggregate_list= + | | +-$agg1_partial#76 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | +-method="RESERVOIR" + | +-size= + | | +-Literal(type=INT64, value=5) + | +-unit=ROWS + | +-partition_by_list= + | +-ColumnRef(type=INT64, column=$group_by.$uid#78) + +-group_by_list= + | +-string#73 := ColumnRef(type=STRING, column=$groupby.string_partial#77) + +-aggregate_list= + | +-$agg1#72 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#76) + | +-$k_threshold_col#81 := + | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(1) INT64, optional(1) INT64) -> INT64) + | +-Literal(type=INT64, value=1) + | +-Literal(type=INT64, value=0) + | +-Literal(type=INT64, value=1) + +-k_threshold_expr= + | +-ColumnRef(type=INT64, column=$anon.$k_threshold_col#81) + +-anonymization_option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) +== + +# WithEntry names added by public groups are unique across multiple anon +# aggregate scans +WITH + withString AS ( + SELECT WITH ANONYMIZATION OPTIONS ( + max_groups_contributed=3, + group_selection_strategy=PUBLIC_GROUPS) + string, ANON_COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT string FROM SimpleTypes + ) USING (string) + GROUP BY string + ), + withInt32 AS ( + SELECT WITH ANONYMIZATION OPTIONS ( + max_groups_contributed=4, + group_selection_strategy=PUBLIC_GROUPS) + int32, ANON_COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT int32 FROM SimpleTypes + ) USING (int32) + GROUP BY int32 + ) +SELECT WITH ANONYMIZATION OPTIONS ( + max_groups_contributed=5, + group_selection_strategy=PUBLIC_GROUPS) + int64, ANON_COUNT(*) +FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT int64 FROM SimpleTypes + ) USING (int64) +GROUP BY int64 +-- +QueryStmt ++-output_column_list= +| +-$groupby.int64#99 AS int64 [INT64] +| +-$aggregate.$agg1#98 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="withString" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#5], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | | +-right_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.string#31] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.string#17], table=SimpleTypes, column_index_list=[4]) + | | | | +-group_by_list= + | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) + | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) + | | +-aggregate_list= + | | | +-$agg1#32 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="withInt32" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[SimpleTypesWithAnonymizationUid.int32#34, $distinct.int32#64] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int32#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[0]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.int32#64] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.int32#46], table=SimpleTypes, column_index_list=[0]) + | | | +-group_by_list= + | | | +-int32#64 := ColumnRef(type=INT32, column=SimpleTypes.int32#46) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#34) + | | | +-ColumnRef(type=INT32, column=$distinct.int32#64) + | | +-has_using=TRUE + | +-group_by_list= + | | +-int32#66 := ColumnRef(type=INT32, column=$distinct.int32#64) + | +-aggregate_list= + | | +-$agg1#65 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + | +-anonymization_option_list= + | +-max_groups_contributed := Literal(type=INT64, value=4) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + +-AnonymizedAggregateScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypesWithAnonymizationUid.int64#68, $distinct.int64#97] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#68], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#97] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#80], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#97 := ColumnRef(type=INT64, column=SimpleTypes.int64#80) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#68) + | | +-ColumnRef(type=INT64, column=$distinct.int64#97) + | +-has_using=TRUE + +-group_by_list= + | +-int64#99 := ColumnRef(type=INT64, column=$distinct.int64#97) + +-aggregate_list= + | +-$agg1#98 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + +-anonymization_option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.int64#99 AS int64 [INT64] +| +-$aggregate.$agg1#98 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="withString" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | +-WithScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-with_entry_list= + | | | +-WithEntry + | | | +-with_query_name="$public_groups1" + | | | +-with_subquery= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#116] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.string#117], table=SimpleTypes, column_index_list=[4]) + | | | +-group_by_list= + | | | +-string#116 := ColumnRef(type=STRING, column=SimpleTypes.string#117) + | | +-query= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$public_groups1.string#115, $aggregate.$agg1_partial#111, $groupby.string_partial#112, $group_by.$uid#113] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-SampleScan + | | | | +-column_list=[$aggregate.$agg1_partial#111, $groupby.string_partial#112, $group_by.$uid#113] + | | | | +-input_scan= + | | | | | +-AggregateScan + | | | | | +-column_list=[$aggregate.$agg1_partial#111, $groupby.string_partial#112, $group_by.$uid#113] + | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#109] + | | | | | | +-left_scan= + | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#109], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | | +-right_scan= + | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups1") + | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-has_using=TRUE + | | | | | +-group_by_list= + | | | | | | +-string_partial#112 := ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-$uid#113 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#109) + | | | | | +-aggregate_list= + | | | | | +-$agg1_partial#111 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | +-method="RESERVOIR" + | | | | +-size= + | | | | | +-Literal(type=INT64, value=3) + | | | | +-unit=ROWS + | | | | +-partition_by_list= + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#113) + | | | +-right_scan= + | | | | +-WithRefScan(column_list=[$public_groups1.string#115], with_query_name="$public_groups1") + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#112) + | | | +-ColumnRef(type=STRING, column=$public_groups1.string#115) + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups1.string#115) + | | +-aggregate_list= + | | | +-$agg1#32 := + | | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#113) + | | | +-Literal(type=INT64, value=NULL) + | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#111) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="withInt32" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | +-WithScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="$public_groups2" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.int32#125] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int32#126], table=SimpleTypes, column_index_list=[0]) + | | +-group_by_list= + | | +-int32#125 := ColumnRef(type=INT32, column=SimpleTypes.int32#126) + | +-query= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[$public_groups2.int32#124, $aggregate.$agg1_partial#120, $groupby.int32_partial#121, $group_by.$uid#122] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-SampleScan + | | | +-column_list=[$aggregate.$agg1_partial#120, $groupby.int32_partial#121, $group_by.$uid#122] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$aggregate.$agg1_partial#120, $groupby.int32_partial#121, $group_by.$uid#122] + | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.int32#34, $distinct.int32#64, SimpleTypesWithAnonymizationUid.uid#118] + | | | | | +-left_scan= + | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int32#34, uid#118], table=SimpleTypesWithAnonymizationUid, column_index_list=[0, 10]) + | | | | | +-right_scan= + | | | | | | +-WithRefScan(column_list=[$distinct.int32#64], with_query_name="$public_groups2") + | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#34) + | | | | | | +-ColumnRef(type=INT32, column=$distinct.int32#64) + | | | | | +-has_using=TRUE + | | | | +-group_by_list= + | | | | | +-int32_partial#121 := ColumnRef(type=INT32, column=$distinct.int32#64) + | | | | | +-$uid#122 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#118) + | | | | +-aggregate_list= + | | | | +-$agg1_partial#120 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=4) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#122) + | | +-right_scan= + | | | +-WithRefScan(column_list=[$public_groups2.int32#124], with_query_name="$public_groups2") + | | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | +-ColumnRef(type=INT32, column=$groupby.int32_partial#121) + | | +-ColumnRef(type=INT32, column=$public_groups2.int32#124) + | +-group_by_list= + | | +-int32#66 := ColumnRef(type=INT32, column=$public_groups2.int32#124) + | +-aggregate_list= + | | +-$agg1#65 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#122) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#120) + | +-anonymization_option_list= + | +-max_groups_contributed := Literal(type=INT64, value=4) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + +-WithScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#107] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#108], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#107 := ColumnRef(type=INT64, column=SimpleTypes.int64#108) + +-query= + +-AnonymizedAggregateScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + | +-JoinScan + | +-column_list=[$public_groups0.int64#106, $aggregate.$agg1_partial#102, $groupby.int64_partial#103, $group_by.$uid#104] + | +-join_type=RIGHT + | +-left_scan= + | | +-SampleScan + | | +-column_list=[$aggregate.$agg1_partial#102, $groupby.int64_partial#103, $group_by.$uid#104] + | | +-input_scan= + | | | +-AggregateScan + | | | +-column_list=[$aggregate.$agg1_partial#102, $groupby.int64_partial#103, $group_by.$uid#104] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#68, $distinct.int64#97, SimpleTypesWithAnonymizationUid.uid#100] + | | | | +-left_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#68, uid#100], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$distinct.int64#97], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#68) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#97) + | | | | +-has_using=TRUE + | | | +-group_by_list= + | | | | +-int64_partial#103 := ColumnRef(type=INT64, column=$distinct.int64#97) + | | | | +-$uid#104 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#100) + | | | +-aggregate_list= + | | | +-$agg1_partial#102 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | +-method="RESERVOIR" + | | +-size= + | | | +-Literal(type=INT64, value=5) + | | +-unit=ROWS + | | +-partition_by_list= + | | +-ColumnRef(type=INT64, column=$group_by.$uid#104) + | +-right_scan= + | | +-WithRefScan(column_list=[$public_groups0.int64#106], with_query_name="$public_groups0") + | +-join_expr= + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$groupby.int64_partial#103) + | +-ColumnRef(type=INT64, column=$public_groups0.int64#106) + +-group_by_list= + | +-int64#99 := ColumnRef(type=INT64, column=$public_groups0.int64#106) + +-aggregate_list= + | +-$agg1#98 := + | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$group_by.$uid#104) + | +-Literal(type=INT64, value=NULL) + | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#102) + +-anonymization_option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) +== + +[language_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY] +# Nested WithScan for public groups input. +WITH + res1 AS ( + WITH + public1 AS ( + SELECT string FROM SimpleTypes + ) + SELECT WITH ANONYMIZATION OPTIONS ( + group_selection_strategy=PUBLIC_GROUPS, + max_groups_contributed=2) + string, ANON_COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT string FROM public1 + ) USING (string) + GROUP BY string + ) +SELECT * +FROM res1; +-- +QueryStmt ++-output_column_list= +| +-res1.string#35 AS string [STRING] +| +-res1.$col2#36 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=res1.[string#35, $col2#36] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="res1" + | +-with_subquery= + | +-WithScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="public1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[SimpleTypes.string#5] + | | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#5], table=SimpleTypes, column_index_list=[4]) + | +-query= + | +-ProjectScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[SimpleTypesWithAnonymizationUid.string#23, $distinct.string#32] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#32] + | | | +-input_scan= + | | | | +-WithRefScan(column_list=[public1.string#31], with_query_name="public1") + | | | +-group_by_list= + | | | +-string#32 := ColumnRef(type=STRING, column=public1.string#31) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#23) + | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | +-has_using=TRUE + | +-group_by_list= + | | +-string#34 := ColumnRef(type=STRING, column=$distinct.string#32) + | +-aggregate_list= + | | +-$agg1#33 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + | +-anonymization_option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-max_groups_contributed := Literal(type=INT64, value=2) + +-query= + +-ProjectScan + +-column_list=res1.[string#35, $col2#36] + +-input_scan= + +-WithRefScan(column_list=res1.[string#35, $col2#36], with_query_name="res1") + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-res1.string#35 AS string [STRING] +| +-res1.$col2#36 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=res1.[string#35, $col2#36] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="res1" + | +-with_subquery= + | +-WithScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="public1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[SimpleTypes.string#5] + | | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#5], table=SimpleTypes, column_index_list=[4]) + | +-query= + | +-ProjectScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | +-WithScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.string#44] + | | +-input_scan= + | | | +-WithRefScan(column_list=[public1.string#45], with_query_name="public1") + | | +-group_by_list= + | | +-string#44 := ColumnRef(type=STRING, column=public1.string#45) + | +-query= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[$public_groups0.string#43, $aggregate.$agg1_partial#39, $groupby.string_partial#40, $group_by.$uid#41] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-SampleScan + | | | +-column_list=[$aggregate.$agg1_partial#39, $groupby.string_partial#40, $group_by.$uid#41] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$aggregate.$agg1_partial#39, $groupby.string_partial#40, $group_by.$uid#41] + | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#23, $distinct.string#32, SimpleTypesWithAnonymizationUid.uid#37] + | | | | | +-left_scan= + | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#23, uid#37], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | +-right_scan= + | | | | | | +-WithRefScan(column_list=[$distinct.string#32], with_query_name="$public_groups0") + | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#23) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | | | | +-has_using=TRUE + | | | | +-group_by_list= + | | | | | +-string_partial#40 := ColumnRef(type=STRING, column=$distinct.string#32) + | | | | | +-$uid#41 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#37) + | | | | +-aggregate_list= + | | | | +-$agg1_partial#39 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=2) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#41) + | | +-right_scan= + | | | +-WithRefScan(column_list=[$public_groups0.string#43], with_query_name="$public_groups0") + | | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$groupby.string_partial#40) + | | +-ColumnRef(type=STRING, column=$public_groups0.string#43) + | +-group_by_list= + | | +-string#34 := ColumnRef(type=STRING, column=$public_groups0.string#43) + | +-aggregate_list= + | | +-$agg1#33 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#41) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#39) + | +-anonymization_option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-max_groups_contributed := Literal(type=INT64, value=2) + +-query= + +-ProjectScan + +-column_list=res1.[string#35, $col2#36] + +-input_scan= + +-WithRefScan(column_list=res1.[string#35, $col2#36], with_query_name="res1") +== + +# UNION ALL of multiple public group queries. +SELECT WITH ANONYMIZATION OPTIONS ( + max_groups_contributed=2, + group_selection_strategy=PUBLIC_GROUPS) + string, ANON_COUNT(*) AS anon_count +FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN (SELECT DISTINCT string FROM SimpleTypes) USING (string) +GROUP BY string +UNION ALL +SELECT WITH ANONYMIZATION OPTIONS ( + max_groups_contributed=3, + group_selection_strategy=PUBLIC_GROUPS) + string, ANON_COUNT(*) AS anon_count +FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN (SELECT DISTINCT string FROM SimpleTypes) USING (string) +GROUP BY STRING; +-- +QueryStmt ++-output_column_list= +| +-$union_all.string#67 AS string [STRING] +| +-$union_all.anon_count#68 AS anon_count [INT64] ++-query= + +-SetOperationScan + +-column_list=$union_all.[string#67, anon_count#68] + +-op_type=UNION_ALL + +-input_item_list= + +-SetOperationItem + | +-scan= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#5], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | | +-right_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.string#31] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.string#17], table=SimpleTypes, column_index_list=[4]) + | | | | +-group_by_list= + | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) + | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) + | | +-aggregate_list= + | | | +-anon_count#32 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=2) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-output_column_list=[$groupby.string#33, $aggregate.anon_count#32] + +-SetOperationItem + +-scan= + | +-ProjectScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-input_scan= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[SimpleTypesWithAnonymizationUid.string#38, $distinct.string#64] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#38], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#64] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.string#50], table=SimpleTypes, column_index_list=[4]) + | | | +-group_by_list= + | | | +-string#64 := ColumnRef(type=STRING, column=SimpleTypes.string#50) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#38) + | | | +-ColumnRef(type=STRING, column=$distinct.string#64) + | | +-has_using=TRUE + | +-group_by_list= + | | +-string#66 := ColumnRef(type=STRING, column=$distinct.string#64) + | +-aggregate_list= + | | +-anon_count#65 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64) + | +-anonymization_option_list= + | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-output_column_list=[$groupby.string#66, $aggregate.anon_count#65] + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$union_all.string#67 AS string [STRING] +| +-$union_all.anon_count#68 AS anon_count [INT64] ++-query= + +-SetOperationScan + +-column_list=$union_all.[string#67, anon_count#68] + +-op_type=UNION_ALL + +-input_item_list= + +-SetOperationItem + | +-scan= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | +-WithScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-with_entry_list= + | | | +-WithEntry + | | | +-with_query_name="$public_groups0" + | | | +-with_subquery= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#76] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.string#77], table=SimpleTypes, column_index_list=[4]) + | | | +-group_by_list= + | | | +-string#76 := ColumnRef(type=STRING, column=SimpleTypes.string#77) + | | +-query= + | | +-AnonymizedAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$public_groups0.string#75, $aggregate.anon_count_partial#71, $groupby.string_partial#72, $group_by.$uid#73] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-SampleScan + | | | | +-column_list=[$aggregate.anon_count_partial#71, $groupby.string_partial#72, $group_by.$uid#73] + | | | | +-input_scan= + | | | | | +-AggregateScan + | | | | | +-column_list=[$aggregate.anon_count_partial#71, $groupby.string_partial#72, $group_by.$uid#73] + | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#69] + | | | | | | +-left_scan= + | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#69], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | | +-right_scan= + | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") + | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-has_using=TRUE + | | | | | +-group_by_list= + | | | | | | +-string_partial#72 := ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-$uid#73 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#69) + | | | | | +-aggregate_list= + | | | | | +-anon_count_partial#71 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | +-method="RESERVOIR" + | | | | +-size= + | | | | | +-Literal(type=INT64, value=2) + | | | | +-unit=ROWS + | | | | +-partition_by_list= + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#73) + | | | +-right_scan= + | | | | +-WithRefScan(column_list=[$public_groups0.string#75], with_query_name="$public_groups0") + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#72) + | | | +-ColumnRef(type=STRING, column=$public_groups0.string#75) + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#75) + | | +-aggregate_list= + | | | +-anon_count#32 := + | | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#73) + | | | +-Literal(type=INT64, value=NULL) + | | | +-ColumnRef(type=INT64, column=$aggregate.anon_count_partial#71) + | | +-anonymization_option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=2) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-output_column_list=[$groupby.string#33, $aggregate.anon_count#32] + +-SetOperationItem + +-scan= + | +-ProjectScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-input_scan= + | +-WithScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="$public_groups1" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.string#85] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#86], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#85 := ColumnRef(type=STRING, column=SimpleTypes.string#86) + | +-query= + | +-AnonymizedAggregateScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[$public_groups1.string#84, $aggregate.anon_count_partial#80, $groupby.string_partial#81, $group_by.$uid#82] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-SampleScan + | | | +-column_list=[$aggregate.anon_count_partial#80, $groupby.string_partial#81, $group_by.$uid#82] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$aggregate.anon_count_partial#80, $groupby.string_partial#81, $group_by.$uid#82] + | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#38, $distinct.string#64, SimpleTypesWithAnonymizationUid.uid#78] + | | | | | +-left_scan= + | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#38, uid#78], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | +-right_scan= + | | | | | | +-WithRefScan(column_list=[$distinct.string#64], with_query_name="$public_groups1") + | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#38) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#64) + | | | | | +-has_using=TRUE + | | | | +-group_by_list= + | | | | | +-string_partial#81 := ColumnRef(type=STRING, column=$distinct.string#64) + | | | | | +-$uid#82 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#78) + | | | | +-aggregate_list= + | | | | +-anon_count_partial#80 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=3) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#82) + | | +-right_scan= + | | | +-WithRefScan(column_list=[$public_groups1.string#84], with_query_name="$public_groups1") + | | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$groupby.string_partial#81) + | | +-ColumnRef(type=STRING, column=$public_groups1.string#84) + | +-group_by_list= + | | +-string#66 := ColumnRef(type=STRING, column=$public_groups1.string#84) + | +-aggregate_list= + | | +-anon_count#65 := + | | +-AggregateFunctionCall(ZetaSQL:anon_sum(INT64, optional(0) INT64, optional(0) INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#82) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.anon_count_partial#80) + | +-anonymization_option_list= + | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-output_column_list=[$groupby.string#66, $aggregate.anon_count#65] diff --git a/zetasql/analyzer/testdata/anonymization_join.test b/zetasql/analyzer/testdata/anonymization_join.test index f28df3a73..c32729d75 100644 --- a/zetasql/analyzer/testdata/anonymization_join.test +++ b/zetasql/analyzer/testdata/anonymization_join.test @@ -353,9 +353,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[10], alias="b") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | +-has_using=TRUE +-aggregate_list= +-$agg1#25 := +-AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(1) INT64, optional(1) INT64) -> INT64) @@ -383,9 +384,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[10], alias="b") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | +-has_using=TRUE | +-group_by_list= | | +-$uid#28 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) | +-aggregate_list= @@ -2453,9 +2455,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=63-91, column_list=[KitchenSinkWithUidValueTable.value#2], table=KitchenSinkWithUidValueTable, column_index_list=[0], alias="t2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=$join_left.string_val#3) - | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$join_left.string_val#3) + | | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | +-has_using=TRUE +-aggregate_list= +-$agg1#5 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64)(parse_location=7-17) [REPLACED_LITERALS] @@ -2519,9 +2522,10 @@ QueryStmt | | | +-input_scan= | | | +-TableScan(parse_location=63-91, column_list=[KitchenSinkWithUidValueTable.value#2], table=KitchenSinkWithUidValueTable, column_index_list=[0], alias="t2") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=$join_left.string_val#3) - | | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$join_left.string_val#3) + | | | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | | +-has_using=TRUE | +-group_by_list= | | +-$uid#10 := ColumnRef(type=STRING, column=$join_left.string_val#3) | +-aggregate_list= @@ -2594,9 +2598,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=63-91, column_list=[KitchenSinkWithUidValueTable.value#2], table=KitchenSinkWithUidValueTable, column_index_list=[0], alias="t2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$join_left.int64_val#3) - | +-ColumnRef(type=INT64, column=$join_right.int64_val#4) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$join_left.int64_val#3) + | | +-ColumnRef(type=INT64, column=$join_right.int64_val#4) + | +-has_using=TRUE +-aggregate_list= +-$agg1#5 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64)(parse_location=7-17) [REPLACED_LITERALS] @@ -2680,9 +2685,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=156-184, column_list=[KitchenSinkWithUidValueTable.value#4], table=KitchenSinkWithUidValueTable, column_index_list=[0]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=t1.x#2) - | +-ColumnRef(type=STRING, column=t2.x#5) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=t1.x#2) + | | +-ColumnRef(type=STRING, column=t2.x#5) + | +-has_using=TRUE +-aggregate_list= +-$agg1#7 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64)(parse_location=7-17) [REPLACED_LITERALS] @@ -2764,9 +2770,10 @@ QueryStmt | | | +-input_scan= | | | +-TableScan(parse_location=156-184, column_list=[KitchenSinkWithUidValueTable.value#4], table=KitchenSinkWithUidValueTable, column_index_list=[0]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=t1.x#2) - | | +-ColumnRef(type=STRING, column=t2.x#5) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=t1.x#2) + | | | +-ColumnRef(type=STRING, column=t2.x#5) + | | +-has_using=TRUE | +-group_by_list= | | +-$uid#12 := ColumnRef(type=STRING, column=t1.x#2) | +-aggregate_list= @@ -2855,9 +2862,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=156-184, column_list=[KitchenSinkWithUidValueTable.value#4], table=KitchenSinkWithUidValueTable, column_index_list=[0]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=t1.y#3) - | +-ColumnRef(type=STRING, column=t2.y#6) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=t1.y#3) + | | +-ColumnRef(type=STRING, column=t2.y#6) + | +-has_using=TRUE +-aggregate_list= +-$agg1#7 := AggregateFunctionCall(ZetaSQL:$anon_count_star(optional(0) INT64, optional(0) INT64) -> INT64)(parse_location=7-17) [REPLACED_LITERALS] diff --git a/zetasql/analyzer/testdata/array_join.test b/zetasql/analyzer/testdata/array_join.test index 585336ed9..172768d0f 100644 --- a/zetasql/analyzer/testdata/array_join.test +++ b/zetasql/analyzer/testdata/array_join.test @@ -1135,9 +1135,10 @@ QueryStmt | | +-array_offset_column= | | +-ColumnHolder(column=$array_offset.offset#6) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$array_offset.offset#4) - | +-ColumnRef(type=INT64, column=$array_offset.offset#6) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$array_offset.offset#4) + | | +-ColumnRef(type=INT64, column=$array_offset.offset#6) + | +-has_using=TRUE +-input_scan= +-ProjectScan +-column_list=sub.[ca#1, cb#2] @@ -1214,9 +1215,10 @@ QueryStmt | | +-array_offset_column= | | +-ColumnHolder(column=$array_offset.offset#6) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$array_offset.offset#4) - | +-ColumnRef(type=INT64, column=$array_offset.offset#6) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$array_offset.offset#4) + | | +-ColumnRef(type=INT64, column=$array_offset.offset#6) + | +-has_using=TRUE +-input_scan= +-ProjectScan +-column_list=sub.[ca#1, cb#2] @@ -1830,5 +1832,3 @@ ALTERNATION GROUP: FULL ERROR: Unrecognized name: arr2 [at 19:26] FULL OUTER JOIN UNNEST(arr2) AS r ^ -== - diff --git a/zetasql/analyzer/testdata/array_join_with_position.test b/zetasql/analyzer/testdata/array_join_with_position.test index f6ba88399..f1807bcdc 100644 --- a/zetasql/analyzer/testdata/array_join_with_position.test +++ b/zetasql/analyzer/testdata/array_join_with_position.test @@ -222,9 +222,10 @@ QueryStmt | +-array_offset_column= | +-ColumnHolder(column=$array_offset.pos#4) +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$array_offset.pos#2) - +-ColumnRef(type=INT64, column=$array_offset.pos#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$array_offset.pos#2) + | +-ColumnRef(type=INT64, column=$array_offset.pos#4) + +-has_using=TRUE == # Joining two unnest expressions with USING where the visible columns is from @@ -288,11 +289,12 @@ QueryStmt | | +-Literal(type=ARRAY, value=[1, 2], has_explicit_type=TRUE) | +-element_column_list=[$array.t1#2] +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=$array.t1#1) - +-Cast(UINT32 -> UINT64) - +-ColumnRef(type=UINT32, column=$array.t1#2) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=$array.t1#1) + | +-Cast(UINT32 -> UINT64) + | +-ColumnRef(type=UINT32, column=$array.t1#2) + +-has_using=TRUE == # Joining two unnest expressions with USING where the visible column is from lhs @@ -562,13 +564,14 @@ QueryStmt | +-array_offset_column= | +-ColumnHolder(column=$array_offset.pos#4) +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$array.val#1) - | +-ColumnRef(type=INT64, column=$array.val#3) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$array_offset.pos#2) - +-ColumnRef(type=INT64, column=$array_offset.pos#4) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$array.val#1) + | | +-ColumnRef(type=INT64, column=$array.val#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$array_offset.pos#2) + | +-ColumnRef(type=INT64, column=$array_offset.pos#4) + +-has_using=TRUE == select * diff --git a/zetasql/analyzer/testdata/array_path.test b/zetasql/analyzer/testdata/array_path.test index 087f156e0..9b471a2e4 100644 --- a/zetasql/analyzer/testdata/array_path.test +++ b/zetasql/analyzer/testdata/array_path.test @@ -6,13 +6,13 @@ from ArrayTypes t, unnest(t.ProtoArray.int32_val1) value -- QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -27,39 +27,39 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=int32_val1 | +-default_value=0 - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-expr_list= - | +-value#18 := ColumnRef(type=INT32, column=$flatten.injected#20) + | +-value#21 := ColumnRef(type=INT32, column=$flatten.injected#23) +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-expr_list= - | +-injected#20 := + | +-injected#23 := | +-GetProtoField | +-type=INT32 | +-expr= - | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | +-field_descriptor=int32_val1 | +-default_value=0 +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$flatten.injected#19] + +-element_column_list=[$flatten.injected#22] == select value @@ -67,13 +67,13 @@ from ArrayTypes t, unnest(t.ProtoArray.has_int32_val1) value -- QueryStmt +-output_column_list= -| +-$array.value#18 AS value [BOOL] +| +-$array.value#21 AS value [BOOL] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -88,39 +88,39 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=int32_val1 | +-get_has_bit=TRUE - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [BOOL] +| +-$array.value#21 AS value [BOOL] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-expr_list= - | +-value#18 := ColumnRef(type=BOOL, column=$flatten.injected#20) + | +-value#21 := ColumnRef(type=BOOL, column=$flatten.injected#23) +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-expr_list= - | +-injected#20 := + | +-injected#23 := | +-GetProtoField | +-type=BOOL | +-expr= - | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | +-field_descriptor=int32_val1 | +-get_has_bit=TRUE +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$flatten.injected#19] + +-element_column_list=[$flatten.injected#22] == select value @@ -128,13 +128,13 @@ from ArrayTypes t, t.ProtoArray.int32_val1 value -- QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -149,39 +149,39 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=int32_val1 | +-default_value=0 - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-expr_list= - | +-value#18 := ColumnRef(type=INT32, column=$flatten.injected#20) + | +-value#21 := ColumnRef(type=INT32, column=$flatten.injected#23) +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-expr_list= - | +-injected#20 := + | +-injected#23 := | +-GetProtoField | +-type=INT32 | +-expr= - | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | +-field_descriptor=int32_val1 | +-default_value=0 +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$flatten.injected#19] + +-element_column_list=[$flatten.injected#22] == select str_value @@ -189,13 +189,13 @@ from ArrayTypes t, unnest(t.ProtoArray.str_value) str_value -- QueryStmt +-output_column_list= -| +-$array.str_value#18 AS str_value [STRING] +| +-$array.str_value#21 AS str_value [STRING] +-query= +-ProjectScan - +-column_list=[$array.str_value#18] + +-column_list=[$array.str_value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.str_value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.str_value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -210,39 +210,39 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=str_value | +-default_value=[] - +-element_column_list=[$array.str_value#18] + +-element_column_list=[$array.str_value#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.str_value#18 AS str_value [STRING] +| +-$array.str_value#21 AS str_value [STRING] +-query= +-ProjectScan - +-column_list=[$array.str_value#18] + +-column_list=[$array.str_value#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.str_value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.str_value#21] +-expr_list= - | +-str_value#18 := ColumnRef(type=STRING, column=$flatten.injected#20) + | +-str_value#21 := ColumnRef(type=STRING, column=$flatten.injected#23) +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-input_scan= | +-ArrayScan - | +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + | +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] | +-input_scan= | | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") | +-array_expr_list= | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - | +-element_column_list=[$flatten.injected#19] + | +-element_column_list=[$flatten.injected#22] +-array_expr_list= | +-GetProtoField | +-type=ARRAY | +-expr= - | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | +-field_descriptor=str_value | +-default_value=[] - +-element_column_list=[$flatten.injected#20] + +-element_column_list=[$flatten.injected#23] == select t.key, value @@ -1075,13 +1075,13 @@ from ArrayTypes t, unnest(t.ProtoArray.(zetasql_test__.TestExtraPBExtensionHolde -- QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -1102,31 +1102,31 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=ext_value | +-default_value=[] - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-expr_list= - | +-value#18 := ColumnRef(type=INT32, column=$flatten.injected#20) + | +-value#21 := ColumnRef(type=INT32, column=$flatten.injected#23) +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-input_scan= | +-ArrayScan - | +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + | +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] | +-input_scan= | | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") | +-array_expr_list= | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - | +-element_column_list=[$flatten.injected#19] + | +-element_column_list=[$flatten.injected#22] +-array_expr_list= | +-GetProtoField | +-type=ARRAY @@ -1134,12 +1134,12 @@ QueryStmt | | +-GetProtoField | | +-type=PROTO | | +-expr= - | | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | | +-field_descriptor=[zetasql_test__.TestExtraPBExtensionHolder.test_extra_proto_extension] | | +-default_value=NULL | +-field_descriptor=ext_value | +-default_value=[] - +-element_column_list=[$flatten.injected#20] + +-element_column_list=[$flatten.injected#23] == select value @@ -1341,13 +1341,13 @@ from ArrayTypes t, unnest(array_concat(t.ProtoArray, []).str_value) value -- QueryStmt +-output_column_list= -| +-$array.value#18 AS value [STRING] +| +-$array.value#21 AS value [STRING] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -1364,41 +1364,41 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=str_value | +-default_value=[] - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [STRING] +| +-$array.value#21 AS value [STRING] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-expr_list= - | +-value#18 := ColumnRef(type=STRING, column=$flatten.injected#20) + | +-value#21 := ColumnRef(type=STRING, column=$flatten.injected#23) +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-input_scan= | +-ArrayScan - | +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + | +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] | +-input_scan= | | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") | +-array_expr_list= | | +-FunctionCall(ZetaSQL:array_concat(ARRAY>, repeated(1) ARRAY>) -> ARRAY>) | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | | +-Literal(type=ARRAY>, value=[]) - | +-element_column_list=[$flatten.injected#19] + | +-element_column_list=[$flatten.injected#22] +-array_expr_list= | +-GetProtoField | +-type=ARRAY | +-expr= - | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | +-field_descriptor=str_value | +-default_value=[] - +-element_column_list=[$flatten.injected#20] + +-element_column_list=[$flatten.injected#23] == select value @@ -1415,13 +1415,13 @@ from ArrayTypes t, unnest(array_concat(flatten(t.ProtoArray.str_value), ["foo"]) -- QueryStmt +-output_column_list= -| +-$array.value#18 AS value [STRING] +| +-$array.value#21 AS value [STRING] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -1438,19 +1438,19 @@ QueryStmt | | +-field_descriptor=str_value | | +-default_value=[] | +-Literal(type=ARRAY, value=["foo"]) - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] [[ REWRITER ARTIFACTS FOR RULE GROUPS 'DEFAULTS,-WITH_EXPR' ]] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [STRING] +| +-$array.value#21 AS value [STRING] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -1458,63 +1458,63 @@ QueryStmt | +-WithExpr | | +-type=ARRAY | | +-assignment_list= - | | | +-injected#19 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) + | | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | | +-expr= | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22, $offset.injected#23] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25, $offset.injected#26] | | | +-input_scan= | | | | +-ArrayScan - | | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | | +-array_expr_list= - | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | | +-element_column_list=[$flatten.injected#20] + | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | | +-element_column_list=[$flatten.injected#23] | | | | +-array_offset_column= - | | | | +-ColumnHolder(column=$offset.injected#21) + | | | | +-ColumnHolder(column=$offset.injected#24) | | | +-array_expr_list= | | | | +-GetProtoField | | | | +-type=ARRAY | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | +-field_descriptor=str_value | | | | +-default_value=[] - | | | +-element_column_list=[$flatten.injected#22] + | | | +-element_column_list=[$flatten.injected#25] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#23) + | | | +-ColumnHolder(column=$offset.injected#26) | | +-order_by_item_list= | | +-OrderByItem | | | +-column_ref= - | | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | | +-ColumnRef(type=INT64, column=$offset.injected#24) | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#23) + | | +-ColumnRef(type=INT64, column=$offset.injected#26) | +-Literal(type=ARRAY, value=["foo"]) - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] [[ REWRITER ARTIFACTS FOR RULE GROUPS 'DEFAULTS' ]] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [STRING] +| +-$array.value#21 AS value [STRING] +-query= +-ProjectScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -1526,59 +1526,59 @@ QueryStmt | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | | +-subquery= | | +-ProjectScan - | | +-column_list=[$with_expr.injected#24] + | | +-column_list=[$with_expr.injected#27] | | +-expr_list= - | | | +-injected#24 := + | | | +-injected#27 := | | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | | +-Literal(type=ARRAY, value=NULL) | | | +-SubqueryExpr | | | +-type=ARRAY | | | +-subquery_type=ARRAY | | | +-parameter_list= - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | | +-subquery= | | | +-OrderByScan - | | | +-column_list=[$flatten.injected#22] + | | | +-column_list=[$flatten.injected#25] | | | +-is_ordered=TRUE | | | +-input_scan= | | | | +-ArrayScan - | | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22, $offset.injected#23] + | | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25, $offset.injected#26] | | | | +-input_scan= | | | | | +-ArrayScan - | | | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | | | +-array_expr_list= - | | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | | | +-element_column_list=[$flatten.injected#20] + | | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | | | +-element_column_list=[$flatten.injected#23] | | | | | +-array_offset_column= - | | | | | +-ColumnHolder(column=$offset.injected#21) + | | | | | +-ColumnHolder(column=$offset.injected#24) | | | | +-array_expr_list= | | | | | +-GetProtoField | | | | | +-type=ARRAY | | | | | +-expr= - | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | | +-field_descriptor=str_value | | | | | +-default_value=[] - | | | | +-element_column_list=[$flatten.injected#22] + | | | | +-element_column_list=[$flatten.injected#25] | | | | +-array_offset_column= - | | | | +-ColumnHolder(column=$offset.injected#23) + | | | | +-ColumnHolder(column=$offset.injected#26) | | | +-order_by_item_list= | | | +-OrderByItem | | | | +-column_ref= - | | | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | | | +-ColumnRef(type=INT64, column=$offset.injected#24) | | | +-OrderByItem | | | +-column_ref= - | | | +-ColumnRef(type=INT64, column=$offset.injected#23) + | | | +-ColumnRef(type=INT64, column=$offset.injected#26) | | +-input_scan= | | +-ProjectScan - | | +-column_list=[$flatten_input.injected#19] + | | +-column_list=[$flatten_input.injected#22] | | +-expr_list= - | | | +-injected#19 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | | +-input_scan= | | +-SingleRowScan | +-Literal(type=ARRAY, value=["foo"]) - +-element_column_list=[$array.value#18] + +-element_column_list=[$array.value#21] == select flatten([ @@ -1729,12 +1729,12 @@ from ArrayTypes t -- QueryStmt +-output_column_list= -| +-$query.value#18 AS value [ARRAY] +| +-$query.value#21 AS value [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.value#18] + +-column_list=[$query.value#21] +-expr_list= - | +-value#18 := + | +-value#21 := | +-Flatten | +-type=ARRAY | +-expr= @@ -1752,12 +1752,12 @@ QueryStmt [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$query.value#18 AS value [ARRAY] +| +-$query.value#21 AS value [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.value#18] + +-column_list=[$query.value#21] +-expr_list= - | +-value#18 := + | +-value#21 := | +-SubqueryExpr | +-type=ARRAY | +-subquery_type=SCALAR @@ -1765,55 +1765,55 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#24] + | +-column_list=[$with_expr.injected#27] | +-expr_list= - | | +-injected#24 := + | | +-injected#27 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22, $offset.injected#23] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25, $offset.injected#26] | | | +-input_scan= | | | | +-ArrayScan - | | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | | +-array_expr_list= - | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | | +-element_column_list=[$flatten.injected#20] + | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | | +-element_column_list=[$flatten.injected#23] | | | | +-array_offset_column= - | | | | +-ColumnHolder(column=$offset.injected#21) + | | | | +-ColumnHolder(column=$offset.injected#24) | | | +-array_expr_list= | | | | +-GetProtoField | | | | +-type=ARRAY | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | +-field_descriptor=str_value | | | | +-default_value=[] - | | | +-element_column_list=[$flatten.injected#22] + | | | +-element_column_list=[$flatten.injected#25] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#23) + | | | +-ColumnHolder(column=$offset.injected#26) | | +-order_by_item_list= | | +-OrderByItem | | | +-column_ref= - | | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | | +-ColumnRef(type=INT64, column=$offset.injected#24) | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#23) + | | +-ColumnRef(type=INT64, column=$offset.injected#26) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#19] + | +-column_list=[$flatten_input.injected#22] | +-expr_list= - | | +-injected#19 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-input_scan= @@ -1825,12 +1825,12 @@ from ArrayTypes t -- QueryStmt +-output_column_list= -| +-$query.value#18 AS value [ARRAY] +| +-$query.value#21 AS value [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.value#18] + +-column_list=[$query.value#21] +-expr_list= - | +-value#18 := + | +-value#21 := | +-Flatten | +-type=ARRAY | +-expr= @@ -1848,12 +1848,12 @@ QueryStmt [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$query.value#18 AS value [ARRAY] +| +-$query.value#21 AS value [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.value#18] + +-column_list=[$query.value#21] +-expr_list= - | +-value#18 := + | +-value#21 := | +-SubqueryExpr | +-type=ARRAY | +-subquery_type=SCALAR @@ -1861,55 +1861,55 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#24] + | +-column_list=[$with_expr.injected#27] | +-expr_list= - | | +-injected#24 := + | | +-injected#27 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22, $offset.injected#23] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25, $offset.injected#26] | | | +-input_scan= | | | | +-ArrayScan - | | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | | +-array_expr_list= - | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | | +-element_column_list=[$flatten.injected#20] + | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | | +-element_column_list=[$flatten.injected#23] | | | | +-array_offset_column= - | | | | +-ColumnHolder(column=$offset.injected#21) + | | | | +-ColumnHolder(column=$offset.injected#24) | | | +-array_expr_list= | | | | +-GetProtoField | | | | +-type=ARRAY | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | +-field_descriptor=str_value | | | | +-default_value=[] - | | | +-element_column_list=[$flatten.injected#22] + | | | +-element_column_list=[$flatten.injected#25] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#23) + | | | +-ColumnHolder(column=$offset.injected#26) | | +-order_by_item_list= | | +-OrderByItem | | | +-column_ref= - | | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | | +-ColumnRef(type=INT64, column=$offset.injected#24) | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#23) + | | +-ColumnRef(type=INT64, column=$offset.injected#26) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#19] + | +-column_list=[$flatten_input.injected#22] | +-expr_list= - | | +-injected#19 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-input_scan= @@ -2306,14 +2306,14 @@ order by o -- QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-OrderByScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-is_ordered=TRUE +-input_scan= | +-ArrayScan - | +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18, $array_offset.o#19] + | +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21, $array_offset.o#22] | +-input_scan= | | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") | +-array_expr_list= @@ -2328,25 +2328,25 @@ QueryStmt | | | +-FlattenedArg(type=PROTO) | | +-field_descriptor=int32_val1 | | +-default_value=0 - | +-element_column_list=[$array.value#18] + | +-element_column_list=[$array.value#21] | +-array_offset_column= - | +-ColumnHolder(column=$array_offset.o#19) + | +-ColumnHolder(column=$array_offset.o#22) +-order_by_item_list= +-OrderByItem +-column_ref= - +-ColumnRef(type=INT64, column=$array_offset.o#19) + +-ColumnRef(type=INT64, column=$array_offset.o#22) [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.value#18 AS value [INT32] +| +-$array.value#21 AS value [INT32] +-query= +-OrderByScan - +-column_list=[$array.value#18] + +-column_list=[$array.value#21] +-is_ordered=TRUE +-input_scan= | +-ArrayScan - | +-column_list=[ArrayTypes.ProtoArray#15, $array.value#18, $array_offset.o#19] + | +-column_list=[ArrayTypes.ProtoArray#15, $array.value#21, $array_offset.o#22] | +-input_scan= | | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") | +-array_expr_list= @@ -2357,40 +2357,40 @@ QueryStmt | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] | | | +-expr_list= - | | | | +-injected#22 := + | | | | +-injected#25 := | | | | +-GetProtoField | | | | +-type=INT32 | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | +-field_descriptor=int32_val1 | | | | +-default_value=0 | | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | +-input_scan= | | | | +-SingleRowScan | | | +-array_expr_list= | | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) - | | | +-element_column_list=[$flatten.injected#20] + | | | +-element_column_list=[$flatten.injected#23] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#21) + | | | +-ColumnHolder(column=$offset.injected#24) | | +-order_by_item_list= | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#21) - | +-element_column_list=[$array.value#18] + | | +-ColumnRef(type=INT64, column=$offset.injected#24) + | +-element_column_list=[$array.value#21] | +-array_offset_column= - | +-ColumnHolder(column=$array_offset.o#19) + | +-ColumnHolder(column=$array_offset.o#22) +-order_by_item_list= +-OrderByItem +-column_ref= - +-ColumnRef(type=INT64, column=$array_offset.o#19) + +-ColumnRef(type=INT64, column=$array_offset.o#22) == select value, o @@ -3830,12 +3830,12 @@ SELECT FLATTEN(JsonArray.field) FROM ArrayTypes -- QueryStmt +-output_column_list= -| +-$query.$col1#18 AS `$col1` [ARRAY] +| +-$query.$col1#21 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#18] + +-column_list=[$query.$col1#21] +-expr_list= - | +-$col1#18 := + | +-$col1#21 := | +-Flatten | +-type=ARRAY | +-expr= @@ -3852,12 +3852,12 @@ QueryStmt [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$query.$col1#18 AS `$col1` [ARRAY] +| +-$query.$col1#21 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#18] + +-column_list=[$query.$col1#21] +-expr_list= - | +-$col1#18 := + | +-$col1#21 := | +-SubqueryExpr | +-type=ARRAY | +-subquery_type=SCALAR @@ -3865,49 +3865,49 @@ QueryStmt | | +-ColumnRef(type=ARRAY, column=ArrayTypes.JsonArray#17) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#23] + | +-column_list=[$with_expr.injected#26] | +-expr_list= - | | +-injected#23 := + | | +-injected#26 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) - | | | +-ColumnRef(type=ARRAY, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY, column=$flatten_input.injected#22) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= - | | | +-ColumnRef(type=ARRAY, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY, column=$flatten_input.injected#22) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] | | | +-expr_list= - | | | | +-injected#22 := + | | | | +-injected#25 := | | | | +-GetJsonField | | | | +-type=JSON | | | | +-expr= - | | | | | +-ColumnRef(type=JSON, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=JSON, column=$flatten.injected#23) | | | | +-field_name="field" | | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | +-array_expr_list= - | | | | +-ColumnRef(type=ARRAY, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | +-element_column_list=[$flatten.injected#20] + | | | | +-ColumnRef(type=ARRAY, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | +-element_column_list=[$flatten.injected#23] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#21) + | | | +-ColumnHolder(column=$offset.injected#24) | | +-order_by_item_list= | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | +-ColumnRef(type=INT64, column=$offset.injected#24) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#19] + | +-column_list=[$flatten_input.injected#22] | +-expr_list= - | | +-injected#19 := ColumnRef(type=ARRAY, column=ArrayTypes.JsonArray#17, is_correlated=TRUE) + | | +-injected#22 := ColumnRef(type=ARRAY, column=ArrayTypes.JsonArray#17, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-input_scan= @@ -4747,19 +4747,19 @@ ON key IN UNNEST(elements) -- QueryStmt +-output_column_list= -| +-$subquery1.elements#18 AS elements [ARRAY] -| +-$array.key#19 AS key [STRING] +| +-$subquery1.elements#21 AS elements [ARRAY] +| +-$array.key#22 AS key [STRING] +-query= +-ProjectScan - +-column_list=[$subquery1.elements#18, $array.key#19] + +-column_list=[$subquery1.elements#21, $array.key#22] +-input_scan= +-ArrayScan - +-column_list=[$subquery1.elements#18, $array.key#19] + +-column_list=[$subquery1.elements#21, $array.key#22] +-input_scan= | +-ProjectScan - | +-column_list=[$subquery1.elements#18] + | +-column_list=[$subquery1.elements#21] | +-expr_list= - | | +-elements#18 := + | | +-elements#21 := | | +-Flatten | | +-type=ARRAY | | +-expr= @@ -4784,28 +4784,28 @@ QueryStmt | +-expr= | | +-FlattenedArg(type=STRUCT) | +-field_idx=0 - +-element_column_list=[$array.key#19] + +-element_column_list=[$array.key#22] +-join_expr= +-FunctionCall(ZetaSQL:$in_array(STRING, ARRAY) -> BOOL) - +-ColumnRef(type=STRING, column=$array.key#19) - +-ColumnRef(type=ARRAY, column=$subquery1.elements#18) + +-ColumnRef(type=STRING, column=$array.key#22) + +-ColumnRef(type=ARRAY, column=$subquery1.elements#21) [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$subquery1.elements#18 AS elements [ARRAY] -| +-$array.key#19 AS key [STRING] +| +-$subquery1.elements#21 AS elements [ARRAY] +| +-$array.key#22 AS key [STRING] +-query= +-ProjectScan - +-column_list=[$subquery1.elements#18, $array.key#19] + +-column_list=[$subquery1.elements#21, $array.key#22] +-input_scan= +-ArrayScan - +-column_list=[$subquery1.elements#18, $array.key#19] + +-column_list=[$subquery1.elements#21, $array.key#22] +-input_scan= | +-ProjectScan - | +-column_list=[$subquery1.elements#18] + | +-column_list=[$subquery1.elements#21] | +-expr_list= - | | +-elements#18 := + | | +-elements#21 := | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=SCALAR @@ -4813,55 +4813,55 @@ QueryStmt | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | | +-subquery= | | +-ProjectScan - | | +-column_list=[$with_expr.injected#27] + | | +-column_list=[$with_expr.injected#30] | | +-expr_list= - | | | +-injected#27 := + | | | +-injected#30 := | | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#20) + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#23) | | | +-Literal(type=ARRAY, value=NULL) | | | +-SubqueryExpr | | | +-type=ARRAY | | | +-subquery_type=ARRAY | | | +-parameter_list= - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#20) + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#23) | | | +-subquery= | | | +-OrderByScan - | | | +-column_list=[$flatten.injected#23] + | | | +-column_list=[$flatten.injected#26] | | | +-is_ordered=TRUE | | | +-input_scan= | | | | +-ArrayScan - | | | | +-column_list=[$flatten.injected#21, $offset.injected#22, $flatten.injected#23, $offset.injected#24] + | | | | +-column_list=[$flatten.injected#24, $offset.injected#25, $flatten.injected#26, $offset.injected#27] | | | | +-input_scan= | | | | | +-ArrayScan - | | | | | +-column_list=[$flatten.injected#21, $offset.injected#22] + | | | | | +-column_list=[$flatten.injected#24, $offset.injected#25] | | | | | +-array_expr_list= - | | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#20, is_correlated=TRUE) - | | | | | +-element_column_list=[$flatten.injected#21] + | | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#23, is_correlated=TRUE) + | | | | | +-element_column_list=[$flatten.injected#24] | | | | | +-array_offset_column= - | | | | | +-ColumnHolder(column=$offset.injected#22) + | | | | | +-ColumnHolder(column=$offset.injected#25) | | | | +-array_expr_list= | | | | | +-GetProtoField | | | | | +-type=ARRAY | | | | | +-expr= - | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#21) + | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#24) | | | | | +-field_descriptor=str_value | | | | | +-default_value=[] - | | | | +-element_column_list=[$flatten.injected#23] + | | | | +-element_column_list=[$flatten.injected#26] | | | | +-array_offset_column= - | | | | +-ColumnHolder(column=$offset.injected#24) + | | | | +-ColumnHolder(column=$offset.injected#27) | | | +-order_by_item_list= | | | +-OrderByItem | | | | +-column_ref= - | | | | +-ColumnRef(type=INT64, column=$offset.injected#22) + | | | | +-ColumnRef(type=INT64, column=$offset.injected#25) | | | +-OrderByItem | | | +-column_ref= - | | | +-ColumnRef(type=INT64, column=$offset.injected#24) + | | | +-ColumnRef(type=INT64, column=$offset.injected#27) | | +-input_scan= | | +-ProjectScan - | | +-column_list=[$flatten_input.injected#20] + | | +-column_list=[$flatten_input.injected#23] | | +-expr_list= - | | | +-injected#20 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | | +-injected#23 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | | +-input_scan= | | +-SingleRowScan | +-input_scan= @@ -4872,40 +4872,40 @@ QueryStmt | +-subquery_type=ARRAY | +-subquery= | +-ProjectScan - | +-column_list=[$flatten.injected#26] + | +-column_list=[$flatten.injected#29] | +-expr_list= - | | +-injected#26 := + | | +-injected#29 := | | +-GetStructField | | +-type=STRING | | +-expr= - | | | +-ColumnRef(type=STRUCT, column=$flatten.injected#25) + | | | +-ColumnRef(type=STRUCT, column=$flatten.injected#28) | | +-field_idx=0 | +-input_scan= | +-ArrayScan - | +-column_list=[$flatten.injected#25] + | +-column_list=[$flatten.injected#28] | +-input_scan= | | +-SingleRowScan | +-array_expr_list= | | +-Literal(type=ARRAY>, value=[{a:"hello"}]) - | +-element_column_list=[$flatten.injected#25] - +-element_column_list=[$array.key#19] + | +-element_column_list=[$flatten.injected#28] + +-element_column_list=[$array.key#22] +-join_expr= +-FunctionCall(ZetaSQL:$in_array(STRING, ARRAY) -> BOOL) - +-ColumnRef(type=STRING, column=$array.key#19) - +-ColumnRef(type=ARRAY, column=$subquery1.elements#18) + +-ColumnRef(type=STRING, column=$array.key#22) + +-ColumnRef(type=ARRAY, column=$subquery1.elements#21) == select count(*) from ArrayTypes t where 3 in unnest(t.ProtoArray.int32_val1); -- QueryStmt +-output_column_list= -| +-$aggregate.$agg1#18 AS `$col1` [INT64] +| +-$aggregate.$agg1#21 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$aggregate.$agg1#18] + +-column_list=[$aggregate.$agg1#21] +-input_scan= +-AggregateScan - +-column_list=[$aggregate.$agg1#18] + +-column_list=[$aggregate.$agg1#21] +-input_scan= | +-FilterScan | +-column_list=[ArrayTypes.ProtoArray#15] @@ -4926,18 +4926,18 @@ QueryStmt | +-field_descriptor=int32_val1 | +-default_value=0 +-aggregate_list= - +-$agg1#18 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + +-$agg1#21 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$aggregate.$agg1#18 AS `$col1` [INT64] +| +-$aggregate.$agg1#21 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$aggregate.$agg1#18] + +-column_list=[$aggregate.$agg1#21] +-input_scan= +-AggregateScan - +-column_list=[$aggregate.$agg1#18] + +-column_list=[$aggregate.$agg1#21] +-input_scan= | +-FilterScan | +-column_list=[ArrayTypes.ProtoArray#15] @@ -4953,54 +4953,54 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#23] + | +-column_list=[$with_expr.injected#26] | +-expr_list= - | | +-injected#23 := + | | +-injected#26 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] | | | +-expr_list= - | | | | +-injected#22 := + | | | | +-injected#25 := | | | | +-GetProtoField | | | | +-type=INT32 | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | +-field_descriptor=int32_val1 | | | | +-default_value=0 | | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | +-array_expr_list= - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | +-element_column_list=[$flatten.injected#20] + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | +-element_column_list=[$flatten.injected#23] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#21) + | | | +-ColumnHolder(column=$offset.injected#24) | | +-order_by_item_list= | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | +-ColumnRef(type=INT64, column=$offset.injected#24) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#19] + | +-column_list=[$flatten_input.injected#22] | +-expr_list= - | | +-injected#19 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-aggregate_list= - +-$agg1#18 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + +-$agg1#21 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) == select FLATTEN(t.ProtoArray.str_value[OFFSET((SELECT COUNT(*) FROM TestTable))]) @@ -5008,12 +5008,12 @@ from ArrayTypes t; -- QueryStmt +-output_column_list= -| +-$query.$col1#22 AS `$col1` [ARRAY] +| +-$query.$col1#25 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#22] + +-column_list=[$query.$col1#25] +-expr_list= - | +-$col1#22 := + | +-$col1#25 := | +-Flatten | +-type=ARRAY | +-expr= @@ -5031,26 +5031,26 @@ QueryStmt | +-subquery_type=SCALAR | +-subquery= | +-ProjectScan - | +-column_list=[$aggregate.$agg1#21] + | +-column_list=[$aggregate.$agg1#24] | +-input_scan= | +-AggregateScan - | +-column_list=[$aggregate.$agg1#21] + | +-column_list=[$aggregate.$agg1#24] | +-input_scan= | | +-TableScan(table=TestTable) | +-aggregate_list= - | +-$agg1#21 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | +-$agg1#24 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) +-input_scan= +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$query.$col1#22 AS `$col1` [ARRAY] +| +-$query.$col1#25 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#22] + +-column_list=[$query.$col1#25] +-expr_list= - | +-$col1#22 := + | +-$col1#25 := | +-SubqueryExpr | +-type=ARRAY | +-subquery_type=SCALAR @@ -5058,32 +5058,32 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#27] + | +-column_list=[$with_expr.injected#30] | +-expr_list= - | | +-injected#27 := + | | +-injected#30 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#23) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#26) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#23) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#26) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#26] + | | +-column_list=[$flatten.injected#29] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=[$flatten.injected#24, $offset.injected#25, $flatten.injected#26] + | | | +-column_list=[$flatten.injected#27, $offset.injected#28, $flatten.injected#29] | | | +-expr_list= - | | | | +-injected#26 := + | | | | +-injected#29 := | | | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | | | | +-GetProtoField | | | | | +-type=ARRAY | | | | | +-expr= - | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#24) + | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#27) | | | | | +-field_descriptor=str_value | | | | | +-default_value=[] | | | | +-SubqueryExpr @@ -5091,31 +5091,31 @@ QueryStmt | | | | +-subquery_type=SCALAR | | | | +-subquery= | | | | +-ProjectScan - | | | | +-column_list=[$aggregate.$agg1#21] + | | | | +-column_list=[$aggregate.$agg1#24] | | | | +-input_scan= | | | | +-AggregateScan - | | | | +-column_list=[$aggregate.$agg1#21] + | | | | +-column_list=[$aggregate.$agg1#24] | | | | +-input_scan= | | | | | +-TableScan(table=TestTable) | | | | +-aggregate_list= - | | | | +-$agg1#21 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | +-$agg1#24 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) | | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#24, $offset.injected#25] + | | | +-column_list=[$flatten.injected#27, $offset.injected#28] | | | +-array_expr_list= - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#23, is_correlated=TRUE) - | | | +-element_column_list=[$flatten.injected#24] + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#26, is_correlated=TRUE) + | | | +-element_column_list=[$flatten.injected#27] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#25) + | | | +-ColumnHolder(column=$offset.injected#28) | | +-order_by_item_list= | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#25) + | | +-ColumnRef(type=INT64, column=$offset.injected#28) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#23] + | +-column_list=[$flatten_input.injected#26] | +-expr_list= - | | +-injected#23 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | +-injected#26 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-input_scan= @@ -5128,13 +5128,13 @@ from ArrayTypes t, unnest(t.ProtoArray.str_value[OFFSET(ARRAY_LENGTH(t.ProtoArra -- QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] +| +-$array.v#21 AS v [STRING] +-query= +-ProjectScan - +-column_list=[$array.v#18] + +-column_list=[$array.v#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.v#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.v#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -5152,42 +5152,42 @@ QueryStmt | | +-default_value=[] | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$array.v#18] + +-element_column_list=[$array.v#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] +| +-$array.v#21 AS v [STRING] +-query= +-ProjectScan - +-column_list=[$array.v#18] + +-column_list=[$array.v#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.v#18] + +-column_list=[ArrayTypes.ProtoArray#15, $array.v#21] +-expr_list= - | +-v#18 := ColumnRef(type=STRING, column=$flatten.injected#20) + | +-v#21 := ColumnRef(type=STRING, column=$flatten.injected#23) +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-expr_list= - | +-injected#20 := + | +-injected#23 := | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | +-GetProtoField | | +-type=ARRAY | | +-expr= - | | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | | +-field_descriptor=str_value | | +-default_value=[] | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$flatten.injected#19] + +-element_column_list=[$flatten.injected#22] == select v, o @@ -5195,14 +5195,14 @@ from ArrayTypes t, unnest(t.ProtoArray.str_value[OFFSET(ARRAY_LENGTH(t.ProtoArra -- QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] -| +-$array_offset.o#19 AS o [INT64] +| +-$array.v#21 AS v [STRING] +| +-$array_offset.o#22 AS o [INT64] +-query= +-ProjectScan - +-column_list=[$array.v#18, $array_offset.o#19] + +-column_list=[$array.v#21, $array_offset.o#22] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.v#18, $array_offset.o#19] + +-column_list=[ArrayTypes.ProtoArray#15, $array.v#21, $array_offset.o#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -5220,21 +5220,21 @@ QueryStmt | | +-default_value=[] | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$array.v#18] + +-element_column_list=[$array.v#21] +-array_offset_column= - +-ColumnHolder(column=$array_offset.o#19) + +-ColumnHolder(column=$array_offset.o#22) [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] -| +-$array_offset.o#19 AS o [INT64] +| +-$array.v#21 AS v [STRING] +| +-$array_offset.o#22 AS o [INT64] +-query= +-ProjectScan - +-column_list=[$array.v#18, $array_offset.o#19] + +-column_list=[$array.v#21, $array_offset.o#22] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $array.v#18, $array_offset.o#19] + +-column_list=[ArrayTypes.ProtoArray#15, $array.v#21, $array_offset.o#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-array_expr_list= @@ -5245,39 +5245,39 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-OrderByScan - | +-column_list=[$flatten.injected#22] + | +-column_list=[$flatten.injected#25] | +-is_ordered=TRUE | +-input_scan= | | +-ProjectScan - | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22] + | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] | | +-expr_list= - | | | +-injected#22 := + | | | +-injected#25 := | | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | | | +-GetProtoField | | | | +-type=ARRAY | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | +-field_descriptor=str_value | | | | +-default_value=[] | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | | +-input_scan= | | +-ArrayScan - | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | +-input_scan= | | | +-SingleRowScan | | +-array_expr_list= | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) - | | +-element_column_list=[$flatten.injected#20] + | | +-element_column_list=[$flatten.injected#23] | | +-array_offset_column= - | | +-ColumnHolder(column=$offset.injected#21) + | | +-ColumnHolder(column=$offset.injected#24) | +-order_by_item_list= | +-OrderByItem | +-column_ref= - | +-ColumnRef(type=INT64, column=$offset.injected#21) - +-element_column_list=[$array.v#18] + | +-ColumnRef(type=INT64, column=$offset.injected#24) + +-element_column_list=[$array.v#21] +-array_offset_column= - +-ColumnHolder(column=$array_offset.o#19) + +-ColumnHolder(column=$array_offset.o#22) == select FLATTEN(t.ProtoArray.str_value[OFFSET(ARRAY_LENGTH(t.ProtoArray))]) @@ -5285,12 +5285,12 @@ from ArrayTypes t; -- QueryStmt +-output_column_list= -| +-$query.$col1#18 AS `$col1` [ARRAY] +| +-$query.$col1#21 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#18] + +-column_list=[$query.$col1#21] +-expr_list= - | +-$col1#18 := + | +-$col1#21 := | +-Flatten | +-type=ARRAY | +-expr= @@ -5311,12 +5311,12 @@ QueryStmt [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$query.$col1#18 AS `$col1` [ARRAY] +| +-$query.$col1#21 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#18] + +-column_list=[$query.$col1#21] +-expr_list= - | +-$col1#18 := + | +-$col1#21 := | +-SubqueryExpr | +-type=ARRAY | +-subquery_type=SCALAR @@ -5324,54 +5324,54 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#23] + | +-column_list=[$with_expr.injected#26] | +-expr_list= - | | +-injected#23 := + | | +-injected#26 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] | | | +-expr_list= - | | | | +-injected#22 := + | | | | +-injected#25 := | | | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | | | | +-GetProtoField | | | | | +-type=ARRAY | | | | | +-expr= - | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | | +-field_descriptor=str_value | | | | | +-default_value=[] | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) | | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | +-array_expr_list= - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | +-element_column_list=[$flatten.injected#20] + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | +-element_column_list=[$flatten.injected#23] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#21) + | | | +-ColumnHolder(column=$offset.injected#24) | | +-order_by_item_list= | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | +-ColumnRef(type=INT64, column=$offset.injected#24) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#19] + | +-column_list=[$flatten_input.injected#22] | +-expr_list= - | | +-injected#19 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-input_scan= @@ -5384,12 +5384,12 @@ from ArrayTypes t; -- QueryStmt +-output_column_list= -| +-$query.$col1#18 AS `$col1` [ARRAY] +| +-$query.$col1#21 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#18] + +-column_list=[$query.$col1#21] +-expr_list= - | +-$col1#18 := + | +-$col1#21 := | +-Flatten | +-type=ARRAY | +-expr= @@ -5410,12 +5410,12 @@ QueryStmt [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$query.$col1#18 AS `$col1` [ARRAY] +| +-$query.$col1#21 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#18] + +-column_list=[$query.$col1#21] +-expr_list= - | +-$col1#18 := + | +-$col1#21 := | +-SubqueryExpr | +-type=ARRAY | +-subquery_type=SCALAR @@ -5424,54 +5424,54 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#23] + | +-column_list=[$with_expr.injected#26] | +-expr_list= - | | +-injected#23 := + | | +-injected#26 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= | | | +-ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1, is_correlated=TRUE) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#22] + | | +-column_list=[$flatten.injected#25] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] | | | +-expr_list= - | | | | +-injected#22 := + | | | | +-injected#25 := | | | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | | | | +-GetProtoField | | | | | +-type=ARRAY | | | | | +-expr= - | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | | +-field_descriptor=str_value | | | | | +-default_value=[] | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) | | | | +-ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1, is_correlated=TRUE) | | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | | +-array_expr_list= - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#19, is_correlated=TRUE) - | | | +-element_column_list=[$flatten.injected#20] + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) + | | | +-element_column_list=[$flatten.injected#23] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#21) + | | | +-ColumnHolder(column=$offset.injected#24) | | +-order_by_item_list= | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#21) + | | +-ColumnRef(type=INT64, column=$offset.injected#24) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#19] + | +-column_list=[$flatten_input.injected#22] | +-expr_list= - | | +-injected#19 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-input_scan= @@ -5484,13 +5484,13 @@ from ArrayTypes t, UNNEST(t.ProtoArray.str_value[OFFSET(ARRAY_LENGTH(t.Int32Arra -- QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] +| +-$array.v#21 AS v [STRING] +-query= +-ProjectScan - +-column_list=[$array.v#18] + +-column_list=[$array.v#21] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#18] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#21] +-input_scan= | +-TableScan(column_list=ArrayTypes.[Int32Array#1, ProtoArray#15], table=ArrayTypes, column_index_list=[0, 14], alias="t") +-array_expr_list= @@ -5508,42 +5508,42 @@ QueryStmt | | +-default_value=[] | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) | +-ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1) - +-element_column_list=[$array.v#18] + +-element_column_list=[$array.v#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] +| +-$array.v#21 AS v [STRING] +-query= +-ProjectScan - +-column_list=[$array.v#18] + +-column_list=[$array.v#21] +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#18] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#21] +-expr_list= - | +-v#18 := ColumnRef(type=STRING, column=$flatten.injected#20) + | +-v#21 := ColumnRef(type=STRING, column=$flatten.injected#23) +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-expr_list= - | +-injected#20 := + | +-injected#23 := | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | +-GetProtoField | | +-type=ARRAY | | +-expr= - | | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | | +-field_descriptor=str_value | | +-default_value=[] | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) | +-ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1) +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $flatten.injected#19] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $flatten.injected#22] +-input_scan= | +-TableScan(column_list=ArrayTypes.[Int32Array#1, ProtoArray#15], table=ArrayTypes, column_index_list=[0, 14], alias="t") +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$flatten.injected#19] + +-element_column_list=[$flatten.injected#22] == select v, o @@ -5551,14 +5551,14 @@ from ArrayTypes t, UNNEST(t.ProtoArray.str_value[OFFSET(ARRAY_LENGTH(t.Int32Arra -- QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] -| +-$array_offset.o#19 AS o [INT64] +| +-$array.v#21 AS v [STRING] +| +-$array_offset.o#22 AS o [INT64] +-query= +-ProjectScan - +-column_list=[$array.v#18, $array_offset.o#19] + +-column_list=[$array.v#21, $array_offset.o#22] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#18, $array_offset.o#19] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#21, $array_offset.o#22] +-input_scan= | +-TableScan(column_list=ArrayTypes.[Int32Array#1, ProtoArray#15], table=ArrayTypes, column_index_list=[0, 14], alias="t") +-array_expr_list= @@ -5576,21 +5576,21 @@ QueryStmt | | +-default_value=[] | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) | +-ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1) - +-element_column_list=[$array.v#18] + +-element_column_list=[$array.v#21] +-array_offset_column= - +-ColumnHolder(column=$array_offset.o#19) + +-ColumnHolder(column=$array_offset.o#22) [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.v#18 AS v [STRING] -| +-$array_offset.o#19 AS o [INT64] +| +-$array.v#21 AS v [STRING] +| +-$array_offset.o#22 AS o [INT64] +-query= +-ProjectScan - +-column_list=[$array.v#18, $array_offset.o#19] + +-column_list=[$array.v#21, $array_offset.o#22] +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#18, $array_offset.o#19] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.ProtoArray#15, $array.v#21, $array_offset.o#22] +-input_scan= | +-TableScan(column_list=ArrayTypes.[Int32Array#1, ProtoArray#15], table=ArrayTypes, column_index_list=[0, 14], alias="t") +-array_expr_list= @@ -5602,39 +5602,39 @@ QueryStmt | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) | +-subquery= | +-OrderByScan - | +-column_list=[$flatten.injected#22] + | +-column_list=[$flatten.injected#25] | +-is_ordered=TRUE | +-input_scan= | | +-ProjectScan - | | +-column_list=[$flatten.injected#20, $offset.injected#21, $flatten.injected#22] + | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] | | +-expr_list= - | | | +-injected#22 := + | | | +-injected#25 := | | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | | | +-GetProtoField | | | | +-type=ARRAY | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#20) + | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) | | | | +-field_descriptor=str_value | | | | +-default_value=[] | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) | | | +-ColumnRef(type=ARRAY, column=ArrayTypes.Int32Array#1, is_correlated=TRUE) | | +-input_scan= | | +-ArrayScan - | | +-column_list=[$flatten.injected#20, $offset.injected#21] + | | +-column_list=[$flatten.injected#23, $offset.injected#24] | | +-input_scan= | | | +-SingleRowScan | | +-array_expr_list= | | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) - | | +-element_column_list=[$flatten.injected#20] + | | +-element_column_list=[$flatten.injected#23] | | +-array_offset_column= - | | +-ColumnHolder(column=$offset.injected#21) + | | +-ColumnHolder(column=$offset.injected#24) | +-order_by_item_list= | +-OrderByItem | +-column_ref= - | +-ColumnRef(type=INT64, column=$offset.injected#21) - +-element_column_list=[$array.v#18] + | +-ColumnRef(type=INT64, column=$offset.injected#24) + +-element_column_list=[$array.v#21] +-array_offset_column= - +-ColumnHolder(column=$array_offset.o#19) + +-ColumnHolder(column=$array_offset.o#22) == select FLATTEN(t.ProtoArray.str_value[OFFSET(ARRAY_LENGTH(test.KitchenSink.nested_repeated_value))]) @@ -5642,12 +5642,12 @@ from ArrayTypes t, TestTable test; -- QueryStmt +-output_column_list= -| +-$query.$col1#21 AS `$col1` [ARRAY] +| +-$query.$col1#24 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#21] + +-column_list=[$query.$col1#24] +-expr_list= - | +-$col1#21 := + | +-$col1#24 := | +-Flatten | +-type=ARRAY | +-expr= @@ -5664,96 +5664,96 @@ QueryStmt | +-GetProtoField | +-type=ARRAY> | +-expr= - | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#20) + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#23) | +-field_descriptor=nested_repeated_value | +-default_value=[] +-input_scan= +-JoinScan - +-column_list=[ArrayTypes.ProtoArray#15, TestTable.KitchenSink#20] + +-column_list=[ArrayTypes.ProtoArray#15, TestTable.KitchenSink#23] +-left_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-right_scan= - +-TableScan(column_list=[TestTable.KitchenSink#20], table=TestTable, column_index_list=[2], alias="test") + +-TableScan(column_list=[TestTable.KitchenSink#23], table=TestTable, column_index_list=[2], alias="test") [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$query.$col1#21 AS `$col1` [ARRAY] +| +-$query.$col1#24 AS `$col1` [ARRAY] +-query= +-ProjectScan - +-column_list=[$query.$col1#21] + +-column_list=[$query.$col1#24] +-expr_list= - | +-$col1#21 := + | +-$col1#24 := | +-SubqueryExpr | +-type=ARRAY | +-subquery_type=SCALAR | +-parameter_list= | | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#20) + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#23) | +-subquery= | +-ProjectScan - | +-column_list=[$with_expr.injected#26] + | +-column_list=[$with_expr.injected#29] | +-expr_list= - | | +-injected#26 := + | | +-injected#29 := | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#25) | | +-Literal(type=ARRAY, value=NULL) | | +-SubqueryExpr | | +-type=ARRAY | | +-subquery_type=ARRAY | | +-parameter_list= - | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#20, is_correlated=TRUE) - | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22) + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#23, is_correlated=TRUE) + | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#25) | | +-subquery= | | +-OrderByScan - | | +-column_list=[$flatten.injected#25] + | | +-column_list=[$flatten.injected#28] | | +-is_ordered=TRUE | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=[$flatten.injected#23, $offset.injected#24, $flatten.injected#25] + | | | +-column_list=[$flatten.injected#26, $offset.injected#27, $flatten.injected#28] | | | +-expr_list= - | | | | +-injected#25 := + | | | | +-injected#28 := | | | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) | | | | +-GetProtoField | | | | | +-type=ARRAY | | | | | +-expr= - | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#23) + | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#26) | | | | | +-field_descriptor=str_value | | | | | +-default_value=[] | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) | | | | +-GetProtoField | | | | +-type=ARRAY> | | | | +-expr= - | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#20, is_correlated=TRUE) + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#23, is_correlated=TRUE) | | | | +-field_descriptor=nested_repeated_value | | | | +-default_value=[] | | | +-input_scan= | | | +-ArrayScan - | | | +-column_list=[$flatten.injected#23, $offset.injected#24] + | | | +-column_list=[$flatten.injected#26, $offset.injected#27] | | | +-array_expr_list= - | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#22, is_correlated=TRUE) - | | | +-element_column_list=[$flatten.injected#23] + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#25, is_correlated=TRUE) + | | | +-element_column_list=[$flatten.injected#26] | | | +-array_offset_column= - | | | +-ColumnHolder(column=$offset.injected#24) + | | | +-ColumnHolder(column=$offset.injected#27) | | +-order_by_item_list= | | +-OrderByItem | | +-column_ref= - | | +-ColumnRef(type=INT64, column=$offset.injected#24) + | | +-ColumnRef(type=INT64, column=$offset.injected#27) | +-input_scan= | +-ProjectScan - | +-column_list=[$flatten_input.injected#22] + | +-column_list=[$flatten_input.injected#25] | +-expr_list= - | | +-injected#22 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) + | | +-injected#25 := ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15, is_correlated=TRUE) | +-input_scan= | +-SingleRowScan +-input_scan= +-JoinScan - +-column_list=[ArrayTypes.ProtoArray#15, TestTable.KitchenSink#20] + +-column_list=[ArrayTypes.ProtoArray#15, TestTable.KitchenSink#23] +-left_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="t") +-right_scan= - +-TableScan(column_list=[TestTable.KitchenSink#20], table=TestTable, column_index_list=[2], alias="test") + +-TableScan(column_list=[TestTable.KitchenSink#23], table=TestTable, column_index_list=[2], alias="test") == select t.KitchenSink.nested_repeated_value.nested_int64 in unnest(t.KitchenSink.nested_repeated_value.nested_int64) diff --git a/zetasql/analyzer/testdata/coercion.test b/zetasql/analyzer/testdata/coercion.test index cfd06f49b..1cd2595c8 100644 --- a/zetasql/analyzer/testdata/coercion.test +++ b/zetasql/analyzer/testdata/coercion.test @@ -2898,7 +2898,7 @@ select fn_on_int32_array_returns_int32(ARRAY[]) == [language_features=NUMERIC_TYPE,ROUND_WITH_ROUNDING_MODE] -[product_mode={{internal|external}}] +[product_mode=internal] # Integers should coerce to opaque enums - it's not likely to # be used much in practice, but it is part of the spec. However # this should _not_ be the case for product mode external, which @@ -2907,9 +2907,7 @@ select fn_on_int32_array_returns_int32(ARRAY[]) # SELECT ROUND(NUMERIC "1", 1, 1); -- -ALTERNATION GROUP: internal --- QueryStmt +-output_column_list= | +-$query.$col1#1 AS `$col1` [NUMERIC] @@ -2924,26 +2922,60 @@ QueryStmt | +-Literal(type=ENUM, value=ROUND_HALF_AWAY_FROM_ZERO) +-input_scan= +-SingleRowScan +== + +[language_features=NUMERIC_TYPE,ROUND_WITH_ROUNDING_MODE,V_1_4_DISABLE_FLOAT32] +[product_mode=external] + +SELECT ROUND(NUMERIC "1", 1, 1); -- -ALTERNATION GROUP: external +ERROR: No matching signature for function ROUND for argument types: NUMERIC, INT64, INT64. Supported signatures: ROUND(FLOAT64); ROUND(NUMERIC); ROUND(FLOAT64, INT64); ROUND(NUMERIC, INT64); ROUND(NUMERIC, INT64, ROUNDING_MODE) [at 1:8] +SELECT ROUND(NUMERIC "1", 1, 1); + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for function ROUND + Argument types: NUMERIC, INT64, INT64 + Signature: ROUND(FLOAT64) + Signature accepts at most 1 argument, found 3 arguments + Signature: ROUND(NUMERIC) + Signature accepts at most 1 argument, found 3 arguments + Signature: ROUND(FLOAT64, INT64) + Signature accepts at most 2 arguments, found 3 arguments + Signature: ROUND(NUMERIC, INT64) + Signature accepts at most 2 arguments, found 3 arguments + Signature: ROUND(NUMERIC, INT64, ROUNDING_MODE) + Argument 3: Unable to coerce type INT64 to expected type ROUNDING_MODE [at 1:8] +SELECT ROUND(NUMERIC "1", 1, 1); + ^ +== + +[language_features=NUMERIC_TYPE,ROUND_WITH_ROUNDING_MODE] +[product_mode=external] + +SELECT ROUND(NUMERIC "1", 1, 1); -- -ERROR: No matching signature for function ROUND for argument types: NUMERIC, INT64, INT64. Supported signatures: ROUND(FLOAT64); ROUND(NUMERIC); ROUND(FLOAT64, INT64); ROUND(NUMERIC, INT64); ROUND(NUMERIC, INT64, ROUNDING_MODE) [at 7:8] +ERROR: No matching signature for function ROUND for argument types: NUMERIC, INT64, INT64. Supported signatures: ROUND(FLOAT); ROUND(FLOAT64); ROUND(NUMERIC); ROUND(FLOAT, INT64); ROUND(FLOAT64, INT64); ROUND(NUMERIC, INT64); ROUND(NUMERIC, INT64, ROUNDING_MODE) [at 1:8] SELECT ROUND(NUMERIC "1", 1, 1); ^ -- Signature Mismatch Details: ERROR: No matching signature for function ROUND Argument types: NUMERIC, INT64, INT64 + Signature: ROUND(FLOAT) + Signature accepts at most 1 argument, found 3 arguments Signature: ROUND(FLOAT64) Signature accepts at most 1 argument, found 3 arguments Signature: ROUND(NUMERIC) Signature accepts at most 1 argument, found 3 arguments + Signature: ROUND(FLOAT, INT64) + Signature accepts at most 2 arguments, found 3 arguments Signature: ROUND(FLOAT64, INT64) Signature accepts at most 2 arguments, found 3 arguments Signature: ROUND(NUMERIC, INT64) Signature accepts at most 2 arguments, found 3 arguments Signature: ROUND(NUMERIC, INT64, ROUNDING_MODE) - Argument 3: Unable to coerce type INT64 to expected type ROUNDING_MODE [at 7:8] + Argument 3: Unable to coerce type INT64 to expected type ROUNDING_MODE [at 1:8] SELECT ROUND(NUMERIC "1", 1, 1); ^ == diff --git a/zetasql/analyzer/testdata/collation.test b/zetasql/analyzer/testdata/collation.test index 05683f7b8..54fda3d2a 100644 --- a/zetasql/analyzer/testdata/collation.test +++ b/zetasql/analyzer/testdata/collation.test @@ -4147,10 +4147,11 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=t2.[col_a#7, col_c#8], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t1.col_a#5{Collation:"und:ci"}) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t2.col_a#7{Collation:"und:ci"}) - +-collation_list=[und:ci] + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t1.col_a#5{Collation:"und:ci"}) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t2.col_a#7{Collation:"und:ci"}) + | +-collation_list=[und:ci] + +-has_using=TRUE -- ALTERNATION GROUP: right -- @@ -4203,10 +4204,11 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=t2.[col_a#7, col_c#8], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t1.col_a#5{Collation:"und:ci"}) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t2.col_a#7{Collation:"und:ci"}) - +-collation_list=[und:ci] + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t1.col_a#5{Collation:"und:ci"}) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t2.col_a#7{Collation:"und:ci"}) + | +-collation_list=[und:ci] + +-has_using=TRUE -- ALTERNATION GROUP: full -- @@ -4268,10 +4270,11 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=t2.[col_a#7, col_c#8], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t1.col_a#5{Collation:"und:ci"}) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t2.col_a#7{Collation:"und:ci"}) - +-collation_list=[und:ci] + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t1.col_a#5{Collation:"und:ci"}) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=t2.col_a#7{Collation:"und:ci"}) + | +-collation_list=[und:ci] + +-has_using=TRUE == [no_run_unparser] @@ -4390,10 +4393,11 @@ QueryStmt | +-input_scan= | +-WithRefScan(column_list=[t2.$struct#8<{Collation:"und:ci"},_>], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) - +-collation_list=[und:ci] + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) + | +-collation_list=[und:ci] + +-has_using=TRUE -- ALTERNATION GROUP: left -- @@ -4503,10 +4507,11 @@ QueryStmt | +-input_scan= | +-WithRefScan(column_list=[t2.$struct#8<{Collation:"und:ci"},_>], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) - +-collation_list=[und:ci] + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) + | +-collation_list=[und:ci] + +-has_using=TRUE -- ALTERNATION GROUP: right -- @@ -4616,10 +4621,11 @@ QueryStmt | +-input_scan= | +-WithRefScan(column_list=[t2.$struct#8<{Collation:"und:ci"},_>], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) - +-collation_list=[und:ci] + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) + | +-collation_list=[und:ci] + +-has_using=TRUE -- ALTERNATION GROUP: full -- @@ -4738,10 +4744,11 @@ QueryStmt | +-input_scan= | +-WithRefScan(column_list=[t2.$struct#8<{Collation:"und:ci"},_>], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) - +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) - +-collation_list=[und:ci] + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_left.col_a#9{Collation:"und:ci"}) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$join_right.col_a#10{Collation:"und:ci"}) + | +-collation_list=[und:ci] + +-has_using=TRUE -- ALTERNATION GROUP: outer -- @@ -6277,11 +6284,11 @@ CREATE SCHEMA myProject.mySchema DEFAULT COLLATE 'und:ci' OPTIONS(a="b", c="d"); -- CreateSchemaStmt +-name_path=myProject.mySchema -+-collation_name= -| +-Literal(type=STRING, value="und:ci") +-option_list= - +-a := Literal(type=STRING, value="b") - +-c := Literal(type=STRING, value="d") +| +-a := Literal(type=STRING, value="b") +| +-c := Literal(type=STRING, value="d") ++-collation_name= + +-Literal(type=STRING, value="und:ci") == create table t (a int64, b string collate 'und:cs') default collate cast(null as string); diff --git a/zetasql/analyzer/testdata/constant.test b/zetasql/analyzer/testdata/constant.test index af0f84aa1..c9783cfb4 100644 --- a/zetasql/analyzer/testdata/constant.test +++ b/zetasql/analyzer/testdata/constant.test @@ -1,8 +1,30 @@ # Constants cannot referenced in the LIMIT clause. -# TODO: Enable support for named constants in LIMIT. +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 FROM KeyValue limit TestConstantInt64 OFFSET TestConstantInt64; -- -ERROR: Syntax error: Unexpected identifier "TestConstantInt64" [at 1:30] +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [INT64] ++-query= + +-LimitOffsetScan + +-column_list=[$query.$col1#3] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.$col1#3] + | +-expr_list= + | | +-$col1#3 := Literal(type=INT64, value=1) + | +-input_scan= + | +-TableScan(table=KeyValue) + +-limit= + | +-Constant(TestConstantInt64, type=INT64, value=1) + +-offset= + +-Constant(TestConstantInt64, type=INT64, value=1) +-- +ALTERNATION GROUP: +-- +ERROR: LIMIT expects an integer literal or parameter [at 1:30] SELECT 1 FROM KeyValue limit TestConstantInt64 OFFSET TestConstantInt64; ^ == diff --git a/zetasql/analyzer/testdata/correlated_expr_subquery.test b/zetasql/analyzer/testdata/correlated_expr_subquery.test index dbfd4d48f..4f5d35f31 100644 --- a/zetasql/analyzer/testdata/correlated_expr_subquery.test +++ b/zetasql/analyzer/testdata/correlated_expr_subquery.test @@ -1710,7 +1710,7 @@ select tt.key as key, IF(EXISTS(select * from TestTable tt group by tt.key -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 2:38] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 2:38] from tt.KitchenSink.repeated_int32_val), ^ == diff --git a/zetasql/analyzer/testdata/correlated_subquery_outer_aggr.test b/zetasql/analyzer/testdata/correlated_subquery_outer_aggr.test index 679a4d165..755f5187a 100644 --- a/zetasql/analyzer/testdata/correlated_subquery_outer_aggr.test +++ b/zetasql/analyzer/testdata/correlated_subquery_outer_aggr.test @@ -396,7 +396,7 @@ select tt.TestEnum, (select count(*) from tt.KitchenSink.repeated_int32_val) from TestTable tt group by 1; -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 1:43] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 1:43] select tt.TestEnum, (select count(*) from tt.KitchenSink.repeated_int32_val) ^ == @@ -406,7 +406,7 @@ select distinct tt.TestEnum from TestTable tt order by (select count(*) from tt.KitchenSink.repeated_int32_val) -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 3:32] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 3:32] order by (select count(*) from tt.KitchenSink.repeated_int32_val) ^ == @@ -418,7 +418,7 @@ select sum(tt.KitchenSink.int32_val) + from TestTable tt where tt.KitchenSink.bool_val = true; -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 3:14] +ERROR: FROM clause expression references tt.KitchenSink.nested_repeated_value which is neither grouped nor aggregated [at 3:14] FROM tt.KitchenSink.nested_repeated_value) ^ == @@ -435,7 +435,7 @@ select tt.key as key, IF(EXISTS(select * from TestTable tt group by tt.key -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 2:38] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 2:38] from tt.KitchenSink.repeated_int32_val), ^ == @@ -894,7 +894,6 @@ GROUP BY tt.struct1 ^ == -# TODO: the error message is misleading, 'tt' is not an array. SELECT tt.struct1 from (select struct, sub_struct_2 struct>((1, 2), (3, 4)) as struct1) tt @@ -902,7 +901,7 @@ GROUP BY tt.struct1 HAVING (select count(*) from tt.struct1.sub_struct_1) > 0 -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 5:30] +ERROR: Values referenced in FROM clause must be arrays. tt.struct1.sub_struct_1 has type STRUCT [at 5:30] HAVING (select count(*) from tt.struct1.sub_struct_1) > 0 ^ == @@ -939,7 +938,7 @@ from TestTable tt GROUP BY tt.TestEnum HAVING (select count(*) from tt.KitchenSink.repeated_int32_val) > 0 -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 4:30] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 4:30] HAVING (select count(*) from tt.KitchenSink.repeated_int32_val) > 0 ^ == @@ -1053,7 +1052,7 @@ from TestTable tt GROUP BY tt.TestEnum HAVING (select count(*) from tt.KitchenSink.repeated_int32_val) > 0 -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 4:30] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 4:30] HAVING (select count(*) from tt.KitchenSink.repeated_int32_val) > 0 ^ == @@ -1064,7 +1063,7 @@ from TestTable tt GROUP BY tt.TestEnum, (select count(*) from tt.KitchenSink.repeated_int32_val) HAVING (select count(*) from tt.KitchenSink.repeated_int32_val) > 0 -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 4:30] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 4:30] HAVING (select count(*) from tt.KitchenSink.repeated_int32_val) > 0 ^ == @@ -1195,7 +1194,7 @@ from TestTable tt GROUP BY tt.TestEnum ORDER BY (select count(*) from tt.KitchenSink.repeated_int32_val) DESC -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 4:32] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 4:32] ORDER BY (select count(*) from tt.KitchenSink.repeated_int32_val) DESC ^ == @@ -1512,7 +1511,7 @@ select tt.TestEnum, from TestTable tt group by 1; -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 2:38] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 2:38] (select (select count(*) from tt.KitchenSink.repeated_int32_val)) ^ == @@ -1871,7 +1870,7 @@ from TestTable tt GROUP BY tt.TestEnum HAVING (select (select count(*) from tt.KitchenSink.repeated_int32_val)) > 0 -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 4:38] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 4:38] HAVING (select (select count(*) from tt.KitchenSink.repeated_int32_val)) > 0 ^ == @@ -2016,7 +2015,7 @@ from TestTable tt GROUP BY tt.TestEnum ORDER BY (select (select count(*) from tt.KitchenSink.repeated_int32_val)) -- -ERROR: Correlated aliases referenced in the from clause must refer to arrays that are valid to access from the outer query, but tt refers to an array that is not valid to access after GROUP BY or DISTINCT in the outer query [at 4:40] +ERROR: FROM clause expression references tt.KitchenSink.repeated_int32_val which is neither grouped nor aggregated [at 4:40] ORDER BY (select (select count(*) from tt.KitchenSink.repeated_int32_val)) ^ == @@ -4678,9 +4677,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-has_using=TRUE +-group_by_list= +-key#6 := ColumnRef(type=INT64, column=KeyValue.Key#1) -- @@ -4735,9 +4735,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-has_using=TRUE +-group_by_list= +-key#6 := ColumnRef(type=INT64, column=KeyValue.Key#1) -- diff --git a/zetasql/analyzer/testdata/correlated_subquery_outer_aggr_struct.test b/zetasql/analyzer/testdata/correlated_subquery_outer_aggr_struct.test index 067245190..b92eecb06 100644 --- a/zetasql/analyzer/testdata/correlated_subquery_outer_aggr_struct.test +++ b/zetasql/analyzer/testdata/correlated_subquery_outer_aggr_struct.test @@ -3221,9 +3221,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[ComplexTypes.TestStruct#11], table=ComplexTypes, column_index_list=[4], alias="ct2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRUCT>, STRUCT>) -> BOOL) - | +-ColumnRef(type=STRUCT>, column=ComplexTypes.TestStruct#5) - | +-ColumnRef(type=STRUCT>, column=ComplexTypes.TestStruct#11) + | | +-FunctionCall(ZetaSQL:$equal(STRUCT>, STRUCT>) -> BOOL) + | | +-ColumnRef(type=STRUCT>, column=ComplexTypes.TestStruct#5) + | | +-ColumnRef(type=STRUCT>, column=ComplexTypes.TestStruct#11) + | +-has_using=TRUE +-group_by_list= +-d#14 := +-GetStructField diff --git a/zetasql/analyzer/testdata/corresponding.test b/zetasql/analyzer/testdata/corresponding.test index 11d48d8b5..7f6497d35 100644 --- a/zetasql/analyzer/testdata/corresponding.test +++ b/zetasql/analyzer/testdata/corresponding.test @@ -1,6 +1,6 @@ # CORRESPONDING [default enabled_ast_rewrites=DEFAULTS] -[language_features={{|V_1_4_CORRESPONDING}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 UNION ALL CORRESPONDING SELECT 2 @@ -11,7 +11,9 @@ ERROR: CORRESPONDING for set operations is not supported [at 2:11] UNION ALL CORRESPONDING ^ -- -ALTERNATION GROUP: V_1_4_CORRESPONDING +ALTERNATION GROUPS: + V_1_4_CORRESPONDING + V_1_4_CORRESPONDING_FULL -- ERROR: Anonymous columns are not allowed in set operations when CORRESPONDING is used: query 1, column 1 [at 1:1] SELECT 1 @@ -19,7 +21,7 @@ SELECT 1 == # CORRESPONDING in multiple operations. -[language_features={{|V_1_4_CORRESPONDING}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 UNION ALL SELECT 2 @@ -32,7 +34,9 @@ ERROR: CORRESPONDING for set operations is not supported [at 4:11] UNION ALL CORRESPONDING ^ -- -ALTERNATION GROUP: V_1_4_CORRESPONDING +ALTERNATION GROUPS: + V_1_4_CORRESPONDING + V_1_4_CORRESPONDING_FULL -- ERROR: Different column match modes cannot be used in the same query without using parentheses for grouping [at 4:11] UNION ALL CORRESPONDING @@ -40,18 +44,20 @@ UNION ALL CORRESPONDING == # CORRESPONDING BY -[language_features={{|V_1_4_CORRESPONDING_BY}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 UNION ALL CORRESPONDING BY (a, b, c) SELECT 2 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: CORRESPONDING BY for set operations is not supported [at 2:11] UNION ALL CORRESPONDING BY (a, b, c) ^ -- -ALTERNATION GROUP: V_1_4_CORRESPONDING_BY +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: The identifier a from the CORRESPONDING BY list does not appear in the input query 1. All columns in the BY list must appear in each input query unless FULL CORRESPONDING or LEFT CORRESPONDING is specified [at 2:29] UNION ALL CORRESPONDING BY (a, b, c) @@ -59,20 +65,22 @@ UNION ALL CORRESPONDING BY (a, b, c) == # CORRESPONDING BY in multiple operations -[language_features={{|V_1_4_CORRESPONDING_BY}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 UNION ALL SELECT 2 UNION ALL CORRESPONDING BY (a, b, c) SELECT 3 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: CORRESPONDING BY for set operations is not supported [at 4:11] UNION ALL CORRESPONDING BY (a, b, c) ^ -- -ALTERNATION GROUP: V_1_4_CORRESPONDING_BY +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: Different column match modes cannot be used in the same query without using parentheses for grouping [at 4:11] UNION ALL CORRESPONDING BY (a, b, c) @@ -80,18 +88,20 @@ UNION ALL CORRESPONDING BY (a, b, c) == # STRICT without CORRESPONDING -[language_features={{|V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 UNION DISTINCT STRICT SELECT 2 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: Column propagation mode (FULL/LEFT/STRICT) for set operations are not supported [at 2:16] UNION DISTINCT STRICT ^ -- -ALTERNATION GROUP: V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: STRICT in set operations cannot be used without CORRESPONDING [at 2:16] UNION DISTINCT STRICT @@ -99,20 +109,22 @@ UNION DISTINCT STRICT == # STRICT without CORRESPONDING in multiple set operations -[language_features={{|V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 UNION DISTINCT SELECT 2 UNION DISTINCT STRICT SELECT 3 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: Column propagation mode (FULL/LEFT/STRICT) for set operations are not supported [at 4:16] UNION DISTINCT STRICT ^ -- -ALTERNATION GROUP: V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: STRICT in set operations cannot be used without CORRESPONDING [at 4:16] UNION DISTINCT STRICT @@ -120,18 +132,20 @@ UNION DISTINCT STRICT == # FULL without CORRESPONDING -[language_features={{|V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 FULL EXCEPT ALL SELECT 2 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: Column propagation mode (FULL/LEFT/STRICT) for set operations are not supported [at 2:1] FULL EXCEPT ALL ^ -- -ALTERNATION GROUP: V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: FULL in set operations cannot be used without CORRESPONDING [at 2:1] FULL EXCEPT ALL @@ -139,20 +153,22 @@ FULL EXCEPT ALL == # FULL without CORRESPONDING in multiple operations -[language_features={{|V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 EXCEPT ALL SELECT 2 FULL EXCEPT ALL SELECT 3 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: Column propagation mode (FULL/LEFT/STRICT) for set operations are not supported [at 4:1] FULL EXCEPT ALL ^ -- -ALTERNATION GROUP: V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: FULL in set operations cannot be used without CORRESPONDING [at 4:1] FULL EXCEPT ALL @@ -160,18 +176,20 @@ FULL EXCEPT ALL == # LEFT without CORRESPONDING -[language_features={{|V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 LEFT EXCEPT ALL SELECT 2 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: Column propagation mode (FULL/LEFT/STRICT) for set operations are not supported [at 2:1] LEFT EXCEPT ALL ^ -- -ALTERNATION GROUP: V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: LEFT in set operations cannot be used without CORRESPONDING [at 2:1] LEFT EXCEPT ALL @@ -179,27 +197,29 @@ LEFT EXCEPT ALL == # LEFT without CORRESPONDING in multiple set operations -[language_features={{|V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{|V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 EXCEPT ALL SELECT 2 LEFT EXCEPT ALL SELECT 3 -- -ALTERNATION GROUP: +ALTERNATION GROUPS: + + V_1_4_CORRESPONDING -- ERROR: Column propagation mode (FULL/LEFT/STRICT) for set operations are not supported [at 4:1] LEFT EXCEPT ALL ^ -- -ALTERNATION GROUP: V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: LEFT in set operations cannot be used without CORRESPONDING [at 4:1] LEFT EXCEPT ALL ^ == -[language_features={{V_1_4_CORRESPONDING|V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 {{FULL|LEFT}} UNION ALL CORRESPONDING SELECT 2 @@ -217,16 +237,15 @@ LEFT UNION ALL CORRESPONDING ^ -- ALTERNATION GROUPS: - V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,FULL - V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,LEFT + V_1_4_CORRESPONDING_FULL,FULL + V_1_4_CORRESPONDING_FULL,LEFT -- ERROR: Anonymous columns are not allowed in set operations when CORRESPONDING is used: query 1, column 1 [at 1:1] SELECT 1 ^ == -# STRICT not implemented. -[language_features={{V_1_4_CORRESPONDING|V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE}}] +[language_features={{V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] SELECT 1 UNION ALL STRICT CORRESPONDING SELECT 2 @@ -237,14 +256,14 @@ ERROR: Column propagation mode (FULL/LEFT/STRICT) for set operations are not sup UNION ALL STRICT CORRESPONDING ^ -- -ALTERNATION GROUP: V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE +ALTERNATION GROUP: V_1_4_CORRESPONDING_FULL -- ERROR: Anonymous columns are not allowed in set operations when CORRESPONDING is used: query 1, column 1 [at 1:1] SELECT 1 ^ == -[default language_features=V_1_4_CORRESPONDING] +[default language_features={{V_1_4_CORRESPONDING|V_1_4_CORRESPONDING_FULL}}] # CORRESPONDING: same columns at same index. SELECT 1 AS col1, 2 AS col2 EXCEPT DISTINCT CORRESPONDING @@ -281,37 +300,6 @@ QueryStmt | +-output_column_list=$except_distinct2.[col1#3, col2#4] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$except_distinct.col1#5 AS col1 [INT64] -| +-$except_distinct.col2#6 AS col2 [INT64] -+-query= - +-SetOperationScan - +-column_list=$except_distinct.[col1#5, col2#6] - +-op_type=EXCEPT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$except_distinct1.[col1#1, col2#2] - | | +-expr_list= - | | | +-col1#1 := Literal(type=INT64, value=1) - | | | +-col2#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$except_distinct1.[col1#1, col2#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$except_distinct2.[col1#3, col2#4] - | +-expr_list= - | | +-col1#3 := Literal(type=INT64, value=3) - | | +-col2#4 := Literal(type=INT64, value=4) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$except_distinct2.[col1#3, col2#4] == # CORRESPONDING: same columns at different index, the output column order is @@ -342,49 +330,19 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$intersect_all2.[col2#3, col1#4] - | | +-expr_list= - | | | +-col2#3 := Literal(type=INT64, value=3) - | | | +-col1#4 := Literal(type=INT64, value=4) + | | +-column_list=$intersect_all2.[col1#4, col2#3] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$intersect_all2.[col2#3, col1#4] + | | +-expr_list= + | | | +-col2#3 := Literal(type=INT64, value=3) + | | | +-col1#4 := Literal(type=INT64, value=4) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$intersect_all2.[col1#4, col2#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.col1#5 AS col1 [INT64] -| +-$intersect_all.col2#6 AS col2 [INT64] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[col1#5, col2#6] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$intersect_all1.[col1#1, col2#2] - | | +-expr_list= - | | | +-col1#1 := Literal(type=INT64, value=1) - | | | +-col2#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$intersect_all1.[col1#1, col2#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$intersect_all2.[col1#4, col2#3] - | +-input_scan= - | +-ProjectScan - | +-column_list=$intersect_all2.[col2#3, col1#4] - | +-expr_list= - | | +-col2#3 := Literal(type=INT64, value=3) - | | +-col1#4 := Literal(type=INT64, value=4) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$intersect_all2.[col1#4, col2#3] == # CORRESPONDING: extra columns are ignored. @@ -403,61 +361,33 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=$union_all1.[col#1, extra_col_1#2] - | | | +-expr_list= - | | | | +-col#1 := Literal(type=INT64, value=1) - | | | | +-extra_col_1#2 := Literal(type=INT64, value=2) + | | | +-column_list=[$union_all1.col#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-SingleRowScan + | | | +-ProjectScan + | | | +-column_list=$union_all1.[col#1, extra_col_1#2] + | | | +-expr_list= + | | | | +-col#1 := Literal(type=INT64, value=1) + | | | | +-extra_col_1#2 := Literal(type=INT64, value=2) + | | | +-input_scan= + | | | +-SingleRowScan | | +-output_column_list=[$union_all1.col#1] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[extra_col_2#3, col#4] - | | +-expr_list= - | | | +-extra_col_2#3 := Literal(type=INT64, value=3) - | | | +-col#4 := Literal(type=INT64, value=4) + | | +-column_list=[$union_all2.col#4] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[extra_col_2#3, col#4] + | | +-expr_list= + | | | +-extra_col_2#3 := Literal(type=INT64, value=3) + | | | +-col#4 := Literal(type=INT64, value=4) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=[$union_all2.col#4] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#5 AS col [INT64] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[col#1, extra_col_1#2] - | | +-expr_list= - | | | +-col#1 := Literal(type=INT64, value=1) - | | | +-extra_col_1#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#4] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[extra_col_2#3, col#4] - | +-expr_list= - | | +-extra_col_2#3 := Literal(type=INT64, value=3) - | | +-col#4 := Literal(type=INT64, value=4) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.col#4] == # CORRESPONDING: column name comparison is case-insensitive. @@ -487,49 +417,19 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$intersect_all2.[Col2#3, cOL1#4] - | | +-expr_list= - | | | +-Col2#3 := Literal(type=INT64, value=3) - | | | +-cOL1#4 := Literal(type=INT64, value=4) + | | +-column_list=$intersect_all2.[cOL1#4, Col2#3] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$intersect_all2.[Col2#3, cOL1#4] + | | +-expr_list= + | | | +-Col2#3 := Literal(type=INT64, value=3) + | | | +-cOL1#4 := Literal(type=INT64, value=4) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$intersect_all2.[cOL1#4, Col2#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.cOl1#5 AS cOl1 [INT64] -| +-$intersect_all.CoL2#6 AS CoL2 [INT64] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[cOl1#5, CoL2#6] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$intersect_all1.[cOl1#1, CoL2#2] - | | +-expr_list= - | | | +-cOl1#1 := Literal(type=INT64, value=1) - | | | +-CoL2#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$intersect_all1.[cOl1#1, CoL2#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$intersect_all2.[cOL1#4, Col2#3] - | +-input_scan= - | +-ProjectScan - | +-column_list=$intersect_all2.[Col2#3, cOL1#4] - | +-expr_list= - | | +-Col2#3 := Literal(type=INT64, value=3) - | | +-cOL1#4 := Literal(type=INT64, value=4) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$intersect_all2.[cOL1#4, Col2#3] == # CORRESPONDING: duplicate columns are not allowed. @@ -688,70 +588,26 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2_cast.[float#41, float#42] - | | +-expr_list= - | | | +-float#41 := - | | | | +-Cast(FLOAT -> DOUBLE) - | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) - | | | +-float#42 := - | | | +-Cast(FLOAT -> DOUBLE) - | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) + | | +-column_list=$union_all2_cast.[float#42, float#41] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[float#26, float#26] + | | +-column_list=$union_all2_cast.[float#41, float#42] + | | +-expr_list= + | | | +-float#41 := + | | | | +-Cast(FLOAT -> DOUBLE) + | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) + | | | +-float#42 := + | | | +-Cast(FLOAT -> DOUBLE) + | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.float#26], table=SimpleTypes, column_index_list=[7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[float#26, float#26] + | | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.float#26], table=SimpleTypes, column_index_list=[7]) | +-output_column_list=$union_all2_cast.[float#42, float#41] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col1#37 AS col1 [DOUBLE] -| +-$union_all.col2#38 AS col2 [DOUBLE] -+-query= - +-SetOperationScan - +-column_list=$union_all.[col1#37, col2#38] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1_cast.[int32#39, int32#40] - | | +-expr_list= - | | | +-int32#39 := - | | | | +-Cast(INT32 -> DOUBLE) - | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | | +-int32#40 := - | | | +-Cast(INT32 -> DOUBLE) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#1, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int32#1], table=SimpleTypes, column_index_list=[0]) - | +-output_column_list=$union_all1_cast.[int32#39, int32#40] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2_cast.[float#42, float#41] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2_cast.[float#41, float#42] - | +-expr_list= - | | +-float#41 := - | | | +-Cast(FLOAT -> DOUBLE) - | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) - | | +-float#42 := - | | +-Cast(FLOAT -> DOUBLE) - | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[float#26, float#26] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.float#26], table=SimpleTypes, column_index_list=[7]) - +-output_column_list=$union_all2_cast.[float#42, float#41] == # CORRESPONDING: same column with and without alias. @@ -770,77 +626,41 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=[$union_all1_cast.int32#38, SimpleTypes.int32#1] - | | | +-expr_list= - | | | | +-int32#38 := - | | | | +-Cast(INT32 -> DOUBLE) - | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | | +-column_list=[$union_all1_cast.int32#38] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int32#1, int32#1] + | | | +-column_list=[$union_all1_cast.int32#38, SimpleTypes.int32#1] + | | | +-expr_list= + | | | | +-int32#38 := + | | | | +-Cast(INT32 -> DOUBLE) + | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) | | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.int32#1], table=SimpleTypes, column_index_list=[0]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[int32#1, int32#1] + | | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int32#1], table=SimpleTypes, column_index_list=[0]) | | +-output_column_list=[$union_all1_cast.int32#38] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=[SimpleTypes.float#26, $union_all2_cast.float#39] - | | +-expr_list= - | | | +-float#39 := - | | | +-Cast(FLOAT -> DOUBLE) - | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) + | | +-column_list=[$union_all2_cast.float#39] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[float#26, float#26] + | | +-column_list=[SimpleTypes.float#26, $union_all2_cast.float#39] + | | +-expr_list= + | | | +-float#39 := + | | | +-Cast(FLOAT -> DOUBLE) + | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.float#26], table=SimpleTypes, column_index_list=[7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[float#26, float#26] + | | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.float#26], table=SimpleTypes, column_index_list=[7]) | +-output_column_list=[$union_all2_cast.float#39] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#37 AS col [DOUBLE] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#37] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.int32#38] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.int32#38, SimpleTypes.int32#1] - | | +-expr_list= - | | | +-int32#38 := - | | | +-Cast(INT32 -> DOUBLE) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#1, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int32#1], table=SimpleTypes, column_index_list=[0]) - | +-output_column_list=[$union_all1_cast.int32#38] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.float#39] - | +-input_scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.float#26, $union_all2_cast.float#39] - | +-expr_list= - | | +-float#39 := - | | +-Cast(FLOAT -> DOUBLE) - | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[float#26, float#26] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.float#26], table=SimpleTypes, column_index_list=[7]) - +-output_column_list=[$union_all2_cast.float#39] == # CORRESPONDING: Nested operations. @@ -865,12 +685,16 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=$union_all1.[col#1, extra_col#2] - | | | +-expr_list= - | | | | +-col#1 := Literal(type=STRING, value="abc") - | | | | +-extra_col#2 := Literal(type=STRING, value="bcd") + | | | +-column_list=[$union_all1.col#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-SingleRowScan + | | | +-ProjectScan + | | | +-column_list=$union_all1.[col#1, extra_col#2] + | | | +-expr_list= + | | | | +-col#1 := Literal(type=STRING, value="abc") + | | | | +-extra_col#2 := Literal(type=STRING, value="bcd") + | | | +-input_scan= + | | | +-SingleRowScan | | +-output_column_list=[$union_all1.col#1] | +-SetOperationItem | +-scan= @@ -883,41 +707,6 @@ QueryStmt | +-output_column_list=[$union_all2.col#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#4 AS col [STRING] -+-query= - +-ProjectScan - +-column_list=[$union_all.col#4] - +-input_scan= - +-SetOperationScan - +-column_list=[$union_all.col#4] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[col#1, extra_col#2] - | | +-expr_list= - | | | +-col#1 := Literal(type=STRING, value="abc") - | | | +-extra_col#2 := Literal(type=STRING, value="bcd") - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#3] - | +-expr_list= - | | +-col#3 := Literal(type=STRING, value="def") - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.col#3] == # CORRESPONDING: no common supertype. @@ -947,87 +736,46 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=[$union_all1.a#1, $union_all1_cast.v1#6] - | | | +-expr_list= - | | | | +-v1#6 := - | | | | +-Cast(INT64 -> DOUBLE) - | | | | +-ColumnRef(type=INT64, column=$union_all1.v1#2) + | | | +-column_list=[$union_all1_cast.v1#6] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= | | | +-ProjectScan - | | | +-column_list=$union_all1.[a#1, v1#2] + | | | +-column_list=[$union_all1.a#1, $union_all1_cast.v1#6] | | | +-expr_list= - | | | | +-a#1 := Literal(type=STRING, value="a") - | | | | +-v1#2 := - | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) - | | | | +-Literal(type=INT64, value=1) - | | | | +-Literal(type=INT64, value=1) + | | | | +-v1#6 := + | | | | +-Cast(INT64 -> DOUBLE) + | | | | +-ColumnRef(type=INT64, column=$union_all1.v1#2) | | | +-input_scan= - | | | +-SingleRowScan + | | | +-ProjectScan + | | | +-column_list=$union_all1.[a#1, v1#2] + | | | +-expr_list= + | | | | +-a#1 := Literal(type=STRING, value="a") + | | | | +-v1#2 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-Literal(type=INT64, value=1) + | | | | +-Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-SingleRowScan | | +-output_column_list=[$union_all1_cast.v1#6] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[b#3, v1#4] - | | +-expr_list= - | | | +-b#3 := Literal(type=STRING, value="a") - | | | +-v1#4 := - | | | +-FunctionCall(ZetaSQL:$add(DOUBLE, DOUBLE) -> DOUBLE) - | | | +-Literal(type=DOUBLE, value=1) - | | | +-Literal(type=DOUBLE, value=1) + | | +-column_list=[$union_all2.v1#4] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[b#3, v1#4] + | | +-expr_list= + | | | +-b#3 := Literal(type=STRING, value="a") + | | | +-v1#4 := + | | | +-FunctionCall(ZetaSQL:$add(DOUBLE, DOUBLE) -> DOUBLE) + | | | +-Literal(type=DOUBLE, value=1) + | | | +-Literal(type=DOUBLE, value=1) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=[$union_all2.v1#4] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.v1#5 AS v1 [DOUBLE] -+-query= - +-SetOperationScan - +-column_list=[$union_all.v1#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.v1#6] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1, $union_all1_cast.v1#6] - | | +-expr_list= - | | | +-v1#6 := - | | | +-Cast(INT64 -> DOUBLE) - | | | +-ColumnRef(type=INT64, column=$union_all1.v1#2) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, v1#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=STRING, value="a") - | | | +-v1#2 := - | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) - | | | +-Literal(type=INT64, value=1) - | | | +-Literal(type=INT64, value=1) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.v1#6] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.v1#4] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[b#3, v1#4] - | +-expr_list= - | | +-b#3 := Literal(type=STRING, value="a") - | | +-v1#4 := - | | +-FunctionCall(ZetaSQL:$add(DOUBLE, DOUBLE) -> DOUBLE) - | | +-Literal(type=DOUBLE, value=1) - | | +-Literal(type=DOUBLE, value=1) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.v1#4] == # CORRESPONDING: Extra columns are omitted from table scan. @@ -1051,66 +799,38 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[uint32#3, float#8] + | | | +-column_list=[SimpleTypes.uint32#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[uint32#3, float#8], table=SimpleTypes, column_index_list=[2, 7]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[uint32#3, float#8] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[uint32#3, float#8], table=SimpleTypes, column_index_list=[2, 7]) | | +-output_column_list=[SimpleTypes.uint32#3] | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[uint32#21, json#36] + | | | +-column_list=[SimpleTypes.uint32#21] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[uint32#21, json#36], table=SimpleTypes, column_index_list=[2, 17]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[uint32#21, json#36] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[uint32#21, json#36], table=SimpleTypes, column_index_list=[2, 17]) | | +-output_column_list=[SimpleTypes.uint32#21] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[json#54, uint32#39] + | | +-column_list=[SimpleTypes.uint32#39] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[uint32#39, json#54], table=SimpleTypes, column_index_list=[2, 17]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[json#54, uint32#39] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[uint32#39, json#54], table=SimpleTypes, column_index_list=[2, 17]) | +-output_column_list=[SimpleTypes.uint32#39] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.uint32#55 AS uint32 [UINT32] -+-query= - +-SetOperationScan - +-column_list=[$union_all.uint32#55] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.uint32#3] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[uint32#3, float#8] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[uint32#3, float#8], table=SimpleTypes, column_index_list=[2, 7]) - | +-output_column_list=[SimpleTypes.uint32#3] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.uint32#21] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[uint32#21, json#36] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[uint32#21, json#36], table=SimpleTypes, column_index_list=[2, 17]) - | +-output_column_list=[SimpleTypes.uint32#21] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.uint32#39] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[json#54, uint32#39] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[uint32#39, json#54], table=SimpleTypes, column_index_list=[2, 17]) - +-output_column_list=[SimpleTypes.uint32#39] == # CORRESPONDING: Set operation with CORRESPONDING as an input item for another @@ -1141,16 +861,24 @@ QueryStmt | | | +-SetOperationItem | | | | +-scan= | | | | | +-ProjectScan - | | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] + | | | | | +-column_list=SimpleTypes.[int32#1, int64#2] + | | | | | +-node_source="resolver_set_operation_corresponding" | | | | | +-input_scan= - | | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | | | | +-ProjectScan + | | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] + | | | | | +-input_scan= + | | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) | | | | +-output_column_list=SimpleTypes.[int32#1, int64#2] | | | +-SetOperationItem | | | +-scan= | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | | | +-column_list=SimpleTypes.[int32#19, int64#20] + | | | | +-node_source="resolver_set_operation_corresponding" | | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) + | | | | +-ProjectScan + | | | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | | | +-input_scan= + | | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) | | | +-output_column_list=SimpleTypes.[int32#19, int64#20] | | +-column_match_mode=CORRESPONDING | | +-column_propagation_mode=INNER @@ -1169,59 +897,7 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[SimpleTypes.int32#39], table=SimpleTypes, column_index_list=[0]) +-output_column_list=[SimpleTypes.int32#39, $union_all2_cast.int32#59] - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int32#57 AS int32 [INT32] -| +-$union_all.int64#58 AS int64 [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int32#57, int64#58] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-SetOperationScan - | | +-column_list=$union_all.[int32#37, int64#38] - | | +-op_type=UNION_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[int32#1, int64#2] - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] - | | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) - | | | +-output_column_list=SimpleTypes.[int32#1, int64#2] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int32#19, int64#20] - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#20, int32#19] - | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - | | +-output_column_list=SimpleTypes.[int32#19, int64#20] - | +-output_column_list=$union_all.[int32#37, int64#38] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int32#39, $union_all2_cast.int32#59] - | +-expr_list= - | | +-int32#59 := - | | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#39) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#39, int32#39] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.int32#39], table=SimpleTypes, column_index_list=[0]) - +-output_column_list=[SimpleTypes.int32#39, $union_all2_cast.int32#59] -== +== # CORRESPONDING: Set operation with CORRESPONDING as an input item for another # set operation: CORREPSPONDING in CORRESPONDING. @@ -1243,86 +919,52 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-SetOperationScan - | | | +-column_list=$union_all.[int32#37, int64#38] - | | | +-op_type=UNION_ALL - | | | +-input_item_list= - | | | | +-SetOperationItem - | | | | | +-scan= - | | | | | | +-ProjectScan - | | | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] - | | | | | | +-input_scan= - | | | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) - | | | | | +-output_column_list=SimpleTypes.[int32#1, int64#2] - | | | | +-SetOperationItem - | | | | +-scan= - | | | | | +-ProjectScan - | | | | | +-column_list=SimpleTypes.[int64#20, int32#19] - | | | | | +-input_scan= - | | | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - | | | | +-output_column_list=SimpleTypes.[int32#19, int64#20] - | | | +-column_match_mode=CORRESPONDING - | | | +-column_propagation_mode=INNER + | | | +-ProjectScan + | | | +-column_list=[$union_all.int32#37] + | | | +-node_source="resolver_set_operation_corresponding" + | | | +-input_scan= + | | | +-SetOperationScan + | | | +-column_list=$union_all.[int32#37, int64#38] + | | | +-op_type=UNION_ALL + | | | +-input_item_list= + | | | | +-SetOperationItem + | | | | | +-scan= + | | | | | | +-ProjectScan + | | | | | | +-column_list=SimpleTypes.[int32#1, int64#2] + | | | | | | +-node_source="resolver_set_operation_corresponding" + | | | | | | +-input_scan= + | | | | | | +-ProjectScan + | | | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] + | | | | | | +-input_scan= + | | | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | | | | +-output_column_list=SimpleTypes.[int32#1, int64#2] + | | | | +-SetOperationItem + | | | | +-scan= + | | | | | +-ProjectScan + | | | | | +-column_list=SimpleTypes.[int32#19, int64#20] + | | | | | +-node_source="resolver_set_operation_corresponding" + | | | | | +-input_scan= + | | | | | +-ProjectScan + | | | | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | | | | +-input_scan= + | | | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) + | | | | +-output_column_list=SimpleTypes.[int32#19, int64#20] + | | | +-column_match_mode=CORRESPONDING + | | | +-column_propagation_mode=INNER | | +-output_column_list=[$union_all.int32#37] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#39, float#46] + | | +-column_list=[SimpleTypes.int32#39] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#39, float#46], table=SimpleTypes, column_index_list=[0, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#39, float#46] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#39, float#46], table=SimpleTypes, column_index_list=[0, 7]) | +-output_column_list=[SimpleTypes.int32#39] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int32#57 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=[$union_all.int32#57] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all.int32#37] - | | +-input_scan= - | | +-SetOperationScan - | | +-column_list=$union_all.[int32#37, int64#38] - | | +-op_type=UNION_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[int32#1, int64#2] - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] - | | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) - | | | +-output_column_list=SimpleTypes.[int32#1, int64#2] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int32#19, int64#20] - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#20, int32#19] - | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - | | +-output_column_list=SimpleTypes.[int32#19, int64#20] - | +-output_column_list=[$union_all.int32#37] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int32#39] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#39, float#46] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#39, float#46], table=SimpleTypes, column_index_list=[0, 7]) - +-output_column_list=[SimpleTypes.int32#39] == # CORRESPONDING: Set operation as an input item for another set operation: @@ -1346,44 +988,48 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-SetOperationScan - | | | +-column_list=$union_all.[int32#37, int64#38, float#39] - | | | +-op_type=UNION_ALL - | | | +-input_item_list= - | | | +-SetOperationItem - | | | | +-scan= - | | | | | +-ProjectScan - | | | | | +-column_list=[$union_all1_cast.int32#40, SimpleTypes.int64#2, $union_all1_cast.float#41] - | | | | | +-expr_list= - | | | | | | +-int32#40 := - | | | | | | | +-Cast(INT32 -> INT64) - | | | | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | | | | | +-float#41 := - | | | | | | +-Cast(FLOAT -> DOUBLE) - | | | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) - | | | | | +-input_scan= - | | | | | +-ProjectScan - | | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] - | | | | | +-input_scan= - | | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) - | | | | +-output_column_list=[$union_all1_cast.int32#40, SimpleTypes.int64#2, $union_all1_cast.float#41] - | | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, $union_all2_cast.int64#43] - | | | | +-expr_list= - | | | | | +-int32#42 := - | | | | | | +-Cast(INT32 -> INT64) - | | | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | | | | +-int64#43 := - | | | | | +-Cast(INT64 -> DOUBLE) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#20) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[int64#20, int32#19, int64#20] - | | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - | | | +-output_column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, $union_all2_cast.int64#43] + | | | +-ProjectScan + | | | +-column_list=$union_all.[int32#37, float#39] + | | | +-node_source="resolver_set_operation_corresponding" + | | | +-input_scan= + | | | +-SetOperationScan + | | | +-column_list=$union_all.[int32#37, int64#38, float#39] + | | | +-op_type=UNION_ALL + | | | +-input_item_list= + | | | +-SetOperationItem + | | | | +-scan= + | | | | | +-ProjectScan + | | | | | +-column_list=[$union_all1_cast.int32#40, SimpleTypes.int64#2, $union_all1_cast.float#41] + | | | | | +-expr_list= + | | | | | | +-int32#40 := + | | | | | | | +-Cast(INT32 -> INT64) + | | | | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | | | | | +-float#41 := + | | | | | | +-Cast(FLOAT -> DOUBLE) + | | | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) + | | | | | +-input_scan= + | | | | | +-ProjectScan + | | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] + | | | | | +-input_scan= + | | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | | | +-output_column_list=[$union_all1_cast.int32#40, SimpleTypes.int64#2, $union_all1_cast.float#41] + | | | +-SetOperationItem + | | | +-scan= + | | | | +-ProjectScan + | | | | +-column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, $union_all2_cast.int64#43] + | | | | +-expr_list= + | | | | | +-int32#42 := + | | | | | | +-Cast(INT32 -> INT64) + | | | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) + | | | | | +-int64#43 := + | | | | | +-Cast(INT64 -> DOUBLE) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#20) + | | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=SimpleTypes.[int64#20, int32#19, int64#20] + | | | | +-input_scan= + | | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) + | | | +-output_column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, $union_all2_cast.int64#43] | | +-output_column_list=$union_all.[int32#37, float#39] | +-SetOperationItem | +-scan= @@ -1404,78 +1050,6 @@ QueryStmt | +-output_column_list=$union_all2_cast.[int32#64, float#65] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int32#62 AS int32 [INT64] -| +-$union_all.float#63 AS float [DOUBLE] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int32#62, float#63] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all.[int32#37, float#39] - | | +-input_scan= - | | +-SetOperationScan - | | +-column_list=$union_all.[int32#37, int64#38, float#39] - | | +-op_type=UNION_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[$union_all1_cast.int32#40, SimpleTypes.int64#2, $union_all1_cast.float#41] - | | | | +-expr_list= - | | | | | +-int32#40 := - | | | | | | +-Cast(INT32 -> INT64) - | | | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | | | | +-float#41 := - | | | | | +-Cast(FLOAT -> DOUBLE) - | | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[int32#1, int64#2, float#8] - | | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, float#8], table=SimpleTypes, column_index_list=[0, 1, 7]) - | | | +-output_column_list=[$union_all1_cast.int32#40, SimpleTypes.int64#2, $union_all1_cast.float#41] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, $union_all2_cast.int64#43] - | | | +-expr_list= - | | | | +-int32#42 := - | | | | | +-Cast(INT32 -> INT64) - | | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | | | +-int64#43 := - | | | | +-Cast(INT64 -> DOUBLE) - | | | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#20) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#20, int32#19, int64#20] - | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - | | +-output_column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, $union_all2_cast.int64#43] - | +-output_column_list=$union_all.[int32#37, float#39] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2_cast.[int32#64, float#65] - | +-expr_list= - | | +-int32#64 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#44) - | | +-float#65 := - | | +-Cast(FLOAT -> DOUBLE) - | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#51) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#44, float#51] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#44, float#51], table=SimpleTypes, column_index_list=[0, 7]) - +-output_column_list=$union_all2_cast.[int32#64, float#65] == # CORRESPONDING: The edge case of SELECT DISTINCT is handled correctly despite @@ -1499,78 +1073,41 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[a#3, b#4] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#3, b#4, a#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$subquery1.[a#1, b#2] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[a#3, b#4] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$subquery1.[a#1, b#2] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) | | +-output_column_list=$distinct.[a#3, b#4, a#3] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[col3#5, col1#6, col2#7] - | | +-expr_list= - | | | +-col3#5 := Literal(type=INT64, value=1) - | | | +-col1#6 := Literal(type=INT64, value=1) - | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-column_list=$union_all2.[col1#6, col2#7, col3#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[col3#5, col1#6, col2#7] + | | +-expr_list= + | | | +-col3#5 := Literal(type=INT64, value=1) + | | | +-col1#6 := Literal(type=INT64, value=1) + | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[col1#6, col2#7, col3#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col1#8 AS col1 [INT64] -| +-$union_all.col2#9 AS col2 [INT64] -| +-$union_all.col3#10 AS col3 [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[col1#8, col2#9, col3#10] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$distinct.[a#3, b#4, a#3] - | | +-input_scan= - | | +-AggregateScan - | | +-column_list=$distinct.[a#3, b#4] - | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$subquery1.[a#1, b#2] - | | | +-expr_list= - | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | +-b#2 := Literal(type=INT64, value=2) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-group_by_list= - | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) - | +-output_column_list=$distinct.[a#3, b#4, a#3] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[col1#6, col2#7, col3#5] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[col3#5, col1#6, col2#7] - | +-expr_list= - | | +-col3#5 := Literal(type=INT64, value=1) - | | +-col1#6 := Literal(type=INT64, value=1) - | | +-col2#7 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[col1#6, col2#7, col3#5] == # CORRESPONDING: Union two of the same proto and proto + string literal. @@ -1597,54 +1134,23 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=[TestTable.KitchenSink#6, $union_all2_cast.col#10] - | | +-expr_list= - | | | +-col#10 := Literal(type=PROTO, value={int64_key_1: 1 int64_key_2: 2}) + | | +-column_list=[$union_all2_cast.col#10, TestTable.KitchenSink#6] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=[TestTable.KitchenSink#6, $union_all2.col#7] + | | +-column_list=[TestTable.KitchenSink#6, $union_all2_cast.col#10] | | +-expr_list= - | | | +-col#7 := Literal(type=STRING, value="int64_key_1: 1, int64_key_2: 2") + | | | +-col#10 := Literal(type=PROTO, value={int64_key_1: 1 int64_key_2: 2}) | | +-input_scan= - | | +-TableScan(column_list=[TestTable.KitchenSink#6], table=TestTable, column_index_list=[2]) + | | +-ProjectScan + | | +-column_list=[TestTable.KitchenSink#6, $union_all2.col#7] + | | +-expr_list= + | | | +-col#7 := Literal(type=STRING, value="int64_key_1: 1, int64_key_2: 2") + | | +-input_scan= + | | +-TableScan(column_list=[TestTable.KitchenSink#6], table=TestTable, column_index_list=[2]) | +-output_column_list=[$union_all2_cast.col#10, TestTable.KitchenSink#6] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#8 AS col [PROTO] -| +-$union_all.KitchenSink#9 AS KitchenSink [PROTO] -+-query= - +-SetOperationScan - +-column_list=$union_all.[col#8, KitchenSink#9] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=TestTable.[KitchenSink#3, KitchenSink#3] - | | +-input_scan= - | | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) - | +-output_column_list=TestTable.[KitchenSink#3, KitchenSink#3] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.col#10, TestTable.KitchenSink#6] - | +-input_scan= - | +-ProjectScan - | +-column_list=[TestTable.KitchenSink#6, $union_all2_cast.col#10] - | +-expr_list= - | | +-col#10 := Literal(type=PROTO, value={int64_key_1: 1 int64_key_2: 2}) - | +-input_scan= - | +-ProjectScan - | +-column_list=[TestTable.KitchenSink#6, $union_all2.col#7] - | +-expr_list= - | | +-col#7 := Literal(type=STRING, value="int64_key_1: 1, int64_key_2: 2") - | +-input_scan= - | +-TableScan(column_list=[TestTable.KitchenSink#6], table=TestTable, column_index_list=[2]) - +-output_column_list=[$union_all2_cast.col#10, TestTable.KitchenSink#6] == # CORRESPONDING: Union two different protos. @@ -1708,65 +1214,23 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=[SimpleTypes.string#23, $union_all2_cast.int32#42, SimpleTypes.int64#20] - | | +-expr_list= - | | | +-int32#42 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) + | | +-column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, SimpleTypes.string#23] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[string#23, int32#19, int64#20] + | | +-column_list=[SimpleTypes.string#23, $union_all2_cast.int32#42, SimpleTypes.int64#20] + | | +-expr_list= + | | | +-int32#42 := + | | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, string#23], table=SimpleTypes, column_index_list=[0, 1, 4], alias="s2") + | | +-ProjectScan + | | +-column_list=SimpleTypes.[string#23, int32#19, int64#20] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, string#23], table=SimpleTypes, column_index_list=[0, 1, 4], alias="s2") | +-output_column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, SimpleTypes.string#23] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col1#37 AS col1 [INT64] -| +-$union_all.col2#38 AS col2 [INT64] -| +-$union_all.col3#39 AS col3 [STRING] -+-query= - +-SetOperationScan - +-column_list=$union_all.[col1#37, col2#38, col3#39] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.int32#40, $union_all1_cast.uint32#41, SimpleTypes.string#5] - | | +-expr_list= - | | | +-int32#40 := - | | | | +-Cast(INT32 -> INT64) - | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | | +-uint32#41 := - | | | +-Cast(UINT32 -> INT64) - | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#3) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#1, uint32#3, string#5] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, uint32#3, string#5], table=SimpleTypes, column_index_list=[0, 2, 4], alias="s1") - | +-output_column_list=[$union_all1_cast.int32#40, $union_all1_cast.uint32#41, SimpleTypes.string#5] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, SimpleTypes.string#23] - | +-input_scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.string#23, $union_all2_cast.int32#42, SimpleTypes.int64#20] - | +-expr_list= - | | +-int32#42 := - | | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[string#23, int32#19, int64#20] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, string#23], table=SimpleTypes, column_index_list=[0, 1, 4], alias="s2") - +-output_column_list=[SimpleTypes.int64#20, $union_all2_cast.int32#42, SimpleTypes.string#23] == # CORRESPONDING: Nested operations output column names are ordered according to the first query. @@ -1802,52 +1266,19 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[col2#3, col1#4] - | | +-expr_list= - | | | +-col2#3 := Literal(type=INT64, value=1) - | | | +-col1#4 := Literal(type=INT64, value=2) + | | +-column_list=$union_all2.[col1#4, col2#3] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[col2#3, col1#4] + | | +-expr_list= + | | | +-col2#3 := Literal(type=INT64, value=1) + | | | +-col1#4 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[col1#4, col2#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col1#5 AS col1 [INT64] -| +-$union_all.col2#6 AS col2 [INT64] -+-query= - +-ProjectScan - +-column_list=$union_all.[col1#5, col2#6] - +-input_scan= - +-SetOperationScan - +-column_list=$union_all.[col1#5, col2#6] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[col1#1, col2#2] - | | +-expr_list= - | | | +-col1#1 := Literal(type=INT64, value=1) - | | | +-col2#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[col1#1, col2#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[col1#4, col2#3] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[col2#3, col1#4] - | +-expr_list= - | | +-col2#3 := Literal(type=INT64, value=1) - | | +-col1#4 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[col1#4, col2#3] == # CORRESPONDING: coercion with NULL. @@ -1886,37 +1317,6 @@ QueryStmt | +-output_column_list=[SimpleTypes.timestamp#34] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.timestamp#38 AS timestamp [TIMESTAMP] -+-query= - +-SetOperationScan - +-column_list=[$union_all.timestamp#38] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.timestamp#39] - | | +-expr_list= - | | | +-timestamp#39 := Literal(type=TIMESTAMP, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.timestamp#19] - | | +-expr_list= - | | | +-timestamp#19 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-TableScan(table=SimpleTypes) - | +-output_column_list=[$union_all1_cast.timestamp#39] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.timestamp#34] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.timestamp#34], table=SimpleTypes, column_index_list=[14]) - +-output_column_list=[SimpleTypes.timestamp#34] == # CORRESPONDING: Union of two identical structs works. @@ -1995,76 +1395,6 @@ QueryStmt | +-output_column_list=[$union_all2.col#8] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#9 AS col [STRUCT] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#9] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#4] - | | +-expr_list= - | | | +-col#4 := - | | | +-SubqueryExpr - | | | +-type=STRUCT - | | | +-subquery_type=SCALAR - | | | +-subquery= - | | | +-ProjectScan - | | | +-column_list=[$make_struct.$struct#3] - | | | +-expr_list= - | | | | +-$struct#3 := - | | | | +-MakeStruct - | | | | +-type=STRUCT - | | | | +-field_list= - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.a#1) - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.bbB#2) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$expr_subquery.[a#1, bbB#2] - | | | +-expr_list= - | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | +-bbB#2 := Literal(type=INT64, value=2) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#4] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#8] - | +-expr_list= - | | +-col#8 := - | | +-SubqueryExpr - | | +-type=STRUCT - | | +-subquery_type=SCALAR - | | +-subquery= - | | +-ProjectScan - | | +-column_list=[$make_struct.$struct#7] - | | +-expr_list= - | | | +-$struct#7 := - | | | +-MakeStruct - | | | +-type=STRUCT - | | | +-field_list= - | | | +-ColumnRef(type=INT64, column=$expr_subquery.a#5) - | | | +-ColumnRef(type=INT64, column=$expr_subquery.Bbb#6) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$expr_subquery.[a#5, Bbb#6] - | | +-expr_list= - | | | +-a#5 := Literal(type=INT64, value=3) - | | | +-Bbb#6 := Literal(type=INT64, value=4) - | | +-input_scan= - | | +-SingleRowScan - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.col#8] == # CORRESPONDING: Union of two STRUCTs with different column/field names. @@ -2151,84 +1481,7 @@ QueryStmt | +-output_column_list=[$union_all2_cast.foo#10] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.foo#9 AS foo [STRUCT] -+-query= - +-SetOperationScan - +-column_list=[$union_all.foo#9] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.foo#4] - | | +-expr_list= - | | | +-foo#4 := - | | | +-SubqueryExpr - | | | +-type=STRUCT - | | | +-subquery_type=SCALAR - | | | +-subquery= - | | | +-ProjectScan - | | | +-column_list=[$make_struct.$struct#3] - | | | +-expr_list= - | | | | +-$struct#3 := - | | | | +-MakeStruct - | | | | +-type=STRUCT - | | | | +-field_list= - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.a#1) - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.b#2) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$expr_subquery.[a#1, b#2] - | | | +-expr_list= - | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | +-b#2 := Literal(type=INT64, value=2) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.foo#4] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.foo#10] - | +-expr_list= - | | +-foo#10 := - | | +-Cast(STRUCT -> STRUCT) - | | +-ColumnRef(type=STRUCT, column=$union_all2.foo#8) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.foo#8] - | +-expr_list= - | | +-foo#8 := - | | +-SubqueryExpr - | | +-type=STRUCT - | | +-subquery_type=SCALAR - | | +-subquery= - | | +-ProjectScan - | | +-column_list=[$make_struct.$struct#7] - | | +-expr_list= - | | | +-$struct#7 := - | | | +-MakeStruct - | | | +-type=STRUCT - | | | +-field_list= - | | | +-ColumnRef(type=INT64, column=$expr_subquery.b#5) - | | | +-ColumnRef(type=INT64, column=$expr_subquery.c#6) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$expr_subquery.[b#5, c#6] - | | +-expr_list= - | | | +-b#5 := Literal(type=INT64, value=3) - | | | +-c#6 := Literal(type=INT64, value=4) - | | +-input_scan= - | | +-SingleRowScan - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2_cast.foo#10] -== +== # CORRESPONDING: UNION of a STRUCT value table and non-value table not allowed. select AS STRUCT 1 a, 2 b @@ -2332,83 +1585,6 @@ QueryStmt | +-output_column_list=[$union_all2.col#8] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#9 AS col [STRUCT] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#9] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.col#10] - | | +-expr_list= - | | | +-col#10 := - | | | +-Cast(STRUCT -> STRUCT) - | | | +-ColumnRef(type=STRUCT, column=$union_all1.col#4) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#4] - | | +-expr_list= - | | | +-col#4 := - | | | +-SubqueryExpr - | | | +-type=STRUCT - | | | +-subquery_type=SCALAR - | | | +-subquery= - | | | +-ProjectScan - | | | +-column_list=[$make_struct.$struct#3] - | | | +-expr_list= - | | | | +-$struct#3 := - | | | | +-MakeStruct - | | | | +-type=STRUCT - | | | | +-field_list= - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.a#1) - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.b#2) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$expr_subquery.[a#1, b#2] - | | | +-expr_list= - | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | +-b#2 := Literal(type=INT64, value=2) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.col#10] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#8] - | +-expr_list= - | | +-col#8 := - | | +-SubqueryExpr - | | +-type=STRUCT - | | +-subquery_type=SCALAR - | | +-subquery= - | | +-ProjectScan - | | +-column_list=[$make_struct.$struct#7] - | | +-expr_list= - | | | +-$struct#7 := - | | | +-MakeStruct - | | | +-type=STRUCT - | | | +-field_list= - | | | +-ColumnRef(type=INT64, column=$expr_subquery.a#5) - | | | +-ColumnRef(type=DOUBLE, column=$expr_subquery.b#6) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$expr_subquery.[a#5, b#6] - | | +-expr_list= - | | | +-a#5 := Literal(type=INT64, value=3) - | | | +-b#6 := Literal(type=DOUBLE, value=4.5) - | | +-input_scan= - | | +-SingleRowScan - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.col#8] == # CORRESPONDING: Union of two non-coercible struct types. @@ -2504,83 +1680,6 @@ QueryStmt | +-output_column_list=[$union_all2_cast.col#10] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#9 AS col [STRUCT] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#9] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#4] - | | +-expr_list= - | | | +-col#4 := - | | | +-SubqueryExpr - | | | +-type=STRUCT - | | | +-subquery_type=SCALAR - | | | +-subquery= - | | | +-ProjectScan - | | | +-column_list=[$make_struct.$struct#3] - | | | +-expr_list= - | | | | +-$struct#3 := - | | | | +-MakeStruct - | | | | +-type=STRUCT - | | | | +-field_list= - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.$col1#1) - | | | | +-ColumnRef(type=INT64, column=$expr_subquery.$col2#2) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$expr_subquery.[$col1#1, $col2#2] - | | | +-expr_list= - | | | | +-$col1#1 := Literal(type=INT64, value=1) - | | | | +-$col2#2 := Literal(type=INT64, value=2) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#4] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.col#10] - | +-expr_list= - | | +-col#10 := - | | +-Cast(STRUCT -> STRUCT) - | | +-ColumnRef(type=STRUCT, column=$union_all2.col#8) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#8] - | +-expr_list= - | | +-col#8 := - | | +-SubqueryExpr - | | +-type=STRUCT - | | +-subquery_type=SCALAR - | | +-subquery= - | | +-ProjectScan - | | +-column_list=[$make_struct.$struct#7] - | | +-expr_list= - | | | +-$struct#7 := - | | | +-MakeStruct - | | | +-type=STRUCT - | | | +-field_list= - | | | +-ColumnRef(type=INT64, column=$expr_subquery.a#5) - | | | +-ColumnRef(type=INT64, column=$expr_subquery.b#6) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$expr_subquery.[a#5, b#6] - | | +-expr_list= - | | | +-a#5 := Literal(type=INT64, value=3) - | | | +-b#6 := Literal(type=INT64, value=4) - | | +-input_scan= - | | +-SingleRowScan - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2_cast.col#10] == # CORRESPONDING: UNION of struct parameters. @@ -2633,51 +1732,6 @@ QueryStmt | +-output_column_list=[$union_all2_cast.col#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#3 AS col [STRUCT] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#3] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.col#4] - | | +-expr_list= - | | | +-col#4 := Literal(type=STRUCT, value={1, "abc"}) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#1] - | | +-expr_list= - | | | +-col#1 := Literal(type=STRUCT, value={1, "abc"}) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.col#4] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.col#5] - | +-expr_list= - | | +-col#5 := - | | +-Cast(STRUCT -> STRUCT) - | | +-ColumnRef(type=STRUCT, column=$union_all2.col#2) - | +-input_scan= - | +-LimitOffsetScan - | +-column_list=[$union_all2.col#2] - | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all2.col#2] - | | +-expr_list= - | | | +-col#2 := Parameter(type=STRUCT, name="test_param_struct") - | | +-input_scan= - | | +-SingleRowScan - | +-limit= - | +-Literal(type=INT64, value=1) - +-output_column_list=[$union_all2_cast.col#5] == # CORRESPONDING: with a group by. @@ -2715,35 +1769,6 @@ QueryStmt | +-output_column_list=[$groupby.uint64#37] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.uint64#38 AS uint64 [UINT64] -+-query= - +-SetOperationScan - +-column_list=[$union_all.uint64#38] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.uint64#4] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.uint64#4], table=SimpleTypes, column_index_list=[3]) - | +-output_column_list=[SimpleTypes.uint64#4] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$groupby.uint64#37] - | +-input_scan= - | +-AggregateScan - | +-column_list=[$groupby.uint64#37] - | +-input_scan= - | | +-TableScan(table=SimpleTypes) - | +-group_by_list= - | +-uint64#37 := Literal(type=UINT64, value=1, has_explicit_type=TRUE) - +-output_column_list=[$groupby.uint64#37] == # CORRESPONDING: Type coercing in multiple operations of columns from table scans. @@ -2808,58 +1833,6 @@ QueryStmt | +-output_column_list=[$union_all3_cast.float#58] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#55 AS col [DOUBLE] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#55] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.uint32#56] - | | +-expr_list= - | | | +-uint32#56 := - | | | +-Cast(UINT32 -> DOUBLE) - | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#3) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.uint32#3] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.uint32#3], table=SimpleTypes, column_index_list=[2]) - | +-output_column_list=[$union_all1_cast.uint32#56] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all2_cast.int32#57] - | | +-expr_list= - | | | +-int32#57 := - | | | +-Cast(INT32 -> DOUBLE) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int32#19] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - | +-output_column_list=[$union_all2_cast.int32#57] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all3_cast.float#58] - | +-expr_list= - | | +-float#58 := - | | +-Cast(FLOAT -> DOUBLE) - | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) - | +-input_scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.float#44] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.float#44], table=SimpleTypes, column_index_list=[7]) - +-output_column_list=[$union_all3_cast.float#58] == # CORRESPONDING: nested set operations with type coercing. @@ -2953,86 +1926,6 @@ QueryStmt | +-output_column_list=[$intersect_all.col#58] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#61 AS col [DOUBLE] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#61] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.uint32#62] - | | +-expr_list= - | | | +-uint32#62 := - | | | +-Cast(UINT32 -> DOUBLE) - | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#3) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.uint32#3] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.uint32#3], table=SimpleTypes, column_index_list=[2], alias="s1") - | +-output_column_list=[$union_all1_cast.uint32#62] - +-SetOperationItem - +-scan= - | +-SetOperationScan - | +-column_list=[$intersect_all.col#58] - | +-op_type=INTERSECT_ALL - | +-input_item_list= - | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[$intersect_all1_cast.int32#59] - | | | +-expr_list= - | | | | +-int32#59 := - | | | | +-Cast(INT32 -> DOUBLE) - | | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=[SimpleTypes.int32#19] - | | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0], alias="s2") - | | +-output_column_list=[$intersect_all1_cast.int32#59] - | +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$intersect_all2_cast.col#60] - | | +-expr_list= - | | | +-col#60 := - | | | +-Cast(FLOAT -> DOUBLE) - | | | +-ColumnRef(type=FLOAT, column=$except_all.col#56) - | | +-input_scan= - | | +-SetOperationScan - | | +-column_list=[$except_all.col#56] - | | +-op_type=EXCEPT_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[SimpleTypes.float#44] - | | | | +-input_scan= - | | | | +-TableScan(column_list=[SimpleTypes.float#44], table=SimpleTypes, column_index_list=[7], alias="s3") - | | | +-output_column_list=[SimpleTypes.float#44] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[$except_all2_cast.col#57] - | | | +-expr_list= - | | | | +-col#57 := Literal(type=FLOAT, value=1) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=[$except_all2.col#55] - | | | +-expr_list= - | | | | +-col#55 := Literal(type=INT64, value=1) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=[$except_all2_cast.col#57] - | +-output_column_list=[$intersect_all2_cast.col#60] - +-output_column_list=[$intersect_all.col#58] == # CORRESPONDING: UNION ALL CORRESPONDING with anonymous STRUCT. @@ -3066,34 +1959,6 @@ QueryStmt | +-output_column_list=[$union_all2.col#2] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#3 AS col [STRUCT] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#3] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#1] - | | +-expr_list= - | | | +-col#1 := Literal(type=STRUCT, value={1, 1}) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#2] - | +-expr_list= - | | +-col#2 := Literal(type=STRUCT, value={1, 1}) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.col#2] == # CORRESPONDING: UNION of arrays. @@ -3127,34 +1992,6 @@ QueryStmt | +-output_column_list=[$union_all2.col#2] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#3 AS col [ARRAY] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#3] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#1] - | | +-expr_list= - | | | +-col#1 := Literal(type=ARRAY, value=[1, 1]) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#2] - | +-expr_list= - | | +-col#2 := Literal(type=ARRAY, value=[2, 3, 4]) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.col#2] == # CORRESPONDING: Subqueries with SELECT *. @@ -3205,47 +2042,6 @@ QueryStmt | +-output_column_list=[$subquery1.c1#3, $union_all2_cast.c2#7] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.c1#5 AS c1 [STRING] -| +-$union_all.c2#6 AS c2 [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[c1#5, c2#6] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[c1#1, c2#2] - | | +-expr_list= - | | | +-c1#1 := Literal(type=STRING, value="abc") - | | | +-c2#2 := Literal(type=INT64, value=1, has_explicit_type=TRUE) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[c1#1, c2#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$subquery1.c1#3, $union_all2_cast.c2#7] - | +-expr_list= - | | +-c2#7 := - | | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=$subquery1.c2#4) - | +-input_scan= - | +-ProjectScan - | +-column_list=$subquery1.[c1#3, c2#4] - | +-input_scan= - | +-ProjectScan - | +-column_list=$subquery1.[c1#3, c2#4] - | +-expr_list= - | | +-c1#3 := Literal(type=STRING, value="def") - | | +-c2#4 := Literal(type=INT32, value=1, has_explicit_type=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$subquery1.c1#3, $union_all2_cast.c2#7] == # CORRESPONDING: Set operation with hint. @@ -3294,48 +2090,9 @@ QueryStmt | +-output_column_list=[$union_all3.col#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#4 AS col [INT64] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#4] - +-hint_list= - | +-key := Literal(type=INT64, value=5) - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#1] - | | +-expr_list= - | | | +-col#1 := Literal(type=INT64, value=1) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#1] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all2.col#2] - | | +-expr_list= - | | | +-col#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all2.col#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all3.col#3] - | +-expr_list= - | | +-col#3 := Literal(type=INT64, value=3) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all3.col#3] == -[default language_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default language_features=V_1_4_CORRESPONDING_FULL] # CORRESPONDING: Basic FULL mode: no common columns. SELECT 1 AS a FULL UNION ALL CORRESPONDING @@ -3382,47 +2139,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#6, $union_all2.b#2] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#3 AS a [INT64] -| +-$union_all.b#4 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#3, b#4] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1, $null_column_for_outer_set_op.b#5] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-b#5 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.a#1, $null_column_for_outer_set_op.b#5] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#6, $union_all2.b#2] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#6 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.b#2] - | +-expr_list= - | | +-b#2 := Literal(type=INT64, value=1) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.a#6, $union_all2.b#2] == # CORRESPONDING: Baisc FULL mode: first query misses columns. @@ -3455,46 +2171,16 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int64#37 AS int64 [INT64] -| +-$union_all.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int64#37, int32#38] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, $null_column_for_outer_set_op.int32#39] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-int32#39 := Literal(type=INT32, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#2], table=SimpleTypes, column_index_list=[1]) - | +-output_column_list=[SimpleTypes.int64#2, $null_column_for_outer_set_op.int32#39] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] == # CORRESPONDING: Basic FULL mode: second query misses columns. @@ -3533,37 +2219,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.int64#39, SimpleTypes.int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int64#37 AS int64 [INT64] -| +-$union_all.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int64#37, int32#38] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.int64#39, SimpleTypes.int32#19] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-int64#39 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int32#19] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - +-output_column_list=[$null_column_for_outer_set_op.int64#39, SimpleTypes.int32#19] == # CORRESPONDING: Basic FULL mode: both queries miss columns. @@ -3615,50 +2270,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.c#9, $union_all2.a#3, $union_all2.b#4] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.c#5 AS c [INT64] -| +-$union_all.a#6 AS a [INT64] -| +-$union_all.b#7 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[c#5, a#6, b#7] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.c#1, $union_all1.a#2, $null_column_for_outer_set_op.b#8] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-b#8 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[c#1, a#2] - | | +-expr_list= - | | | +-c#1 := Literal(type=INT64, value=3) - | | | +-a#2 := Literal(type=INT64, value=1) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.c#1, $union_all1.a#2, $null_column_for_outer_set_op.b#8] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.c#9, $union_all2.a#3, $union_all2.b#4] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-c#9 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[a#3, b#4] - | +-expr_list= - | | +-a#3 := Literal(type=INT64, value=1) - | | +-b#4 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.c#9, $union_all2.a#3, $union_all2.b#4] == # CORRESPONDING: Basic FULL no queries miss columns. @@ -3685,40 +2296,16 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_distinct.int64#37 AS int64 [INT64] -| +-$union_distinct.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_distinct.[int64#37, int32#38] - +-op_type=UNION_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] == # CORRESPONDING: A column has incompatible types. @@ -3791,57 +2378,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.int64#60, SimpleTypes.int32#37, SimpleTypes.uint64#40] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int64#55 AS int64 [INT64] -| +-$union_all.int32#56 AS int32 [INT32] -| +-$union_all.uint64#57 AS uint64 [UINT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int64#55, int32#56, uint64#57] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, $null_column_for_outer_set_op.uint64#58] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-uint64#58 := Literal(type=UINT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, $null_column_for_outer_set_op.uint64#58] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.int64#59, SimpleTypes.int32#19, SimpleTypes.uint64#22] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-int64#59 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, uint64#22] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, uint64#22], table=SimpleTypes, column_index_list=[0, 3]) - | +-output_column_list=[$null_column_for_outer_set_op.int64#59, SimpleTypes.int32#19, SimpleTypes.uint64#22] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.int64#60, SimpleTypes.int32#37, SimpleTypes.uint64#40] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-int64#60 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#37, uint64#40] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#37, uint64#40], table=SimpleTypes, column_index_list=[0, 3]) - +-output_column_list=[$null_column_for_outer_set_op.int64#60, SimpleTypes.int32#37, SimpleTypes.uint64#40] == # CORRESPONDING: FULL mode multiple expressions: every query has a unique @@ -3907,61 +2443,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.int64#63, SimpleTypes.int32#37, $null_column_for_outer_set_op.uint64#64, SimpleTypes.float#44] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.int64#55 AS int64 [INT64] -| +-$intersect_all.int32#56 AS int32 [INT32] -| +-$intersect_all.uint64#57 AS uint64 [UINT64] -| +-$intersect_all.float#58 AS float [FLOAT] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[int64#55, int32#56, uint64#57, float#58] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, $null_column_for_outer_set_op.uint64#59, $null_column_for_outer_set_op.float#60] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-uint64#59 := Literal(type=UINT64, value=NULL) - | | | +-float#60 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, $null_column_for_outer_set_op.uint64#59, $null_column_for_outer_set_op.float#60] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.int64#61, SimpleTypes.int32#19, SimpleTypes.uint64#22, $null_column_for_outer_set_op.float#62] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-int64#61 := Literal(type=INT64, value=NULL) - | | | +-float#62 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, uint64#22] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, uint64#22], table=SimpleTypes, column_index_list=[0, 3]) - | +-output_column_list=[$null_column_for_outer_set_op.int64#61, SimpleTypes.int32#19, SimpleTypes.uint64#22, $null_column_for_outer_set_op.float#62] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.int64#63, SimpleTypes.int32#37, $null_column_for_outer_set_op.uint64#64, SimpleTypes.float#44] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-int64#63 := Literal(type=INT64, value=NULL) - | | +-uint64#64 := Literal(type=UINT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#37, float#44] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#37, float#44], table=SimpleTypes, column_index_list=[0, 7]) - +-output_column_list=[$null_column_for_outer_set_op.int64#63, SimpleTypes.int32#37, $null_column_for_outer_set_op.uint64#64, SimpleTypes.float#44] == # CORRESPONDING: FULL mode with multiple expressions: Type coercion for NULL @@ -4021,90 +2502,26 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$intersect_all3_cast.[uint32#60, float#61] - | | +-expr_list= - | | | +-uint32#60 := - | | | | +-Cast(UINT32 -> INT64) - | | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#39) - | | | +-float#61 := - | | | +-Cast(FLOAT -> DOUBLE) - | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) + | | +-column_list=$intersect_all3_cast.[float#61, uint32#60] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[uint32#39, float#44] + | | +-column_list=$intersect_all3_cast.[uint32#60, float#61] + | | +-expr_list= + | | | +-uint32#60 := + | | | | +-Cast(UINT32 -> INT64) + | | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#39) + | | | +-float#61 := + | | | +-Cast(FLOAT -> DOUBLE) + | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[uint32#39, float#44], table=SimpleTypes, column_index_list=[2, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[uint32#39, float#44] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[uint32#39, float#44], table=SimpleTypes, column_index_list=[2, 7]) | +-output_column_list=$intersect_all3_cast.[float#61, uint32#60] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.a#55 AS a [DOUBLE] -| +-$intersect_all.b#56 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[a#55, b#56] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$intersect_all1_cast.[int64#57, int32#58] - | | +-expr_list= - | | | +-int64#57 := - | | | | +-Cast(INT64 -> DOUBLE) - | | | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | | | +-int32#58 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=$intersect_all1_cast.[int64#57, int32#58] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#62, $intersect_all2_cast.int32#59] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#62 := Literal(type=DOUBLE, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$intersect_all2_cast.int32#59] - | | +-expr_list= - | | | +-int32#59 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int32#19] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - | +-output_column_list=[$null_column_for_outer_set_op.a#62, $intersect_all2_cast.int32#59] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$intersect_all3_cast.[float#61, uint32#60] - | +-input_scan= - | +-ProjectScan - | +-column_list=$intersect_all3_cast.[uint32#60, float#61] - | +-expr_list= - | | +-uint32#60 := - | | | +-Cast(UINT32 -> INT64) - | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#39) - | | +-float#61 := - | | +-Cast(FLOAT -> DOUBLE) - | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[uint32#39, float#44] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[uint32#39, float#44], table=SimpleTypes, column_index_list=[2, 7]) - +-output_column_list=$intersect_all3_cast.[float#61, uint32#60] == # CORRESPONDING: FULL mode with duplicate columns of different names. @@ -4157,51 +2574,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#42, $union_all2_cast.int32#40, SimpleTypes.int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#37 AS a [INT64] -| +-$union_all.b#38 AS b [INT64] -| +-$union_all.c#39 AS c [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#37, b#38, c#39] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int64#2, $null_column_for_outer_set_op.c#41] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-c#41 := Literal(type=INT32, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int64#2] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#2], table=SimpleTypes, column_index_list=[1]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int64#2, $null_column_for_outer_set_op.c#41] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#42, $union_all2_cast.int32#40, SimpleTypes.int32#19] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#42 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.int32#40, SimpleTypes.int32#19] - | +-expr_list= - | | +-int32#40 := - | | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int32#19] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - +-output_column_list=[$null_column_for_outer_set_op.a#42, $union_all2_cast.int32#40, SimpleTypes.int32#19] == # CORRESPONDING: FULL mode with duplicate columns of different names. @@ -4260,57 +2632,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#45, $union_all2_cast.int32#42, $null_column_for_outer_set_op.c#46, $union_all2_cast.int32#43, SimpleTypes.int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#37 AS a [INT64] -| +-$union_all.b#38 AS b [INT64] -| +-$union_all.c#39 AS c [INT32] -| +-$union_all.d#40 AS d [INT64] -| +-$union_all.e#41 AS e [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#37, b#38, c#39, d#40, e#41] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.int64#2, $null_column_for_outer_set_op.e#44] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-e#44 := Literal(type=INT32, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int64#2, int32#1, int64#2] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.int64#2, $null_column_for_outer_set_op.e#44] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#45, $union_all2_cast.int32#42, $null_column_for_outer_set_op.c#46, $union_all2_cast.int32#43, SimpleTypes.int32#19] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#45 := Literal(type=INT64, value=NULL) - | | +-c#46 := Literal(type=INT32, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.int32#42, $union_all2_cast.int32#43, SimpleTypes.int32#19] - | +-expr_list= - | | +-int32#42 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | +-int32#43 := - | | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int32#19, int32#19] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - +-output_column_list=[$null_column_for_outer_set_op.a#45, $union_all2_cast.int32#42, $null_column_for_outer_set_op.c#46, $union_all2_cast.int32#43, SimpleTypes.int32#19] == # CORRESPONDING: nested queries. @@ -4399,82 +2720,6 @@ QueryStmt | +-output_column_list=[$intersect_all.int64#57, $null_column_for_outer_set_op.int32#71, $intersect_all.float#55, $intersect_all.double#56, $intersect_all.bool#58] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int64#63 AS int64 [INT64] -| +-$union_all.int32#64 AS int32 [INT32] -| +-$union_all.float#65 AS float [DOUBLE] -| +-$union_all.double#66 AS double [DOUBLE] -| +-$union_all.bool#67 AS bool [BOOL] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int64#63, int32#64, float#65, double#66, bool#67] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, $null_column_for_outer_set_op.float#68, $null_column_for_outer_set_op.double#69, $null_column_for_outer_set_op.bool#70] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-float#68 := Literal(type=DOUBLE, value=NULL) - | | | +-double#69 := Literal(type=DOUBLE, value=NULL) - | | | +-bool#70 := Literal(type=BOOL, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, $null_column_for_outer_set_op.float#68, $null_column_for_outer_set_op.double#69, $null_column_for_outer_set_op.bool#70] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$intersect_all.int64#57, $null_column_for_outer_set_op.int32#71, $intersect_all.float#55, $intersect_all.double#56, $intersect_all.bool#58] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-int32#71 := Literal(type=INT32, value=NULL) - | +-input_scan= - | +-SetOperationScan - | +-column_list=$intersect_all.[float#55, double#56, int64#57, bool#58] - | +-op_type=INTERSECT_ALL - | +-input_item_list= - | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[$intersect_all1_cast.float#59, SimpleTypes.double#27, SimpleTypes.int64#20, $null_column_for_outer_set_op.bool#60] - | | | +-node_source="resolver_set_operation_corresponding" - | | | +-expr_list= - | | | | +-bool#60 := Literal(type=BOOL, value=NULL) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=[$intersect_all1_cast.float#59, SimpleTypes.double#27, SimpleTypes.int64#20] - | | | +-expr_list= - | | | | +-float#59 := - | | | | +-Cast(FLOAT -> DOUBLE) - | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[float#26, double#27, int64#20] - | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int64#20, float#26, double#27], table=SimpleTypes, column_index_list=[1, 7, 8]) - | | +-output_column_list=[$intersect_all1_cast.float#59, SimpleTypes.double#27, SimpleTypes.int64#20, $null_column_for_outer_set_op.bool#60] - | +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.double#45, $null_column_for_outer_set_op.double#61, $null_column_for_outer_set_op.int64#62, SimpleTypes.bool#43] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-double#61 := Literal(type=DOUBLE, value=NULL) - | | | +-int64#62 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[bool#43, double#45] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[bool#43, double#45], table=SimpleTypes, column_index_list=[6, 8]) - | +-output_column_list=[SimpleTypes.double#45, $null_column_for_outer_set_op.double#61, $null_column_for_outer_set_op.int64#62, SimpleTypes.bool#43] - +-output_column_list=[$intersect_all.int64#57, $null_column_for_outer_set_op.int32#71, $intersect_all.float#55, $intersect_all.double#56, $intersect_all.bool#58] == # CORRESPONDING: nested queries. @@ -4564,82 +2809,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.float#69, $null_column_for_outer_set_op.double#70, SimpleTypes.int64#46, $null_column_for_outer_set_op.bool#71, SimpleTypes.int32#45] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.float#63 AS float [DOUBLE] -| +-$union_all.double#64 AS double [DOUBLE] -| +-$union_all.int64#65 AS int64 [INT64] -| +-$union_all.bool#66 AS bool [BOOL] -| +-$union_all.int32#67 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_all.[float#63, double#64, int64#65, bool#66, int32#67] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$intersect_all.float#37, $intersect_all.double#38, $intersect_all.int64#39, $intersect_all.bool#40, $null_column_for_outer_set_op.int32#68] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-int32#68 := Literal(type=INT32, value=NULL) - | | +-input_scan= - | | +-SetOperationScan - | | +-column_list=$intersect_all.[float#37, double#38, int64#39, bool#40] - | | +-op_type=INTERSECT_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[$intersect_all1_cast.float#41, SimpleTypes.double#9, SimpleTypes.int64#2, $null_column_for_outer_set_op.bool#42] - | | | | +-node_source="resolver_set_operation_corresponding" - | | | | +-expr_list= - | | | | | +-bool#42 := Literal(type=BOOL, value=NULL) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=[$intersect_all1_cast.float#41, SimpleTypes.double#9, SimpleTypes.int64#2] - | | | | +-expr_list= - | | | | | +-float#41 := - | | | | | +-Cast(FLOAT -> DOUBLE) - | | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[float#8, double#9, int64#2] - | | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int64#2, float#8, double#9], table=SimpleTypes, column_index_list=[1, 7, 8]) - | | | +-output_column_list=[$intersect_all1_cast.float#41, SimpleTypes.double#9, SimpleTypes.int64#2, $null_column_for_outer_set_op.bool#42] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[SimpleTypes.double#27, $null_column_for_outer_set_op.double#43, $null_column_for_outer_set_op.int64#44, SimpleTypes.bool#25] - | | | +-node_source="resolver_set_operation_corresponding" - | | | +-expr_list= - | | | | +-double#43 := Literal(type=DOUBLE, value=NULL) - | | | | +-int64#44 := Literal(type=INT64, value=NULL) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[bool#25, double#27] - | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[bool#25, double#27], table=SimpleTypes, column_index_list=[6, 8]) - | | +-output_column_list=[SimpleTypes.double#27, $null_column_for_outer_set_op.double#43, $null_column_for_outer_set_op.int64#44, SimpleTypes.bool#25] - | +-output_column_list=[$intersect_all.float#37, $intersect_all.double#38, $intersect_all.int64#39, $intersect_all.bool#40, $null_column_for_outer_set_op.int32#68] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.float#69, $null_column_for_outer_set_op.double#70, SimpleTypes.int64#46, $null_column_for_outer_set_op.bool#71, SimpleTypes.int32#45] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-float#69 := Literal(type=DOUBLE, value=NULL) - | | +-double#70 := Literal(type=DOUBLE, value=NULL) - | | +-bool#71 := Literal(type=BOOL, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#46, int32#45] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#45, int64#46], table=SimpleTypes, column_index_list=[0, 1]) - +-output_column_list=[$null_column_for_outer_set_op.float#69, $null_column_for_outer_set_op.double#70, SimpleTypes.int64#46, $null_column_for_outer_set_op.bool#71, SimpleTypes.int32#45] == # CORRESPONDING: Basic LEFT mode: no common columns. @@ -4677,47 +2846,19 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[a#2, b#3] - | | +-expr_list= - | | | +-a#2 := Literal(type=INT64, value=1) - | | | +-b#3 := Literal(type=INT64, value=2) + | | +-column_list=[$union_all2.a#2] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[a#2, b#3] + | | +-expr_list= + | | | +-a#2 := Literal(type=INT64, value=1) + | | | +-b#3 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=[$union_all2.a#2] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#4 AS a [INT64] -+-query= - +-SetOperationScan - +-column_list=[$union_all.a#4] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.a#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.a#2] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[a#2, b#3] - | +-expr_list= - | | +-a#2 := Literal(type=INT64, value=1) - | | +-b#3 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.a#2] == # CORRESPONDING: Basic LEFT mode: second query misses columns. @@ -4761,42 +2902,6 @@ QueryStmt | +-output_column_list=[$union_all2.a#3, $null_column_for_outer_set_op.b#6] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#4 AS a [INT64] -| +-$union_all.b#5 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#4, b#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.a#3, $null_column_for_outer_set_op.b#6] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#6 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.a#3] - | +-expr_list= - | | +-a#3 := Literal(type=INT64, value=1) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.a#3, $null_column_for_outer_set_op.b#6] == # CORRESPONDING: Basic LEFT mode: second query misses columns. @@ -4840,42 +2945,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#6, $union_all2.b#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#4 AS a [INT64] -| +-$union_all.b#5 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#4, b#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#6, $union_all2.b#3] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#6 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.b#3] - | +-expr_list= - | | +-b#3 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.a#6, $union_all2.b#3] == # CORRESPONDING: Basic LEFT mode: both queries miss columns @@ -4920,50 +2989,13 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#7, $union_all2.b#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT +== -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#5 AS a [INT64] -| +-$union_all.b#6 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#5, b#6] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#7, $union_all2.b#3] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#7 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[b#3, c#4] - | +-expr_list= - | | +-b#3 := Literal(type=INT64, value=2) - | | +-c#4 := Literal(type=INT64, value=3) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.a#7, $union_all2.b#3] -== - -# CORRESPONDING: Basic LEFT mode: no queries miss columns. -SELECT 1 AS a, 2 AS b -LEFT UNION ALL CORRESPONDING -SELECT 2 AS b, 3 AS a --- +# CORRESPONDING: Basic LEFT mode: no queries miss columns. +SELECT 1 AS a, 2 AS b +LEFT UNION ALL CORRESPONDING +SELECT 2 AS b, 3 AS a +-- QueryStmt +-output_column_list= | +-$union_all.a#5 AS a [INT64] @@ -4986,49 +3018,19 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[b#3, a#4] - | | +-expr_list= - | | | +-b#3 := Literal(type=INT64, value=2) - | | | +-a#4 := Literal(type=INT64, value=3) + | | +-column_list=$union_all2.[a#4, b#3] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[b#3, a#4] + | | +-expr_list= + | | | +-b#3 := Literal(type=INT64, value=2) + | | | +-a#4 := Literal(type=INT64, value=3) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[a#4, b#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#5 AS a [INT64] -| +-$union_all.b#6 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#5, b#6] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[a#4, b#3] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[b#3, a#4] - | +-expr_list= - | | +-b#3 := Literal(type=INT64, value=2) - | | +-a#4 := Literal(type=INT64, value=3) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[a#4, b#3] == # CORRESPONDING: LEFT mode multiple expressions. @@ -5091,59 +3093,6 @@ QueryStmt | +-output_column_list=[$union_all3.a#6, $null_column_for_outer_set_op.b#10] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#7 AS a [INT64] -| +-$union_all.b#8 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#7, b#8] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#3] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#9 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all2.[b#3, c#4] - | | +-expr_list= - | | | +-b#3 := Literal(type=INT64, value=2) - | | | +-c#4 := Literal(type=INT64, value=3) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#3] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all3.a#6, $null_column_for_outer_set_op.b#10] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#10 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all3.[c#5, a#6] - | +-expr_list= - | | +-c#5 := Literal(type=INT64, value=2) - | | +-a#6 := Literal(type=INT64, value=1) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all3.a#6, $null_column_for_outer_set_op.b#10] == # CORRESPONDING: LEFT mode multiple expressions no common columns. @@ -5230,59 +3179,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#10, $union_all3.b#6] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#7 AS a [INT64] -| +-$union_all.b#8 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#7, b#8] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#3] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#9 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all2.[b#3, c#4] - | | +-expr_list= - | | | +-b#3 := Literal(type=INT64, value=1) - | | | +-c#4 := Literal(type=BOOL, value=false) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#3] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#10, $union_all3.b#6] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#10 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all3.[c#5, b#6] - | +-expr_list= - | | +-c#5 := Literal(type=INT64, value=100) - | | +-b#6 := Literal(type=INT64, value=1) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.a#10, $union_all3.b#6] == # CORRESPONDING: LEFT mode multiple expressions. @@ -5336,50 +3232,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.int64#58, SimpleTypes.int32#37] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.int64#55 AS int64 [INT64] -| +-$intersect_all.int32#56 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[int64#55, int32#56] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.int64#57, SimpleTypes.int32#19] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-int64#57 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, uint64#22] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, uint64#22], table=SimpleTypes, column_index_list=[0, 3]) - | +-output_column_list=[$null_column_for_outer_set_op.int64#57, SimpleTypes.int32#19] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.int64#58, SimpleTypes.int32#37] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-int64#58 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#37, float#44] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#37, float#44], table=SimpleTypes, column_index_list=[0, 7]) - +-output_column_list=[$null_column_for_outer_set_op.int64#58, SimpleTypes.int32#37] == # CORRESPONDING: LEFT mode with multiple expressions: Type coercion for NULL @@ -5439,90 +3291,26 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$intersect_all3_cast.[uint32#60, float#61] - | | +-expr_list= - | | | +-uint32#60 := - | | | | +-Cast(UINT32 -> INT64) - | | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#39) - | | | +-float#61 := - | | | +-Cast(FLOAT -> DOUBLE) - | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) + | | +-column_list=$intersect_all3_cast.[float#61, uint32#60] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[uint32#39, float#44] + | | +-column_list=$intersect_all3_cast.[uint32#60, float#61] + | | +-expr_list= + | | | +-uint32#60 := + | | | | +-Cast(UINT32 -> INT64) + | | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#39) + | | | +-float#61 := + | | | +-Cast(FLOAT -> DOUBLE) + | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[uint32#39, float#44], table=SimpleTypes, column_index_list=[2, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[uint32#39, float#44] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[uint32#39, float#44], table=SimpleTypes, column_index_list=[2, 7]) | +-output_column_list=$intersect_all3_cast.[float#61, uint32#60] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.a#55 AS a [DOUBLE] -| +-$intersect_all.b#56 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[a#55, b#56] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$intersect_all1_cast.[int64#57, int32#58] - | | +-expr_list= - | | | +-int64#57 := - | | | | +-Cast(INT64 -> DOUBLE) - | | | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | | | +-int32#58 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=$intersect_all1_cast.[int64#57, int32#58] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#62, $intersect_all2_cast.int32#59] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#62 := Literal(type=DOUBLE, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$intersect_all2_cast.int32#59] - | | +-expr_list= - | | | +-int32#59 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int32#19] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - | +-output_column_list=[$null_column_for_outer_set_op.a#62, $intersect_all2_cast.int32#59] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$intersect_all3_cast.[float#61, uint32#60] - | +-input_scan= - | +-ProjectScan - | +-column_list=$intersect_all3_cast.[uint32#60, float#61] - | +-expr_list= - | | +-uint32#60 := - | | | +-Cast(UINT32 -> INT64) - | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#39) - | | +-float#61 := - | | +-Cast(FLOAT -> DOUBLE) - | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[uint32#39, float#44] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[uint32#39, float#44], table=SimpleTypes, column_index_list=[2, 7]) - +-output_column_list=$intersect_all3_cast.[float#61, uint32#60] == # CORRESPONDING: LEFT mode with multiple expressions: Type coercion does not @@ -5601,74 +3389,6 @@ QueryStmt | +-output_column_list=[$intersect_all3_cast.float#60, $null_column_for_outer_set_op.b#62] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.a#55 AS a [DOUBLE] -| +-$intersect_all.b#56 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[a#55, b#56] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$intersect_all1_cast.[int64#57, int32#58] - | | +-expr_list= - | | | +-int64#57 := - | | | | +-Cast(INT64 -> DOUBLE) - | | | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | | | +-int32#58 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=$intersect_all1_cast.[int64#57, int32#58] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#61, $intersect_all2_cast.uint32#59] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#61 := Literal(type=DOUBLE, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$intersect_all2_cast.uint32#59, SimpleTypes.int32#19] - | | +-expr_list= - | | | +-uint32#59 := - | | | +-Cast(UINT32 -> INT64) - | | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[uint32#21, int32#19] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, uint32#21], table=SimpleTypes, column_index_list=[0, 2]) - | +-output_column_list=[$null_column_for_outer_set_op.a#61, $intersect_all2_cast.uint32#59] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$intersect_all3_cast.float#60, $null_column_for_outer_set_op.b#62] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#62 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.uint32#39, $intersect_all3_cast.float#60] - | +-expr_list= - | | +-float#60 := - | | +-Cast(FLOAT -> DOUBLE) - | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#44) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[uint32#39, float#44] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[uint32#39, float#44], table=SimpleTypes, column_index_list=[2, 7]) - +-output_column_list=[$intersect_all3_cast.float#60, $null_column_for_outer_set_op.b#62] == # CORRESPONDING: LEFT mode with duplicate columns of different names. @@ -5714,44 +3434,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#40, $union_all2_cast.int32#39] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#37 AS a [INT64] -| +-$union_all.b#38 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#37, b#38] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int64#2] - | | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#2], table=SimpleTypes, column_index_list=[1]) - | +-output_column_list=SimpleTypes.[int64#2, int64#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#40, $union_all2_cast.int32#39] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#40 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.int32#39, SimpleTypes.int32#19] - | +-expr_list= - | | +-int32#39 := - | | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int32#19] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - +-output_column_list=[$null_column_for_outer_set_op.a#40, $union_all2_cast.int32#39] == # CORRESPONDING: LEFT mode with duplicate columns of different names. @@ -5803,50 +3485,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.a#43, $union_all2_cast.int32#41, $null_column_for_outer_set_op.c#44, $union_all2_cast.int32#42] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#37 AS a [INT64] -| +-$union_all.b#38 AS b [INT64] -| +-$union_all.c#39 AS c [INT32] -| +-$union_all.d#40 AS d [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#37, b#38, c#39, d#40] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int64#2, int32#1, int64#2] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=SimpleTypes.[int64#2, int64#2, int32#1, int64#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.a#43, $union_all2_cast.int32#41, $null_column_for_outer_set_op.c#44, $union_all2_cast.int32#42] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-a#43 := Literal(type=INT64, value=NULL) - | | +-c#44 := Literal(type=INT32, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.int32#41, $union_all2_cast.int32#42, SimpleTypes.int32#19] - | +-expr_list= - | | +-int32#41 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | +-int32#42 := - | | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int32#19, int32#19] - | +-input_scan= - | +-TableScan(column_list=[SimpleTypes.int32#19], table=SimpleTypes, column_index_list=[0]) - +-output_column_list=[$null_column_for_outer_set_op.a#43, $union_all2_cast.int32#41, $null_column_for_outer_set_op.c#44, $union_all2_cast.int32#42] == # CORRESPONDING: LEFT nested queries. @@ -5918,65 +3556,6 @@ QueryStmt | +-output_column_list=[$intersect_all.int64#57, $null_column_for_outer_set_op.int32#63] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int64#61 AS int64 [INT64] -| +-$union_all.int32#62 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int64#61, int32#62] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$intersect_all.int64#57, $null_column_for_outer_set_op.int32#63] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-int32#63 := Literal(type=INT32, value=NULL) - | +-input_scan= - | +-SetOperationScan - | +-column_list=$intersect_all.[float#55, double#56, int64#57] - | +-op_type=INTERSECT_ALL - | +-input_item_list= - | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[$intersect_all1_cast.float#58, SimpleTypes.double#27, SimpleTypes.int64#20] - | | | +-expr_list= - | | | | +-float#58 := - | | | | +-Cast(FLOAT -> DOUBLE) - | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#26) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[float#26, double#27, int64#20] - | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int64#20, float#26, double#27], table=SimpleTypes, column_index_list=[1, 7, 8]) - | | +-output_column_list=[$intersect_all1_cast.float#58, SimpleTypes.double#27, SimpleTypes.int64#20] - | +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.double#45, $null_column_for_outer_set_op.double#59, $null_column_for_outer_set_op.int64#60] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-double#59 := Literal(type=DOUBLE, value=NULL) - | | | +-int64#60 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[bool#43, double#45] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[bool#43, double#45], table=SimpleTypes, column_index_list=[6, 8]) - | +-output_column_list=[SimpleTypes.double#45, $null_column_for_outer_set_op.double#59, $null_column_for_outer_set_op.int64#60] - +-output_column_list=[$intersect_all.int64#57, $null_column_for_outer_set_op.int32#63] == # CORRESPONDING: LEFT nested queries. @@ -6051,67 +3630,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.float#64, $null_column_for_outer_set_op.double#65, SimpleTypes.int64#44] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.float#61 AS float [DOUBLE] -| +-$union_all.double#62 AS double [DOUBLE] -| +-$union_all.int64#63 AS int64 [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[float#61, double#62, int64#63] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-SetOperationScan - | | +-column_list=$intersect_all.[float#37, double#38, int64#39] - | | +-op_type=INTERSECT_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[$intersect_all1_cast.float#40, SimpleTypes.double#9, SimpleTypes.int64#2] - | | | | +-expr_list= - | | | | | +-float#40 := - | | | | | +-Cast(FLOAT -> DOUBLE) - | | | | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=SimpleTypes.[float#8, double#9, int64#2] - | | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int64#2, float#8, double#9], table=SimpleTypes, column_index_list=[1, 7, 8]) - | | | +-output_column_list=[$intersect_all1_cast.float#40, SimpleTypes.double#9, SimpleTypes.int64#2] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[SimpleTypes.double#27, $null_column_for_outer_set_op.double#41, $null_column_for_outer_set_op.int64#42] - | | | +-node_source="resolver_set_operation_corresponding" - | | | +-expr_list= - | | | | +-double#41 := Literal(type=DOUBLE, value=NULL) - | | | | +-int64#42 := Literal(type=INT64, value=NULL) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[bool#25, double#27] - | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[bool#25, double#27], table=SimpleTypes, column_index_list=[6, 8]) - | | +-output_column_list=[SimpleTypes.double#27, $null_column_for_outer_set_op.double#41, $null_column_for_outer_set_op.int64#42] - | +-output_column_list=$intersect_all.[float#37, double#38, int64#39] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.float#64, $null_column_for_outer_set_op.double#65, SimpleTypes.int64#44] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-float#64 := Literal(type=DOUBLE, value=NULL) - | | +-double#65 := Literal(type=DOUBLE, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#44, int32#43] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#43, int64#44], table=SimpleTypes, column_index_list=[0, 1]) - +-output_column_list=[$null_column_for_outer_set_op.float#64, $null_column_for_outer_set_op.double#65, SimpleTypes.int64#44] == # CORRESPONDING: LEFT nested queries: no common columns. @@ -6147,17 +3665,21 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=[$distinct.a#2] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#2, a#2] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=[$subquery1.a#1] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#2 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-AggregateScan + | | | +-column_list=[$distinct.a#2] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$subquery1.a#1] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#2 := ColumnRef(type=INT64, column=$subquery1.a#1) | | +-output_column_list=$distinct.[a#2, a#2] | +-SetOperationItem | +-scan= @@ -6171,55 +3693,17 @@ QueryStmt | +-output_column_list=$union_all2.[col1#3, col2#4] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT +== -[REWRITTEN AST] +# Literal coercion is correctly handled: INNER mode. +SELECT CAST(1 AS INT32) AS A, NULL AS B +INTERSECT ALL CORRESPONDING +SELECT CAST("STRING_VAL" AS STRING) AS B, 100 AS A +-- QueryStmt +-output_column_list= -| +-$union_all.col1#5 AS col1 [INT64] -| +-$union_all.col2#6 AS col2 [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[col1#5, col2#6] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$distinct.[a#2, a#2] - | | +-input_scan= - | | +-AggregateScan - | | +-column_list=[$distinct.a#2] - | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=[$subquery1.a#1] - | | | +-expr_list= - | | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-group_by_list= - | | +-a#2 := ColumnRef(type=INT64, column=$subquery1.a#1) - | +-output_column_list=$distinct.[a#2, a#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[col1#3, col2#4] - | +-expr_list= - | | +-col1#3 := Literal(type=INT64, value=1) - | | +-col2#4 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[col1#3, col2#4] -== - -# Literal coercion is correctly handled: INNER mode. -SELECT CAST(1 AS INT32) AS A, NULL AS B -INTERSECT ALL CORRESPONDING -SELECT CAST("STRING_VAL" AS STRING) AS B, 100 AS A --- -QueryStmt -+-output_column_list= -| +-$intersect_all.A#5 AS A [INT32] -| +-$intersect_all.B#6 AS B [STRING] +| +-$intersect_all.A#5 AS A [INT32] +| +-$intersect_all.B#6 AS B [STRING] +-query= +-SetOperationScan +-column_list=$intersect_all.[A#5, B#6] @@ -6243,64 +3727,24 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=[$intersect_all2.B#3, $intersect_all2_cast.A#8] - | | +-expr_list= - | | | +-A#8 := Literal(type=INT32, value=100) + | | +-column_list=[$intersect_all2_cast.A#8, $intersect_all2.B#3] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=$intersect_all2.[B#3, A#4] + | | +-column_list=[$intersect_all2.B#3, $intersect_all2_cast.A#8] | | +-expr_list= - | | | +-B#3 := Literal(type=STRING, value="STRING_VAL", has_explicit_type=TRUE) - | | | +-A#4 := Literal(type=INT64, value=100) + | | | +-A#8 := Literal(type=INT32, value=100) | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$intersect_all2.[B#3, A#4] + | | +-expr_list= + | | | +-B#3 := Literal(type=STRING, value="STRING_VAL", has_explicit_type=TRUE) + | | | +-A#4 := Literal(type=INT64, value=100) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=[$intersect_all2_cast.A#8, $intersect_all2.B#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.A#5 AS A [INT32] -| +-$intersect_all.B#6 AS B [STRING] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[A#5, B#6] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$intersect_all1.A#1, $intersect_all1_cast.B#7] - | | +-expr_list= - | | | +-B#7 := Literal(type=STRING, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$intersect_all1.[A#1, B#2] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT32, value=1, has_explicit_type=TRUE) - | | | +-B#2 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$intersect_all1.A#1, $intersect_all1_cast.B#7] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$intersect_all2_cast.A#8, $intersect_all2.B#3] - | +-input_scan= - | +-ProjectScan - | +-column_list=[$intersect_all2.B#3, $intersect_all2_cast.A#8] - | +-expr_list= - | | +-A#8 := Literal(type=INT32, value=100) - | +-input_scan= - | +-ProjectScan - | +-column_list=$intersect_all2.[B#3, A#4] - | +-expr_list= - | | +-B#3 := Literal(type=STRING, value="STRING_VAL", has_explicit_type=TRUE) - | | +-A#4 := Literal(type=INT64, value=100) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$intersect_all2_cast.A#8, $intersect_all2.B#3] == # Literal cocercion is correctly handled when NULL columns are padded for FULL @@ -6341,59 +3785,19 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[B#2, A#3] - | | +-expr_list= - | | | +-B#2 := Literal(type=INT64, value=1) - | | | +-A#3 := Literal(type=ARRAY>, value=NULL, has_explicit_type=TRUE) + | | +-column_list=$union_all2.[A#3, B#2] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[B#2, A#3] + | | +-expr_list= + | | | +-B#2 := Literal(type=INT64, value=1) + | | | +-A#3 := Literal(type=ARRAY>, value=NULL, has_explicit_type=TRUE) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[A#3, B#2] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.A#4 AS A [ARRAY>] -| +-$union_all.B#5 AS B [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[A#4, B#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.A#6, $null_column_for_outer_set_op.B#7] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-B#7 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.A#6] - | | +-expr_list= - | | | +-A#6 := Literal(type=ARRAY>, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.A#1] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.A#6, $null_column_for_outer_set_op.B#7] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[A#3, B#2] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[B#2, A#3] - | +-expr_list= - | | +-B#2 := Literal(type=INT64, value=1) - | | +-A#3 := Literal(type=ARRAY>, value=NULL, has_explicit_type=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[A#3, B#2] == # Literal cocercion is correctly handled when NULL columns are padded for FULL @@ -6446,50 +3850,6 @@ QueryStmt | +-output_column_list=[$union_all2.A#5, $union_all2.B#4, $null_column_for_outer_set_op.C#10] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.A#6 AS A [ARRAY>] -| +-$union_all.B#7 AS B [INT64] -| +-$union_all.C#8 AS C [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[A#6, B#7, C#8] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.A#9, $union_all1.B#2, $union_all1.C#3] - | | +-expr_list= - | | | +-A#9 := Literal(type=ARRAY>, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[A#1, B#2, C#3] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT64, value=NULL) - | | | +-B#2 := Literal(type=INT64, value=1) - | | | +-C#3 := Literal(type=INT64, value=3) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.A#9, $union_all1.B#2, $union_all1.C#3] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.A#5, $union_all2.B#4, $null_column_for_outer_set_op.C#10] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-C#10 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[B#4, A#5] - | +-expr_list= - | | +-B#4 := Literal(type=INT64, value=1) - | | +-A#5 := Literal(type=ARRAY>, value=NULL, has_explicit_type=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.A#5, $union_all2.B#4, $null_column_for_outer_set_op.C#10] == # Literal coercion is correctly handled when NULL columns are padded for FULL @@ -6539,47 +3899,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.A#7, $union_distinct2_cast.B#6] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_distinct.A#4 AS A [INT32] -| +-$union_distinct.B#5 AS B [STRING] -+-query= - +-SetOperationScan - +-column_list=$union_distinct.[A#4, B#5] - +-op_type=UNION_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_distinct1.[A#1, B#2] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT32, value=1, has_explicit_type=TRUE) - | | | +-B#2 := Literal(type=STRING, value="STRING_VAL") - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_distinct1.[A#1, B#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.A#7, $union_distinct2_cast.B#6] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-A#7 := Literal(type=INT32, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_distinct2_cast.B#6] - | +-expr_list= - | | +-B#6 := Literal(type=STRING, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_distinct2.B#3] - | +-expr_list= - | | +-B#3 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.A#7, $union_distinct2_cast.B#6] == # Literal coercion is correctly handled when NULL columns are padded for FULL @@ -6617,70 +3936,26 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=[$union_distinct2_cast.B#9, $union_distinct2_cast.A#10, $union_distinct2.C#5] - | | +-expr_list= - | | | +-B#9 := Literal(type=STRING, value=NULL) - | | | +-A#10 := Literal(type=INT32, value=1) + | | +-column_list=[$union_distinct2_cast.A#10, $union_distinct2_cast.B#9, $union_distinct2.C#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=$union_distinct2.[B#3, A#4, C#5] + | | +-column_list=[$union_distinct2_cast.B#9, $union_distinct2_cast.A#10, $union_distinct2.C#5] | | +-expr_list= - | | | +-B#3 := Literal(type=INT64, value=NULL) - | | | +-A#4 := Literal(type=INT64, value=1) - | | | +-C#5 := Literal(type=STRING, value="NO_MATCHING") + | | | +-B#9 := Literal(type=STRING, value=NULL) + | | | +-A#10 := Literal(type=INT32, value=1) | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_distinct2.[B#3, A#4, C#5] + | | +-expr_list= + | | | +-B#3 := Literal(type=INT64, value=NULL) + | | | +-A#4 := Literal(type=INT64, value=1) + | | | +-C#5 := Literal(type=STRING, value="NO_MATCHING") + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=[$union_distinct2_cast.A#10, $union_distinct2_cast.B#9, $union_distinct2.C#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_distinct.A#6 AS A [INT32] -| +-$union_distinct.B#7 AS B [STRING] -| +-$union_distinct.C#8 AS C [STRING] -+-query= - +-SetOperationScan - +-column_list=$union_distinct.[A#6, B#7, C#8] - +-op_type=UNION_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_distinct1.A#1, $union_distinct1.B#2, $null_column_for_outer_set_op.C#11] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-C#11 := Literal(type=STRING, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_distinct1.[A#1, B#2] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT32, value=1, has_explicit_type=TRUE) - | | | +-B#2 := Literal(type=STRING, value="STRING_VAL") - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_distinct1.A#1, $union_distinct1.B#2, $null_column_for_outer_set_op.C#11] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_distinct2_cast.A#10, $union_distinct2_cast.B#9, $union_distinct2.C#5] - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_distinct2_cast.B#9, $union_distinct2_cast.A#10, $union_distinct2.C#5] - | +-expr_list= - | | +-B#9 := Literal(type=STRING, value=NULL) - | | +-A#10 := Literal(type=INT32, value=1) - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_distinct2.[B#3, A#4, C#5] - | +-expr_list= - | | +-B#3 := Literal(type=INT64, value=NULL) - | | +-A#4 := Literal(type=INT64, value=1) - | | +-C#5 := Literal(type=STRING, value="NO_MATCHING") - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_distinct2_cast.A#10, $union_distinct2_cast.B#9, $union_distinct2.C#5] == # Literal coercion is correctly handled when NULL columns are padded for LEFT @@ -6730,47 +4005,6 @@ QueryStmt | +-output_column_list=[$union_all2.A#3, $null_column_for_outer_set_op.B#7] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.A#4 AS A [INT32] -| +-$union_all.B#5 AS B [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[A#4, B#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.A#6, $union_all1.B#2] - | | +-expr_list= - | | | +-A#6 := Literal(type=INT32, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[A#1, B#2] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT64, value=NULL) - | | | +-B#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.A#6, $union_all1.B#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.A#3, $null_column_for_outer_set_op.B#7] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-B#7 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.A#3] - | +-expr_list= - | | +-A#3 := Literal(type=INT32, value=1, has_explicit_type=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.A#3, $null_column_for_outer_set_op.B#7] == # Literal coercion is correctly handled when NULL columns are padded for LEFT @@ -6820,47 +4054,6 @@ QueryStmt | +-output_column_list=[$union_all2_cast.A#6, $null_column_for_outer_set_op.B#7] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.A#4 AS A [INT32] -| +-$union_all.B#5 AS B [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[A#4, B#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[A#1, B#2] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT32, value=1, has_explicit_type=TRUE) - | | | +-B#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[A#1, B#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.A#6, $null_column_for_outer_set_op.B#7] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-B#7 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2_cast.A#6] - | +-expr_list= - | | +-A#6 := Literal(type=INT32, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.A#3] - | +-expr_list= - | | +-A#3 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2_cast.A#6, $null_column_for_outer_set_op.B#7] == # CORRESPONDING: Basic STRICT. @@ -6890,48 +4083,18 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[b#3, a#4] - | | +-expr_list= - | | | +-b#3 := Literal(type=INT64, value=2) - | | | +-a#4 := Literal(type=INT64, value=1) + | | +-column_list=$union_all2.[a#4, b#3] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[b#3, a#4] + | | +-expr_list= + | | | +-b#3 := Literal(type=INT64, value=2) + | | | +-a#4 := Literal(type=INT64, value=1) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[a#4, b#3] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#5 AS a [INT64] -| +-$union_all.b#6 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#5, b#6] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[a#4, b#3] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[b#3, a#4] - | +-expr_list= - | | +-b#3 := Literal(type=INT64, value=2) - | | +-a#4 := Literal(type=INT64, value=1) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[a#4, b#3] == # CORRESPONDING: Basic STRICT: column names must be identical. @@ -6993,12 +4156,16 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=$union_all2.[B#3, a#4] - | | | +-expr_list= - | | | | +-B#3 := Literal(type=INT64, value=2) - | | | | +-a#4 := Literal(type=INT64, value=3) + | | | +-column_list=$union_all2.[a#4, B#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-SingleRowScan + | | | +-ProjectScan + | | | +-column_list=$union_all2.[B#3, a#4] + | | | +-expr_list= + | | | | +-B#3 := Literal(type=INT64, value=2) + | | | | +-a#4 := Literal(type=INT64, value=3) + | | | +-input_scan= + | | | +-SingleRowScan | | +-output_column_list=$union_all2.[a#4, B#3] | +-SetOperationItem | +-scan= @@ -7011,50 +4178,6 @@ QueryStmt | | +-SingleRowScan | +-output_column_list=$union_all3.[A#5, B#6] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#7 AS a [INT64] -| +-$union_all.b#8 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#7, b#8] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[a#1, b#2] - | | +-expr_list= - | | | +-a#1 := Literal(type=INT64, value=1) - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[a#1, b#2] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all2.[a#4, B#3] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all2.[B#3, a#4] - | | +-expr_list= - | | | +-B#3 := Literal(type=INT64, value=2) - | | | +-a#4 := Literal(type=INT64, value=3) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all2.[a#4, B#3] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all3.[A#5, B#6] - | +-expr_list= - | | +-A#5 := Literal(type=INT64, value=3) - | | +-B#6 := Literal(type=INT64, value=1) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all3.[A#5, B#6] == # CORRESPONDING: STRICT mode with multiple queries: column lists are not the same. @@ -7096,23 +4219,31 @@ QueryStmt | | | | +-SetOperationItem | | | | | +-scan= | | | | | | +-ProjectScan - | | | | | | +-column_list=$union_all1.[a#1, b#2, c#3] - | | | | | | +-expr_list= - | | | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | | | | +-c#3 := Literal(type=INT64, value=3) + | | | | | | +-column_list=$union_all1.[b#2, c#3] + | | | | | | +-node_source="resolver_set_operation_corresponding" | | | | | | +-input_scan= - | | | | | | +-SingleRowScan + | | | | | | +-ProjectScan + | | | | | | +-column_list=$union_all1.[a#1, b#2, c#3] + | | | | | | +-expr_list= + | | | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | | | | +-c#3 := Literal(type=INT64, value=3) + | | | | | | +-input_scan= + | | | | | | +-SingleRowScan | | | | | +-output_column_list=$union_all1.[b#2, c#3] | | | | +-SetOperationItem | | | | +-scan= | | | | | +-ProjectScan - | | | | | +-column_list=$union_all2.[c#4, b#5] - | | | | | +-expr_list= - | | | | | | +-c#4 := Literal(type=INT64, value=2) - | | | | | | +-b#5 := Literal(type=INT64, value=3) + | | | | | +-column_list=$union_all2.[b#5, c#4] + | | | | | +-node_source="resolver_set_operation_corresponding" | | | | | +-input_scan= - | | | | | +-SingleRowScan + | | | | | +-ProjectScan + | | | | | +-column_list=$union_all2.[c#4, b#5] + | | | | | +-expr_list= + | | | | | | +-c#4 := Literal(type=INT64, value=2) + | | | | | | +-b#5 := Literal(type=INT64, value=3) + | | | | | +-input_scan= + | | | | | +-SingleRowScan | | | | +-output_column_list=$union_all2.[b#5, c#4] | | | +-column_match_mode=CORRESPONDING | | | +-column_propagation_mode=INNER @@ -7128,79 +4259,24 @@ QueryStmt | | +-SingleRowScan | +-output_column_list=$except_distinct2.[B#8, c#9] +-column_match_mode=CORRESPONDING +== -[REWRITTEN AST] +# CORRESPONDING: nested set operations. +SELECT 1 AS B, 3 AS c +EXCEPT DISTINCT STRICT CORRESPONDING +( + SELECT 1 AS a, 2 AS b, 3 AS c + UNION ALL CORRESPONDING + SELECT 2 AS c, 3 AS b +) +-- QueryStmt +-output_column_list= -| +-$except_distinct.b#10 AS b [INT64] +| +-$except_distinct.B#10 AS B [INT64] | +-$except_distinct.c#11 AS c [INT64] +-query= +-SetOperationScan - +-column_list=$except_distinct.[b#10, c#11] - +-op_type=EXCEPT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-SetOperationScan - | | +-column_list=$union_all.[b#6, c#7] - | | +-op_type=UNION_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=$union_all1.[b#2, c#3] - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$union_all1.[a#1, b#2, c#3] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | | +-c#3 := Literal(type=INT64, value=3) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-output_column_list=$union_all1.[b#2, c#3] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=$union_all2.[b#5, c#4] - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$union_all2.[c#4, b#5] - | | | +-expr_list= - | | | | +-c#4 := Literal(type=INT64, value=2) - | | | | +-b#5 := Literal(type=INT64, value=3) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=$union_all2.[b#5, c#4] - | +-output_column_list=$union_all.[b#6, c#7] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$except_distinct2.[B#8, c#9] - | +-expr_list= - | | +-B#8 := Literal(type=INT64, value=1) - | | +-c#9 := Literal(type=INT64, value=3) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$except_distinct2.[B#8, c#9] -== - -# CORRESPONDING: nested set operations. -SELECT 1 AS B, 3 AS c -EXCEPT DISTINCT STRICT CORRESPONDING -( - SELECT 1 AS a, 2 AS b, 3 AS c - UNION ALL CORRESPONDING - SELECT 2 AS c, 3 AS b -) --- -QueryStmt -+-output_column_list= -| +-$except_distinct.B#10 AS B [INT64] -| +-$except_distinct.c#11 AS c [INT64] -+-query= - +-SetOperationScan - +-column_list=$except_distinct.[B#10, c#11] + +-column_list=$except_distinct.[B#10, c#11] +-op_type=EXCEPT_DISTINCT +-input_item_list= | +-SetOperationItem @@ -7222,83 +4298,36 @@ QueryStmt | | | +-SetOperationItem | | | | +-scan= | | | | | +-ProjectScan - | | | | | +-column_list=$union_all1.[a#3, b#4, c#5] - | | | | | +-expr_list= - | | | | | | +-a#3 := Literal(type=INT64, value=1) - | | | | | | +-b#4 := Literal(type=INT64, value=2) - | | | | | | +-c#5 := Literal(type=INT64, value=3) + | | | | | +-column_list=$union_all1.[b#4, c#5] + | | | | | +-node_source="resolver_set_operation_corresponding" | | | | | +-input_scan= - | | | | | +-SingleRowScan + | | | | | +-ProjectScan + | | | | | +-column_list=$union_all1.[a#3, b#4, c#5] + | | | | | +-expr_list= + | | | | | | +-a#3 := Literal(type=INT64, value=1) + | | | | | | +-b#4 := Literal(type=INT64, value=2) + | | | | | | +-c#5 := Literal(type=INT64, value=3) + | | | | | +-input_scan= + | | | | | +-SingleRowScan | | | | +-output_column_list=$union_all1.[b#4, c#5] | | | +-SetOperationItem | | | +-scan= | | | | +-ProjectScan - | | | | +-column_list=$union_all2.[c#6, b#7] - | | | | +-expr_list= - | | | | | +-c#6 := Literal(type=INT64, value=2) - | | | | | +-b#7 := Literal(type=INT64, value=3) + | | | | +-column_list=$union_all2.[b#7, c#6] + | | | | +-node_source="resolver_set_operation_corresponding" | | | | +-input_scan= - | | | | +-SingleRowScan + | | | | +-ProjectScan + | | | | +-column_list=$union_all2.[c#6, b#7] + | | | | +-expr_list= + | | | | | +-c#6 := Literal(type=INT64, value=2) + | | | | | +-b#7 := Literal(type=INT64, value=3) + | | | | +-input_scan= + | | | | +-SingleRowScan | | | +-output_column_list=$union_all2.[b#7, c#6] | | +-column_match_mode=CORRESPONDING | | +-column_propagation_mode=INNER | +-output_column_list=$union_all.[b#8, c#9] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$except_distinct.B#10 AS B [INT64] -| +-$except_distinct.c#11 AS c [INT64] -+-query= - +-SetOperationScan - +-column_list=$except_distinct.[B#10, c#11] - +-op_type=EXCEPT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$except_distinct1.[B#1, c#2] - | | +-expr_list= - | | | +-B#1 := Literal(type=INT64, value=1) - | | | +-c#2 := Literal(type=INT64, value=3) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$except_distinct1.[B#1, c#2] - +-SetOperationItem - +-scan= - | +-SetOperationScan - | +-column_list=$union_all.[b#8, c#9] - | +-op_type=UNION_ALL - | +-input_item_list= - | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=$union_all1.[b#4, c#5] - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$union_all1.[a#3, b#4, c#5] - | | | +-expr_list= - | | | | +-a#3 := Literal(type=INT64, value=1) - | | | | +-b#4 := Literal(type=INT64, value=2) - | | | | +-c#5 := Literal(type=INT64, value=3) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=$union_all1.[b#4, c#5] - | +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all2.[b#7, c#6] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$union_all2.[c#6, b#7] - | | +-expr_list= - | | | +-c#6 := Literal(type=INT64, value=2) - | | | +-b#7 := Literal(type=INT64, value=3) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all2.[b#7, c#6] - +-output_column_list=$union_all.[b#8, c#9] == # CORRESPONDING: nested set operations: column lists are not the same. @@ -7356,23 +4385,31 @@ QueryStmt | | | | +-SetOperationItem | | | | | +-scan= | | | | | | +-ProjectScan - | | | | | | +-column_list=$intersect_all1.[A#1, B#2, C#3] - | | | | | | +-expr_list= - | | | | | | | +-A#1 := Literal(type=INT64, value=1) - | | | | | | | +-B#2 := Literal(type=INT64, value=2) - | | | | | | | +-C#3 := Literal(type=INT64, value=3) + | | | | | | +-column_list=$intersect_all1.[A#1, B#2] + | | | | | | +-node_source="resolver_set_operation_corresponding" | | | | | | +-input_scan= - | | | | | | +-SingleRowScan + | | | | | | +-ProjectScan + | | | | | | +-column_list=$intersect_all1.[A#1, B#2, C#3] + | | | | | | +-expr_list= + | | | | | | | +-A#1 := Literal(type=INT64, value=1) + | | | | | | | +-B#2 := Literal(type=INT64, value=2) + | | | | | | | +-C#3 := Literal(type=INT64, value=3) + | | | | | | +-input_scan= + | | | | | | +-SingleRowScan | | | | | +-output_column_list=$intersect_all1.[A#1, B#2] | | | | +-SetOperationItem | | | | +-scan= | | | | | +-ProjectScan - | | | | | +-column_list=$intersect_all2.[B#4, A#5] - | | | | | +-expr_list= - | | | | | | +-B#4 := Literal(type=INT64, value=2) - | | | | | | +-A#5 := Literal(type=INT64, value=1) + | | | | | +-column_list=$intersect_all2.[A#5, B#4] + | | | | | +-node_source="resolver_set_operation_corresponding" | | | | | +-input_scan= - | | | | | +-SingleRowScan + | | | | | +-ProjectScan + | | | | | +-column_list=$intersect_all2.[B#4, A#5] + | | | | | +-expr_list= + | | | | | | +-B#4 := Literal(type=INT64, value=2) + | | | | | | +-A#5 := Literal(type=INT64, value=1) + | | | | | +-input_scan= + | | | | | +-SingleRowScan | | | | +-output_column_list=$intersect_all2.[A#5, B#4] | | | +-column_match_mode=CORRESPONDING | | | +-column_propagation_mode=INNER @@ -7388,61 +4425,6 @@ QueryStmt | | +-SingleRowScan | +-output_column_list=$union_all2.[A#8, B#9] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.A#10 AS A [INT64] -| +-$union_all.B#11 AS B [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[A#10, B#11] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-SetOperationScan - | | +-column_list=$intersect_all.[A#6, B#7] - | | +-op_type=INTERSECT_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=$intersect_all1.[A#1, B#2] - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$intersect_all1.[A#1, B#2, C#3] - | | | | +-expr_list= - | | | | | +-A#1 := Literal(type=INT64, value=1) - | | | | | +-B#2 := Literal(type=INT64, value=2) - | | | | | +-C#3 := Literal(type=INT64, value=3) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-output_column_list=$intersect_all1.[A#1, B#2] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=$intersect_all2.[A#5, B#4] - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$intersect_all2.[B#4, A#5] - | | | +-expr_list= - | | | | +-B#4 := Literal(type=INT64, value=2) - | | | | +-A#5 := Literal(type=INT64, value=1) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=$intersect_all2.[A#5, B#4] - | +-output_column_list=$intersect_all.[A#6, B#7] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[A#8, B#9] - | +-expr_list= - | | +-A#8 := Literal(type=INT64, value=1) - | | +-B#9 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[A#8, B#9] == # CORRESPONDING: INNER + STRICT @@ -7482,83 +4464,36 @@ QueryStmt | | | +-SetOperationItem | | | | +-scan= | | | | | +-ProjectScan - | | | | | +-column_list=$intersect_all1.[A#3, B#4, C#5] - | | | | | +-expr_list= - | | | | | | +-A#3 := Literal(type=INT64, value=1) - | | | | | | +-B#4 := Literal(type=INT64, value=2) - | | | | | | +-C#5 := Literal(type=INT64, value=3) + | | | | | +-column_list=$intersect_all1.[A#3, B#4] + | | | | | +-node_source="resolver_set_operation_corresponding" | | | | | +-input_scan= - | | | | | +-SingleRowScan + | | | | | +-ProjectScan + | | | | | +-column_list=$intersect_all1.[A#3, B#4, C#5] + | | | | | +-expr_list= + | | | | | | +-A#3 := Literal(type=INT64, value=1) + | | | | | | +-B#4 := Literal(type=INT64, value=2) + | | | | | | +-C#5 := Literal(type=INT64, value=3) + | | | | | +-input_scan= + | | | | | +-SingleRowScan | | | | +-output_column_list=$intersect_all1.[A#3, B#4] | | | +-SetOperationItem | | | +-scan= | | | | +-ProjectScan - | | | | +-column_list=$intersect_all2.[B#6, A#7] - | | | | +-expr_list= - | | | | | +-B#6 := Literal(type=INT64, value=2) - | | | | | +-A#7 := Literal(type=INT64, value=1) + | | | | +-column_list=$intersect_all2.[A#7, B#6] + | | | | +-node_source="resolver_set_operation_corresponding" | | | | +-input_scan= - | | | | +-SingleRowScan + | | | | +-ProjectScan + | | | | +-column_list=$intersect_all2.[B#6, A#7] + | | | | +-expr_list= + | | | | | +-B#6 := Literal(type=INT64, value=2) + | | | | | +-A#7 := Literal(type=INT64, value=1) + | | | | +-input_scan= + | | | | +-SingleRowScan | | | +-output_column_list=$intersect_all2.[A#7, B#6] | | +-column_match_mode=CORRESPONDING | | +-column_propagation_mode=INNER | +-output_column_list=$intersect_all.[A#8, B#9] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.A#10 AS A [INT64] -| +-$union_all.B#11 AS B [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[A#10, B#11] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[A#1, B#2] - | | +-expr_list= - | | | +-A#1 := Literal(type=INT64, value=1) - | | | +-B#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[A#1, B#2] - +-SetOperationItem - +-scan= - | +-SetOperationScan - | +-column_list=$intersect_all.[A#8, B#9] - | +-op_type=INTERSECT_ALL - | +-input_item_list= - | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=$intersect_all1.[A#3, B#4] - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$intersect_all1.[A#3, B#4, C#5] - | | | +-expr_list= - | | | | +-A#3 := Literal(type=INT64, value=1) - | | | | +-B#4 := Literal(type=INT64, value=2) - | | | | +-C#5 := Literal(type=INT64, value=3) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=$intersect_all1.[A#3, B#4] - | +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$intersect_all2.[A#7, B#6] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$intersect_all2.[B#6, A#7] - | | +-expr_list= - | | | +-B#6 := Literal(type=INT64, value=2) - | | | +-A#7 := Literal(type=INT64, value=1) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$intersect_all2.[A#7, B#6] - +-output_column_list=$intersect_all.[A#8, B#9] == # CORRESPONDING: FULL + STRICT @@ -7624,80 +4559,19 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[C#10, B#11, A#12] - | | +-expr_list= - | | | +-C#10 := Literal(type=INT64, value=3) - | | | +-B#11 := Literal(type=INT64, value=1) - | | | +-A#12 := Literal(type=INT64, value=2) + | | +-column_list=$union_all2.[A#12, C#10, B#11] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[C#10, B#11, A#12] + | | +-expr_list= + | | | +-C#10 := Literal(type=INT64, value=3) + | | | +-B#11 := Literal(type=INT64, value=1) + | | | +-A#12 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[A#12, C#10, B#11] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.A#13 AS A [INT64] -| +-$union_all.C#14 AS C [INT64] -| +-$union_all.B#15 AS B [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[A#13, C#14, B#15] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-SetOperationScan - | | +-column_list=$intersect_distinct.[A#4, C#5, B#6] - | | +-op_type=INTERSECT_DISTINCT - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[$intersect_distinct1.A#1, $intersect_distinct1.C#2, $null_column_for_outer_set_op.B#7] - | | | | +-node_source="resolver_set_operation_corresponding" - | | | | +-expr_list= - | | | | | +-B#7 := Literal(type=INT64, value=NULL) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$intersect_distinct1.[A#1, C#2] - | | | | +-expr_list= - | | | | | +-A#1 := Literal(type=INT64, value=1) - | | | | | +-C#2 := Literal(type=INT64, value=3) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-output_column_list=[$intersect_distinct1.A#1, $intersect_distinct1.C#2, $null_column_for_outer_set_op.B#7] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[$null_column_for_outer_set_op.A#8, $null_column_for_outer_set_op.C#9, $intersect_distinct2.B#3] - | | | +-node_source="resolver_set_operation_corresponding" - | | | +-expr_list= - | | | | +-A#8 := Literal(type=INT64, value=NULL) - | | | | +-C#9 := Literal(type=INT64, value=NULL) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=[$intersect_distinct2.B#3] - | | | +-expr_list= - | | | | +-B#3 := Literal(type=INT64, value=2) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=[$null_column_for_outer_set_op.A#8, $null_column_for_outer_set_op.C#9, $intersect_distinct2.B#3] - | +-output_column_list=$intersect_distinct.[A#4, C#5, B#6] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$union_all2.[A#12, C#10, B#11] - | +-input_scan= - | +-ProjectScan - | +-column_list=$union_all2.[C#10, B#11, A#12] - | +-expr_list= - | | +-C#10 := Literal(type=INT64, value=3) - | | +-B#11 := Literal(type=INT64, value=1) - | | +-A#12 := Literal(type=INT64, value=2) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$union_all2.[A#12, C#10, B#11] == # CORRESPONDING: INNER + STRICT columns not identical. @@ -7746,111 +4620,50 @@ QueryStmt | | +-output_column_list=$union_all1.[C#1, B#2, A#3] | +-SetOperationItem | +-scan= - | | +-SetOperationScan - | | +-column_list=$intersect_distinct.[A#7, C#8, B#9] - | | +-op_type=INTERSECT_DISTINCT - | | +-input_item_list= - | | | +-SetOperationItem - | | | | +-scan= - | | | | | +-ProjectScan - | | | | | +-column_list=[$intersect_distinct1.A#4, $intersect_distinct1.C#5, $null_column_for_outer_set_op.B#10] - | | | | | +-node_source="resolver_set_operation_corresponding" - | | | | | +-expr_list= - | | | | | | +-B#10 := Literal(type=INT64, value=NULL) - | | | | | +-input_scan= - | | | | | +-ProjectScan - | | | | | +-column_list=$intersect_distinct1.[A#4, C#5] - | | | | | +-expr_list= - | | | | | | +-A#4 := Literal(type=INT64, value=1) - | | | | | | +-C#5 := Literal(type=INT64, value=3) - | | | | | +-input_scan= - | | | | | +-SingleRowScan - | | | | +-output_column_list=[$intersect_distinct1.A#4, $intersect_distinct1.C#5, $null_column_for_outer_set_op.B#10] - | | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[$null_column_for_outer_set_op.A#11, $null_column_for_outer_set_op.C#12, $intersect_distinct2.B#6] - | | | | +-node_source="resolver_set_operation_corresponding" - | | | | +-expr_list= - | | | | | +-A#11 := Literal(type=INT64, value=NULL) - | | | | | +-C#12 := Literal(type=INT64, value=NULL) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=[$intersect_distinct2.B#6] - | | | | +-expr_list= - | | | | | +-B#6 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-output_column_list=[$null_column_for_outer_set_op.A#11, $null_column_for_outer_set_op.C#12, $intersect_distinct2.B#6] - | | +-column_match_mode=CORRESPONDING - | | +-column_propagation_mode=FULL + | | +-ProjectScan + | | +-column_list=$intersect_distinct.[C#8, B#9, A#7] + | | +-node_source="resolver_set_operation_corresponding" + | | +-input_scan= + | | +-SetOperationScan + | | +-column_list=$intersect_distinct.[A#7, C#8, B#9] + | | +-op_type=INTERSECT_DISTINCT + | | +-input_item_list= + | | | +-SetOperationItem + | | | | +-scan= + | | | | | +-ProjectScan + | | | | | +-column_list=[$intersect_distinct1.A#4, $intersect_distinct1.C#5, $null_column_for_outer_set_op.B#10] + | | | | | +-node_source="resolver_set_operation_corresponding" + | | | | | +-expr_list= + | | | | | | +-B#10 := Literal(type=INT64, value=NULL) + | | | | | +-input_scan= + | | | | | +-ProjectScan + | | | | | +-column_list=$intersect_distinct1.[A#4, C#5] + | | | | | +-expr_list= + | | | | | | +-A#4 := Literal(type=INT64, value=1) + | | | | | | +-C#5 := Literal(type=INT64, value=3) + | | | | | +-input_scan= + | | | | | +-SingleRowScan + | | | | +-output_column_list=[$intersect_distinct1.A#4, $intersect_distinct1.C#5, $null_column_for_outer_set_op.B#10] + | | | +-SetOperationItem + | | | +-scan= + | | | | +-ProjectScan + | | | | +-column_list=[$null_column_for_outer_set_op.A#11, $null_column_for_outer_set_op.C#12, $intersect_distinct2.B#6] + | | | | +-node_source="resolver_set_operation_corresponding" + | | | | +-expr_list= + | | | | | +-A#11 := Literal(type=INT64, value=NULL) + | | | | | +-C#12 := Literal(type=INT64, value=NULL) + | | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$intersect_distinct2.B#6] + | | | | +-expr_list= + | | | | | +-B#6 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-output_column_list=[$null_column_for_outer_set_op.A#11, $null_column_for_outer_set_op.C#12, $intersect_distinct2.B#6] + | | +-column_match_mode=CORRESPONDING + | | +-column_propagation_mode=FULL | +-output_column_list=$intersect_distinct.[C#8, B#9, A#7] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.C#13 AS C [INT64] -| +-$union_all.B#14 AS B [INT64] -| +-$union_all.A#15 AS A [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[C#13, B#14, A#15] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[C#1, B#2, A#3] - | | +-expr_list= - | | | +-C#1 := Literal(type=INT64, value=3) - | | | +-B#2 := Literal(type=INT64, value=1) - | | | +-A#3 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[C#1, B#2, A#3] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$intersect_distinct.[C#8, B#9, A#7] - | +-input_scan= - | +-SetOperationScan - | +-column_list=$intersect_distinct.[A#7, C#8, B#9] - | +-op_type=INTERSECT_DISTINCT - | +-input_item_list= - | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[$intersect_distinct1.A#4, $intersect_distinct1.C#5, $null_column_for_outer_set_op.B#10] - | | | +-node_source="resolver_set_operation_corresponding" - | | | +-expr_list= - | | | | +-B#10 := Literal(type=INT64, value=NULL) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$intersect_distinct1.[A#4, C#5] - | | | +-expr_list= - | | | | +-A#4 := Literal(type=INT64, value=1) - | | | | +-C#5 := Literal(type=INT64, value=3) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=[$intersect_distinct1.A#4, $intersect_distinct1.C#5, $null_column_for_outer_set_op.B#10] - | +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.A#11, $null_column_for_outer_set_op.C#12, $intersect_distinct2.B#6] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-A#11 := Literal(type=INT64, value=NULL) - | | | +-C#12 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$intersect_distinct2.B#6] - | | +-expr_list= - | | | +-B#6 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$null_column_for_outer_set_op.A#11, $null_column_for_outer_set_op.C#12, $intersect_distinct2.B#6] - +-output_column_list=$intersect_distinct.[C#8, B#9, A#7] == # CORRESPONDING: FULL + STRICT, not identical columns. @@ -7923,71 +4736,18 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$intersect_distinct2.[B#8, A#9] - | | +-expr_list= - | | | +-B#8 := Literal(type=INT64, value=2) - | | | +-A#9 := Literal(type=INT64, value=3) + | | +-column_list=$intersect_distinct2.[A#9, B#8] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$intersect_distinct2.[B#8, A#9] + | | +-expr_list= + | | | +-B#8 := Literal(type=INT64, value=2) + | | | +-A#9 := Literal(type=INT64, value=3) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$intersect_distinct2.[A#9, B#8] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_distinct.A#10 AS A [INT64] -| +-$intersect_distinct.B#11 AS B [INT64] -+-query= - +-SetOperationScan - +-column_list=$intersect_distinct.[A#10, B#11] - +-op_type=INTERSECT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-SetOperationScan - | | +-column_list=$except_all.[A#5, B#6] - | | +-op_type=EXCEPT_ALL - | | +-input_item_list= - | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=$except_all1.[A#1, B#2] - | | | | +-expr_list= - | | | | | +-A#1 := Literal(type=INT64, value=1) - | | | | | +-B#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-output_column_list=$except_all1.[A#1, B#2] - | | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=[$except_all2.A#3, $null_column_for_outer_set_op.B#7] - | | | +-node_source="resolver_set_operation_corresponding" - | | | +-expr_list= - | | | | +-B#7 := Literal(type=INT64, value=NULL) - | | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=$except_all2.[A#3, C#4] - | | | +-expr_list= - | | | | +-A#3 := Literal(type=INT64, value=2) - | | | | +-C#4 := Literal(type=INT64, value=3) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=[$except_all2.A#3, $null_column_for_outer_set_op.B#7] - | +-output_column_list=$except_all.[A#5, B#6] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$intersect_distinct2.[A#9, B#8] - | +-input_scan= - | +-ProjectScan - | +-column_list=$intersect_distinct2.[B#8, A#9] - | +-expr_list= - | | +-B#8 := Literal(type=INT64, value=2) - | | +-A#9 := Literal(type=INT64, value=3) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=$intersect_distinct2.[A#9, B#8] == # CORRESPONDING: LEFT + STRICT @@ -8020,97 +4780,44 @@ QueryStmt | | +-output_column_list=$intersect_all1.[B#1, A#2] | +-SetOperationItem | +-scan= - | | +-SetOperationScan - | | +-column_list=$except_all.[A#7, B#8] - | | +-op_type=EXCEPT_ALL - | | +-input_item_list= - | | | +-SetOperationItem - | | | | +-scan= - | | | | | +-ProjectScan - | | | | | +-column_list=$except_all1.[A#3, B#4] - | | | | | +-expr_list= - | | | | | | +-A#3 := Literal(type=INT64, value=1) - | | | | | | +-B#4 := Literal(type=INT64, value=2) - | | | | | +-input_scan= - | | | | | +-SingleRowScan - | | | | +-output_column_list=$except_all1.[A#3, B#4] - | | | +-SetOperationItem - | | | +-scan= - | | | | +-ProjectScan - | | | | +-column_list=[$except_all2.A#5, $null_column_for_outer_set_op.B#9] - | | | | +-node_source="resolver_set_operation_corresponding" - | | | | +-expr_list= - | | | | | +-B#9 := Literal(type=INT64, value=NULL) - | | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$except_all2.[A#5, C#6] - | | | | +-expr_list= - | | | | | +-A#5 := Literal(type=INT64, value=2) - | | | | | +-C#6 := Literal(type=INT64, value=3) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-output_column_list=[$except_all2.A#5, $null_column_for_outer_set_op.B#9] - | | +-column_match_mode=CORRESPONDING - | | +-column_propagation_mode=LEFT + | | +-ProjectScan + | | +-column_list=$except_all.[B#8, A#7] + | | +-node_source="resolver_set_operation_corresponding" + | | +-input_scan= + | | +-SetOperationScan + | | +-column_list=$except_all.[A#7, B#8] + | | +-op_type=EXCEPT_ALL + | | +-input_item_list= + | | | +-SetOperationItem + | | | | +-scan= + | | | | | +-ProjectScan + | | | | | +-column_list=$except_all1.[A#3, B#4] + | | | | | +-expr_list= + | | | | | | +-A#3 := Literal(type=INT64, value=1) + | | | | | | +-B#4 := Literal(type=INT64, value=2) + | | | | | +-input_scan= + | | | | | +-SingleRowScan + | | | | +-output_column_list=$except_all1.[A#3, B#4] + | | | +-SetOperationItem + | | | +-scan= + | | | | +-ProjectScan + | | | | +-column_list=[$except_all2.A#5, $null_column_for_outer_set_op.B#9] + | | | | +-node_source="resolver_set_operation_corresponding" + | | | | +-expr_list= + | | | | | +-B#9 := Literal(type=INT64, value=NULL) + | | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$except_all2.[A#5, C#6] + | | | | +-expr_list= + | | | | | +-A#5 := Literal(type=INT64, value=2) + | | | | | +-C#6 := Literal(type=INT64, value=3) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-output_column_list=[$except_all2.A#5, $null_column_for_outer_set_op.B#9] + | | +-column_match_mode=CORRESPONDING + | | +-column_propagation_mode=LEFT | +-output_column_list=$except_all.[B#8, A#7] +-column_match_mode=CORRESPONDING - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.B#10 AS B [INT64] -| +-$intersect_all.A#11 AS A [INT64] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[B#10, A#11] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$intersect_all1.[B#1, A#2] - | | +-expr_list= - | | | +-B#1 := Literal(type=INT64, value=2) - | | | +-A#2 := Literal(type=INT64, value=3) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$intersect_all1.[B#1, A#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=$except_all.[B#8, A#7] - | +-input_scan= - | +-SetOperationScan - | +-column_list=$except_all.[A#7, B#8] - | +-op_type=EXCEPT_ALL - | +-input_item_list= - | +-SetOperationItem - | | +-scan= - | | | +-ProjectScan - | | | +-column_list=$except_all1.[A#3, B#4] - | | | +-expr_list= - | | | | +-A#3 := Literal(type=INT64, value=1) - | | | | +-B#4 := Literal(type=INT64, value=2) - | | | +-input_scan= - | | | +-SingleRowScan - | | +-output_column_list=$except_all1.[A#3, B#4] - | +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$except_all2.A#5, $null_column_for_outer_set_op.B#9] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-B#9 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=$except_all2.[A#5, C#6] - | | +-expr_list= - | | | +-A#5 := Literal(type=INT64, value=2) - | | | +-C#6 := Literal(type=INT64, value=3) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$except_all2.A#5, $null_column_for_outer_set_op.B#9] - +-output_column_list=$except_all.[B#8, A#7] == # CORRESPONDING: LEFT + STRICT, not identical columns diff --git a/zetasql/analyzer/testdata/corresponding_combinations.test b/zetasql/analyzer/testdata/corresponding_combinations.test index 6201fe23e..e56fbd3bf 100644 --- a/zetasql/analyzer/testdata/corresponding_combinations.test +++ b/zetasql/analyzer/testdata/corresponding_combinations.test @@ -5,7 +5,7 @@ # - column match mode: BY_POSITION, CORRESPONDING # - column propagation mode: INNER, FULL -[default language_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default language_features=V_1_4_CORRESPONDING_FULL] [default enabled_ast_rewrites=DEFAULTS] SELECT int64, int32, double FROM SimpleTypes @@ -59,45 +59,6 @@ QueryStmt | +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int64#37 AS int64 [INT64] -| +-$union_all.int32#38 AS int32 [INT32] -| +-$union_all.double#39 AS double [DOUBLE] -| +-$union_all.float#40 AS float [FLOAT] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int64#37, int32#38, double#39, float#40] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-float#41 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-double#42 := Literal(type=DOUBLE, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] -- ALTERNATION GROUP: FULL,UNION,DISTINCT, -- @@ -146,45 +107,6 @@ QueryStmt | +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_distinct.int64#37 AS int64 [INT64] -| +-$union_distinct.int32#38 AS int32 [INT32] -| +-$union_distinct.double#39 AS double [DOUBLE] -| +-$union_distinct.float#40 AS float [FLOAT] -+-query= - +-SetOperationScan - +-column_list=$union_distinct.[int64#37, int32#38, double#39, float#40] - +-op_type=UNION_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-float#41 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-double#42 := Literal(type=DOUBLE, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] -- ALTERNATION GROUP: FULL,INTERSECT,ALL, -- @@ -233,45 +155,6 @@ QueryStmt | +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.int64#37 AS int64 [INT64] -| +-$intersect_all.int32#38 AS int32 [INT32] -| +-$intersect_all.double#39 AS double [DOUBLE] -| +-$intersect_all.float#40 AS float [FLOAT] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[int64#37, int32#38, double#39, float#40] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-float#41 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-double#42 := Literal(type=DOUBLE, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] -- ALTERNATION GROUP: FULL,INTERSECT,DISTINCT, -- @@ -320,45 +203,6 @@ QueryStmt | +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_distinct.int64#37 AS int64 [INT64] -| +-$intersect_distinct.int32#38 AS int32 [INT32] -| +-$intersect_distinct.double#39 AS double [DOUBLE] -| +-$intersect_distinct.float#40 AS float [FLOAT] -+-query= - +-SetOperationScan - +-column_list=$intersect_distinct.[int64#37, int32#38, double#39, float#40] - +-op_type=INTERSECT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-float#41 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-double#42 := Literal(type=DOUBLE, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] -- ALTERNATION GROUP: FULL,EXCEPT,ALL, -- @@ -407,45 +251,6 @@ QueryStmt | +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$except_all.int64#37 AS int64 [INT64] -| +-$except_all.int32#38 AS int32 [INT32] -| +-$except_all.double#39 AS double [DOUBLE] -| +-$except_all.float#40 AS float [FLOAT] -+-query= - +-SetOperationScan - +-column_list=$except_all.[int64#37, int32#38, double#39, float#40] - +-op_type=EXCEPT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-float#41 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-double#42 := Literal(type=DOUBLE, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] -- ALTERNATION GROUP: FULL,EXCEPT,DISTINCT, -- @@ -494,45 +299,6 @@ QueryStmt | +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$except_distinct.int64#37 AS int64 [INT64] -| +-$except_distinct.int32#38 AS int32 [INT32] -| +-$except_distinct.double#39 AS double [DOUBLE] -| +-$except_distinct.float#40 AS float [FLOAT] -+-query= - +-SetOperationScan - +-column_list=$except_distinct.[int64#37, int32#38, double#39, float#40] - +-op_type=EXCEPT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-float#41 := Literal(type=FLOAT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=[SimpleTypes.int64#2, SimpleTypes.int32#1, SimpleTypes.double#9, $null_column_for_outer_set_op.float#41] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-double#42 := Literal(type=DOUBLE, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=[SimpleTypes.int64#20, SimpleTypes.int32#19, $null_column_for_outer_set_op.double#42, SimpleTypes.float#26] -- ALTERNATION GROUP: UNION,ALL, -- @@ -592,50 +358,27 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-column_list=SimpleTypes.[int64#2, int32#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) | | +-output_column_list=SimpleTypes.[int64#2, int32#1] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.int64#37 AS int64 [INT64] -| +-$union_all.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_all.[int64#37, int32#38] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] -- ALTERNATION GROUP: UNION,DISTINCT, -- @@ -695,50 +438,27 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-column_list=SimpleTypes.[int64#2, int32#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) | | +-output_column_list=SimpleTypes.[int64#2, int32#1] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_distinct.int64#37 AS int64 [INT64] -| +-$union_distinct.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$union_distinct.[int64#37, int32#38] - +-op_type=UNION_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] -- ALTERNATION GROUP: INTERSECT,ALL, -- @@ -798,50 +518,27 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-column_list=SimpleTypes.[int64#2, int32#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) | | +-output_column_list=SimpleTypes.[int64#2, int32#1] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_all.int64#37 AS int64 [INT64] -| +-$intersect_all.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$intersect_all.[int64#37, int32#38] - +-op_type=INTERSECT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] -- ALTERNATION GROUP: INTERSECT,DISTINCT, -- @@ -901,50 +598,27 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-column_list=SimpleTypes.[int64#2, int32#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) | | +-output_column_list=SimpleTypes.[int64#2, int32#1] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$intersect_distinct.int64#37 AS int64 [INT64] -| +-$intersect_distinct.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$intersect_distinct.[int64#37, int32#38] - +-op_type=INTERSECT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] -- ALTERNATION GROUP: EXCEPT,ALL, -- @@ -1004,50 +678,27 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-column_list=SimpleTypes.[int64#2, int32#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) | | +-output_column_list=SimpleTypes.[int64#2, int32#1] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$except_all.int64#37 AS int64 [INT64] -| +-$except_all.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$except_all.[int64#37, int32#38] - +-op_type=EXCEPT_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] -- ALTERNATION GROUP: EXCEPT,DISTINCT, -- @@ -1107,47 +758,24 @@ QueryStmt | +-SetOperationItem | | +-scan= | | | +-ProjectScan - | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-column_list=SimpleTypes.[int64#2, int32#1] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) + | | | +-ProjectScan + | | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] + | | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) | | +-output_column_list=SimpleTypes.[int64#2, int32#1] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-column_list=SimpleTypes.[int64#20, int32#19] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) | +-output_column_list=SimpleTypes.[int64#20, int32#19] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$except_distinct.int64#37 AS int64 [INT64] -| +-$except_distinct.int32#38 AS int32 [INT32] -+-query= - +-SetOperationScan - +-column_list=$except_distinct.[int64#37, int32#38] - +-op_type=EXCEPT_DISTINCT - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1] - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=SimpleTypes.[int64#2, int32#1, double#9] - | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2, double#9], table=SimpleTypes, column_index_list=[0, 1, 8]) - | +-output_column_list=SimpleTypes.[int64#2, int32#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int64#20, int32#19] - | +-input_scan= - | +-ProjectScan - | +-column_list=SimpleTypes.[int32#19, int64#20, float#26] - | +-input_scan= - | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, float#26], table=SimpleTypes, column_index_list=[0, 1, 7]) - +-output_column_list=SimpleTypes.[int64#20, int32#19] diff --git a/zetasql/analyzer/testdata/corresponding_with_collation.test b/zetasql/analyzer/testdata/corresponding_with_collation.test index e3b65b62b..7f2e206eb 100644 --- a/zetasql/analyzer/testdata/corresponding_with_collation.test +++ b/zetasql/analyzer/testdata/corresponding_with_collation.test @@ -3,7 +3,7 @@ # is supported. [default no_run_unparser] [default enabled_ast_rewrites=DEFAULTS] -[default language_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +[default language_features=V_1_4_CORRESPONDING_FULL,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] SELECT COLLATE("b", "und:binary") AS b FULL UNION ALL CORRESPONDING SELECT COLLATE("a", "und:ci") AS a @@ -57,56 +57,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.b#6, $union_all2.a#2{Collation:"und:ci"}] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.b#3{Collation:"und:binary"} AS b [STRING] -| +-$union_all.a#4{Collation:"und:ci"} AS a [STRING] -+-query= - +-SetOperationScan - +-column_list=$union_all.[b#3, a#4] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.b#1{Collation:"und:binary"}, $null_column_for_outer_set_op.a#5] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#5 := Literal(type=STRING, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.b#1{Collation:"und:binary"}] - | | +-expr_list= - | | | +-b#1 := - | | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | | +-type_annotation_map={Collation:"und:binary"} - | | | +-Literal(type=STRING, value="b") - | | | +-Literal(type=STRING, value="und:binary", preserve_in_literal_remover=TRUE) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.b#1{Collation:"und:binary"}, $null_column_for_outer_set_op.a#5] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.b#6, $union_all2.a#2{Collation:"und:ci"}] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#6 := Literal(type=STRING, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.a#2{Collation:"und:ci"}] - | +-expr_list= - | | +-a#2 := - | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | +-type_annotation_map={Collation:"und:ci"} - | | +-Literal(type=STRING, value="a") - | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.b#6, $union_all2.a#2{Collation:"und:ci"}] == # Different collations for the matching columns throws an error. @@ -156,39 +106,6 @@ QueryStmt | +-output_column_list=[$union_all2.col#2{Collation:"und:ci"}] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.col#3{Collation:"und:ci"} AS col [STRING] -+-query= - +-SetOperationScan - +-column_list=[$union_all.col#3{Collation:"und:ci"}] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.col#1] - | | +-expr_list= - | | | +-col#1 := Literal(type=STRING, value="b") - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1.col#1] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all2.col#2{Collation:"und:ci"}] - | +-expr_list= - | | +-col#2 := - | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | +-type_annotation_map={Collation:"und:ci"} - | | +-Literal(type=STRING, value="a") - | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all2.col#2{Collation:"und:ci"}] == # COLLATION annotations are preserved for columns for both columns w/ and w/o @@ -241,51 +158,6 @@ QueryStmt | +-output_column_list=[$null_column_for_outer_set_op.b#6, $union_all2.a#3{Collation:"und:ci"}] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.b#4{Collation:"und:binary"} AS b [STRING] -| +-$union_all.a#5{Collation:"und:ci"} AS a [STRING] -+-query= - +-SetOperationScan - +-column_list=$union_all.[b#4, a#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=$union_all1.[b#1, a#2] - | | +-expr_list= - | | | +-b#1 := - | | | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | | | +-type_annotation_map={Collation:"und:binary"} - | | | | +-Literal(type=STRING, value="b") - | | | | +-Literal(type=STRING, value="und:binary", preserve_in_literal_remover=TRUE) - | | | +-a#2 := Literal(type=STRING, value="a") - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=$union_all1.[b#1, a#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$null_column_for_outer_set_op.b#6, $union_all2.a#3{Collation:"und:ci"}] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#6 := Literal(type=STRING, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all2.a#3{Collation:"und:ci"}] - | +-expr_list= - | | +-a#3 := - | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | +-type_annotation_map={Collation:"und:ci"} - | | +-Literal(type=STRING, value="a") - | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$null_column_for_outer_set_op.b#6, $union_all2.a#3{Collation:"und:ci"}] == # Collation annotation is preserved when merging with literal struct. @@ -365,81 +237,10 @@ QueryStmt | +-output_column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#9] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#4<_,{Collation:"und:ci"}> AS a [STRUCT] -| +-$union_all.b#5 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#4, b#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#7] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-b#7 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6] - | | +-expr_list= - | | | +-a#6 := Literal(type=STRUCT, value={1, "str"}) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1] - | | +-expr_list= - | | | +-a#1 := Literal(type=STRUCT, value={1, "str"}) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#7] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#8, $union_all2.b#2] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#8 := Literal(type=STRUCT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all2.b#2] - | | +-expr_list= - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$null_column_for_outer_set_op.a#8, $union_all2.b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#9] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#9 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>] - | +-expr_list= - | | +-a#3 := - | | +-MakeStruct - | | +-type=STRUCT - | | +-type_annotation_map=<_,{Collation:"und:ci"}> - | | +-field_list= - | | +-Literal(type=FLOAT, value=1, has_explicit_type=TRUE) - | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | +-type_annotation_map={Collation:"und:ci"} - | | +-Literal(type=STRING, value="str") - | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#9] == # Collation annotation is preserved when merging with literals. -[language_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_PRESERVE_ANNOTATION_IN_IMPLICIT_CAST_IN_SCAN] +[language_features=V_1_4_CORRESPONDING_FULL,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_PRESERVE_ANNOTATION_IN_IMPLICIT_CAST_IN_SCAN] SELECT (1, "str") AS a FULL UNION ALL CORRESPONDING SELECT 2 AS b @@ -516,77 +317,6 @@ QueryStmt | +-output_column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#9] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#4<_,{Collation:"und:ci"}> AS a [STRUCT] -| +-$union_all.b#5 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#4, b#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#7] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-b#7 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6] - | | +-expr_list= - | | | +-a#6 := Literal(type=STRUCT, value={1, "str"}) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1] - | | +-expr_list= - | | | +-a#1 := Literal(type=STRUCT, value={1, "str"}) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#7] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#8, $union_all2.b#2] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#8 := Literal(type=STRUCT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all2.b#2] - | | +-expr_list= - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$null_column_for_outer_set_op.a#8, $union_all2.b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#9] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#9 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>] - | +-expr_list= - | | +-a#3 := - | | +-MakeStruct - | | +-type=STRUCT - | | +-type_annotation_map=<_,{Collation:"und:ci"}> - | | +-field_list= - | | +-Literal(type=FLOAT, value=1, has_explicit_type=TRUE) - | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | +-type_annotation_map={Collation:"und:ci"} - | | +-Literal(type=STRING, value="str") - | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#9] == # Collation annotation is not preserved when implicit type cast is used and the @@ -676,91 +406,13 @@ QueryStmt | +-output_column_list=[$union_all3_cast.a#7, $null_column_for_outer_set_op.b#10] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#4 AS a [STRUCT] -| +-$union_all.b#5 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#4, b#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#8] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-b#8 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6] - | | +-expr_list= - | | | +-a#6 := Literal(type=STRUCT, value={1, "str"}, has_explicit_type=TRUE) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1] - | | +-expr_list= - | | | +-a#1 := Literal(type=STRUCT, value={1, "str"}, has_explicit_type=TRUE) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#8] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#2] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#9 := Literal(type=STRUCT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all2.b#2] - | | +-expr_list= - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all3_cast.a#7, $null_column_for_outer_set_op.b#10] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#10 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all3_cast.a#7] - | +-expr_list= - | | +-a#7 := - | | +-Cast(STRUCT -> STRUCT) - | | +-ColumnRef(type=STRUCT, type_annotation_map=<_,{Collation:"und:ci"}>, column=$union_all3.a#3<_,{Collation:"und:ci"}>) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>] - | +-expr_list= - | | +-a#3 := - | | +-MakeStruct - | | +-type=STRUCT - | | +-type_annotation_map=<_,{Collation:"und:ci"}> - | | +-field_list= - | | +-Literal(type=FLOAT, value=1, has_explicit_type=TRUE) - | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | +-type_annotation_map={Collation:"und:ci"} - | | +-Literal(type=STRING, value="str") - | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all3_cast.a#7, $null_column_for_outer_set_op.b#10] == # In contrast with the previous example, the collation annotation is preserved # when implicit type cast is used and the language feature # FEATURE_V_1_4_PRESERVE_ANNOTATION_IN_IMPLICIT_CAST_IN_SCAN is *ENABLED*. # Specifically, the annotations of string_ci of struct_val is preserved. -[language_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_PRESERVE_ANNOTATION_IN_IMPLICIT_CAST_IN_SCAN] +[language_features=V_1_4_CORRESPONDING_FULL,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_PRESERVE_ANNOTATION_IN_IMPLICIT_CAST_IN_SCAN] SELECT CAST((1, "str") AS STRUCT) AS a FULL UNION ALL CORRESPONDING SELECT 2 AS b @@ -846,83 +498,3 @@ QueryStmt | +-output_column_list=[$union_all3_cast.a#7<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#10] +-column_match_mode=CORRESPONDING +-column_propagation_mode=FULL - - -[REWRITTEN AST] -QueryStmt -+-output_column_list= -| +-$union_all.a#4<_,{Collation:"und:ci"}> AS a [STRUCT] -| +-$union_all.b#5 AS b [INT64] -+-query= - +-SetOperationScan - +-column_list=$union_all.[a#4, b#5] - +-op_type=UNION_ALL - +-input_item_list= - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#8] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-b#8 := Literal(type=INT64, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1_cast.a#6] - | | +-expr_list= - | | | +-a#6 := Literal(type=STRUCT, value={1, "str"}, has_explicit_type=TRUE) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all1.a#1] - | | +-expr_list= - | | | +-a#1 := Literal(type=STRUCT, value={1, "str"}, has_explicit_type=TRUE) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$union_all1_cast.a#6, $null_column_for_outer_set_op.b#8] - +-SetOperationItem - | +-scan= - | | +-ProjectScan - | | +-column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#2] - | | +-node_source="resolver_set_operation_corresponding" - | | +-expr_list= - | | | +-a#9 := Literal(type=STRUCT, value=NULL) - | | +-input_scan= - | | +-ProjectScan - | | +-column_list=[$union_all2.b#2] - | | +-expr_list= - | | | +-b#2 := Literal(type=INT64, value=2) - | | +-input_scan= - | | +-SingleRowScan - | +-output_column_list=[$null_column_for_outer_set_op.a#9, $union_all2.b#2] - +-SetOperationItem - +-scan= - | +-ProjectScan - | +-column_list=[$union_all3_cast.a#7<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#10] - | +-node_source="resolver_set_operation_corresponding" - | +-expr_list= - | | +-b#10 := Literal(type=INT64, value=NULL) - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all3_cast.a#7<_,{Collation:"und:ci"}>] - | +-expr_list= - | | +-a#7 := - | | +-Cast(STRUCT -> STRUCT) - | | +-type_annotation_map=<_,{Collation:"und:ci"}> - | | +-ColumnRef(type=STRUCT, type_annotation_map=<_,{Collation:"und:ci"}>, column=$union_all3.a#3<_,{Collation:"und:ci"}>) - | | +-type_modifiers=collation:[_,und:ci] - | +-input_scan= - | +-ProjectScan - | +-column_list=[$union_all3.a#3<_,{Collation:"und:ci"}>] - | +-expr_list= - | | +-a#3 := - | | +-MakeStruct - | | +-type=STRUCT - | | +-type_annotation_map=<_,{Collation:"und:ci"}> - | | +-field_list= - | | +-Literal(type=FLOAT, value=1, has_explicit_type=TRUE) - | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) - | | +-type_annotation_map={Collation:"und:ci"} - | | +-Literal(type=STRING, value="str") - | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) - | +-input_scan= - | +-SingleRowScan - +-output_column_list=[$union_all3_cast.a#7<_,{Collation:"und:ci"}>, $null_column_for_outer_set_op.b#10] diff --git a/zetasql/analyzer/testdata/create_external_schema.test b/zetasql/analyzer/testdata/create_external_schema.test new file mode 100644 index 000000000..902f68e13 --- /dev/null +++ b/zetasql/analyzer/testdata/create_external_schema.test @@ -0,0 +1,106 @@ +[default language_features=EXTERNAL_SCHEMA_DDL] +CREATE EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS() +-- +CreateExternalSchemaStmt ++-name_path=foo ++-connection= + +-Connection(connection=connection1) +== + +CREATE {{|OR REPLACE}} EXTERNAL SCHEMA {{|IF NOT EXISTS}} foo.bar.baz +WITH CONNECTION connection1 OPTIONS (a=1, b="cde") +-- +ALTERNATION GROUP: +-- +CreateExternalSchemaStmt ++-name_path=foo.bar.baz ++-option_list= +| +-a := Literal(type=INT64, value=1) +| +-b := Literal(type=STRING, value="cde") ++-connection= + +-Connection(connection=connection1) +-- +ALTERNATION GROUP: IF NOT EXISTS +-- +CreateExternalSchemaStmt ++-name_path=foo.bar.baz ++-create_mode=CREATE_IF_NOT_EXISTS ++-option_list= +| +-a := Literal(type=INT64, value=1) +| +-b := Literal(type=STRING, value="cde") ++-connection= + +-Connection(connection=connection1) +-- +ALTERNATION GROUP: OR REPLACE, +-- +CreateExternalSchemaStmt ++-name_path=foo.bar.baz ++-create_mode=CREATE_OR_REPLACE ++-option_list= +| +-a := Literal(type=INT64, value=1) +| +-b := Literal(type=STRING, value="cde") ++-connection= + +-Connection(connection=connection1) +-- +ALTERNATION GROUP: OR REPLACE,IF NOT EXISTS +-- +ERROR: CREATE EXTERNAL SCHEMA cannot have both OR REPLACE and IF NOT EXISTS [at 1:1] +CREATE OR REPLACE EXTERNAL SCHEMA IF NOT EXISTS foo.bar.baz +^ +== + +CREATE EXTERNAL SCHEMA foo WITH CONNECTION connection1 +-- +ERROR: Syntax error: Expected keyword OPTIONS but got end of statement [at 1:55] +CREATE EXTERNAL SCHEMA foo WITH CONNECTION connection1 + ^ +== + +CREATE EXTERNAL SCHEMA foo OPTIONS () +-- +ERROR: Syntax error: Expected "." or keyword WITH but got keyword OPTIONS [at 1:28] +CREATE EXTERNAL SCHEMA foo OPTIONS () + ^ +== + +CREATE {{temp|temporary|public|private}} EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS() +-- + +ALTERNATION GROUPS: + temp + temporary +-- +CreateExternalSchemaStmt ++-name_path=foo ++-create_scope=CREATE_TEMP ++-connection= + +-Connection(connection=connection1) +-- +ALTERNATION GROUP: public +-- +ERROR: CREATE EXTERNAL SCHEMA with PUBLIC or PRIVATE modifiers is not supported [at 1:1] +CREATE public EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS() +^ +-- +ALTERNATION GROUP: private +-- +ERROR: CREATE EXTERNAL SCHEMA with PUBLIC or PRIVATE modifiers is not supported [at 1:1] +CREATE private EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS() +^ +== + +[disallow_duplicate_options] +CREATE EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS(x=1, x=5) +-- +ERROR: Duplicate option specified for 'x' [at 1:69] +CREATE EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS(x=1, x=5) + ^ +== + +[language_features=] +CREATE EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS(x=1) +-- +ERROR: CREATE EXTERNAL SCHEMA is not supported [at 1:1] +CREATE EXTERNAL SCHEMA foo WITH CONNECTION connection1 OPTIONS(x=1) +^ +== diff --git a/zetasql/analyzer/testdata/create_model.test b/zetasql/analyzer/testdata/create_model.test index bc5ed7ac9..fa94cb6e0 100644 --- a/zetasql/analyzer/testdata/create_model.test +++ b/zetasql/analyzer/testdata/create_model.test @@ -204,14 +204,33 @@ transform (a + b as t) == create model tt +input (a INT64) output (b BOOL) remote -as select 1 a +as select 1 a, TRUE b -- -ERROR: The AS SELECT clause cannot be used with REMOTE [at 3:4] -as select 1 a +ERROR: The AS SELECT clause cannot be used with INPUT and OUTPUT [at 4:4] +as select 1 a, TRUE b ^ == +create model tt +remote +as select 1 a +-- +CreateModelStmt ++-name_path=tt ++-output_column_list= +| +-$create_as.a#1 AS a [INT64] ++-query= +| +-ProjectScan +| +-column_list=[$create_as.a#1] +| +-expr_list= +| | +-a#1 := Literal(type=INT64, value=1) +| +-input_scan= +| +-SingleRowScan ++-is_remote=TRUE +== + create model tt remote as (training_data as (select 1 a)) @@ -381,9 +400,9 @@ transform (c as A, c+B as d, *) remote as select 1 c, 2 b; -- -ERROR: The AS SELECT clause cannot be used with REMOTE [at 4:4] -as select 1 c, 2 b; - ^ +ERROR: The TRANSFORM clause cannot be used with REMOTE [at 2:1] +transform (c as A, c+B as d, *) +^ == [language_features=ANALYTIC_FUNCTIONS] @@ -967,7 +986,7 @@ remote with connection connection1 options (abc = def) as select * from t2; -- -ERROR: The AS SELECT clause cannot be used with REMOTE [at 7:4] +ERROR: The AS SELECT clause cannot be used with INPUT and OUTPUT [at 7:4] as select * from t2; ^ == diff --git a/zetasql/analyzer/testdata/create_sql_aggregate_function.test b/zetasql/analyzer/testdata/create_sql_aggregate_function.test index 67795ef7e..323b7b9c0 100644 --- a/zetasql/analyzer/testdata/create_sql_aggregate_function.test +++ b/zetasql/analyzer/testdata/create_sql_aggregate_function.test @@ -563,13 +563,34 @@ CreateFunctionStmt == # Limit only accepts constants, which doesn't include arguments currently. -[language_features=CREATE_AGGREGATE_FUNCTION,ANALYTIC_FUNCTIONS,V_1_1_LIMIT_IN_AGGREGATE] +[language_features={{CREATE_AGGREGATE_FUNCTION,ANALYTIC_FUNCTIONS,V_1_1_LIMIT_IN_AGGREGATE|CREATE_AGGREGATE_FUNCTION,ANALYTIC_FUNCTIONS,V_1_1_LIMIT_IN_AGGREGATE,V_1_4_LIMIT_OFFSET_EXPRESSIONS}}] create aggregate function myfunc(x bytes, y int64 NOT AGGREGATE) as ( array_agg(x limit y) ) -- -ERROR: Syntax error: Unexpected identifier "y" [at 2:21] +ALTERNATION GROUP: CREATE_AGGREGATE_FUNCTION,ANALYTIC_FUNCTIONS,V_1_1_LIMIT_IN_AGGREGATE +-- +ERROR: LIMIT expects an integer literal or parameter [at 2:21] ( array_agg(x limit y) ) ^ +-- +ALTERNATION GROUP: CREATE_AGGREGATE_FUNCTION,ANALYTIC_FUNCTIONS,V_1_1_LIMIT_IN_AGGREGATE,V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +CreateFunctionStmt ++-name_path=myfunc ++-return_type=ARRAY ++-argument_name_list=[x, y] ++-signature=(BYTES x, INT64 {is_not_aggregate: true} y) -> ARRAY ++-is_aggregate=TRUE ++-language="SQL" ++-code="array_agg(x limit y)" ++-aggregate_expression_list= +| +-$agg1#1 := +| +-AggregateFunctionCall(ZetaSQL:array_agg(BYTES) -> ARRAY) +| +-ArgumentRef(parse_location=80-81, type=BYTES, name="x", argument_kind=AGGREGATE) +| +-limit= +| +-ArgumentRef(type=INT64, name="y", argument_kind=NOT_AGGREGATE) ++-function_expression= + +-ColumnRef(type=ARRAY, column=$aggregate.$agg1#1) == # Aggregate UDF using DISTINCT modifiers on the aggregates. @@ -940,3 +961,27 @@ AS ( ERROR: An aggregate function that has both DISTINCT and ORDER BY arguments can only ORDER BY expressions that are arguments to the function [at 5:51] ARRAY_TO_STRING(ARRAY_AGG(DISTINCT foo ORDER BY foo), delimiter) ^ +== + +[language_features=CREATE_AGGREGATE_FUNCTION,V_1_4_GROUPING_BUILTIN] +CREATE TEMP AGGREGATE FUNCTION MyGroupingFunc(x STRING) +RETURNS BOOL +AS ( + GROUPING(x) +); +-- +ERROR: GROUPING function is not supported in SQL function body. [at 1:1] +CREATE TEMP AGGREGATE FUNCTION MyGroupingFunc(x STRING) +^ +== + +[language_features=CREATE_AGGREGATE_FUNCTION,V_1_4_GROUPING_BUILTIN] +CREATE TEMP AGGREGATE FUNCTION MyAggFunc(x STRING, y INT64, z int32 NOT AGGREGATE) +RETURNS BOOL +AS ( + GROUPING(x) + MAX(y) + z +); +-- +ERROR: GROUPING function is not supported in SQL function body. [at 1:1] +CREATE TEMP AGGREGATE FUNCTION MyAggFunc(x STRING, y INT64, z int32 NOT AGGRE... +^ diff --git a/zetasql/analyzer/testdata/differential_privacy.test b/zetasql/analyzer/testdata/differential_privacy.test index 84aec7c77..e8a35944c 100644 --- a/zetasql/analyzer/testdata/differential_privacy.test +++ b/zetasql/analyzer/testdata/differential_privacy.test @@ -1,6 +1,6 @@ # Specify WITH DIFFERENTIAL_PRIVACY but no FROM clause [default also_show_signature_mismatch_details] -[default language_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS,NAMED_ARGUMENTS,NUMERIC_TYPE,JSON_TYPE] +[default language_features=DIFFERENTIAL_PRIVACY,V_1_3_QUALIFY,ANALYTIC_FUNCTIONS,DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS,NAMED_ARGUMENTS,NUMERIC_TYPE,JSON_TYPE] select with differential_privacy sum(); -- ERROR: SELECT without FROM clause cannot specify WITH DIFFERENTIAL_PRIVACY [at 1:1] @@ -19,6 +19,73 @@ select with differential_privacy (int64) ^ == +# This query produced a confusing error message stating the query does not +# include aggregation before the fix for b/309029333 +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +[parse_location_record_type=PARSE_LOCATION_RECORD_CODE_SEARCH] +select with differential_privacy 1 +from SimpleTypes +qualify sum(int64) > rank() OVER (ORDER BY 'a'); +-- +[PRE-REWRITE AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#21 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-parse_location=0-99 + +-column_list=[$query.$col1#21] + +-input_scan= + +-FilterScan + +-column_list=[$query.$col1#21, $aggregate.$agg1#22, $analytic.$analytic1#23] + +-input_scan= + | +-AnalyticScan + | +-column_list=[$query.$col1#21, $aggregate.$agg1#22, $analytic.$analytic1#23] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[$query.$col1#21, $aggregate.$agg1#22, $orderby.$orderbycol1#24] + | | +-expr_list= + | | | +-$orderbycol1#24 := Literal(parse_location=95-98, type=STRING, value="a") + | | +-input_scan= + | | +-ProjectScan + | | +-column_list=[$query.$col1#21, $aggregate.$agg1#22] + | | +-expr_list= + | | | +-$col1#21 := Literal(parse_location=33-34, type=INT64, value=1) + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$aggregate.$agg1#22] + | | +-input_scan= + | | | +-TableScan(parse_location=40-51, column_list=[SimpleTypes.int64#2], table=SimpleTypes, column_index_list=[1]) + | | +-aggregate_list= + | | +-$agg1#22 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-parse_location=60-63 + | | +-ColumnRef(parse_location=64-69, type=INT64, column=SimpleTypes.int64#2) + | | +-Literal(type=STRUCT, value=NULL) + | +-function_group_list= + | +-AnalyticFunctionGroup + | +-order_by= + | | +-WindowOrdering + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | +-ColumnRef(type=STRING, column=$orderby.$orderbycol1#24) + | +-analytic_function_list= + | +-$analytic1#23 := AnalyticFunctionCall(ZetaSQL:rank() -> INT64) + +-filter_expr= + +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + +-ColumnRef(type=INT64, column=$aggregate.$agg1#22) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#23) +[REPLACED_LITERALS] +select with differential_privacy @_p0_INT64 +from SimpleTypes +qualify sum(int64) > rank() OVER (ORDER BY @_p1_STRING); + +Rewrite ERROR: A SELECT WITH DIFFERENTIAL_PRIVACY query must query data with a specified privacy unit column [at 1:1] +select with differential_privacy 1 +^ +== + [enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] select WITH DIFFERENTIAL_PRIVACY sum(int64), max(int64) from SimpleTypesWithAnonymizationUid; @@ -2074,9 +2141,14 @@ QueryStmt select with differential_privacy count(string, contribution_bounds_per_group => (0,100), @test_param_double) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator COUNT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT, DOUBLE. Supported signatures: COUNT(T2, [contribution_bounds_per_group => STRUCT]); COUNT(T2, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, [contribution_bounds_per_group => STRUCT]); COUNT(T2, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator COUNT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT, DOUBLE. Supported signatures: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64; COUNT(T2, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON; COUNT(T2, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport [at 1:34] select with differential_privacy count(string, contribution_bounds_per_group ... ^ +-- +Signature Mismatch Details: +ERROR: Call to function ZetaSQL:$differential_privacy_count must not specify positional arguments after named arguments; named arguments must be specified last in the argument list [at 1:48] +select with differential_privacy count(string, contribution_bounds_per_group ... + ^ == [enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] @@ -7836,7 +7908,18 @@ select with differential_privacy APPROX_QUANTILES(*, 4, contribution_bounds_p... select with differential_privacy APPROX_QUANTILES(int64, contribution_bounds_per_row =>(4, bool), report_format => "PROTO") from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT, STRING. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT); APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, contribution_bounds_per_row => STRUCT); APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, contribution_bounds_per_row => STRUCT) [at 1:34] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT, STRING. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY; APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, contribution_bounds_per_row => STRUCT) -> JSON; APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, contribution_bounds_per_row => STRUCT) -> zetasql.functions.DifferentialPrivacyOutputWithReport [at 1:34] +select with differential_privacy APPROX_QUANTILES(int64, contribution_bounds_... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT, STRING + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Named argument `report_format` does not exist in signature + Signature: APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, contribution_bounds_per_row => STRUCT) -> JSON + Signature requires at least 4 arguments, found 3 arguments + Signature: APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, contribution_bounds_per_row => STRUCT) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Signature requires at least 4 arguments, found 3 arguments [at 1:34] select with differential_privacy APPROX_QUANTILES(int64, contribution_bounds_... ^ == @@ -7845,7 +7928,18 @@ select with differential_privacy APPROX_QUANTILES(int64, contribution_bounds_... select with differential_privacy APPROX_QUANTILES(double, contribution_bounds_per_row => (bool, 3), report_format => "PROTO") from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, STRUCT, STRING. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT); APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, contribution_bounds_per_row => STRUCT); APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, contribution_bounds_per_row => STRUCT) [at 1:34] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, STRUCT, STRING. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY; APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, contribution_bounds_per_row => STRUCT) -> JSON; APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, contribution_bounds_per_row => STRUCT) -> zetasql.functions.DifferentialPrivacyOutputWithReport [at 1:34] +select with differential_privacy APPROX_QUANTILES(double, contribution_bounds... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, STRUCT, STRING + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Named argument `report_format` does not exist in signature + Signature: APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, contribution_bounds_per_row => STRUCT) -> JSON + Signature requires at least 4 arguments, found 3 arguments + Signature: APPROX_QUANTILES(DOUBLE, INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, contribution_bounds_per_row => STRUCT) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Signature requires at least 4 arguments, found 3 arguments [at 1:34] select with differential_privacy APPROX_QUANTILES(double, contribution_bounds... ^ == @@ -9251,3 +9345,235 @@ GROUP BY int32 Rewrite ERROR: Reading the table SimpleTypesWithAnonymizationUid containing user data in expression subqueries is not allowed [at 7:31] EXISTS(SELECT 1 FROM SimpleTypesWithAnonymizationUid), ^ +== + +# GROUPING function is unsupported in with DP clause. +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,NUMERIC_TYPE,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT WITH DIFFERENTIAL_PRIVACY GROUPING(int64), int64 +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double; +-- +ERROR: GROUPING function is not supported in differential privacy queries [at 1:34] +SELECT WITH DIFFERENTIAL_PRIVACY GROUPING(int64), int64 + ^ +== + +# GROUPING function is unsupported in the same select list with DP functions. +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,NUMERIC_TYPE,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT WITH DIFFERENTIAL_PRIVACY int64, GROUPING(int64) +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double; +-- +ERROR: GROUPING function is not supported in differential privacy queries [at 1:41] +SELECT WITH DIFFERENTIAL_PRIVACY int64, GROUPING(int64) + ^ +== + +# GROUPING function is unsupported in the same query with the DP query. +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,NUMERIC_TYPE,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT WITH DIFFERENTIAL_PRIVACY int64 +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double HAVING GROUPING(int64) = 0; +-- +ERROR: GROUPING function is not supported in differential privacy queries [at 3:31] +GROUP BY int64, double HAVING GROUPING(int64) = 0; + ^ +== + +# GROUPING function should be allowed in a subquery. +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,NUMERIC_TYPE,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +SELECT + WITH DIFFERENTIAL_PRIVACY int64, + EXISTS(SELECT GROUPING(key) FROM KeyValue GROUP BY key) +FROM SimpleTypesWithAnonymizationUid +GROUP BY int64, double; +-- +QueryStmt ++-output_column_list= +| +-$groupby.int64#18 AS int64 [INT64] +| +-$query.$col2#25 AS `$col2` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$groupby.int64#18, $query.$col2#25] + +-expr_list= + | +-$col2#25 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=EXISTS + | +-subquery= + | +-ProjectScan + | +-column_list=[$grouping_call.$grouping_call1#24] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.key#23, $grouping_call.$grouping_call1#24] + | +-input_scan= + | | +-TableScan(column_list=[KeyValue.Key#20], table=KeyValue, column_index_list=[0]) + | +-group_by_list= + | | +-key#23 := ColumnRef(type=INT64, column=KeyValue.Key#20) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.key#23) + | +-output_column=$grouping_call.$grouping_call1#24 + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.int64#18] + +-input_scan= + | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, double#9], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 8]) + +-group_by_list= + +-int64#18 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + +-double#19 := ColumnRef(type=DOUBLE, column=SimpleTypesWithAnonymizationUid.double#9) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.int64#18 AS int64 [INT64] +| +-$query.$col2#25 AS `$col2` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$groupby.int64#18, $query.$col2#25] + +-expr_list= + | +-$col2#25 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=EXISTS + | +-subquery= + | +-ProjectScan + | +-column_list=[$grouping_call.$grouping_call1#24] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.key#23, $grouping_call.$grouping_call1#24] + | +-input_scan= + | | +-TableScan(column_list=[KeyValue.Key#20], table=KeyValue, column_index_list=[0]) + | +-group_by_list= + | | +-key#23 := ColumnRef(type=INT64, column=KeyValue.Key#20) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.key#23) + | +-output_column=$grouping_call.$grouping_call1#24 + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.int64#18] + +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.int64_partial#27, $groupby.double_partial#28, $group_by.$uid#29] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, double#9, uid#26], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 8, 10]) + | +-group_by_list= + | +-int64_partial#27 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | +-double_partial#28 := ColumnRef(type=DOUBLE, column=SimpleTypesWithAnonymizationUid.double#9) + | +-$uid#29 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#26) + +-group_by_list= + | +-int64#18 := ColumnRef(type=INT64, column=$groupby.int64_partial#27) + | +-double#19 := ColumnRef(type=DOUBLE, column=$groupby.double_partial#28) + +-aggregate_list= + | +-$group_selection_threshold_col#31 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=INT64, value=1) + | +-Literal(type=STRUCT, value={0, 1}) + +-group_selection_threshold_expr= + +-ColumnRef(type=INT64, column=$differential_privacy.$group_selection_threshold_col#31) +== + +# GROUPING function should be allowed in the with clause. +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,NUMERIC_TYPE,V_1_4_GROUPING_BUILTIN] +[enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] +WITH T AS ( + SELECT uid, int64, GROUPING(int64) AS grp + FROM SimpleTypesWithAnonymizationUid + GROUP BY uid, int64 +) +SELECT WITH DIFFERENTIAL_PRIVACY int64 +FROM T +GROUP BY int64; +-- +QueryStmt ++-output_column_list= +| +-$groupby.int64#20 AS int64 [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#20] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#11], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | +-group_by_list= + | | +-uid#14 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-int64#15 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.int64#15) + | +-output_column=$grouping_call.$grouping_call1#16 + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#20] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.int64#20] + +-input_scan= + | +-WithRefScan(column_list=T.[uid#17, int64#18, grp#19], with_query_name="T") + +-group_by_list= + +-int64#20 := ColumnRef(type=INT64, column=T.int64#18) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.int64#20 AS int64 [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#20] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.uid#14, $groupby.int64#15, $grouping_call.$grouping_call1#16] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#11], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | +-group_by_list= + | | +-uid#14 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-int64#15 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | +-grouping_call_list= + | +-GroupingCall + | +-group_by_column= + | | +-ColumnRef(type=INT64, column=$groupby.int64#15) + | +-output_column=$grouping_call.$grouping_call1#16 + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#20] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.int64#20] + +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.int64_partial#21, $group_by.$uid#22] + | +-input_scan= + | | +-WithRefScan(column_list=T.[uid#17, int64#18, grp#19], with_query_name="T") + | +-group_by_list= + | +-int64_partial#21 := ColumnRef(type=INT64, column=T.int64#18) + | +-$uid#22 := ColumnRef(type=INT64, column=T.uid#17) + +-group_by_list= + | +-int64#20 := ColumnRef(type=INT64, column=$groupby.int64_partial#21) + +-aggregate_list= + | +-$group_selection_threshold_col#24 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=INT64, value=1) + | +-Literal(type=STRUCT, value={0, 1}) + +-group_selection_threshold_expr= + +-ColumnRef(type=INT64, column=$differential_privacy.$group_selection_threshold_col#24) diff --git a/zetasql/analyzer/testdata/differential_privacy_group_selection_strategy.test b/zetasql/analyzer/testdata/differential_privacy_group_selection_strategy.test index 5d552e60d..6b9a4175b 100644 --- a/zetasql/analyzer/testdata/differential_privacy_group_selection_strategy.test +++ b/zetasql/analyzer/testdata/differential_privacy_group_selection_strategy.test @@ -1,5 +1,5 @@ [default also_show_signature_mismatch_details] -[default language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS] +[default language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY] [default enabled_ast_rewrites=DEFAULTS,+ANONYMIZATION] [default no_run_unparser] # When no group_selection_strategy is set, default to laplace thresholding. @@ -115,7 +115,7 @@ QueryStmt +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) == -[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Forbidden operations after public groups joins are allowed outside of the # dp aggregate scan with per-group contribution bounding. SELECT string, anon_users @@ -172,9 +172,10 @@ QueryStmt | | | | | +-group_by_list= | | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) | | | +-aggregate_list= @@ -236,9 +237,10 @@ QueryStmt | | | | | | +-group_by_list= | | | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -430,6 +432,35 @@ QueryStmt Rewrite ERROR: Differential privacy option group_selection_strategy PUBLIC_GROUPS has not been enabled == +# public groups requires V_1_1_WITH_ON_SUBQUERY feature +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS] +select with differential_privacy options(group_selection_strategy=PUBLIC_GROUPS) +count(*) +from SimpleTypesWithAnonymizationUid; +-- +[PRE-REWRITE AST] +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#13 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#13] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$aggregate.$agg1#13] + +-input_scan= + | +-TableScan(table=SimpleTypesWithAnonymizationUid) + +-aggregate_list= + | +-$agg1#13 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +Rewrite ERROR: Differential privacy option group_selection_strategy PUBLIC_GROUPS is not supported without support for WITH subqueries +== + # Happy path without max_groups_contributed and no GROUP BY. select with differential_privacy options(group_selection_strategy=PUBLIC_GROUPS) count(*) @@ -587,9 +618,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -627,9 +659,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-has_using=TRUE | +-group_by_list= | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -684,9 +717,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -703,22 +737,149 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan + +-column_list=[$aggregate.$agg1#32] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= + +-DifferentialPrivacyAggregateScan + +-column_list=[$aggregate.$agg1#32] + +-input_scan= + | +-JoinScan + | +-column_list=[$public_groups0.int64#40, $aggregate.$agg1_partial#36, $groupby.int64_partial#37, $group_by.$uid#38] + | +-join_type=RIGHT + | +-left_scan= + | | +-SampleScan + | | +-column_list=[$aggregate.$agg1_partial#36, $groupby.int64_partial#37, $group_by.$uid#38] + | | +-input_scan= + | | | +-AggregateScan + | | | +-column_list=[$aggregate.$agg1_partial#36, $groupby.int64_partial#37, $group_by.$uid#38] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#2, $distinct.int64#31, SimpleTypesWithAnonymizationUid.uid#34] + | | | | +-left_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-has_using=TRUE + | | | +-group_by_list= + | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) + | | | +-aggregate_list= + | | | +-$agg1_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | +-method="RESERVOIR" + | | +-size= + | | | +-Literal(type=INT64, value=3) + | | +-unit=ROWS + | | +-partition_by_list= + | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | +-right_scan= + | | +-WithRefScan(column_list=[$public_groups0.int64#40], with_query_name="$public_groups0") + | +-join_expr= + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$groupby.int64_partial#37) + | +-ColumnRef(type=INT64, column=$public_groups0.int64#40) + +-group_by_list= + | +-int64#33 := ColumnRef(type=INT64, column=$public_groups0.int64#40) + +-aggregate_list= + | +-$agg1#32 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#36) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-max_groups_contributed := Literal(type=INT64, value=3) +== + +# Happy path with max_groups_contributed set via default value, GROUP BY, and +# RIGHT OUTER JOIN. This introduces an additional join and adds a WithScan for +# the public groups. +[default_anon_kappa_value=3] +select with differential_privacy options( + group_selection_strategy=PUBLIC_GROUPS +) count(*) +from SimpleTypesWithAnonymizationUid +right outer join (select distinct int64 from SimpleTypes) + using (int64) +group by int64; +-- +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#32 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#32] + +-input_scan= + +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypesWithAnonymizationUid.int64#2, $distinct.int64#31] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#2], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#31] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#14], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE + +-group_by_list= + | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) + +-aggregate_list= + | +-$agg1#32 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#32 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#32] + +-input_scan= + +-WithScan + +-column_list=[$aggregate.$agg1#32] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -739,9 +900,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -775,6 +937,108 @@ QueryStmt +-max_groups_contributed := Literal(type=INT64, value=3) == +# Happy path with max_groups_contributed explicitly unset via the user, despite +# a default value. This should *not* add a second join. +[default_anon_kappa_value=3] +select with differential_privacy options( + group_selection_strategy=PUBLIC_GROUPS, + max_groups_contributed=NULL +) count(*) +from SimpleTypesWithAnonymizationUid +right outer join (select distinct int64 from SimpleTypes) + using (int64) +group by int64; +-- +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#32 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#32] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$aggregate.$agg1#32] + +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypesWithAnonymizationUid.int64#2, $distinct.int64#31] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#2], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#31] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#14], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE + +-group_by_list= + | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) + +-aggregate_list= + | +-$agg1#32 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-max_groups_contributed := Literal(type=INT64, value=NULL) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#32 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#32] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$aggregate.$agg1#32] + +-input_scan= + | +-AggregateScan + | +-column_list=[$aggregate.$agg1_partial#36, $groupby.int64_partial#37, $group_by.$uid#38] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#2, $distinct.int64#31, SimpleTypesWithAnonymizationUid.uid#34] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.int64#31] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.int64#14], table=SimpleTypes, column_index_list=[1]) + | | | +-group_by_list= + | | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-has_using=TRUE + | +-group_by_list= + | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) + | +-aggregate_list= + | +-$agg1_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + +-group_by_list= + | +-int64#33 := ColumnRef(type=INT64, column=$groupby.int64_partial#37) + +-aggregate_list= + | +-$agg1#32 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#36) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-max_groups_contributed := Literal(type=INT64, value=NULL) +== + # Using the uid column as a public groups column is allowed, but does not # provide any useful results. select with differential_privacy options( @@ -809,9 +1073,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-uid#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -828,22 +1093,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#40] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#41], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#40 := ColumnRef(type=INT64, column=SimpleTypes.int64#41) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#40] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#41], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#40 := ColumnRef(type=INT64, column=SimpleTypes.int64#41) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -864,9 +1129,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-uid_partial#36 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -935,9 +1201,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#21], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#19) +-aggregate_list= @@ -954,22 +1221,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -990,9 +1257,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#19) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -1064,9 +1332,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#21], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$groupby.int64#19) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$groupby.int64#19) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$groupby.int64#19) +-aggregate_list= @@ -1084,25 +1353,25 @@ QueryStmt | +-$groupby.int64#33 AS int64 [INT64] | +-$aggregate.$agg1#32 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.int64#33, $aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-ProjectScan - | +-column_list=[$groupby.int64#41] - | +-input_scan= - | +-AggregateScan - | +-column_list=[$groupby.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.int64#33, $aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#41] + | +-input_scan= + | +-AggregateScan + | +-column_list=[$groupby.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.int64#33, $aggregate.$agg1#32] +-input_scan= @@ -1123,9 +1392,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=$groupby.int64#19) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$groupby.int64#19) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$groupby.int64#19) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -1200,9 +1470,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#14, uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10], alias="b") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.int64#43] @@ -1211,9 +1482,10 @@ QueryStmt | | +-group_by_list= | | +-int64#43 := ColumnRef(type=INT64, column=SimpleTypes.int64#26) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) - | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) + | | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | +-has_using=TRUE +-group_by_list= | +-int64#45 := ColumnRef(type=INT64, column=$distinct.int64#43) +-aggregate_list= @@ -1231,22 +1503,22 @@ QueryStmt | +-$groupby.int64#45 AS int64 [INT64] | +-$aggregate.$agg1#44 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.int64#45, $aggregate.$agg1#44] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#52] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#53], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#52 := ColumnRef(type=INT64, column=SimpleTypes.int64#53) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.int64#45, $aggregate.$agg1#44] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#52] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#53], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#52 := ColumnRef(type=INT64, column=SimpleTypes.int64#53) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.int64#45, $aggregate.$agg1#44] +-input_scan= @@ -1274,15 +1546,17 @@ QueryStmt | | | | | +-right_scan= | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#14, uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10], alias="b") | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | | | +-has_using=TRUE | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#43], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#43) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#48 := ColumnRef(type=INT64, column=$distinct.int64#43) | | | | +-$uid#49 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) @@ -1350,13 +1624,14 @@ QueryStmt | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-string#32 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | +-has_using=TRUE +-group_by_list= | +-int64#34 := ColumnRef(type=INT64, column=$distinct.int64#31) | +-string#35 := ColumnRef(type=STRING, column=$distinct.string#32) @@ -1374,23 +1649,23 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#33 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#33] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=$distinct.[int64#45, string#46] - | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int64#47, string#48], table=SimpleTypes, column_index_list=[1, 4]) - | +-group_by_list= - | +-int64#45 := ColumnRef(type=INT64, column=SimpleTypes.int64#47) - | +-string#46 := ColumnRef(type=STRING, column=SimpleTypes.string#48) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#33] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=$distinct.[int64#45, string#46] + | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int64#47, string#48], table=SimpleTypes, column_index_list=[1, 4]) + | +-group_by_list= + | +-int64#45 := ColumnRef(type=INT64, column=SimpleTypes.int64#47) + | +-string#46 := ColumnRef(type=STRING, column=SimpleTypes.string#48) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#33] +-input_scan= @@ -1411,13 +1686,14 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=$distinct.[int64#31, string#32], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | | | | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#39 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-string_partial#40 := ColumnRef(type=STRING, column=$distinct.string#32) @@ -1499,9 +1775,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -1539,9 +1816,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-has_using=TRUE | +-group_by_list= | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -1593,9 +1871,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -1701,9 +1980,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#34 := ColumnRef(type=INT64, column=$full_join.int64#32) +-aggregate_list= @@ -1749,9 +2029,10 @@ QueryStmt | | +-group_by_list= | | +-uid#25 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | +-ColumnRef(type=INT64, column=$distinct.uid#25) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-ColumnRef(type=INT64, column=$distinct.uid#25) + | +-has_using=TRUE +-group_by_list= | +-uid#27 := ColumnRef(type=INT64, column=$distinct.uid#25) +-aggregate_list= @@ -1910,9 +2191,10 @@ QueryStmt | | +-group_by_list= | | +-int64#26 := ColumnRef(type=INT64, column=$distinct.int64#25) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#26) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#26) + | +-has_using=TRUE +-group_by_list= | +-int64#28 := ColumnRef(type=INT64, column=$distinct.int64#26) +-aggregate_list= @@ -1953,9 +2235,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypes.int64#14], table=SimpleTypes, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#14) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#14) + | +-has_using=TRUE +-group_by_list= | +-int64#32 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) +-aggregate_list= @@ -1996,9 +2279,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#20], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | +-has_using=TRUE +-group_by_list= | +-int64#32 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) +-aggregate_list= @@ -2042,9 +2326,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#20], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | +-has_using=TRUE +-group_by_list= | +-int64#32 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) +-aggregate_list= @@ -2091,9 +2376,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=$distinct.int64#31) +-aggregate_list= @@ -2110,22 +2396,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -2146,9 +2432,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#37 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -2234,22 +2521,22 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#32 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#42], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#41 := ColumnRef(type=INT64, column=SimpleTypes.int64#42) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#32] +-input_scan= @@ -2373,28 +2660,28 @@ QueryStmt | +-$groupby.public_dayofweek#16 AS public_dayofweek [INT64] | +-$aggregate.$agg1#15 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.public_dayofweek#24] - | +-input_scan= - | | +-ArrayScan - | | +-column_list=[$array.public_dayofweek#25] - | | +-array_expr_list= - | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) - | | | +-Literal(type=INT64, value=1) - | | | +-Literal(type=INT64, value=7) - | | +-element_column_list=[$array.public_dayofweek#25] - | +-group_by_list= - | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.public_dayofweek#24] + | +-input_scan= + | | +-ArrayScan + | | +-column_list=[$array.public_dayofweek#25] + | | +-array_expr_list= + | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=7) + | | +-element_column_list=[$array.public_dayofweek#25] + | +-group_by_list= + | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] +-input_scan= @@ -2523,28 +2810,28 @@ QueryStmt | +-$groupby.public_dayofweek#16 AS public_dayofweek [INT64] | +-$aggregate.$agg1#15 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.public_dayofweek#24] - | +-input_scan= - | | +-ArrayScan - | | +-column_list=[$array.public_dayofweek#25] - | | +-array_expr_list= - | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) - | | | +-Literal(type=INT64, value=1) - | | | +-Literal(type=INT64, value=7) - | | +-element_column_list=[$array.public_dayofweek#25] - | +-group_by_list= - | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.public_dayofweek#24] + | +-input_scan= + | | +-ArrayScan + | | +-column_list=[$array.public_dayofweek#25] + | | +-array_expr_list= + | | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=7) + | | +-element_column_list=[$array.public_dayofweek#25] + | +-group_by_list= + | +-public_dayofweek#24 := ColumnRef(type=INT64, column=$array.public_dayofweek#25) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.public_dayofweek#16, $aggregate.$agg1#15] +-input_scan= @@ -2642,9 +2929,10 @@ QueryStmt | | +-group_by_list= | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | +-has_using=TRUE +-group_by_list= | +-$groupbycol1#33 := | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) @@ -2701,9 +2989,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#31 := ColumnRef(type=INT64, column=SimpleTypes.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.string#50] @@ -2712,9 +3001,10 @@ QueryStmt | | +-group_by_list= | | +-string#50 := ColumnRef(type=STRING, column=SimpleTypes.string#36) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | +-has_using=TRUE +-group_by_list= | +-int64#52 := ColumnRef(type=INT64, column=$distinct.int64#31) | +-string#53 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -2732,31 +3022,31 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#51 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#51] - +-with_entry_list= - | +-WithEntry - | | +-with_query_name="$public_groups0" - | | +-with_subquery= - | | +-AggregateScan - | | +-column_list=[$distinct.int64#63] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) - | | +-group_by_list= - | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) - | +-WithEntry - | +-with_query_name="$public_groups1" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#65] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#51] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#63] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) + | +-WithEntry + | +-with_query_name="$public_groups1" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#65] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#51] +-input_scan= @@ -2784,15 +3074,17 @@ QueryStmt | | | | | | +-right_scan= | | | | | | | +-WithRefScan(column_list=[$distinct.int64#31], with_query_name="$public_groups0") | | | | | | +-join_expr= - | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#31) + | | | | | | +-has_using=TRUE | | | | | +-right_scan= | | | | | | +-WithRefScan(column_list=[$distinct.string#50], with_query_name="$public_groups1") | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-int64_partial#57 := ColumnRef(type=INT64, column=$distinct.int64#31) | | | | | +-string_partial#58 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -2874,9 +3166,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, string#24], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.string#50] @@ -2885,9 +3178,10 @@ QueryStmt | | +-group_by_list= | | +-string#50 := ColumnRef(type=STRING, column=SimpleTypes.string#36) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) - | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) + | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | +-has_using=TRUE +-group_by_list= | +-int64#52 := ColumnRef(type=INT64, column=$distinct.int64#19) | +-string#53 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -2905,31 +3199,31 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#51 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#51] - +-with_entry_list= - | +-WithEntry - | | +-with_query_name="$public_groups0" - | | +-with_subquery= - | | +-AggregateScan - | | +-column_list=[$distinct.int64#63] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) - | | +-group_by_list= - | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) - | +-WithEntry - | +-with_query_name="$public_groups1" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#65] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#51] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#63] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#64], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#63 := ColumnRef(type=INT64, column=SimpleTypes.int64#64) + | +-WithEntry + | +-with_query_name="$public_groups1" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#65] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#66], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#65 := ColumnRef(type=STRING, column=SimpleTypes.string#66) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#51] +-input_scan= @@ -2957,15 +3251,17 @@ QueryStmt | | | | | | +-right_scan= | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#21, string#24, uid#54], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4, 10]) | | | | | | +-join_expr= - | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | | +-has_using=TRUE | | | | | +-right_scan= | | | | | | +-WithRefScan(column_list=[$distinct.string#50], with_query_name="$public_groups1") | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#24) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#50) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-int64_partial#57 := ColumnRef(type=INT64, column=$distinct.int64#19) | | | | | +-string_partial#58 := ColumnRef(type=STRING, column=$distinct.string#50) @@ -3055,13 +3351,15 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#40, string#43], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=$distinct.string#38) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$distinct.string#38) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | +-has_using=TRUE | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | +-has_using=TRUE +-group_by_list= | +-int64#52 := ColumnRef(type=INT64, column=$distinct.int64#19) | +-string#53 := ColumnRef(type=STRING, column=$distinct.string#38) @@ -3079,31 +3377,31 @@ QueryStmt +-output_column_list= | +-$aggregate.$agg1#51 AS `$col1` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$aggregate.$agg1#51] - +-with_entry_list= - | +-WithEntry - | | +-with_query_name="$public_groups0" - | | +-with_subquery= - | | +-AggregateScan - | | +-column_list=[$distinct.string#63] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#64], table=SimpleTypes, column_index_list=[4]) - | | +-group_by_list= - | | +-string#63 := ColumnRef(type=STRING, column=SimpleTypes.string#64) - | +-WithEntry - | +-with_query_name="$public_groups1" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.int64#65] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.int64#66], table=SimpleTypes, column_index_list=[1]) - | +-group_by_list= - | +-int64#65 := ColumnRef(type=INT64, column=SimpleTypes.int64#66) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$aggregate.$agg1#51] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.string#63] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#64], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#63 := ColumnRef(type=STRING, column=SimpleTypes.string#64) + | +-WithEntry + | +-with_query_name="$public_groups1" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#65] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#66], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#65 := ColumnRef(type=INT64, column=SimpleTypes.int64#66) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$aggregate.$agg1#51] +-input_scan= @@ -3133,13 +3431,15 @@ QueryStmt | | | | | | +-right_scan= | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#40, string#43, uid#54], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 4, 10]) | | | | | | +-join_expr= - | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#38) - | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#38) + | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#43) + | | | | | | +-has_using=TRUE | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) - | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#19) + | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#40) + | | | | | +-has_using=TRUE | | | | +-group_by_list= | | | | | +-int64_partial#57 := ColumnRef(type=INT64, column=$distinct.int64#19) | | | | | +-string_partial#58 := ColumnRef(type=STRING, column=$distinct.string#38) @@ -3254,9 +3554,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) | +-string#34 := ColumnRef(type=STRING, column=$distinct.string#31) @@ -3314,9 +3615,10 @@ QueryStmt | | | +-group_by_list= | | | +-int64#25 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#14) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | | +-ColumnRef(type=INT64, column=$distinct.int64#25) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | +-ColumnRef(type=INT64, column=$distinct.int64#25) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.string#44] @@ -3325,9 +3627,10 @@ QueryStmt | | +-group_by_list= | | +-string#44 := ColumnRef(type=STRING, column=SimpleTypes.string#30) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#44) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#44) + | +-has_using=TRUE +-group_by_list= | +-int64#46 := ColumnRef(type=INT64, column=$distinct.int64#25) | +-string#47 := ColumnRef(type=STRING, column=$distinct.string#44) @@ -3386,9 +3689,10 @@ QueryStmt | +-right_scan= | | +-WithRefScan(column_list=[public_int64s.int64#32], with_query_name="public_int64s") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) - | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | +-has_using=TRUE +-group_by_list= | +-int64#34 := ColumnRef(type=INT64, column=public_int64s.int64#32) +-aggregate_list= @@ -3441,9 +3745,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[public_int64s.int64#32], with_query_name="public_int64s") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) - | | | | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#21) + | | | | | +-ColumnRef(type=INT64, column=public_int64s.int64#32) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-int64_partial#38 := ColumnRef(type=INT64, column=public_int64s.int64#32) | | | | +-$uid#39 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#35) @@ -3519,9 +3824,10 @@ QueryStmt | +-right_scan= | | +-WithRefScan(column_list=[public_int64s.int64#31], with_query_name="public_int64s") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) - | +-ColumnRef(type=INT64, column=public_int64s.int64#31) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#20) + | | +-ColumnRef(type=INT64, column=public_int64s.int64#31) + | +-has_using=TRUE +-group_by_list= | +-int64#33 := ColumnRef(type=INT64, column=public_int64s.int64#31) +-aggregate_list= @@ -3569,9 +3875,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -3590,22 +3897,22 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.$agg1#32 AS `$col2` [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.$agg1#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string#33, $aggregate.$agg1#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#41] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.string#33, $aggregate.$agg1#32] +-input_scan= @@ -3626,9 +3933,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) @@ -3706,15 +4014,17 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-right_scan= | | +-TableScan(column_list=[SimpleTypes.string#36], table=SimpleTypes, column_index_list=[4]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=$distinct.string#31) - | +-ColumnRef(type=STRING, column=SimpleTypes.string#36) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-ColumnRef(type=STRING, column=SimpleTypes.string#36) + | +-has_using=TRUE +-group_by_list= | +-string#51 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -3770,9 +4080,10 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-filter_expr= | +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) | +-ColumnRef(type=STRING, column=$distinct.string#31) @@ -3831,9 +4142,10 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-group_by_list= | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) | +-aggregate_list= @@ -3855,83 +4167,84 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#41] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) - +-query= - +-ProjectScan + +-input_scan= + +-FilterScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] +-input_scan= - +-FilterScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= - | +-DifferentialPrivacyAggregateScan - | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | +-input_scan= - | | +-JoinScan - | | +-column_list=[$public_groups0.string#40, $aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | +-join_type=RIGHT - | | +-left_scan= - | | | +-SampleScan - | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | +-input_scan= - | | | | +-AggregateScan - | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | +-input_scan= - | | | | | +-JoinScan - | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#34] - | | | | | +-left_scan= - | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) - | | | | | +-right_scan= - | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") - | | | | | +-join_expr= - | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) - | | | | +-group_by_list= - | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) - | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) - | | | | +-aggregate_list= - | | | | +-anon_users_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) - | | | +-method="RESERVOIR" - | | | +-size= - | | | | +-Literal(type=INT64, value=3) - | | | +-unit=ROWS - | | | +-partition_by_list= - | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | +-right_scan= - | | | +-WithRefScan(column_list=[$public_groups0.string#40], with_query_name="$public_groups0") - | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=$groupby.string_partial#37) - | | +-ColumnRef(type=STRING, column=$public_groups0.string#40) - | +-group_by_list= - | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#40) - | +-aggregate_list= - | | +-anon_users#32 := - | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) - | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) - | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) - | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | | +-Literal(type=INT64, value=NULL) - | | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#36) - | | +-Literal(type=STRUCT, value=NULL) - | +-option_list= - | +-max_groups_contributed := Literal(type=INT64, value=3) - | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) - +-filter_expr= - +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=$groupby.string#33) - +-Literal(type=STRING, value="abc") + | +-WithScan + | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.string#41] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) + | +-query= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[$public_groups0.string#40, $aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-SampleScan + | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#34] + | | | | | +-left_scan= + | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | +-right_scan= + | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") + | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-has_using=TRUE + | | | | +-group_by_list= + | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) + | | | | +-aggregate_list= + | | | | +-anon_users_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=3) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | +-right_scan= + | | | +-WithRefScan(column_list=[$public_groups0.string#40], with_query_name="$public_groups0") + | | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$groupby.string_partial#37) + | | +-ColumnRef(type=STRING, column=$public_groups0.string#40) + | +-group_by_list= + | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#40) + | +-aggregate_list= + | | +-anon_users#32 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | | +-Literal(type=INT64, value=NULL) + | | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#36) + | | +-Literal(type=STRUCT, value=NULL) + | +-option_list= + | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-filter_expr= + +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) + +-ColumnRef(type=STRING, column=$groupby.string#33) + +-Literal(type=STRING, value="abc") == # Filter scans in input scans of public group joins are allowed. @@ -3987,9 +4300,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4007,29 +4321,29 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-FilterScan - | | +-column_list=[SimpleTypes.string#41] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | | +-filter_expr= - | | +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypes.string#41) - | | +-Literal(type=STRING, value="abc") - | +-group_by_list= - | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#40] + | +-input_scan= + | | +-FilterScan + | | +-column_list=[SimpleTypes.string#41] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:starts_with(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypes.string#41) + | | +-Literal(type=STRING, value="abc") + | +-group_by_list= + | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] +-input_scan= @@ -4060,9 +4374,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -4141,9 +4456,10 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-limit= | +-Literal(type=INT64, value=10) +-group_by_list= @@ -4209,9 +4525,10 @@ QueryStmt | | +-limit= | | +-Literal(type=INT64, value=11) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4229,27 +4546,27 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-LimitOffsetScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-AggregateScan - | | +-column_list=[$distinct.string#40] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | | +-group_by_list= - | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - | +-limit= - | +-Literal(type=INT64, value=11) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-LimitOffsetScan + | +-column_list=[$distinct.string#40] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#40] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) + | +-limit= + | +-Literal(type=INT64, value=11) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] +-input_scan= @@ -4278,9 +4595,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -4314,7 +4632,7 @@ QueryStmt +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) == -[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Sample scans between public groups join and aggregation are not allowed. SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( max_groups_contributed=3, @@ -4360,9 +4678,10 @@ QueryStmt | | | +-group_by_list= | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-has_using=TRUE | +-method="bernoulli" | +-size= | | +-Literal(type=INT64, value=1) @@ -4381,7 +4700,7 @@ QueryStmt Rewrite ERROR: group_selection_strategy = PUBLIC_GROUPS does not allow operations between the public groups join and the aggregation, because they could suppress public groups from the result. Try moving the operation to an input subquery of the public groups join. == -[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Sample scans in input to public groups joins are allowed. SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( max_groups_contributed=3, @@ -4439,9 +4758,10 @@ QueryStmt | | | +-Literal(type=INT64, value=2) | | +-unit=PERCENT | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4459,29 +4779,29 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-SampleScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-AggregateScan - | | +-column_list=[$distinct.string#40] - | | +-input_scan= - | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | | +-group_by_list= - | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - | +-method="bernoulli" - | +-size= - | | +-Literal(type=INT64, value=2) - | +-unit=PERCENT - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-SampleScan + | +-column_list=[$distinct.string#40] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#40] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) + | +-method="bernoulli" + | +-size= + | | +-Literal(type=INT64, value=2) + | +-unit=PERCENT + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] +-input_scan= @@ -4512,9 +4832,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -4597,9 +4918,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string#34 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4656,9 +4978,10 @@ QueryStmt | | +-group_by_list= | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | +-has_using=TRUE +-group_by_list= | +-string2#33 := ColumnRef(type=STRING, column=$distinct.string#31) +-aggregate_list= @@ -4676,22 +4999,22 @@ QueryStmt | +-$groupby.string2#33 AS string2 [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-ProjectScan +-column_list=[$groupby.string2#33, $aggregate.anon_users#32] - +-with_entry_list= - | +-WithEntry - | +-with_query_name="$public_groups0" - | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#40] - | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) - +-query= - +-ProjectScan + +-input_scan= + +-WithScan +-column_list=[$groupby.string2#33, $aggregate.anon_users#32] - +-input_scan= + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#40] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#41], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#40 := ColumnRef(type=STRING, column=SimpleTypes.string#41) + +-query= +-DifferentialPrivacyAggregateScan +-column_list=[$groupby.string2#33, $aggregate.anon_users#32] +-input_scan= @@ -4715,9 +5038,10 @@ QueryStmt | | | | +-right_scan= | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string2_partial#36 := ColumnRef(type=STRING, column=$distinct.string#31) | | | | +-$uid#37 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) @@ -4801,18 +5125,20 @@ QueryStmt | | | | +-group_by_list= | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE | | +-right_scan= | | | +-ProjectScan | | | +-column_list=[SimpleTypes.int32#32] | | | +-input_scan= | | | +-TableScan(column_list=[SimpleTypes.int32#32], table=SimpleTypes, column_index_list=[0]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.int64#68] @@ -4821,9 +5147,10 @@ QueryStmt | | +-group_by_list= | | +-int64#68 := ColumnRef(type=INT64, column=SimpleTypes.int64#51) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | +-has_using=TRUE +-group_by_list= | +-string#70 := ColumnRef(type=STRING, column=$distinct.string#31) | +-int64#71 := ColumnRef(type=INT64, column=$distinct.int64#68) @@ -4890,18 +5217,20 @@ QueryStmt | | | | +-group_by_list= | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE | | +-right_scan= | | | +-ProjectScan | | | +-column_list=[SimpleTypes.int32#32] | | | +-input_scan= | | | +-TableScan(column_list=[SimpleTypes.int32#32], table=SimpleTypes, column_index_list=[0]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#1) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#32) + | | +-has_using=TRUE | +-right_scan= | | +-AggregateScan | | +-column_list=[$distinct.int64#68] @@ -4910,9 +5239,10 @@ QueryStmt | | +-group_by_list= | | +-int64#68 := ColumnRef(type=INT64, column=SimpleTypes.int64#51) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) - | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-ColumnRef(type=INT64, column=$distinct.int64#68) + | +-has_using=TRUE +-group_by_list= | +-string#70 := ColumnRef(type=STRING, column=$distinct.string#31) | +-int64#71 := ColumnRef(type=INT64, column=$distinct.int64#68) @@ -4928,7 +5258,7 @@ QueryStmt Rewrite ERROR: group_selection_strategy = PUBLIC_GROUPS does not allow JOIN operations between the public groups join and the aggregation, because they could suppress public groups from the result. Try moving the operation to an input subquery of the public groups join. == -[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,TABLESAMPLE] +[language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,TABLESAMPLE] # Forbidden operations after public groups joins are allowed outside of the # dp aggregate scan. SELECT string, anon_users @@ -4985,9 +5315,10 @@ QueryStmt | | | | | +-group_by_list= | | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | +-has_using=TRUE | | | +-group_by_list= | | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) | | | +-aggregate_list= @@ -5015,95 +5346,1704 @@ QueryStmt | +-$groupby.string#33 AS string [STRING] | +-$aggregate.anon_users#32 AS anon_users [INT64] +-query= - +-WithScan + +-LimitOffsetScan +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + +-input_scan= + | +-ProjectScan + | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | +-input_scan= + | +-FilterScan + | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | +-input_scan= + | | +-SampleScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | | +-input_scan= + | | | +-WithScan + | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | | +-with_entry_list= + | | | | +-WithEntry + | | | | +-with_query_name="$public_groups0" + | | | | +-with_subquery= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.string#41] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) + | | | | +-group_by_list= + | | | | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) + | | | +-query= + | | | +-DifferentialPrivacyAggregateScan + | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[$public_groups0.string#40, $aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | | +-join_type=RIGHT + | | | | +-left_scan= + | | | | | +-SampleScan + | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | | | +-input_scan= + | | | | | | +-AggregateScan + | | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] + | | | | | | +-input_scan= + | | | | | | | +-JoinScan + | | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#34] + | | | | | | | +-left_scan= + | | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | | | +-right_scan= + | | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") + | | | | | | | +-join_expr= + | | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | | +-has_using=TRUE + | | | | | | +-group_by_list= + | | | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) + | | | | | | +-aggregate_list= + | | | | | | +-anon_users_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | | +-method="RESERVOIR" + | | | | | +-size= + | | | | | | +-Literal(type=INT64, value=3) + | | | | | +-unit=ROWS + | | | | | +-partition_by_list= + | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$public_groups0.string#40], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#37) + | | | | +-ColumnRef(type=STRING, column=$public_groups0.string#40) + | | | +-group_by_list= + | | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#40) + | | | +-aggregate_list= + | | | | +-anon_users#32 := + | | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) + | | | | | +-Literal(type=INT64, value=NULL) + | | | | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#36) + | | | | +-Literal(type=STRUCT, value=NULL) + | | | +-option_list= + | | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | | +-method="bernoulli" + | | +-size= + | | | +-Literal(type=INT64, value=1) + | | +-unit=PERCENT + | +-filter_expr= + | +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$aggregate.anon_users#32) + | +-Literal(type=INT64, value=10) + +-limit= + +-Literal(type=INT64, value=11) +== + +# Multiple dp aggregate scans. +WITH + with1 AS ( + SELECT + WITH DIFFERENTIAL_PRIVACY OPTIONS (max_groups_contributed=3) + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64), + with2 AS ( + SELECT WITH DIFFERENTIAL_PRIVACY + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64) +SELECT + WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed = 5, + group_selection_strategy = PUBLIC_GROUPS) + string, + COUNT(*) +FROM SimpleTypesWithAnonymizationUid +RIGHT OUTER JOIN (SELECT DISTINCT string FROM SimpleTypes) + USING (string) +GROUP BY string +-- + + +QueryStmt ++-output_column_list= +| +-$groupby.string#59 AS string [STRING] +| +-$aggregate.$agg1#58 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] +-with_entry_list= | +-WithEntry - | +-with_query_name="$public_groups0" + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#2], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-WithEntry + | +-with_query_name="with2" | +-with_subquery= - | +-AggregateScan - | +-column_list=[$distinct.string#41] + | +-ProjectScan + | +-column_list=[$groupby.int64#26] | +-input_scan= - | | +-TableScan(column_list=[SimpleTypes.string#42], table=SimpleTypes, column_index_list=[4]) - | +-group_by_list= - | +-string#41 := ColumnRef(type=STRING, column=SimpleTypes.string#42) + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.int64#26] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#15], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-group_by_list= + | +-int64#26 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) +-query= - +-LimitOffsetScan - +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + +-ProjectScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypesWithAnonymizationUid.string#31, $distinct.string#57] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#31], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#57] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#43], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#57 := ColumnRef(type=STRING, column=SimpleTypes.string#43) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#31) + | | +-ColumnRef(type=STRING, column=$distinct.string#57) + | +-has_using=TRUE + +-group_by_list= + | +-string#59 := ColumnRef(type=STRING, column=$distinct.string#57) + +-aggregate_list= + | +-$agg1#58 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.string#59 AS string [STRING] +| +-$aggregate.$agg1#58 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-SampleScan + | | | +-column_list=[$groupby.int64_partial#70, $group_by.$uid#71] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$groupby.int64_partial#70, $group_by.$uid#71] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#69], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | +-group_by_list= + | | | | +-int64_partial#70 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | +-$uid#71 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#69) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=3) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#71) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=$groupby.int64_partial#70) + | | +-aggregate_list= + | | | +-$group_selection_threshold_col#73 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=STRUCT, value={0, 1}) + | | +-group_selection_threshold_expr= + | | | +-ColumnRef(type=INT64, column=$differential_privacy.$group_selection_threshold_col#73) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + | +-WithEntry + | +-with_query_name="with2" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#26] + | +-input_scan= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.int64#26] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$groupby.int64_partial#75, $group_by.$uid#76] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#15, uid#74], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | +-group_by_list= + | | +-int64_partial#75 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) + | | +-$uid#76 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#74) + | +-group_by_list= + | | +-int64#26 := ColumnRef(type=INT64, column=$groupby.int64_partial#75) + | +-aggregate_list= + | | +-$group_selection_threshold_col#78 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=STRUCT, value={0, 1}) + | +-group_selection_threshold_expr= + | | +-ColumnRef(type=INT64, column=$differential_privacy.$group_selection_threshold_col#78) + | +-option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + +-query= + +-ProjectScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-input_scan= + +-WithScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#67] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#68], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#67 := ColumnRef(type=STRING, column=SimpleTypes.string#68) + +-query= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#59, $aggregate.$agg1#58] + +-input_scan= + | +-JoinScan + | +-column_list=[$public_groups0.string#66, $aggregate.$agg1_partial#62, $groupby.string_partial#63, $group_by.$uid#64] + | +-join_type=RIGHT + | +-left_scan= + | | +-SampleScan + | | +-column_list=[$aggregate.$agg1_partial#62, $groupby.string_partial#63, $group_by.$uid#64] + | | +-input_scan= + | | | +-AggregateScan + | | | +-column_list=[$aggregate.$agg1_partial#62, $groupby.string_partial#63, $group_by.$uid#64] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#31, $distinct.string#57, SimpleTypesWithAnonymizationUid.uid#60] + | | | | +-left_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#31, uid#60], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$distinct.string#57], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#31) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#57) + | | | | +-has_using=TRUE + | | | +-group_by_list= + | | | | +-string_partial#63 := ColumnRef(type=STRING, column=$distinct.string#57) + | | | | +-$uid#64 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#60) + | | | +-aggregate_list= + | | | +-$agg1_partial#62 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | +-method="RESERVOIR" + | | +-size= + | | | +-Literal(type=INT64, value=5) + | | +-unit=ROWS + | | +-partition_by_list= + | | +-ColumnRef(type=INT64, column=$group_by.$uid#64) + | +-right_scan= + | | +-WithRefScan(column_list=[$public_groups0.string#66], with_query_name="$public_groups0") + | +-join_expr= + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=$groupby.string_partial#63) + | +-ColumnRef(type=STRING, column=$public_groups0.string#66) + +-group_by_list= + | +-string#59 := ColumnRef(type=STRING, column=$public_groups0.string#66) + +-aggregate_list= + | +-$agg1#58 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#64) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#62) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) +== + +# Multiple dp aggregate scans when the public groups query is in a with entry. +WITH + with1 AS ( + SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS (max_groups_contributed=3) + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64 + ), + with2 AS ( + SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed=4, + group_selection_strategy=PUBLIC_GROUPS) + int64, COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT int64 FROM SimpleTypes + ) USING (int64) + GROUP BY int64 + ), + with3 AS ( + SELECT WITH DIFFERENTIAL_PRIVACY + int64 + FROM SimpleTypesWithAnonymizationUid + GROUP BY int64) +SELECT + WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed = 5) + string, + COUNT(*) +FROM SimpleTypesWithAnonymizationUid +GROUP BY string +-- +QueryStmt ++-output_column_list= +| +-$groupby.string#73 AS string [STRING] +| +-$aggregate.$agg1#72 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#2], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-WithEntry + | | +-with_query_name="with2" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#15, $distinct.int64#44] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#15], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | | | +-right_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.int64#44] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.int64#27], table=SimpleTypes, column_index_list=[1]) + | | | | +-group_by_list= + | | | | +-int64#44 := ColumnRef(type=INT64, column=SimpleTypes.int64#27) + | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) + | | | | +-ColumnRef(type=INT64, column=$distinct.int64#44) + | | | +-has_using=TRUE + | | +-group_by_list= + | | | +-int64#46 := ColumnRef(type=INT64, column=$distinct.int64#44) + | | +-aggregate_list= + | | | +-$agg1#45 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-Literal(type=STRUCT, value=NULL) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=4) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="with3" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#48], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-group_by_list= + | +-int64#59 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#48) + +-query= + +-ProjectScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#64], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + +-group_by_list= + | +-string#73 := ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#64) + +-aggregate_list= + | +-$agg1#72 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.string#73 AS string [STRING] +| +-$aggregate.$agg1#72 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="with1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.int64#13] + | | +-input_scan= + | | | +-SampleScan + | | | +-column_list=[$groupby.int64_partial#83, $group_by.$uid#84] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$groupby.int64_partial#83, $group_by.$uid#84] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#2, uid#82], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | +-group_by_list= + | | | | +-int64_partial#83 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#2) + | | | | +-$uid#84 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#82) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=3) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#84) + | | +-group_by_list= + | | | +-int64#13 := ColumnRef(type=INT64, column=$groupby.int64_partial#83) + | | +-aggregate_list= + | | | +-$group_selection_threshold_col#86 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=STRUCT, value={0, 1}) + | | +-group_selection_threshold_expr= + | | | +-ColumnRef(type=INT64, column=$differential_privacy.$group_selection_threshold_col#86) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + | +-WithEntry + | | +-with_query_name="with2" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | +-WithScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-with_entry_list= + | | | +-WithEntry + | | | +-with_query_name="$public_groups0" + | | | +-with_subquery= + | | | +-AggregateScan + | | | +-column_list=[$distinct.int64#94] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.int64#95], table=SimpleTypes, column_index_list=[1]) + | | | +-group_by_list= + | | | +-int64#94 := ColumnRef(type=INT64, column=SimpleTypes.int64#95) + | | +-query= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.int64#46, $aggregate.$agg1#45] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$public_groups0.int64#93, $aggregate.$agg1_partial#89, $groupby.int64_partial#90, $group_by.$uid#91] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-SampleScan + | | | | +-column_list=[$aggregate.$agg1_partial#89, $groupby.int64_partial#90, $group_by.$uid#91] + | | | | +-input_scan= + | | | | | +-AggregateScan + | | | | | +-column_list=[$aggregate.$agg1_partial#89, $groupby.int64_partial#90, $group_by.$uid#91] + | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#15, $distinct.int64#44, SimpleTypesWithAnonymizationUid.uid#87] + | | | | | | +-left_scan= + | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#15, uid#87], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | | | +-right_scan= + | | | | | | | +-WithRefScan(column_list=[$distinct.int64#44], with_query_name="$public_groups0") + | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#15) + | | | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#44) + | | | | | | +-has_using=TRUE + | | | | | +-group_by_list= + | | | | | | +-int64_partial#90 := ColumnRef(type=INT64, column=$distinct.int64#44) + | | | | | | +-$uid#91 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#87) + | | | | | +-aggregate_list= + | | | | | +-$agg1_partial#89 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | +-method="RESERVOIR" + | | | | +-size= + | | | | | +-Literal(type=INT64, value=4) + | | | | +-unit=ROWS + | | | | +-partition_by_list= + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#91) + | | | +-right_scan= + | | | | +-WithRefScan(column_list=[$public_groups0.int64#93], with_query_name="$public_groups0") + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$groupby.int64_partial#90) + | | | +-ColumnRef(type=INT64, column=$public_groups0.int64#93) + | | +-group_by_list= + | | | +-int64#46 := ColumnRef(type=INT64, column=$public_groups0.int64#93) + | | +-aggregate_list= + | | | +-$agg1#45 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#91) + | | | | +-Literal(type=INT64, value=NULL) + | | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#89) + | | | +-Literal(type=STRUCT, value=NULL) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=4) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="with3" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.int64#59] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$groupby.int64_partial#97, $group_by.$uid#98] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#48, uid#96], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | +-group_by_list= + | | +-int64_partial#97 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#48) + | | +-$uid#98 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#96) + | +-group_by_list= + | | +-int64#59 := ColumnRef(type=INT64, column=$groupby.int64_partial#97) + | +-aggregate_list= + | | +-$group_selection_threshold_col#100 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=STRUCT, value={0, 1}) + | +-group_selection_threshold_expr= + | | +-ColumnRef(type=INT64, column=$differential_privacy.$group_selection_threshold_col#100) + | +-option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) + +-query= + +-ProjectScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#73, $aggregate.$agg1#72] + +-input_scan= + | +-SampleScan + | +-column_list=[$aggregate.$agg1_partial#76, $groupby.string_partial#77, $group_by.$uid#78] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$aggregate.$agg1_partial#76, $groupby.string_partial#77, $group_by.$uid#78] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#64, uid#74], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | +-group_by_list= + | | | +-string_partial#77 := ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#64) + | | | +-$uid#78 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#74) + | | +-aggregate_list= + | | +-$agg1_partial#76 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | +-method="RESERVOIR" + | +-size= + | | +-Literal(type=INT64, value=5) + | +-unit=ROWS + | +-partition_by_list= + | +-ColumnRef(type=INT64, column=$group_by.$uid#78) + +-group_by_list= + | +-string#73 := ColumnRef(type=STRING, column=$groupby.string_partial#77) + +-aggregate_list= + | +-$agg1#72 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#76) + | | +-Literal(type=STRUCT, value=NULL) + | +-$group_selection_threshold_col#81 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=INT64, value=1) + | +-Literal(type=STRUCT, value={0, 1}) + +-group_selection_threshold_expr= + | +-ColumnRef(type=INT64, column=$differential_privacy.$group_selection_threshold_col#81) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=LAPLACE_THRESHOLD) +== + +# WithEntry names added by public groups are unique across multiple dp +# aggregate scans +WITH + withString AS ( + SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed=3, + group_selection_strategy=PUBLIC_GROUPS) + string, COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT string FROM SimpleTypes + ) USING (string) + GROUP BY string + ), + withInt32 AS ( + SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed=4, + group_selection_strategy=PUBLIC_GROUPS) + int32, COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT int32 FROM SimpleTypes + ) USING (int32) + GROUP BY int32 + ) +SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed=5, + group_selection_strategy=PUBLIC_GROUPS) + int64, COUNT(*) +FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT int64 FROM SimpleTypes + ) USING (int64) +GROUP BY int64 +-- +QueryStmt ++-output_column_list= +| +-$groupby.int64#99 AS int64 [INT64] +| +-$aggregate.$agg1#98 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="withString" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#5], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | | +-right_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.string#31] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.string#17], table=SimpleTypes, column_index_list=[4]) + | | | | +-group_by_list= + | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) + | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) + | | +-aggregate_list= + | | | +-$agg1#32 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-Literal(type=STRUCT, value=NULL) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="withInt32" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[SimpleTypesWithAnonymizationUid.int32#34, $distinct.int32#64] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int32#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[0]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.int32#64] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.int32#46], table=SimpleTypes, column_index_list=[0]) + | | | +-group_by_list= + | | | +-int32#64 := ColumnRef(type=INT32, column=SimpleTypes.int32#46) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#34) + | | | +-ColumnRef(type=INT32, column=$distinct.int32#64) + | | +-has_using=TRUE + | +-group_by_list= + | | +-int32#66 := ColumnRef(type=INT32, column=$distinct.int32#64) + | +-aggregate_list= + | | +-$agg1#65 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-Literal(type=STRUCT, value=NULL) + | +-option_list= + | +-max_groups_contributed := Literal(type=INT64, value=4) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypesWithAnonymizationUid.int64#68, $distinct.int64#97] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.int64#68], table=SimpleTypesWithAnonymizationUid, column_index_list=[1]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.int64#97] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int64#80], table=SimpleTypes, column_index_list=[1]) + | | +-group_by_list= + | | +-int64#97 := ColumnRef(type=INT64, column=SimpleTypes.int64#80) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#68) + | | +-ColumnRef(type=INT64, column=$distinct.int64#97) + | +-has_using=TRUE + +-group_by_list= + | +-int64#99 := ColumnRef(type=INT64, column=$distinct.int64#97) + +-aggregate_list= + | +-$agg1#98 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.int64#99 AS int64 [INT64] +| +-$aggregate.$agg1#98 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-with_entry_list= + | +-WithEntry + | | +-with_query_name="withString" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | +-WithScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-with_entry_list= + | | | +-WithEntry + | | | +-with_query_name="$public_groups1" + | | | +-with_subquery= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#116] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.string#117], table=SimpleTypes, column_index_list=[4]) + | | | +-group_by_list= + | | | +-string#116 := ColumnRef(type=STRING, column=SimpleTypes.string#117) + | | +-query= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.$agg1#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$public_groups1.string#115, $aggregate.$agg1_partial#111, $groupby.string_partial#112, $group_by.$uid#113] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-SampleScan + | | | | +-column_list=[$aggregate.$agg1_partial#111, $groupby.string_partial#112, $group_by.$uid#113] + | | | | +-input_scan= + | | | | | +-AggregateScan + | | | | | +-column_list=[$aggregate.$agg1_partial#111, $groupby.string_partial#112, $group_by.$uid#113] + | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#109] + | | | | | | +-left_scan= + | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#109], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | | +-right_scan= + | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups1") + | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-has_using=TRUE + | | | | | +-group_by_list= + | | | | | | +-string_partial#112 := ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-$uid#113 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#109) + | | | | | +-aggregate_list= + | | | | | +-$agg1_partial#111 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | +-method="RESERVOIR" + | | | | +-size= + | | | | | +-Literal(type=INT64, value=3) + | | | | +-unit=ROWS + | | | | +-partition_by_list= + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#113) + | | | +-right_scan= + | | | | +-WithRefScan(column_list=[$public_groups1.string#115], with_query_name="$public_groups1") + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#112) + | | | +-ColumnRef(type=STRING, column=$public_groups1.string#115) + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups1.string#115) + | | +-aggregate_list= + | | | +-$agg1#32 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#113) + | | | | +-Literal(type=INT64, value=NULL) + | | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#111) + | | | +-Literal(type=STRUCT, value=NULL) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=3) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-WithEntry + | +-with_query_name="withInt32" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | +-WithScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="$public_groups2" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.int32#125] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.int32#126], table=SimpleTypes, column_index_list=[0]) + | | +-group_by_list= + | | +-int32#125 := ColumnRef(type=INT32, column=SimpleTypes.int32#126) + | +-query= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.int32#66, $aggregate.$agg1#65] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[$public_groups2.int32#124, $aggregate.$agg1_partial#120, $groupby.int32_partial#121, $group_by.$uid#122] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-SampleScan + | | | +-column_list=[$aggregate.$agg1_partial#120, $groupby.int32_partial#121, $group_by.$uid#122] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$aggregate.$agg1_partial#120, $groupby.int32_partial#121, $group_by.$uid#122] + | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.int32#34, $distinct.int32#64, SimpleTypesWithAnonymizationUid.uid#118] + | | | | | +-left_scan= + | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int32#34, uid#118], table=SimpleTypesWithAnonymizationUid, column_index_list=[0, 10]) + | | | | | +-right_scan= + | | | | | | +-WithRefScan(column_list=[$distinct.int32#64], with_query_name="$public_groups2") + | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | | | | | +-ColumnRef(type=INT32, column=SimpleTypesWithAnonymizationUid.int32#34) + | | | | | | +-ColumnRef(type=INT32, column=$distinct.int32#64) + | | | | | +-has_using=TRUE + | | | | +-group_by_list= + | | | | | +-int32_partial#121 := ColumnRef(type=INT32, column=$distinct.int32#64) + | | | | | +-$uid#122 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#118) + | | | | +-aggregate_list= + | | | | +-$agg1_partial#120 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=4) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#122) + | | +-right_scan= + | | | +-WithRefScan(column_list=[$public_groups2.int32#124], with_query_name="$public_groups2") + | | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | +-ColumnRef(type=INT32, column=$groupby.int32_partial#121) + | | +-ColumnRef(type=INT32, column=$public_groups2.int32#124) + | +-group_by_list= + | | +-int32#66 := ColumnRef(type=INT32, column=$public_groups2.int32#124) + | +-aggregate_list= + | | +-$agg1#65 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#122) + | | | +-Literal(type=INT64, value=NULL) + | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#120) + | | +-Literal(type=STRUCT, value=NULL) + | +-option_list= + | +-max_groups_contributed := Literal(type=INT64, value=4) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-query= + +-ProjectScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + +-WithScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.int64#107] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.int64#108], table=SimpleTypes, column_index_list=[1]) + | +-group_by_list= + | +-int64#107 := ColumnRef(type=INT64, column=SimpleTypes.int64#108) + +-query= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.int64#99, $aggregate.$agg1#98] + +-input_scan= + | +-JoinScan + | +-column_list=[$public_groups0.int64#106, $aggregate.$agg1_partial#102, $groupby.int64_partial#103, $group_by.$uid#104] + | +-join_type=RIGHT + | +-left_scan= + | | +-SampleScan + | | +-column_list=[$aggregate.$agg1_partial#102, $groupby.int64_partial#103, $group_by.$uid#104] + | | +-input_scan= + | | | +-AggregateScan + | | | +-column_list=[$aggregate.$agg1_partial#102, $groupby.int64_partial#103, $group_by.$uid#104] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[SimpleTypesWithAnonymizationUid.int64#68, $distinct.int64#97, SimpleTypesWithAnonymizationUid.uid#100] + | | | | +-left_scan= + | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[int64#68, uid#100], table=SimpleTypesWithAnonymizationUid, column_index_list=[1, 10]) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$distinct.int64#97], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.int64#68) + | | | | | +-ColumnRef(type=INT64, column=$distinct.int64#97) + | | | | +-has_using=TRUE + | | | +-group_by_list= + | | | | +-int64_partial#103 := ColumnRef(type=INT64, column=$distinct.int64#97) + | | | | +-$uid#104 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#100) + | | | +-aggregate_list= + | | | +-$agg1_partial#102 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | +-method="RESERVOIR" + | | +-size= + | | | +-Literal(type=INT64, value=5) + | | +-unit=ROWS + | | +-partition_by_list= + | | +-ColumnRef(type=INT64, column=$group_by.$uid#104) + | +-right_scan= + | | +-WithRefScan(column_list=[$public_groups0.int64#106], with_query_name="$public_groups0") + | +-join_expr= + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$groupby.int64_partial#103) + | +-ColumnRef(type=INT64, column=$public_groups0.int64#106) + +-group_by_list= + | +-int64#99 := ColumnRef(type=INT64, column=$public_groups0.int64#106) + +-aggregate_list= + | +-$agg1#98 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#104) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#102) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=5) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) +== + +# Nested WithScan for public groups input. +WITH + res1 AS ( + WITH + public1 AS ( + SELECT string FROM SimpleTypes + ) + SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + group_selection_strategy=PUBLIC_GROUPS, + max_groups_contributed=2) + string, COUNT(*) + FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN ( + SELECT DISTINCT string FROM public1 + ) USING (string) + GROUP BY string + ) +SELECT * +FROM res1; +-- +QueryStmt ++-output_column_list= +| +-res1.string#35 AS string [STRING] +| +-res1.$col2#36 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=res1.[string#35, $col2#36] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="res1" + | +-with_subquery= + | +-WithScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="public1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[SimpleTypes.string#5] + | | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#5], table=SimpleTypes, column_index_list=[4]) + | +-query= + | +-ProjectScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[SimpleTypesWithAnonymizationUid.string#23, $distinct.string#32] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#32] + | | | +-input_scan= + | | | | +-WithRefScan(column_list=[public1.string#31], with_query_name="public1") + | | | +-group_by_list= + | | | +-string#32 := ColumnRef(type=STRING, column=public1.string#31) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#23) + | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | +-has_using=TRUE + | +-group_by_list= + | | +-string#34 := ColumnRef(type=STRING, column=$distinct.string#32) + | +-aggregate_list= + | | +-$agg1#33 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-Literal(type=STRUCT, value=NULL) + | +-option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-max_groups_contributed := Literal(type=INT64, value=2) + +-query= + +-ProjectScan + +-column_list=res1.[string#35, $col2#36] + +-input_scan= + +-WithRefScan(column_list=res1.[string#35, $col2#36], with_query_name="res1") + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-res1.string#35 AS string [STRING] +| +-res1.$col2#36 AS `$col2` [INT64] ++-query= + +-WithScan + +-column_list=res1.[string#35, $col2#36] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="res1" + | +-with_subquery= + | +-WithScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="public1" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[SimpleTypes.string#5] + | | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#5], table=SimpleTypes, column_index_list=[4]) + | +-query= + | +-ProjectScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | +-WithScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="$public_groups0" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.string#44] + | | +-input_scan= + | | | +-WithRefScan(column_list=[public1.string#45], with_query_name="public1") + | | +-group_by_list= + | | +-string#44 := ColumnRef(type=STRING, column=public1.string#45) + | +-query= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.string#34, $aggregate.$agg1#33] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[$public_groups0.string#43, $aggregate.$agg1_partial#39, $groupby.string_partial#40, $group_by.$uid#41] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-SampleScan + | | | +-column_list=[$aggregate.$agg1_partial#39, $groupby.string_partial#40, $group_by.$uid#41] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$aggregate.$agg1_partial#39, $groupby.string_partial#40, $group_by.$uid#41] + | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#23, $distinct.string#32, SimpleTypesWithAnonymizationUid.uid#37] + | | | | | +-left_scan= + | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#23, uid#37], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | +-right_scan= + | | | | | | +-WithRefScan(column_list=[$distinct.string#32], with_query_name="$public_groups0") + | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#23) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#32) + | | | | | +-has_using=TRUE + | | | | +-group_by_list= + | | | | | +-string_partial#40 := ColumnRef(type=STRING, column=$distinct.string#32) + | | | | | +-$uid#41 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#37) + | | | | +-aggregate_list= + | | | | +-$agg1_partial#39 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=2) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#41) + | | +-right_scan= + | | | +-WithRefScan(column_list=[$public_groups0.string#43], with_query_name="$public_groups0") + | | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$groupby.string_partial#40) + | | +-ColumnRef(type=STRING, column=$public_groups0.string#43) + | +-group_by_list= + | | +-string#34 := ColumnRef(type=STRING, column=$public_groups0.string#43) + | +-aggregate_list= + | | +-$agg1#33 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#41) + | | | +-Literal(type=INT64, value=NULL) + | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1_partial#39) + | | +-Literal(type=STRUCT, value=NULL) + | +-option_list= + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-max_groups_contributed := Literal(type=INT64, value=2) + +-query= + +-ProjectScan + +-column_list=res1.[string#35, $col2#36] + +-input_scan= + +-WithRefScan(column_list=res1.[string#35, $col2#36], with_query_name="res1") +== + +# UNION ALL of multiple public group queries. +SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed=2, + group_selection_strategy=PUBLIC_GROUPS) + string, COUNT(*) AS anon_count +FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN (SELECT DISTINCT string FROM SimpleTypes) USING (string) +GROUP BY string +UNION ALL +SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + max_groups_contributed=3, + group_selection_strategy=PUBLIC_GROUPS) + string, COUNT(*) AS anon_count +FROM SimpleTypesWithAnonymizationUid + RIGHT OUTER JOIN (SELECT DISTINCT string FROM SimpleTypes) USING (string) +GROUP BY STRING; +-- +QueryStmt ++-output_column_list= +| +-$union_all.string#67 AS string [STRING] +| +-$union_all.anon_count#68 AS anon_count [INT64] ++-query= + +-SetOperationScan + +-column_list=$union_all.[string#67, anon_count#68] + +-op_type=UNION_ALL + +-input_item_list= + +-SetOperationItem + | +-scan= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#5], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | | +-right_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$distinct.string#31] + | | | | +-input_scan= + | | | | | +-TableScan(column_list=[SimpleTypes.string#17], table=SimpleTypes, column_index_list=[4]) + | | | | +-group_by_list= + | | | | +-string#31 := ColumnRef(type=STRING, column=SimpleTypes.string#17) + | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | +-has_using=TRUE + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$distinct.string#31) + | | +-aggregate_list= + | | | +-anon_count#32 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-Literal(type=STRUCT, value=NULL) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=2) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-output_column_list=[$groupby.string#33, $aggregate.anon_count#32] + +-SetOperationItem + +-scan= | +-ProjectScan - | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] | +-input_scan= - | +-FilterScan - | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] | +-input_scan= - | | +-SampleScan - | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | | +-input_scan= - | | | +-ProjectScan - | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] + | | +-JoinScan + | | +-column_list=[SimpleTypesWithAnonymizationUid.string#38, $distinct.string#64] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.string#38], table=SimpleTypesWithAnonymizationUid, column_index_list=[4]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#64] | | | +-input_scan= - | | | +-DifferentialPrivacyAggregateScan - | | | +-column_list=[$groupby.string#33, $aggregate.anon_users#32] - | | | +-input_scan= - | | | | +-JoinScan - | | | | +-column_list=[$public_groups0.string#40, $aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | +-join_type=RIGHT - | | | | +-left_scan= - | | | | | +-SampleScan - | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | | +-input_scan= - | | | | | | +-AggregateScan - | | | | | | +-column_list=[$aggregate.anon_users_partial#36, $groupby.string_partial#37, $group_by.$uid#38] - | | | | | | +-input_scan= - | | | | | | | +-JoinScan - | | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#34] - | | | | | | | +-left_scan= - | | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#34], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) - | | | | | | | +-right_scan= - | | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") - | | | | | | | +-join_expr= - | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) - | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) - | | | | | | +-group_by_list= - | | | | | | | +-string_partial#37 := ColumnRef(type=STRING, column=$distinct.string#31) - | | | | | | | +-$uid#38 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#34) - | | | | | | +-aggregate_list= - | | | | | | +-anon_users_partial#36 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) - | | | | | +-method="RESERVOIR" - | | | | | +-size= - | | | | | | +-Literal(type=INT64, value=3) - | | | | | +-unit=ROWS - | | | | | +-partition_by_list= - | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | | | +-right_scan= - | | | | | +-WithRefScan(column_list=[$public_groups0.string#40], with_query_name="$public_groups0") - | | | | +-join_expr= - | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#37) - | | | | +-ColumnRef(type=STRING, column=$public_groups0.string#40) - | | | +-group_by_list= - | | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#40) - | | | +-aggregate_list= - | | | | +-anon_users#32 := - | | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) - | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) - | | | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) - | | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#38) - | | | | | +-Literal(type=INT64, value=NULL) - | | | | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#36) - | | | | +-Literal(type=STRUCT, value=NULL) - | | | +-option_list= - | | | +-max_groups_contributed := Literal(type=INT64, value=3) - | | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) - | | +-method="bernoulli" - | | +-size= - | | | +-Literal(type=INT64, value=1) - | | +-unit=PERCENT - | +-filter_expr= - | +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$aggregate.anon_users#32) - | +-Literal(type=INT64, value=10) - +-limit= - +-Literal(type=INT64, value=11) + | | | | +-TableScan(column_list=[SimpleTypes.string#50], table=SimpleTypes, column_index_list=[4]) + | | | +-group_by_list= + | | | +-string#64 := ColumnRef(type=STRING, column=SimpleTypes.string#50) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#38) + | | | +-ColumnRef(type=STRING, column=$distinct.string#64) + | | +-has_using=TRUE + | +-group_by_list= + | | +-string#66 := ColumnRef(type=STRING, column=$distinct.string#64) + | +-aggregate_list= + | | +-anon_count#65 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-Literal(type=STRUCT, value=NULL) + | +-option_list= + | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-output_column_list=[$groupby.string#66, $aggregate.anon_count#65] + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$union_all.string#67 AS string [STRING] +| +-$union_all.anon_count#68 AS anon_count [INT64] ++-query= + +-SetOperationScan + +-column_list=$union_all.[string#67, anon_count#68] + +-op_type=UNION_ALL + +-input_item_list= + +-SetOperationItem + | +-scan= + | | +-ProjectScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | +-WithScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-with_entry_list= + | | | +-WithEntry + | | | +-with_query_name="$public_groups0" + | | | +-with_subquery= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#76] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.string#77], table=SimpleTypes, column_index_list=[4]) + | | | +-group_by_list= + | | | +-string#76 := ColumnRef(type=STRING, column=SimpleTypes.string#77) + | | +-query= + | | +-DifferentialPrivacyAggregateScan + | | +-column_list=[$groupby.string#33, $aggregate.anon_count#32] + | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$public_groups0.string#75, $aggregate.anon_count_partial#71, $groupby.string_partial#72, $group_by.$uid#73] + | | | +-join_type=RIGHT + | | | +-left_scan= + | | | | +-SampleScan + | | | | +-column_list=[$aggregate.anon_count_partial#71, $groupby.string_partial#72, $group_by.$uid#73] + | | | | +-input_scan= + | | | | | +-AggregateScan + | | | | | +-column_list=[$aggregate.anon_count_partial#71, $groupby.string_partial#72, $group_by.$uid#73] + | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#5, $distinct.string#31, SimpleTypesWithAnonymizationUid.uid#69] + | | | | | | +-left_scan= + | | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#5, uid#69], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | | +-right_scan= + | | | | | | | +-WithRefScan(column_list=[$distinct.string#31], with_query_name="$public_groups0") + | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#5) + | | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-has_using=TRUE + | | | | | +-group_by_list= + | | | | | | +-string_partial#72 := ColumnRef(type=STRING, column=$distinct.string#31) + | | | | | | +-$uid#73 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#69) + | | | | | +-aggregate_list= + | | | | | +-anon_count_partial#71 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | | +-method="RESERVOIR" + | | | | +-size= + | | | | | +-Literal(type=INT64, value=2) + | | | | +-unit=ROWS + | | | | +-partition_by_list= + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#73) + | | | +-right_scan= + | | | | +-WithRefScan(column_list=[$public_groups0.string#75], with_query_name="$public_groups0") + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$groupby.string_partial#72) + | | | +-ColumnRef(type=STRING, column=$public_groups0.string#75) + | | +-group_by_list= + | | | +-string#33 := ColumnRef(type=STRING, column=$public_groups0.string#75) + | | +-aggregate_list= + | | | +-anon_count#32 := + | | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#73) + | | | | +-Literal(type=INT64, value=NULL) + | | | | +-ColumnRef(type=INT64, column=$aggregate.anon_count_partial#71) + | | | +-Literal(type=STRUCT, value=NULL) + | | +-option_list= + | | +-max_groups_contributed := Literal(type=INT64, value=2) + | | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + | +-output_column_list=[$groupby.string#33, $aggregate.anon_count#32] + +-SetOperationItem + +-scan= + | +-ProjectScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-input_scan= + | +-WithScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="$public_groups1" + | | +-with_subquery= + | | +-AggregateScan + | | +-column_list=[$distinct.string#85] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#86], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#85 := ColumnRef(type=STRING, column=SimpleTypes.string#86) + | +-query= + | +-DifferentialPrivacyAggregateScan + | +-column_list=[$groupby.string#66, $aggregate.anon_count#65] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[$public_groups1.string#84, $aggregate.anon_count_partial#80, $groupby.string_partial#81, $group_by.$uid#82] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-SampleScan + | | | +-column_list=[$aggregate.anon_count_partial#80, $groupby.string_partial#81, $group_by.$uid#82] + | | | +-input_scan= + | | | | +-AggregateScan + | | | | +-column_list=[$aggregate.anon_count_partial#80, $groupby.string_partial#81, $group_by.$uid#82] + | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[SimpleTypesWithAnonymizationUid.string#38, $distinct.string#64, SimpleTypesWithAnonymizationUid.uid#78] + | | | | | +-left_scan= + | | | | | | +-TableScan(column_list=SimpleTypesWithAnonymizationUid.[string#38, uid#78], table=SimpleTypesWithAnonymizationUid, column_index_list=[4, 10]) + | | | | | +-right_scan= + | | | | | | +-WithRefScan(column_list=[$distinct.string#64], with_query_name="$public_groups1") + | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | | +-ColumnRef(type=STRING, column=SimpleTypesWithAnonymizationUid.string#38) + | | | | | | +-ColumnRef(type=STRING, column=$distinct.string#64) + | | | | | +-has_using=TRUE + | | | | +-group_by_list= + | | | | | +-string_partial#81 := ColumnRef(type=STRING, column=$distinct.string#64) + | | | | | +-$uid#82 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#78) + | | | | +-aggregate_list= + | | | | +-anon_count_partial#80 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | | +-method="RESERVOIR" + | | | +-size= + | | | | +-Literal(type=INT64, value=3) + | | | +-unit=ROWS + | | | +-partition_by_list= + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#82) + | | +-right_scan= + | | | +-WithRefScan(column_list=[$public_groups1.string#84], with_query_name="$public_groups1") + | | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$groupby.string_partial#81) + | | +-ColumnRef(type=STRING, column=$public_groups1.string#84) + | +-group_by_list= + | | +-string#66 := ColumnRef(type=STRING, column=$public_groups1.string#84) + | +-aggregate_list= + | | +-anon_count#65 := + | | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$group_by.$uid#82) + | | | +-Literal(type=INT64, value=NULL) + | | | +-ColumnRef(type=INT64, column=$aggregate.anon_count_partial#80) + | | +-Literal(type=STRUCT, value=NULL) + | +-option_list= + | +-max_groups_contributed := Literal(type=INT64, value=3) + | +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + +-output_column_list=[$groupby.string#66, $aggregate.anon_count#65] +== + +# privacy_unit_column is supported with max_groups_contributed = NULL. +SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + privacy_unit_column = `int64`, + max_groups_contributed = NULL, + group_selection_strategy = PUBLIC_GROUPS +) string, COUNT(*) AS anon_users +FROM + SimpleTypes + RIGHT OUTER JOIN + (SELECT DISTINCT string FROM SimpleTypes) + USING (string) +GROUP BY string +-- +QueryStmt ++-output_column_list= +| +-$groupby.string#39 AS string [STRING] +| +-$aggregate.anon_users#38 AS anon_users [INT64] ++-query= + +-ProjectScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypes.int64#2, SimpleTypes.string#5, $distinct.string#37] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=SimpleTypes.[int64#2, string#5], table=SimpleTypes, column_index_list=[1, 4]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#37] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#23], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#37 := ColumnRef(type=STRING, column=SimpleTypes.string#23) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypes.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#37) + | +-has_using=TRUE + +-group_by_list= + | +-string#39 := ColumnRef(type=STRING, column=$distinct.string#37) + +-aggregate_list= + | +-anon_users#38 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-privacy_unit_column := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + +-max_groups_contributed := Literal(type=INT64, value=NULL) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.string#39 AS string [STRING] +| +-$aggregate.anon_users#38 AS anon_users [INT64] ++-query= + +-ProjectScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + | +-AggregateScan + | +-column_list=[$aggregate.anon_users_partial#41, $groupby.string_partial#42, $group_by.$uid#43] + | +-input_scan= + | | +-JoinScan + | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.string#5, $distinct.string#37] + | | +-join_type=RIGHT + | | +-left_scan= + | | | +-TableScan(column_list=SimpleTypes.[int64#2, string#5], table=SimpleTypes, column_index_list=[1, 4]) + | | +-right_scan= + | | | +-AggregateScan + | | | +-column_list=[$distinct.string#37] + | | | +-input_scan= + | | | | +-TableScan(column_list=[SimpleTypes.string#23], table=SimpleTypes, column_index_list=[4]) + | | | +-group_by_list= + | | | +-string#37 := ColumnRef(type=STRING, column=SimpleTypes.string#23) + | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=SimpleTypes.string#5) + | | | +-ColumnRef(type=STRING, column=$distinct.string#37) + | | +-has_using=TRUE + | +-group_by_list= + | | +-string_partial#42 := ColumnRef(type=STRING, column=$distinct.string#37) + | | +-$uid#43 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | +-aggregate_list= + | +-anon_users_partial#41 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + +-group_by_list= + | +-string#39 := ColumnRef(type=STRING, column=$groupby.string_partial#42) + +-aggregate_list= + | +-anon_users#38 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#43) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#41) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=NULL) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) +== + +# privacy_unit_column is supported with max_groups_contributed > 0. +SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS ( + privacy_unit_column = `int64`, + max_groups_contributed = 3, + group_selection_strategy = PUBLIC_GROUPS +) string, COUNT(*) AS anon_users +FROM + SimpleTypes + RIGHT OUTER JOIN + (SELECT DISTINCT string FROM SimpleTypes) + USING (string) +GROUP BY string +-- +QueryStmt ++-output_column_list= +| +-$groupby.string#39 AS string [STRING] +| +-$aggregate.anon_users#38 AS anon_users [INT64] ++-query= + +-ProjectScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + | +-JoinScan + | +-column_list=[SimpleTypes.int64#2, SimpleTypes.string#5, $distinct.string#37] + | +-join_type=RIGHT + | +-left_scan= + | | +-TableScan(column_list=SimpleTypes.[int64#2, string#5], table=SimpleTypes, column_index_list=[1, 4]) + | +-right_scan= + | | +-AggregateScan + | | +-column_list=[$distinct.string#37] + | | +-input_scan= + | | | +-TableScan(column_list=[SimpleTypes.string#23], table=SimpleTypes, column_index_list=[4]) + | | +-group_by_list= + | | +-string#37 := ColumnRef(type=STRING, column=SimpleTypes.string#23) + | +-join_expr= + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=SimpleTypes.string#5) + | | +-ColumnRef(type=STRING, column=$distinct.string#37) + | +-has_using=TRUE + +-group_by_list= + | +-string#39 := ColumnRef(type=STRING, column=$distinct.string#37) + +-aggregate_list= + | +-anon_users#38 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-privacy_unit_column := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + +-max_groups_contributed := Literal(type=INT64, value=3) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) + + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$groupby.string#39 AS string [STRING] +| +-$aggregate.anon_users#38 AS anon_users [INT64] ++-query= + +-ProjectScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + +-WithScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="$public_groups0" + | +-with_subquery= + | +-AggregateScan + | +-column_list=[$distinct.string#46] + | +-input_scan= + | | +-TableScan(column_list=[SimpleTypes.string#47], table=SimpleTypes, column_index_list=[4]) + | +-group_by_list= + | +-string#46 := ColumnRef(type=STRING, column=SimpleTypes.string#47) + +-query= + +-DifferentialPrivacyAggregateScan + +-column_list=[$groupby.string#39, $aggregate.anon_users#38] + +-input_scan= + | +-JoinScan + | +-column_list=[$public_groups0.string#45, $aggregate.anon_users_partial#41, $groupby.string_partial#42, $group_by.$uid#43] + | +-join_type=RIGHT + | +-left_scan= + | | +-SampleScan + | | +-column_list=[$aggregate.anon_users_partial#41, $groupby.string_partial#42, $group_by.$uid#43] + | | +-input_scan= + | | | +-AggregateScan + | | | +-column_list=[$aggregate.anon_users_partial#41, $groupby.string_partial#42, $group_by.$uid#43] + | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[SimpleTypes.int64#2, SimpleTypes.string#5, $distinct.string#37] + | | | | +-left_scan= + | | | | | +-TableScan(column_list=SimpleTypes.[int64#2, string#5], table=SimpleTypes, column_index_list=[1, 4]) + | | | | +-right_scan= + | | | | | +-WithRefScan(column_list=[$distinct.string#37], with_query_name="$public_groups0") + | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | | | +-ColumnRef(type=STRING, column=SimpleTypes.string#5) + | | | | | +-ColumnRef(type=STRING, column=$distinct.string#37) + | | | | +-has_using=TRUE + | | | +-group_by_list= + | | | | +-string_partial#42 := ColumnRef(type=STRING, column=$distinct.string#37) + | | | | +-$uid#43 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | | +-aggregate_list= + | | | +-anon_users_partial#41 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + | | +-method="RESERVOIR" + | | +-size= + | | | +-Literal(type=INT64, value=3) + | | +-unit=ROWS + | | +-partition_by_list= + | | +-ColumnRef(type=INT64, column=$group_by.$uid#43) + | +-right_scan= + | | +-WithRefScan(column_list=[$public_groups0.string#45], with_query_name="$public_groups0") + | +-join_expr= + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=$groupby.string_partial#42) + | +-ColumnRef(type=STRING, column=$public_groups0.string#45) + +-group_by_list= + | +-string#39 := ColumnRef(type=STRING, column=$public_groups0.string#45) + +-aggregate_list= + | +-anon_users#38 := + | +-AggregateFunctionCall(ZetaSQL:$differential_privacy_sum(INT64, optional(1) STRUCT contribution_bounds_per_group) -> INT64) + | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$is_null(INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$group_by.$uid#43) + | | +-Literal(type=INT64, value=NULL) + | | +-ColumnRef(type=INT64, column=$aggregate.anon_users_partial#41) + | +-Literal(type=STRUCT, value=NULL) + +-option_list= + +-max_groups_contributed := Literal(type=INT64, value=3) + +-group_selection_strategy := Literal(type=ENUM, value=PUBLIC_GROUPS) diff --git a/zetasql/analyzer/testdata/differential_privacy_join.test b/zetasql/analyzer/testdata/differential_privacy_join.test index b7ff4804d..2c8ab0a21 100644 --- a/zetasql/analyzer/testdata/differential_privacy_join.test +++ b/zetasql/analyzer/testdata/differential_privacy_join.test @@ -345,9 +345,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[10], alias="b") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | +-has_using=TRUE +-aggregate_list= +-$agg1#25 := +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) @@ -374,9 +375,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=[SimpleTypesWithAnonymizationUid.uid#23], table=SimpleTypesWithAnonymizationUid, column_index_list=[10], alias="b") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) - | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) + | | | +-ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#23) + | | +-has_using=TRUE | +-group_by_list= | | +-$uid#28 := ColumnRef(type=INT64, column=SimpleTypesWithAnonymizationUid.uid#11) | +-aggregate_list= @@ -2403,9 +2405,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=84-112, column_list=[KitchenSinkWithUidValueTable.value#2], table=KitchenSinkWithUidValueTable, column_index_list=[0], alias="t2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=$join_left.string_val#3) - | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$join_left.string_val#3) + | | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | +-has_using=TRUE +-aggregate_list= +-$agg1#5 := +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) @@ -2472,9 +2475,10 @@ QueryStmt | | | +-input_scan= | | | +-TableScan(parse_location=84-112, column_list=[KitchenSinkWithUidValueTable.value#2], table=KitchenSinkWithUidValueTable, column_index_list=[0], alias="t2") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=$join_left.string_val#3) - | | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$join_left.string_val#3) + | | | +-ColumnRef(type=STRING, column=$join_right.string_val#4) + | | +-has_using=TRUE | +-group_by_list= | | +-$uid#10 := ColumnRef(type=STRING, column=$join_left.string_val#3) | +-aggregate_list= @@ -2547,9 +2551,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=84-112, column_list=[KitchenSinkWithUidValueTable.value#2], table=KitchenSinkWithUidValueTable, column_index_list=[0], alias="t2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$join_left.int64_val#3) - | +-ColumnRef(type=INT64, column=$join_right.int64_val#4) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$join_left.int64_val#3) + | | +-ColumnRef(type=INT64, column=$join_right.int64_val#4) + | +-has_using=TRUE +-aggregate_list= +-$agg1#5 := +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) @@ -2636,9 +2641,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=177-205, column_list=[KitchenSinkWithUidValueTable.value#4], table=KitchenSinkWithUidValueTable, column_index_list=[0]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=t1.x#2) - | +-ColumnRef(type=STRING, column=t2.x#5) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=t1.x#2) + | | +-ColumnRef(type=STRING, column=t2.x#5) + | +-has_using=TRUE +-aggregate_list= +-$agg1#7 := +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) @@ -2723,9 +2729,10 @@ QueryStmt | | | +-input_scan= | | | +-TableScan(parse_location=177-205, column_list=[KitchenSinkWithUidValueTable.value#4], table=KitchenSinkWithUidValueTable, column_index_list=[0]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | | +-ColumnRef(type=STRING, column=t1.x#2) - | | +-ColumnRef(type=STRING, column=t2.x#5) + | | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=t1.x#2) + | | | +-ColumnRef(type=STRING, column=t2.x#5) + | | +-has_using=TRUE | +-group_by_list= | | +-$uid#12 := ColumnRef(type=STRING, column=t1.x#2) | +-aggregate_list= @@ -2814,9 +2821,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(parse_location=177-205, column_list=[KitchenSinkWithUidValueTable.value#4], table=KitchenSinkWithUidValueTable, column_index_list=[0]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=t1.y#3) - | +-ColumnRef(type=STRING, column=t2.y#6) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=t1.y#3) + | | +-ColumnRef(type=STRING, column=t2.y#6) + | +-has_using=TRUE +-aggregate_list= +-$agg1#7 := +-AggregateFunctionCall(ZetaSQL:$differential_privacy_count_star(optional(1) STRUCT contribution_bounds_per_group) -> INT64) diff --git a/zetasql/analyzer/testdata/dml_insert.test b/zetasql/analyzer/testdata/dml_insert.test index 29fe0a625..90e8c31e7 100644 --- a/zetasql/analyzer/testdata/dml_insert.test +++ b/zetasql/analyzer/testdata/dml_insert.test @@ -2083,6 +2083,7 @@ INSERT UpdateToDefaultTable (readonly_settable_to_default) VALUES (DEFAULT); # TODO: Add java support for generated expressions. [no_java] +[enabled_ast_rewrites=DEFAULTS,+INSERT_DML_VALUES] INSERT INTO TableWithGeneratedColumn (C) VALUES(3) -- InsertStmt @@ -2097,6 +2098,114 @@ InsertStmt | +-Literal(type=INT64, value=3) +-column_access_list=READ,READ,READ_WRITE +-topologically_sorted_generated_column_id_list=[2, 1, 4] ++-generated_column_expr_list= + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + | +-Literal(type=INT64, value=1) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.B#2) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.A#1) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.B#2) + +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + +[REWRITTEN AST] +InsertStmt ++-table_scan= +| +-TableScan(column_list=TableWithGeneratedColumn.[A#1, B#2, C#3], table=TableWithGeneratedColumn, column_index_list=[0, 1, 2]) ++-insert_column_list=[TableWithGeneratedColumn.C#3] ++-query= +| +-ProjectScan +| +-column_list=[TableWithGeneratedColumn.$col#5] +| +-expr_list= +| | +-$col#5 := Literal(type=INT64, value=3) +| +-input_scan= +| +-SingleRowScan ++-query_output_column_list=[TableWithGeneratedColumn.$col#5] ++-column_access_list=READ,READ,READ_WRITE ++-topologically_sorted_generated_column_id_list=[2, 1, 4] ++-generated_column_expr_list= + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + | +-Literal(type=INT64, value=1) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.B#2) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.A#1) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.B#2) + +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) +== + +# TODO: Add java support for generated expressions. +[no_java] +[enabled_ast_rewrites=DEFAULTS,+INSERT_DML_VALUES] +INSERT INTO TableWithGeneratedColumn (C) VALUES(3),(4) +-- +InsertStmt ++-table_scan= +| +-TableScan(column_list=TableWithGeneratedColumn.[A#1, B#2, C#3], table=TableWithGeneratedColumn, column_index_list=[0, 1, 2]) ++-insert_column_list=[TableWithGeneratedColumn.C#3] ++-row_list= +| +-InsertRow +| | +-value_list= +| | +-DMLValue +| | +-value= +| | +-Literal(type=INT64, value=3) +| +-InsertRow +| +-value_list= +| +-DMLValue +| +-value= +| +-Literal(type=INT64, value=4) ++-column_access_list=READ,READ,READ_WRITE ++-topologically_sorted_generated_column_id_list=[2, 1, 4] ++-generated_column_expr_list= + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + | +-Literal(type=INT64, value=1) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.B#2) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.A#1) + | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.B#2) + +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) + +[REWRITTEN AST] +InsertStmt ++-table_scan= +| +-TableScan(column_list=TableWithGeneratedColumn.[A#1, B#2, C#3], table=TableWithGeneratedColumn, column_index_list=[0, 1, 2]) ++-insert_column_list=[TableWithGeneratedColumn.C#3] ++-query= +| +-SetOperationScan +| +-column_list=[TableWithGeneratedColumn.$col#7] +| +-op_type=UNION_ALL +| +-input_item_list= +| +-SetOperationItem +| | +-scan= +| | | +-ProjectScan +| | | +-column_list=[TableWithGeneratedColumn.$col#5] +| | | +-expr_list= +| | | | +-$col#5 := Literal(type=INT64, value=3) +| | | +-input_scan= +| | | +-SingleRowScan +| | +-output_column_list=[TableWithGeneratedColumn.$col#5] +| +-SetOperationItem +| +-scan= +| | +-ProjectScan +| | +-column_list=[TableWithGeneratedColumn.$col#6] +| | +-expr_list= +| | | +-$col#6 := Literal(type=INT64, value=4) +| | +-input_scan= +| | +-SingleRowScan +| +-output_column_list=[TableWithGeneratedColumn.$col#6] ++-query_output_column_list=[TableWithGeneratedColumn.$col#7] ++-column_access_list=READ,READ,READ_WRITE ++-topologically_sorted_generated_column_id_list=[2, 1, 4] +-generated_column_expr_list= +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) | +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) @@ -2141,6 +2250,37 @@ InsertStmt +-ColumnRef(type=INT64, column=TableWithGeneratedColumn.C#3) == +[enabled_ast_rewrites=DEFAULTS,+INSERT_DML_VALUES] +INSERT INTO TableWithDefaultColumn (default_col) VALUES(DEFAULT) +-- +InsertStmt ++-table_scan= +| +-TableScan(column_list=[TableWithDefaultColumn.default_col#3], table=TableWithDefaultColumn, column_index_list=[2]) ++-insert_column_list=[TableWithDefaultColumn.default_col#3] ++-row_list= +| +-InsertRow +| +-value_list= +| +-DMLValue +| +-value= +| +-DMLDefault(type=INT64) ++-column_access_list=WRITE + +[REWRITTEN AST] +InsertStmt ++-table_scan= +| +-TableScan(column_list=[TableWithDefaultColumn.default_col#3], table=TableWithDefaultColumn, column_index_list=[2]) ++-insert_column_list=[TableWithDefaultColumn.default_col#3] ++-query= +| +-ProjectScan +| +-column_list=[TableWithDefaultColumn.$col#4] +| +-expr_list= +| | +-$col#4 := Literal(type=INT64, value=10) +| +-input_scan= +| +-SingleRowScan ++-query_output_column_list=[TableWithDefaultColumn.$col#4] ++-column_access_list=WRITE +== + INSERT INTO KeyValue@{hint_name=hint_value} (key, value) VALUES (1, "one"); -- InsertStmt diff --git a/zetasql/analyzer/testdata/dp_functions.test b/zetasql/analyzer/testdata/dp_functions.test index 218c762af..f9bb8d1d1 100644 --- a/zetasql/analyzer/testdata/dp_functions.test +++ b/zetasql/analyzer/testdata/dp_functions.test @@ -31,7 +31,14 @@ select with differential_privacy COUNT(string, 1.0, contribution_bounds_per_grou -- -ERROR: Number of arguments does not match for aggregate operator COUNT. Supported signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) [at 1:34] +ERROR: Number of arguments does not match for aggregate operator COUNT. Supported signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64 [at 1:34] +select with differential_privacy COUNT(string, 1.0, contribution_bounds_per_g... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator COUNT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, DOUBLE, STRUCT + Signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64 + Signature accepts at most 2 arguments, found 3 arguments [at 1:34] select with differential_privacy COUNT(string, 1.0, contribution_bounds_per_g... ^ == @@ -41,7 +48,14 @@ select with differential_privacy COUNT(string, 1.0, contribution_bounds_per_g... select with differential_privacy COUNT(string, 1.0, string, contriubtion_bounds_per_group => (0, 1)), COUNT(*, 1.0, string, contribution_bounds_per_group => (0, 1)) from SimpleTypes -- -ERROR: Number of arguments does not match for aggregate operator COUNT. Supported signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) [at 1:34] +ERROR: Number of arguments does not match for aggregate operator COUNT. Supported signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64 [at 1:34] +select with differential_privacy COUNT(string, 1.0, string, contriubtion_boun... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator COUNT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, DOUBLE, STRING, STRUCT + Signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64 + Signature accepts at most 2 arguments, found 4 arguments [at 1:34] select with differential_privacy COUNT(string, 1.0, string, contriubtion_boun... ^ == @@ -54,6 +68,13 @@ select with differential_privacy COUNT(string, (0, 1)) from SimpleTypes ERROR: Positional argument is invalid because this function restricts that this argument is referred to by name "contribution_bounds_per_group" only [at 1:48] select with differential_privacy COUNT(string, (0, 1)) from SimpleTypes ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator COUNT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT + Signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64 + Positional argument at 2 is invalid because argument `contribution_bounds_per_group` can only be referred to by name [at 1:34] +select with differential_privacy COUNT(string, (0, 1)) from SimpleTypes + ^ == # Reject COUNT() with no arguments @@ -61,7 +82,14 @@ select with differential_privacy COUNT(string, (0, 1)) from SimpleTypes select with differential_privacy COUNT() from SimpleTypes -- -ERROR: Number of arguments does not match for aggregate operator COUNT. Supported signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) [at 1:34] +ERROR: Number of arguments does not match for aggregate operator COUNT. Supported signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64 [at 1:34] +select with differential_privacy COUNT() from SimpleTypes + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator COUNT in SELECT WITH DIFFERENTIAL_PRIVACY context with no arguments + Signature: COUNT(T2, [contribution_bounds_per_group => STRUCT]) -> INT64 + Signature requires at least 1 argument, found 0 arguments [at 1:34] select with differential_privacy COUNT() from SimpleTypes ^ == @@ -246,7 +274,18 @@ QueryStmt [language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS] select with differential_privacy SUM(string, contribution_bounds_per_group => (0, 1)) from SimpleTypes -- -ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64; SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64; SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy SUM(string, contribution_bounds_per_group =>... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT + Signature: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64 + Argument 1: Unable to coerce type STRING to expected type INT64 + Signature: SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64 + Argument 1: Unable to coerce type STRING to expected type UINT64 + Signature: SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE + Argument 1: Unable to coerce type STRING to expected type DOUBLE [at 1:34] select with differential_privacy SUM(string, contribution_bounds_per_group =>... ^ == @@ -255,7 +294,14 @@ select with differential_privacy SUM(string, contribution_bounds_per_group =>... [language_features=DIFFERENTIAL_PRIVACY,NAMED_ARGUMENTS] select with differential_privacy AVG(string, contribution_bounds_per_group => (0, 1)) from SimpleTypes -- -ERROR: No matching signature for aggregate operator AVG in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT. Supported signature: AVG(DOUBLE, [contribution_bounds_per_group => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator AVG in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT. Supported signature: AVG(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy AVG(string, contribution_bounds_per_group =>... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator AVG in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: STRING, STRUCT + Signature: AVG(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE + Argument 1: Unable to coerce type STRING to expected type DOUBLE [at 1:34] select with differential_privacy AVG(string, contribution_bounds_per_group =>... ^ == @@ -504,7 +550,14 @@ QueryStmt select with differential_privacy VAR_POP(double_array) from ArrayWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator VAR_POP in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY. Supported signatures: VAR_POP(DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator VAR_POP in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY. Supported signatures: VAR_POP(DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy VAR_POP(double_array) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator VAR_POP in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY + Signature: VAR_POP(DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Argument 1: Unable to coerce type ARRAY to expected type DOUBLE [at 1:34] select with differential_privacy VAR_POP(double_array) ^ == @@ -515,7 +568,14 @@ select with differential_privacy VAR_POP(double_array) select with differential_privacy STDDEV_POP(double_array) from ArrayWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator STDDEV_POP in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY. Supported signatures: STDDEV_POP(DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator STDDEV_POP in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY. Supported signatures: STDDEV_POP(DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy STDDEV_POP(double_array) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator STDDEV_POP in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY + Signature: STDDEV_POP(DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Argument 1: Unable to coerce type ARRAY to expected type DOUBLE [at 1:34] select with differential_privacy STDDEV_POP(double_array) ^ == @@ -526,7 +586,14 @@ select with differential_privacy STDDEV_POP(double_array) select with differential_privacy PERCENTILE_CONT(double_array, 0.4) from ArrayWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY, DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY, DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy PERCENTILE_CONT(double_array, 0.4) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY, DOUBLE + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Argument 1: Unable to coerce type ARRAY to expected type DOUBLE [at 1:34] select with differential_privacy PERCENTILE_CONT(double_array, 0.4) ^ == @@ -538,7 +605,14 @@ select with differential_privacy PERCENTILE_CONT(double) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy PERCENTILE_CONT(double) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Signature requires at least 2 arguments, found 1 argument [at 1:34] select with differential_privacy PERCENTILE_CONT(double) ^ == @@ -549,7 +623,14 @@ select with differential_privacy PERCENTILE_CONT(double, double, double) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE, DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE, DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy PERCENTILE_CONT(double, double, double) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE, DOUBLE + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Positional argument at 3 is invalid because argument `contribution_bounds_per_row` can only be referred to by name [at 1:34] select with differential_privacy PERCENTILE_CONT(double, double, double) ^ == @@ -570,7 +651,14 @@ select with differential_privacy PERCENTILE_CONT(double, 1.5, 1.5) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE, DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE, DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy PERCENTILE_CONT(double, 1.5, 1.5) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE, DOUBLE + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Positional argument at 3 is invalid because argument `contribution_bounds_per_row` can only be referred to by name [at 1:34] select with differential_privacy PERCENTILE_CONT(double, 1.5, 1.5) ^ == @@ -581,7 +669,14 @@ select with differential_privacy PERCENTILE_CONT(double) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 1:34] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 1:34] +select with differential_privacy PERCENTILE_CONT(double) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Signature requires at least 2 arguments, found 1 argument [at 1:34] select with differential_privacy PERCENTILE_CONT(double) ^ == @@ -592,7 +687,14 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, DOUBLE, DOUBLE, STRUCT. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 2:5] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, DOUBLE, DOUBLE, STRUCT. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 2:5] + PERCENTILE_CONT(int64, 0.4, 0.4, contribution_bounds_per_group => (2, 3)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, DOUBLE, DOUBLE, STRUCT + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Signature accepts at most 3 arguments, found 4 arguments [at 2:5] PERCENTILE_CONT(int64, 0.4, 0.4, contribution_bounds_per_group => (2, 3)) ^ == @@ -603,7 +705,14 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 2:5] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 2:5] + PERCENTILE_CONT(int64, contribution_bounds_per_group => (2, 3)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Named argument `contribution_bounds_per_group` does not exist in signature [at 2:5] PERCENTILE_CONT(int64, contribution_bounds_per_group => (2, 3)) ^ == @@ -613,7 +722,14 @@ select with differential_privacy PERCENTILE_CONT(int64, double, contribution_bounds_per_group => (2, 3)) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, DOUBLE, STRUCT. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) [at 2:5] +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, DOUBLE, STRUCT. Supported signatures: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE [at 2:5] + PERCENTILE_CONT(int64, double, contribution_bounds_per_group => (2, 3)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator PERCENTILE_CONT in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, DOUBLE, STRUCT + Signature: PERCENTILE_CONT(DOUBLE, DOUBLE, [contribution_bounds_per_row => STRUCT]) -> DOUBLE + Named argument `contribution_bounds_per_group` does not exist in signature [at 2:5] PERCENTILE_CONT(int64, double, contribution_bounds_per_group => (2, 3)) ^ == @@ -646,7 +762,14 @@ ERROR: Syntax error: Expected ")" but got identifier "contribution_bounds_per_gr select with differential_privacy APPROX_QUANTILES(double_array, 4) from ArrayWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY, INT64. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 1:34] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY, INT64. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 1:34] +select with differential_privacy APPROX_QUANTILES(double_array, 4) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: ARRAY, INT64 + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Signature requires at least 3 arguments, found 2 arguments [at 1:34] select with differential_privacy APPROX_QUANTILES(double_array, 4) ^ == @@ -658,7 +781,14 @@ select with differential_privacy APPROX_QUANTILES(double) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 1:34] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 1:34] +select with differential_privacy APPROX_QUANTILES(double) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Signature requires at least 3 arguments, found 1 argument [at 1:34] select with differential_privacy APPROX_QUANTILES(double) ^ == @@ -669,7 +799,14 @@ select with differential_privacy APPROX_QUANTILES(double, int64, double) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, INT64, DOUBLE. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 1:34] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, INT64, DOUBLE. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 1:34] +select with differential_privacy APPROX_QUANTILES(double, int64, double) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, INT64, DOUBLE + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Positional argument at 3 is invalid because argument `contribution_bounds_per_row` can only be referred to by name [at 1:34] select with differential_privacy APPROX_QUANTILES(double, int64, double) ^ == @@ -679,7 +816,14 @@ select with differential_privacy APPROX_QUANTILES(double, int64, double) select with differential_privacy APPROX_QUANTILES(double, double) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 1:34] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 1:34] +select with differential_privacy APPROX_QUANTILES(double, double) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, DOUBLE + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Signature requires at least 3 arguments, found 2 arguments [at 1:34] select with differential_privacy APPROX_QUANTILES(double, double) ^ == @@ -689,7 +833,14 @@ select with differential_privacy APPROX_QUANTILES(double, double) select with differential_privacy APPROX_QUANTILES(double, int64) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, INT64. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 1:34] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, INT64. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 1:34] +select with differential_privacy APPROX_QUANTILES(double, int64) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: DOUBLE, INT64 + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Signature requires at least 3 arguments, found 2 arguments [at 1:34] select with differential_privacy APPROX_QUANTILES(double, int64) ^ == @@ -700,7 +851,14 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, DOUBLE, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 2:5] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, DOUBLE, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 2:5] + APPROX_QUANTILES(int64, 4, 0.4, contribution_bounds_per_group => (2, 3)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, DOUBLE, STRUCT + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Signature accepts at most 3 arguments, found 4 arguments [at 2:5] APPROX_QUANTILES(int64, 4, 0.4, contribution_bounds_per_group => (2, 3)) ^ == @@ -711,7 +869,14 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 2:5] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 2:5] + APPROX_QUANTILES(int64, contribution_bounds_per_group => (2, 3)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRUCT + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Signature requires at least 3 arguments, found 2 arguments [at 2:5] APPROX_QUANTILES(int64, contribution_bounds_per_group => (2, 3)) ^ == @@ -721,7 +886,14 @@ select with differential_privacy APPROX_QUANTILES(int64, int64, contribution_bounds_per_group => (2, 3)) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 2:5] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 2:5] + APPROX_QUANTILES(int64, int64, contribution_bounds_per_group => (2, 3)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Named argument `contribution_bounds_per_group` does not exist in signature [at 2:5] APPROX_QUANTILES(int64, int64, contribution_bounds_per_group => (2, 3)) ^ == @@ -732,7 +904,14 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 2:5] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 2:5] + APPROX_QUANTILES(int64, 4, contribution_bounds_per_group => (double, 3)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Named argument `contribution_bounds_per_group` does not exist in signature [at 2:5] APPROX_QUANTILES(int64, 4, contribution_bounds_per_group => (double, 3)) ^ == @@ -743,7 +922,14 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 2:5] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 2:5] + APPROX_QUANTILES(int64, 4, contribution_bounds_per_group => (2, double)) + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Named argument `contribution_bounds_per_group` does not exist in signature [at 2:5] APPROX_QUANTILES(int64, 4, contribution_bounds_per_group => (2, double)) ^ == @@ -753,7 +939,14 @@ select with differential_privacy APPROX_QUANTILES(int64, @test_param_int64, contribution_bounds_per_group => (@test_param_double, @test_param_double)) from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) [at 2:5] +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT. Supported signatures: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY [at 2:5] + APPROX_QUANTILES(int64, @test_param_int64, contribution_bounds_per_group ... + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator APPROX_QUANTILES in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, INT64, STRUCT + Signature: APPROX_QUANTILES(DOUBLE, INT64, contribution_bounds_per_row => STRUCT) -> ARRAY + Named argument `contribution_bounds_per_group` does not exist in signature [at 2:5] APPROX_QUANTILES(int64, @test_param_int64, contribution_bounds_per_group ... ^ == @@ -789,7 +982,30 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]); SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, [contribution_bounds_per_group => STRUCT]); SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]) [at 2:5] +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64; SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64; SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE; SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON; SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON; SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON; SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport; SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport; SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport [at 2:5] + SUM(int64, report_format => "Invalid") as s + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING + Signature: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64 + Named argument `report_format` does not exist in signature + Signature: SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64 + Named argument `report_format` does not exist in signature + Signature: SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE + Named argument `report_format` does not exist in signature + Signature: SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON + Invalid enum value: Invalid + Signature: SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON + Invalid enum value: Invalid + Signature: SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON + Argument 1: Unable to coerce type INT64 to expected type UINT64 + Signature: SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Invalid enum value: Invalid + Signature: SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Invalid enum value: Invalid + Signature: SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Argument 1: Unable to coerce type INT64 to expected type UINT64 [at 2:5] SUM(int64, report_format => "Invalid") as s ^ == @@ -800,7 +1016,30 @@ select with differential_privacy from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]); SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="JSON"*/, [contribution_bounds_per_group => STRUCT]); SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]) [at 2:5] +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64; SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64; SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE; SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON; SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON; SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON; SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport; SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport; SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport [at 2:5] + SUM(int64, report_format => @test_param_string) as s + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING + Signature: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64 + Named argument `report_format` does not exist in signature + Signature: SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64 + Named argument `report_format` does not exist in signature + Signature: SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE + Named argument `report_format` does not exist in signature + Signature: SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON + literal value is required at 2 + Signature: SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON + literal value is required at 2 + Signature: SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "JSON"*/, [contribution_bounds_per_group => STRUCT]) -> JSON + Argument 1: Unable to coerce type INT64 to expected type UINT64 + Signature: SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + literal value is required at 2 + Signature: SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + literal value is required at 2 + Signature: SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Argument 1: Unable to coerce type INT64 to expected type UINT64 [at 2:5] SUM(int64, report_format => @test_param_string) as s ^ == @@ -810,7 +1049,18 @@ select with differential_privacy SUM(int64, report_format => "JSON") as s from SimpleTypesWithAnonymizationUid; -- -ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) [at 2:5] +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64; SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64; SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE [at 2:5] + SUM(int64, report_format => "JSON") as s + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: INT64, STRING + Signature: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64 + Named argument `report_format` does not exist in signature + Signature: SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64 + Named argument `report_format` does not exist in signature + Signature: SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE + Named argument `report_format` does not exist in signature [at 2:5] SUM(int64, report_format => "JSON") as s ^ == @@ -822,7 +1072,26 @@ SUM(numeric, report_format => "JSON") FROM SimpleTypes -- -ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: NUMERIC, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]); SUM(NUMERIC, [contribution_bounds_per_group => STRUCT]); SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]); SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]); SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]) [at 2:1] +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: NUMERIC, STRING. Supported signatures: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64; SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64; SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE; SUM(NUMERIC, [contribution_bounds_per_group => STRUCT]) -> NUMERIC; SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport; SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport; SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport [at 2:1] +SUM(numeric, report_format => "JSON") +^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator SUM in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: NUMERIC, STRING + Signature: SUM(INT64, [contribution_bounds_per_group => STRUCT]) -> INT64 + Named argument `report_format` does not exist in signature + Signature: SUM(UINT64, [contribution_bounds_per_group => STRUCT]) -> UINT64 + Named argument `report_format` does not exist in signature + Signature: SUM(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE + Named argument `report_format` does not exist in signature + Signature: SUM(NUMERIC, [contribution_bounds_per_group => STRUCT]) -> NUMERIC + Named argument `report_format` does not exist in signature + Signature: SUM(INT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Argument 1: Unable to coerce type NUMERIC to expected type INT64 + Signature: SUM(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Found: JSON expecting: PROTO + Signature: SUM(UINT64, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Argument 1: Unable to coerce type NUMERIC to expected type UINT64 [at 2:1] SUM(numeric, report_format => "JSON") ^ == @@ -834,6 +1103,17 @@ AVG(numeric, report_format => "JSON") FROM SimpleTypes -- -ERROR: No matching signature for aggregate operator AVG in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: NUMERIC, STRING. Supported signatures: AVG(DOUBLE, [contribution_bounds_per_group => STRUCT]); AVG(NUMERIC, [contribution_bounds_per_group => STRUCT]); AVG(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*required_value="PROTO"*/, [contribution_bounds_per_group => STRUCT]) [at 2:1] +ERROR: No matching signature for aggregate operator AVG in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: NUMERIC, STRING. Supported signatures: AVG(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE; AVG(NUMERIC, [contribution_bounds_per_group => STRUCT]) -> NUMERIC; AVG(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport [at 2:1] +AVG(numeric, report_format => "JSON") +^ +-- +Signature Mismatch Details: +ERROR: No matching signature for aggregate operator AVG in SELECT WITH DIFFERENTIAL_PRIVACY context for argument types: NUMERIC, STRING + Signature: AVG(DOUBLE, [contribution_bounds_per_group => STRUCT]) -> DOUBLE + Named argument `report_format` does not exist in signature + Signature: AVG(NUMERIC, [contribution_bounds_per_group => STRUCT]) -> NUMERIC + Named argument `report_format` does not exist in signature + Signature: AVG(DOUBLE, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT/*with value "PROTO"*/, [contribution_bounds_per_group => STRUCT]) -> zetasql.functions.DifferentialPrivacyOutputWithReport + Found: JSON expecting: PROTO [at 2:1] AVG(numeric, report_format => "JSON") ^ diff --git a/zetasql/analyzer/testdata/drop.test b/zetasql/analyzer/testdata/drop.test index ec4bac9fe..f1419778d 100644 --- a/zetasql/analyzer/testdata/drop.test +++ b/zetasql/analyzer/testdata/drop.test @@ -8,6 +8,16 @@ DROP TABLE namespace.foo; DropStmt(object_type="TABLE", is_if_exists=FALSE, name_path=namespace.foo) == +DROP EXTERNAL TABLE foo; +-- +DropStmt(object_type="EXTERNAL TABLE", is_if_exists=FALSE, name_path=foo) +== + +DROP EXTERNAL TABLE namespace.foo; +-- +DropStmt(object_type="EXTERNAL TABLE", is_if_exists=FALSE, name_path=namespace.foo) +== + DROP VIEW bar; -- @@ -49,3 +59,38 @@ DropStmt(object_type="SCHEMA", is_if_exists=TRUE, name_path=foo.bar, drop_mode=R ALTERNATION GROUP: IF EXISTS,CASCADE -- DropStmt(object_type="SCHEMA", is_if_exists=TRUE, name_path=foo.bar, drop_mode=CASCADE) +== + +DROP EXTERNAL SCHEMA {{|IF EXISTS}} foo.bar {{|RESTRICT|CASCADE}}; +-- +ALTERNATION GROUP: +-- +DropStmt(object_type="EXTERNAL SCHEMA", is_if_exists=FALSE, name_path=foo.bar) +-- +ALTERNATION GROUP: RESTRICT +-- +ERROR: Syntax error: 'RESTRICT' is not supported for DROP EXTERNAL SCHEMA [at 1:31] +DROP EXTERNAL SCHEMA foo.bar RESTRICT; + ^ +-- +ALTERNATION GROUP: CASCADE +-- +ERROR: Syntax error: 'CASCADE' is not supported for DROP EXTERNAL SCHEMA [at 1:31] +DROP EXTERNAL SCHEMA foo.bar CASCADE; + ^ +-- +ALTERNATION GROUP: IF EXISTS, +-- +DropStmt(object_type="EXTERNAL SCHEMA", is_if_exists=TRUE, name_path=foo.bar) +-- +ALTERNATION GROUP: IF EXISTS,RESTRICT +-- +ERROR: Syntax error: 'RESTRICT' is not supported for DROP EXTERNAL SCHEMA [at 1:40] +DROP EXTERNAL SCHEMA IF EXISTS foo.bar RESTRICT; + ^ +-- +ALTERNATION GROUP: IF EXISTS,CASCADE +-- +ERROR: Syntax error: 'CASCADE' is not supported for DROP EXTERNAL SCHEMA [at 1:40] +DROP EXTERNAL SCHEMA IF EXISTS foo.bar CASCADE; + ^ diff --git a/zetasql/analyzer/testdata/duplicate_alias.test b/zetasql/analyzer/testdata/duplicate_alias.test index e6a815240..b52e89433 100644 --- a/zetasql/analyzer/testdata/duplicate_alias.test +++ b/zetasql/analyzer/testdata/duplicate_alias.test @@ -42,9 +42,10 @@ QueryStmt | +-right_scan= | | +-WithRefScan(column_list=w1.[item#5, event#6], with_query_name="w1") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=w1.event#4) - | +-ColumnRef(type=STRING, column=w1.event#6) + | | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=w1.event#4) + | | +-ColumnRef(type=STRING, column=w1.event#6) + | +-has_using=TRUE +-query= +-ProjectScan +-column_list=w2.[c1#7, c2#8] @@ -87,9 +88,10 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=w1.[item#5, event#6], with_query_name="w1") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=w1.event#4) - +-ColumnRef(type=STRING, column=w1.event#6) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=w1.event#4) + | +-ColumnRef(type=STRING, column=w1.event#6) + +-has_using=TRUE == WITH w1 AS ( @@ -132,9 +134,10 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=w1.[item#5, event#6], with_query_name="w1") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=w1.event#4) - +-ColumnRef(type=STRING, column=w1.event#6) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=w1.event#4) + | +-ColumnRef(type=STRING, column=w1.event#6) + +-has_using=TRUE == select * @@ -192,9 +195,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="m2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#2) - +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + +-has_using=TRUE == select * from ( @@ -220,6 +224,7 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="m2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#2) - +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + +-has_using=TRUE diff --git a/zetasql/analyzer/testdata/expr_subquery.test b/zetasql/analyzer/testdata/expr_subquery.test index e5781be55..1ddf0541c 100644 --- a/zetasql/analyzer/testdata/expr_subquery.test +++ b/zetasql/analyzer/testdata/expr_subquery.test @@ -332,9 +332,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=[KeyValue.Key#20], table=KeyValue, column_index_list=[0], alias="kv2") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#18) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#20) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#18) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#20) + | | +-has_using=TRUE | +-$col6#30 := | +-SubqueryExpr | +-type=ARRAY diff --git a/zetasql/analyzer/testdata/geography.test b/zetasql/analyzer/testdata/geography.test index 7f35f1391..3f6bb2108 100644 --- a/zetasql/analyzer/testdata/geography.test +++ b/zetasql/analyzer/testdata/geography.test @@ -752,11 +752,13 @@ select st_geogfromtext('point(0 0)', planar => true, make_valid => false, oriented => true), st_geogfromtext('point(0 0)', oriented => true, planar => true, make_valid => false), st_geogfromtext('point(0 0)', make_valid => false, oriented => true, planar => true), - # Some named arguments are missing - become NULLs in resolved AST. - # Note: this might change when (broken link) is implemented. st_geogfromtext('point(0 0)', oriented => false, planar => true), st_geogfromtext('point(0 0)', planar => true), - st_geogfromtext('point(0 0)', make_valid => true, oriented => true) + st_geogfromtext('point(0 0)', make_valid => true, oriented => true), + st_geogfromwkb(b'0123456789', oriented => false), + st_geogfromwkb(b'0123456789', oriented => true), + st_geogfromwkb(b'0123456789', planar => true), + st_geogfromwkb(b'0123456789', make_valid => true) -- QueryStmt +-output_column_list= @@ -766,9 +768,13 @@ QueryStmt | +-$query.$col4#4 AS `$col4` [GEOGRAPHY] | +-$query.$col5#5 AS `$col5` [GEOGRAPHY] | +-$query.$col6#6 AS `$col6` [GEOGRAPHY] +| +-$query.$col7#7 AS `$col7` [GEOGRAPHY] +| +-$query.$col8#8 AS `$col8` [GEOGRAPHY] +| +-$query.$col9#9 AS `$col9` [GEOGRAPHY] +| +-$query.$col10#10 AS `$col10` [GEOGRAPHY] +-query= +-ProjectScan - +-column_list=$query.[$col1#1, $col2#2, $col3#3, $col4#4, $col5#5, $col6#6] + +-column_list=$query.[$col1#1, $col2#2, $col3#3, $col4#4, $col5#5, $col6#6, $col7#7, $col8#8, $col9#9, $col10#10] +-expr_list= | +-$col1#1 := | | +-FunctionCall(ZetaSQL:st_geogfromtext(STRING, optional(1) BOOL oriented, optional(1) BOOL planar, optional(1) BOOL make_valid) -> GEOGRAPHY) @@ -801,9 +807,33 @@ QueryStmt | | +-Literal(type=BOOL, value=true) | | +-Literal(type=BOOL, value=false) | +-$col6#6 := - | +-FunctionCall(ZetaSQL:st_geogfromtext(STRING, optional(1) BOOL oriented, optional(1) BOOL planar, optional(1) BOOL make_valid) -> GEOGRAPHY) - | +-Literal(type=STRING, value="point(0 0)") - | +-Literal(type=BOOL, value=true) + | | +-FunctionCall(ZetaSQL:st_geogfromtext(STRING, optional(1) BOOL oriented, optional(1) BOOL planar, optional(1) BOOL make_valid) -> GEOGRAPHY) + | | +-Literal(type=STRING, value="point(0 0)") + | | +-Literal(type=BOOL, value=true) + | | +-Literal(type=BOOL, value=false) + | | +-Literal(type=BOOL, value=true) + | +-$col7#7 := + | | +-FunctionCall(ZetaSQL:st_geogfromwkb(BYTES, optional(1) BOOL oriented, optional(1) BOOL planar, optional(1) BOOL make_valid) -> GEOGRAPHY) + | | +-Literal(type=BYTES, value=b"0123456789") + | | +-Literal(type=BOOL, value=false) + | | +-Literal(type=BOOL, value=false) + | | +-Literal(type=BOOL, value=false) + | +-$col8#8 := + | | +-FunctionCall(ZetaSQL:st_geogfromwkb(BYTES, optional(1) BOOL oriented, optional(1) BOOL planar, optional(1) BOOL make_valid) -> GEOGRAPHY) + | | +-Literal(type=BYTES, value=b"0123456789") + | | +-Literal(type=BOOL, value=true) + | | +-Literal(type=BOOL, value=false) + | | +-Literal(type=BOOL, value=false) + | +-$col9#9 := + | | +-FunctionCall(ZetaSQL:st_geogfromwkb(BYTES, optional(1) BOOL oriented, optional(1) BOOL planar, optional(1) BOOL make_valid) -> GEOGRAPHY) + | | +-Literal(type=BYTES, value=b"0123456789") + | | +-Literal(type=BOOL, value=false) + | | +-Literal(type=BOOL, value=true) + | | +-Literal(type=BOOL, value=false) + | +-$col10#10 := + | +-FunctionCall(ZetaSQL:st_geogfromwkb(BYTES, optional(1) BOOL oriented, optional(1) BOOL planar, optional(1) BOOL make_valid) -> GEOGRAPHY) + | +-Literal(type=BYTES, value=b"0123456789") + | +-Literal(type=BOOL, value=false) | +-Literal(type=BOOL, value=false) | +-Literal(type=BOOL, value=true) +-input_scan= diff --git a/zetasql/analyzer/testdata/group_by_all.test b/zetasql/analyzer/testdata/group_by_all.test index 365f81c6f..a87ad991d 100644 --- a/zetasql/analyzer/testdata/group_by_all.test +++ b/zetasql/analyzer/testdata/group_by_all.test @@ -490,6 +490,110 @@ QueryStmt +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) == +# Repro for b/323439034: When SELECT clause does not contain aggregate column +# and no grouping keys are chosen, it should produce AggregateScan with no +# aggregate_list and no group_by_list. +[language_features=V_1_4_GROUP_BY_ALL,V_1_4_GROUPING_SETS] +SELECT 'a' AS x +FROM KeyValue +GROUP BY {{ALL|()}} +-- +QueryStmt ++-output_column_list= +| +-$query.x#3 AS x [STRING] ++-query= + +-ProjectScan + +-column_list=[$query.x#3] + +-expr_list= + | +-x#3 := Literal(type=STRING, value="a") + +-input_scan= + +-AggregateScan + +-input_scan= + +-TableScan(table=KeyValue) +== + +# Repro for b/323567303: +# When SELECT clause does not contain aggregate column and no grouping keys are +# chosen, HAVING should not inject group_by_list. +[language_features=V_1_4_GROUP_BY_ALL,V_1_4_GROUPING_SETS] +SELECT 'a' AS x +FROM KeyValue +GROUP BY {{ALL|()}} +HAVING TRUE +-- +QueryStmt ++-output_column_list= +| +-$query.x#4 AS x [STRING] ++-query= + +-ProjectScan + +-column_list=[$query.x#4] + +-input_scan= + +-FilterScan + +-column_list=[$query.x#4] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.x#4] + | +-expr_list= + | | +-x#4 := Literal(type=STRING, value="a") + | +-input_scan= + | +-AggregateScan + | +-input_scan= + | +-ProjectScan + | +-column_list=[$pre_groupby.x#3] + | +-expr_list= + | | +-x#3 := Literal(type=STRING, value="a") + | +-input_scan= + | +-TableScan(table=KeyValue) + +-filter_expr= + +-Literal(type=BOOL, value=true) +== + +# Repro for b/323567303: +# When SELECT clause contains only constant computed exprs and no grouping keys +# are chosen, HAVING should not inject group_by_list. +[language_features=V_1_4_GROUP_BY_ALL,V_1_4_GROUPING_SETS] +SELECT (1 + 3) AS x, COUNT(*) +FROM KeyValue +GROUP BY ALL +HAVING TRUE +-- +QueryStmt ++-output_column_list= +| +-$query.x#5 AS x [INT64] +| +-$aggregate.$agg1#3 AS `$col2` [INT64] ++-query= + +-ProjectScan + +-column_list=[$query.x#5, $aggregate.$agg1#3] + +-input_scan= + +-FilterScan + +-column_list=[$aggregate.$agg1#3, $query.x#5] + +-input_scan= + | +-ProjectScan + | +-column_list=[$aggregate.$agg1#3, $query.x#5] + | +-expr_list= + | | +-x#5 := + | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=INT64, value=3) + | +-input_scan= + | +-AggregateScan + | +-column_list=[$aggregate.$agg1#3] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[$pre_groupby.x#4] + | | +-expr_list= + | | | +-x#4 := + | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=3) + | | +-input_scan= + | | +-TableScan(table=KeyValue) + | +-aggregate_list= + | +-$agg1#3 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + +-filter_expr= + +-Literal(type=BOOL, value=true) +== + # SELECT clause only contains aggregate column. # GROUP BY ALL behaves as if no GROUP BY clause is specified. select sum(`int32`) @@ -1159,11 +1263,13 @@ QueryStmt | +-AnalyticScan | +-column_list=[$analytic.a#4] | +-input_scan= - | | +-FilterScan + | | +-AggregateScan | | +-input_scan= - | | | +-TableScan(table=KeyValue) - | | +-filter_expr= - | | +-Literal(type=BOOL, value=true) + | | +-FilterScan + | | +-input_scan= + | | | +-TableScan(table=KeyValue) + | | +-filter_expr= + | | +-Literal(type=BOOL, value=true) | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= @@ -1570,16 +1676,18 @@ QueryStmt +-column_list=[$distinct.$col1#3] +-input_scan= | +-ProjectScan - | +-column_list=[$subquery1.a#1, $query.$col1#2] + | +-column_list=[$query.$col1#2] | +-expr_list= | | +-$col1#2 := Literal(type=INT64, value=1) | +-input_scan= - | +-ProjectScan - | +-column_list=[$subquery1.a#1] - | +-expr_list= - | | +-a#1 := Literal(type=INT64, value=1) + | +-AggregateScan | +-input_scan= - | +-SingleRowScan + | +-ProjectScan + | +-column_list=[$subquery1.a#1] + | +-expr_list= + | | +-a#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan +-group_by_list= +-$col1#3 := ColumnRef(type=INT64, column=$query.$col1#2) == @@ -2415,3 +2523,210 @@ QueryStmt +-ColumnRef(type=INT64, column=T.y#4) == +# Regression test for b/310705631 +[language_features=V_1_2_GROUP_BY_ARRAY,V_1_4_GROUP_BY_ALL] +WITH Table5 AS (SELECT [1, 2, 3] AS arr_col) +SELECT + ( SELECT e FROM t5.arr_col AS e ) AS scalar_subq_col, + t5.arr_col +FROM Table5 AS t5 +GROUP BY {{1,2|ALL}} +-- +ALTERNATION GROUP: 1,2 +-- +QueryStmt ++-output_column_list= +| +-$groupby.scalar_subq_col#4 AS scalar_subq_col [INT64] +| +-$groupby.arr_col#5 AS arr_col [ARRAY] ++-query= + +-WithScan + +-column_list=$groupby.[scalar_subq_col#4, arr_col#5] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Table5" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[Table5.arr_col#1] + | +-expr_list= + | | +-arr_col#1 := Literal(type=ARRAY, value=[1, 2, 3]) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=$groupby.[scalar_subq_col#4, arr_col#5] + +-input_scan= + +-AggregateScan + +-column_list=$groupby.[scalar_subq_col#4, arr_col#5] + +-input_scan= + | +-WithRefScan(column_list=[Table5.arr_col#2], with_query_name="Table5") + +-group_by_list= + +-scalar_subq_col#4 := + | +-SubqueryExpr + | +-type=INT64 + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=Table5.arr_col#2) + | +-subquery= + | +-ProjectScan + | +-column_list=[$array.e#3] + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.e#3] + | +-array_expr_list= + | | +-ColumnRef(type=ARRAY, column=Table5.arr_col#2, is_correlated=TRUE) + | +-element_column_list=[$array.e#3] + +-arr_col#5 := ColumnRef(type=ARRAY, column=Table5.arr_col#2) +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.scalar_subq_col#6 AS scalar_subq_col [INT64] +| +-$groupby.arr_col#4 AS arr_col [ARRAY] ++-query= + +-WithScan + +-column_list=[$query.scalar_subq_col#6, $groupby.arr_col#4] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Table5" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[Table5.arr_col#1] + | +-expr_list= + | | +-arr_col#1 := Literal(type=ARRAY, value=[1, 2, 3]) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$query.scalar_subq_col#6, $groupby.arr_col#4] + +-expr_list= + | +-scalar_subq_col#6 := + | +-SubqueryExpr + | +-type=INT64 + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$groupby.arr_col#4) + | +-subquery= + | +-ProjectScan + | +-column_list=[$array.e#5] + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.e#5] + | +-array_expr_list= + | | +-ColumnRef(type=ARRAY, column=$groupby.arr_col#4, is_correlated=TRUE) + | +-element_column_list=[$array.e#5] + +-input_scan= + +-AggregateScan + +-column_list=[$groupby.arr_col#4] + +-input_scan= + | +-WithRefScan(column_list=[Table5.arr_col#2], with_query_name="Table5") + +-group_by_list= + +-arr_col#4 := ColumnRef(type=ARRAY, column=Table5.arr_col#2) +== + +# Regression test for b/310705631 +[language_features=V_1_2_GROUP_BY_ARRAY,V_1_4_GROUP_BY_ALL] +WITH Table5 AS (SELECT [1, 2, 3] AS arr_col) +SELECT + ( SELECT t5.arr_col ) AS scalar_subq_col, + t5.arr_col +FROM Table5 AS t5 +GROUP BY 2 +-- +QueryStmt ++-output_column_list= +| +-$query.scalar_subq_col#6 AS scalar_subq_col [ARRAY] +| +-$groupby.arr_col#4 AS arr_col [ARRAY] ++-query= + +-WithScan + +-column_list=[$query.scalar_subq_col#6, $groupby.arr_col#4] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Table5" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[Table5.arr_col#1] + | +-expr_list= + | | +-arr_col#1 := Literal(type=ARRAY, value=[1, 2, 3]) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$query.scalar_subq_col#6, $groupby.arr_col#4] + +-expr_list= + | +-scalar_subq_col#6 := + | +-SubqueryExpr + | +-type=ARRAY + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$groupby.arr_col#4) + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.arr_col#5] + | +-expr_list= + | | +-arr_col#5 := ColumnRef(type=ARRAY, column=$groupby.arr_col#4, is_correlated=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-AggregateScan + +-column_list=[$groupby.arr_col#4] + +-input_scan= + | +-WithRefScan(column_list=[Table5.arr_col#2], with_query_name="Table5") + +-group_by_list= + +-arr_col#4 := ColumnRef(type=ARRAY, column=Table5.arr_col#2) +== + +# Regression test for b/310705631 +[language_features=V_1_2_GROUP_BY_ARRAY,V_1_4_GROUP_BY_ALL] +WITH Table5 AS (SELECT [1, 2, 3] AS arr_col) +SELECT + ( SELECT e FROM UNNEST(t5.arr_col) AS e ) AS scalar_subq_col, + t5.arr_col +FROM Table5 AS t5 +GROUP BY 2 +-- +QueryStmt ++-output_column_list= +| +-$query.scalar_subq_col#6 AS scalar_subq_col [INT64] +| +-$groupby.arr_col#4 AS arr_col [ARRAY] ++-query= + +-WithScan + +-column_list=[$query.scalar_subq_col#6, $groupby.arr_col#4] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Table5" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[Table5.arr_col#1] + | +-expr_list= + | | +-arr_col#1 := Literal(type=ARRAY, value=[1, 2, 3]) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$query.scalar_subq_col#6, $groupby.arr_col#4] + +-expr_list= + | +-scalar_subq_col#6 := + | +-SubqueryExpr + | +-type=INT64 + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$groupby.arr_col#4) + | +-subquery= + | +-ProjectScan + | +-column_list=[$array.e#5] + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.e#5] + | +-array_expr_list= + | | +-ColumnRef(type=ARRAY, column=$groupby.arr_col#4, is_correlated=TRUE) + | +-element_column_list=[$array.e#5] + +-input_scan= + +-AggregateScan + +-column_list=[$groupby.arr_col#4] + +-input_scan= + | +-WithRefScan(column_list=[Table5.arr_col#2], with_query_name="Table5") + +-group_by_list= + +-arr_col#4 := ColumnRef(type=ARRAY, column=Table5.arr_col#2) +== + diff --git a/zetasql/analyzer/testdata/hints.test b/zetasql/analyzer/testdata/hints.test index 7c4db7a5d..1ea6e0f6a 100644 --- a/zetasql/analyzer/testdata/hints.test +++ b/zetasql/analyzer/testdata/hints.test @@ -765,9 +765,10 @@ QueryStmt | | | +-right_scan= | | | | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | | +-has_using=TRUE | | +-right_scan= | | | +-TableScan(table=KeyValue, alias="kv3") | | +-join_expr= diff --git a/zetasql/analyzer/testdata/interval.test b/zetasql/analyzer/testdata/interval.test index 9e6b1cf68..a3171848e 100644 --- a/zetasql/analyzer/testdata/interval.test +++ b/zetasql/analyzer/testdata/interval.test @@ -1,5 +1,6 @@ # TODO: Figure out how to make all tests work in Java. [default no_java] +[default also_show_signature_mismatch_details] [language_features={{|INTERVAL_TYPE}}] SELECT INTERVAL "1" YEAR @@ -285,6 +286,54 @@ SELECT INTERVAL '17:23.123456789' MINUTE TO SECOND ^ == +# MILLISECOND date part +[language_features={{|INTERVAL_TYPE}}] +SELECT INTERVAL '7684416' MILLISECOND +-- +ALTERNATION GROUP: +-- +ERROR: Unexpected INTERVAL expression [at 1:8] +SELECT INTERVAL '7684416' MILLISECOND + ^ +-- +ALTERNATION GROUP: INTERVAL_TYPE +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INTERVAL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=INTERVAL, value=0-0 0 2:8:4.416) + +-input_scan= + +-SingleRowScan +== + +# MICROSECOND date part +[language_features={{|INTERVAL_TYPE}}] +SELECT INTERVAL '7684416539' MICROSECOND +-- +ALTERNATION GROUP: +-- +ERROR: Unexpected INTERVAL expression [at 1:8] +SELECT INTERVAL '7684416539' MICROSECOND + ^ +-- +ALTERNATION GROUP: INTERVAL_TYPE +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INTERVAL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=INTERVAL, value=0-0 0 2:8:4.416539) + +-input_scan= + +-SingleRowScan +== + [language_features={{|INTERVAL_TYPE}}] select interval 1 year -- @@ -403,6 +452,22 @@ SELECT INTERVAL -'2' HOUR ALTERNATION GROUP: INTERVAL_TYPE -- ERROR: No matching signature for operator - for argument types: STRING. Supported signatures: -(INT32); -(INT64); -(FLOAT); -(DOUBLE); -(INTERVAL) [at 1:17] +SELECT INTERVAL -'2' HOUR + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for operator - + Argument types: STRING + Signature: -(INT32) + Argument 1: Unable to coerce type STRING to expected type INT32 + Signature: -(INT64) + Argument 1: Unable to coerce type STRING to expected type INT64 + Signature: -(FLOAT) + Argument 1: Unable to coerce type STRING to expected type FLOAT + Signature: -(DOUBLE) + Argument 1: Unable to coerce type STRING to expected type DOUBLE + Signature: -(INTERVAL) + Argument 1: Unable to coerce type STRING to expected type INTERVAL [at 1:17] SELECT INTERVAL -'2' HOUR ^ == @@ -464,6 +529,18 @@ ERROR: No matching signature for operator - for argument types: DATE, DATE. Supp SELECT CURRENT_DATE - DATE '2000-01-01' ^ -- +Signature Mismatch Details: +ERROR: No matching signature for operator - + Argument types: DATE, DATE + Signature: INT64 - INT64 + Argument 1: Unable to coerce type DATE to expected type INT64 + Signature: UINT64 - UINT64 + Argument 1: Unable to coerce type DATE to expected type UINT64 + Signature: DOUBLE - DOUBLE + Argument 1: Unable to coerce type DATE to expected type DOUBLE [at 1:8] +SELECT CURRENT_DATE - DATE '2000-01-01' + ^ +-- ALTERNATION GROUP: INTERVAL_TYPE -- QueryStmt @@ -558,6 +635,26 @@ QueryStmt SELECT CURRENT_TIMESTAMP - CURRENT_DATE -- ERROR: No matching signature for operator - for argument types: TIMESTAMP, DATE. Supported signatures: INT64 - INT64; UINT64 - UINT64; DOUBLE - DOUBLE; DATE - DATE; TIMESTAMP - TIMESTAMP; TIMESTAMP - INTERVAL; INTERVAL - INTERVAL [at 1:8] +SELECT CURRENT_TIMESTAMP - CURRENT_DATE + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for operator - + Argument types: TIMESTAMP, DATE + Signature: INT64 - INT64 + Argument 1: Unable to coerce type TIMESTAMP to expected type INT64 + Signature: UINT64 - UINT64 + Argument 1: Unable to coerce type TIMESTAMP to expected type UINT64 + Signature: DOUBLE - DOUBLE + Argument 1: Unable to coerce type TIMESTAMP to expected type DOUBLE + Signature: DATE - DATE + Argument 1: Unable to coerce type TIMESTAMP to expected type DATE + Signature: TIMESTAMP - TIMESTAMP + Argument 2: Unable to coerce type DATE to expected type TIMESTAMP + Signature: TIMESTAMP - INTERVAL + Argument 2: Unable to coerce type DATE to expected type INTERVAL + Signature: INTERVAL - INTERVAL + Argument 1: Unable to coerce type TIMESTAMP to expected type INTERVAL [at 1:8] SELECT CURRENT_TIMESTAMP - CURRENT_DATE ^ == @@ -596,6 +693,26 @@ QueryStmt SELECT INTERVAL 1 YEAR - CURRENT_TIMESTAMP -- ERROR: No matching signature for operator - for argument types: INTERVAL, TIMESTAMP. Supported signatures: INT64 - INT64; UINT64 - UINT64; DOUBLE - DOUBLE; DATE - DATE; TIMESTAMP - TIMESTAMP; TIMESTAMP - INTERVAL; INTERVAL - INTERVAL [at 1:8] +SELECT INTERVAL 1 YEAR - CURRENT_TIMESTAMP + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for operator - + Argument types: INTERVAL, TIMESTAMP + Signature: INT64 - INT64 + Argument 1: Unable to coerce type INTERVAL to expected type INT64 + Signature: UINT64 - UINT64 + Argument 1: Unable to coerce type INTERVAL to expected type UINT64 + Signature: DOUBLE - DOUBLE + Argument 1: Unable to coerce type INTERVAL to expected type DOUBLE + Signature: DATE - DATE + Argument 1: Unable to coerce type INTERVAL to expected type DATE + Signature: TIMESTAMP - TIMESTAMP + Argument 1: Unable to coerce type INTERVAL to expected type TIMESTAMP + Signature: TIMESTAMP - INTERVAL + Argument 1: Unable to coerce type INTERVAL to expected type TIMESTAMP + Signature: INTERVAL - INTERVAL + Argument 2: Unable to coerce type TIMESTAMP to expected type INTERVAL [at 1:8] SELECT INTERVAL 1 YEAR - CURRENT_TIMESTAMP ^ == @@ -611,6 +728,24 @@ ERROR: No matching signature for operator + for argument types: DATE, INTERVAL. SELECT CURRENT_DATE + INTERVAL '1-2 3 4:5:6.789' YEAR TO SECOND, ^ -- +Signature Mismatch Details: +ERROR: No matching signature for operator + + Argument types: DATE, INTERVAL + Signature: INT64 + INT64 + Argument 1: Unable to coerce type DATE to expected type INT64 + Signature: UINT64 + UINT64 + Argument 1: Unable to coerce type DATE to expected type UINT64 + Signature: DOUBLE + DOUBLE + Argument 1: Unable to coerce type DATE to expected type DOUBLE + Signature: TIMESTAMP + INTERVAL + Argument 1: Unable to coerce type DATE to expected type TIMESTAMP + Signature: INTERVAL + TIMESTAMP + Argument 1: Unable to coerce type DATE to expected type INTERVAL + Signature: INTERVAL + INTERVAL + Argument 1: Unable to coerce type DATE to expected type INTERVAL [at 1:8] +SELECT CURRENT_DATE + INTERVAL '1-2 3 4:5:6.789' YEAR TO SECOND, + ^ +-- ALTERNATION GROUP: ,V_1_2_CIVIL_TIME -- QueryStmt @@ -642,6 +777,34 @@ QueryStmt SELECT INTERVAL 1 YEAR - CURRENT_DATE -- ERROR: No matching signature for operator - for argument types: INTERVAL, DATE. Supported signatures: INT64 - INT64; UINT64 - UINT64; DOUBLE - DOUBLE; DATE - DATE; TIMESTAMP - TIMESTAMP; DATETIME - DATETIME; TIME - TIME; TIMESTAMP - INTERVAL; DATE - INTERVAL; DATETIME - INTERVAL; INTERVAL - INTERVAL [at 1:8] +SELECT INTERVAL 1 YEAR - CURRENT_DATE + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for operator - + Argument types: INTERVAL, DATE + Signature: INT64 - INT64 + Argument 1: Unable to coerce type INTERVAL to expected type INT64 + Signature: UINT64 - UINT64 + Argument 1: Unable to coerce type INTERVAL to expected type UINT64 + Signature: DOUBLE - DOUBLE + Argument 1: Unable to coerce type INTERVAL to expected type DOUBLE + Signature: DATE - DATE + Argument 1: Unable to coerce type INTERVAL to expected type DATE + Signature: TIMESTAMP - TIMESTAMP + Argument 1: Unable to coerce type INTERVAL to expected type TIMESTAMP + Signature: DATETIME - DATETIME + Argument 1: Unable to coerce type INTERVAL to expected type DATETIME + Signature: TIME - TIME + Argument 1: Unable to coerce type INTERVAL to expected type TIME + Signature: TIMESTAMP - INTERVAL + Argument 1: Unable to coerce type INTERVAL to expected type TIMESTAMP + Signature: DATE - INTERVAL + Argument 1: Unable to coerce type INTERVAL to expected type DATE + Signature: DATETIME - INTERVAL + Argument 1: Unable to coerce type INTERVAL to expected type DATETIME + Signature: INTERVAL - INTERVAL + Argument 2: Unable to coerce type DATE to expected type INTERVAL [at 1:8] SELECT INTERVAL 1 YEAR - CURRENT_DATE ^ == @@ -680,6 +843,34 @@ QueryStmt SELECT INTERVAL 1 YEAR - CURRENT_DATETIME -- ERROR: No matching signature for operator - for argument types: INTERVAL, DATETIME. Supported signatures: INT64 - INT64; UINT64 - UINT64; DOUBLE - DOUBLE; DATE - DATE; TIMESTAMP - TIMESTAMP; DATETIME - DATETIME; TIME - TIME; TIMESTAMP - INTERVAL; DATE - INTERVAL; DATETIME - INTERVAL; INTERVAL - INTERVAL [at 1:8] +SELECT INTERVAL 1 YEAR - CURRENT_DATETIME + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for operator - + Argument types: INTERVAL, DATETIME + Signature: INT64 - INT64 + Argument 1: Unable to coerce type INTERVAL to expected type INT64 + Signature: UINT64 - UINT64 + Argument 1: Unable to coerce type INTERVAL to expected type UINT64 + Signature: DOUBLE - DOUBLE + Argument 1: Unable to coerce type INTERVAL to expected type DOUBLE + Signature: DATE - DATE + Argument 1: Unable to coerce type INTERVAL to expected type DATE + Signature: TIMESTAMP - TIMESTAMP + Argument 1: Unable to coerce type INTERVAL to expected type TIMESTAMP + Signature: DATETIME - DATETIME + Argument 1: Unable to coerce type INTERVAL to expected type DATETIME + Signature: TIME - TIME + Argument 1: Unable to coerce type INTERVAL to expected type TIME + Signature: TIMESTAMP - INTERVAL + Argument 1: Unable to coerce type INTERVAL to expected type TIMESTAMP + Signature: DATE - INTERVAL + Argument 1: Unable to coerce type INTERVAL to expected type DATE + Signature: DATETIME - INTERVAL + Argument 1: Unable to coerce type INTERVAL to expected type DATETIME + Signature: INTERVAL - INTERVAL + Argument 2: Unable to coerce type DATETIME to expected type INTERVAL [at 1:8] SELECT INTERVAL 1 YEAR - CURRENT_DATETIME ^ == @@ -752,6 +943,16 @@ QueryStmt select 10 / INTERVAL '10' YEAR -- ERROR: No matching signature for operator / for argument types: INT64, INTERVAL. Supported signatures: DOUBLE / DOUBLE; INTERVAL / INT64 [at 1:8] +select 10 / INTERVAL '10' YEAR + ^ +-- +Signature Mismatch Details: +ERROR: No matching signature for operator / + Argument types: INT64, INTERVAL + Signature: DOUBLE / DOUBLE + Argument 2: Unable to coerce type INTERVAL to expected type DOUBLE + Signature: INTERVAL / INT64 + Argument 1: Unable to coerce type INT64 to expected type INTERVAL [at 1:8] select 10 / INTERVAL '10' YEAR ^ == @@ -960,6 +1161,13 @@ ERROR: No matching signature for function EXTRACT for argument types: DATE FROM SELECT EXTRACT(DATE FROM INTERVAL '0' YEAR) ^ -- +Signature Mismatch Details: +ERROR: No matching signature for function EXTRACT for argument types: DATE FROM INTERVAL + Signature: EXTRACT(DATE FROM TIMESTAMP [AT TIME ZONE STRING]) + Argument 1: Unable to coerce type INTERVAL to expected type TIMESTAMP [at 1:8] +SELECT EXTRACT(DATE FROM INTERVAL '0' YEAR) + ^ +-- ALTERNATION GROUP: QUARTER -- QueryStmt diff --git a/zetasql/analyzer/testdata/join.test b/zetasql/analyzer/testdata/join.test index 29d97000f..3d79f7aea 100644 --- a/zetasql/analyzer/testdata/join.test +++ b/zetasql/analyzer/testdata/join.test @@ -184,9 +184,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + +-has_using=TRUE == select 1 from KeyValue kv1 natural join KeyValue kv2 using (key); @@ -430,9 +431,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue2.[Key#3, Value2#4], table=KeyValue2, column_index_list=[0, 1], alias="k2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue2.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue2.Key#3) + +-has_using=TRUE == SELECT * @@ -464,9 +466,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue2.[Key#3, Value2#4], table=KeyValue2, column_index_list=[0, 1], alias="k2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue2.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue2.Key#3) + +-has_using=TRUE == # Test various combinations of join types, mixing comma and JOIN. @@ -668,9 +671,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValueNested.[Key#3, Value#4], table=nested_catalog.KeyValueNested, column_index_list=[0, 1]) +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValueNested.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValueNested.Key#3) + +-has_using=TRUE == [no_run_unparser] @@ -701,18 +705,21 @@ QueryStmt | +-ArrayTypes.ProtoArray#15 AS ProtoArray [ARRAY>] | +-ArrayTypes.StructArray#16 AS StructArray [ARRAY>] | +-ArrayTypes.JsonArray#17 AS JsonArray [ARRAY] -| +-Int32Array.Int32Array#18 AS Int32Array [ARRAY] +| +-ArrayTypes.NumericArray#18 AS NumericArray [ARRAY] +| +-ArrayTypes.BigNumericArray#19 AS BigNumericArray [ARRAY] +| +-ArrayTypes.IntervalArray#20 AS IntervalArray [ARRAY] +| +-Int32Array.Int32Array#21 AS Int32Array [ARRAY] +-query= +-ProjectScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.Int64Array#2, ArrayTypes.UInt32Array#3, ArrayTypes.UInt64Array#4, ArrayTypes.StringArray#5, ArrayTypes.BytesArray#6, ArrayTypes.BoolArray#7, ArrayTypes.FloatArray#8, ArrayTypes.DoubleArray#9, ArrayTypes.DateArray#10, ArrayTypes.TimestampSecondsArray#11, ArrayTypes.TimestampMillisArray#12, ArrayTypes.TimestampMicrosArray#13, ArrayTypes.TimestampArray#14, ArrayTypes.ProtoArray#15, ArrayTypes.StructArray#16, ArrayTypes.JsonArray#17, Int32Array.Int32Array#18] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.Int64Array#2, ArrayTypes.UInt32Array#3, ArrayTypes.UInt64Array#4, ArrayTypes.StringArray#5, ArrayTypes.BytesArray#6, ArrayTypes.BoolArray#7, ArrayTypes.FloatArray#8, ArrayTypes.DoubleArray#9, ArrayTypes.DateArray#10, ArrayTypes.TimestampSecondsArray#11, ArrayTypes.TimestampMillisArray#12, ArrayTypes.TimestampMicrosArray#13, ArrayTypes.TimestampArray#14, ArrayTypes.ProtoArray#15, ArrayTypes.StructArray#16, ArrayTypes.JsonArray#17, ArrayTypes.NumericArray#18, ArrayTypes.BigNumericArray#19, ArrayTypes.IntervalArray#20, Int32Array.Int32Array#21] +-input_scan= +-JoinScan - +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.Int64Array#2, ArrayTypes.UInt32Array#3, ArrayTypes.UInt64Array#4, ArrayTypes.StringArray#5, ArrayTypes.BytesArray#6, ArrayTypes.BoolArray#7, ArrayTypes.FloatArray#8, ArrayTypes.DoubleArray#9, ArrayTypes.DateArray#10, ArrayTypes.TimestampSecondsArray#11, ArrayTypes.TimestampMillisArray#12, ArrayTypes.TimestampMicrosArray#13, ArrayTypes.TimestampArray#14, ArrayTypes.ProtoArray#15, ArrayTypes.StructArray#16, ArrayTypes.JsonArray#17, Int32Array.Int32Array#18] + +-column_list=[ArrayTypes.Int32Array#1, ArrayTypes.Int64Array#2, ArrayTypes.UInt32Array#3, ArrayTypes.UInt64Array#4, ArrayTypes.StringArray#5, ArrayTypes.BytesArray#6, ArrayTypes.BoolArray#7, ArrayTypes.FloatArray#8, ArrayTypes.DoubleArray#9, ArrayTypes.DateArray#10, ArrayTypes.TimestampSecondsArray#11, ArrayTypes.TimestampMillisArray#12, ArrayTypes.TimestampMicrosArray#13, ArrayTypes.TimestampArray#14, ArrayTypes.ProtoArray#15, ArrayTypes.StructArray#16, ArrayTypes.JsonArray#17, ArrayTypes.NumericArray#18, ArrayTypes.BigNumericArray#19, ArrayTypes.IntervalArray#20, Int32Array.Int32Array#21] +-join_type=RIGHT +-left_scan= - | +-TableScan(column_list=ArrayTypes.[Int32Array#1, Int64Array#2, UInt32Array#3, UInt64Array#4, StringArray#5, BytesArray#6, BoolArray#7, FloatArray#8, DoubleArray#9, DateArray#10, TimestampSecondsArray#11, TimestampMillisArray#12, TimestampMicrosArray#13, TimestampArray#14, ProtoArray#15, StructArray#16, JsonArray#17], table=ArrayTypes, column_index_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]) + | +-TableScan(column_list=ArrayTypes.[Int32Array#1, Int64Array#2, UInt32Array#3, UInt64Array#4, StringArray#5, BytesArray#6, BoolArray#7, FloatArray#8, DoubleArray#9, DateArray#10, TimestampSecondsArray#11, TimestampMillisArray#12, TimestampMicrosArray#13, TimestampArray#14, ProtoArray#15, StructArray#16, JsonArray#17, NumericArray#18, BigNumericArray#19, IntervalArray#20], table=ArrayTypes, column_index_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) +-right_scan= - | +-TableScan(column_list=[Int32Array.Int32Array#18], table=ArrayTableOrCatalog.Int32Array, column_index_list=[0]) + | +-TableScan(column_list=[Int32Array.Int32Array#21], table=ArrayTableOrCatalog.Int32Array, column_index_list=[0]) +-join_expr= +-Literal(type=BOOL, value=true) == @@ -745,17 +752,20 @@ QueryStmt | +-ArrayTypes.ProtoArray#16 AS ProtoArray [ARRAY>] | +-ArrayTypes.StructArray#17 AS StructArray [ARRAY>] | +-ArrayTypes.JsonArray#18 AS JsonArray [ARRAY] +| +-ArrayTypes.NumericArray#19 AS NumericArray [ARRAY] +| +-ArrayTypes.BigNumericArray#20 AS BigNumericArray [ARRAY] +| +-ArrayTypes.IntervalArray#21 AS IntervalArray [ARRAY] +-query= +-ProjectScan - +-column_list=[Int32Array.Int32Array#1, ArrayTypes.Int32Array#2, ArrayTypes.Int64Array#3, ArrayTypes.UInt32Array#4, ArrayTypes.UInt64Array#5, ArrayTypes.StringArray#6, ArrayTypes.BytesArray#7, ArrayTypes.BoolArray#8, ArrayTypes.FloatArray#9, ArrayTypes.DoubleArray#10, ArrayTypes.DateArray#11, ArrayTypes.TimestampSecondsArray#12, ArrayTypes.TimestampMillisArray#13, ArrayTypes.TimestampMicrosArray#14, ArrayTypes.TimestampArray#15, ArrayTypes.ProtoArray#16, ArrayTypes.StructArray#17, ArrayTypes.JsonArray#18] + +-column_list=[Int32Array.Int32Array#1, ArrayTypes.Int32Array#2, ArrayTypes.Int64Array#3, ArrayTypes.UInt32Array#4, ArrayTypes.UInt64Array#5, ArrayTypes.StringArray#6, ArrayTypes.BytesArray#7, ArrayTypes.BoolArray#8, ArrayTypes.FloatArray#9, ArrayTypes.DoubleArray#10, ArrayTypes.DateArray#11, ArrayTypes.TimestampSecondsArray#12, ArrayTypes.TimestampMillisArray#13, ArrayTypes.TimestampMicrosArray#14, ArrayTypes.TimestampArray#15, ArrayTypes.ProtoArray#16, ArrayTypes.StructArray#17, ArrayTypes.JsonArray#18, ArrayTypes.NumericArray#19, ArrayTypes.BigNumericArray#20, ArrayTypes.IntervalArray#21] +-input_scan= +-JoinScan - +-column_list=[Int32Array.Int32Array#1, ArrayTypes.Int32Array#2, ArrayTypes.Int64Array#3, ArrayTypes.UInt32Array#4, ArrayTypes.UInt64Array#5, ArrayTypes.StringArray#6, ArrayTypes.BytesArray#7, ArrayTypes.BoolArray#8, ArrayTypes.FloatArray#9, ArrayTypes.DoubleArray#10, ArrayTypes.DateArray#11, ArrayTypes.TimestampSecondsArray#12, ArrayTypes.TimestampMillisArray#13, ArrayTypes.TimestampMicrosArray#14, ArrayTypes.TimestampArray#15, ArrayTypes.ProtoArray#16, ArrayTypes.StructArray#17, ArrayTypes.JsonArray#18] + +-column_list=[Int32Array.Int32Array#1, ArrayTypes.Int32Array#2, ArrayTypes.Int64Array#3, ArrayTypes.UInt32Array#4, ArrayTypes.UInt64Array#5, ArrayTypes.StringArray#6, ArrayTypes.BytesArray#7, ArrayTypes.BoolArray#8, ArrayTypes.FloatArray#9, ArrayTypes.DoubleArray#10, ArrayTypes.DateArray#11, ArrayTypes.TimestampSecondsArray#12, ArrayTypes.TimestampMillisArray#13, ArrayTypes.TimestampMicrosArray#14, ArrayTypes.TimestampArray#15, ArrayTypes.ProtoArray#16, ArrayTypes.StructArray#17, ArrayTypes.JsonArray#18, ArrayTypes.NumericArray#19, ArrayTypes.BigNumericArray#20, ArrayTypes.IntervalArray#21] +-join_type=RIGHT +-left_scan= | +-TableScan(column_list=[Int32Array.Int32Array#1], table=ArrayTableOrCatalog.Int32Array, column_index_list=[0]) +-right_scan= - | +-TableScan(column_list=ArrayTypes.[Int32Array#2, Int64Array#3, UInt32Array#4, UInt64Array#5, StringArray#6, BytesArray#7, BoolArray#8, FloatArray#9, DoubleArray#10, DateArray#11, TimestampSecondsArray#12, TimestampMillisArray#13, TimestampMicrosArray#14, TimestampArray#15, ProtoArray#16, StructArray#17, JsonArray#18], table=ArrayTypes, column_index_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]) + | +-TableScan(column_list=ArrayTypes.[Int32Array#2, Int64Array#3, UInt32Array#4, UInt64Array#5, StringArray#6, BytesArray#7, BoolArray#8, FloatArray#9, DoubleArray#10, DateArray#11, TimestampSecondsArray#12, TimestampMillisArray#13, TimestampMicrosArray#14, TimestampArray#15, ProtoArray#16, StructArray#17, JsonArray#18, NumericArray#19, BigNumericArray#20, IntervalArray#21], table=ArrayTypes, column_index_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) +-join_expr= +-Literal(type=BOOL, value=true) == diff --git a/zetasql/analyzer/testdata/join_parenthesized.test b/zetasql/analyzer/testdata/join_parenthesized.test index 64b11642a..68dbf9a89 100644 --- a/zetasql/analyzer/testdata/join_parenthesized.test +++ b/zetasql/analyzer/testdata/join_parenthesized.test @@ -133,9 +133,10 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | +-has_using=TRUE | +-right_scan= | | +-JoinScan | | +-column_list=KeyValue.[Key#5, Value#6, Key#7, Value#8] @@ -145,13 +146,15 @@ QueryStmt | | +-right_scan= | | | +-TableScan(column_list=KeyValue.[Key#7, Value#8], table=KeyValue, column_index_list=[0, 1], alias="kv4") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | | +-has_using=TRUE | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#3) - | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | +-has_using=TRUE +-filter_expr= +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) @@ -188,15 +191,17 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=KeyValue2.[Key#3, Value2#4], table=KeyValue2, column_index_list=[0, 1]) | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | +-ColumnRef(type=INT64, column=KeyValue2.Key#3) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | +-ColumnRef(type=INT64, column=KeyValue2.Key#3) + | +-has_using=TRUE +-right_scan= | +-TableScan(column_list=KeyValue.[Key#5, Value#6], table=KeyValue, column_index_list=[0, 1], alias="kv3") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + +-has_using=TRUE == select 1 @@ -343,9 +348,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[KeyValue.Value#7], table=KeyValue, column_index_list=[1], alias="C") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#5) - +-ColumnRef(type=STRING, column=KeyValue.Value#7) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#5) + | +-ColumnRef(type=STRING, column=KeyValue.Value#7) + +-has_using=TRUE == # Fix the previous query using parentheses. @@ -374,6 +380,7 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[KeyValue.Key#6], table=KeyValue, column_index_list=[0], alias="C") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#4) - +-ColumnRef(type=INT64, column=KeyValue.Key#6) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#4) + | +-ColumnRef(type=INT64, column=KeyValue.Key#6) + +-has_using=TRUE diff --git a/zetasql/analyzer/testdata/join_using.test b/zetasql/analyzer/testdata/join_using.test index 4dbd65d89..d502b5bea 100644 --- a/zetasql/analyzer/testdata/join_using.test +++ b/zetasql/analyzer/testdata/join_using.test @@ -17,9 +17,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#2) - +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + +-has_using=TRUE == select * @@ -42,9 +43,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#2) - +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + +-has_using=TRUE == # Things to note, and compare to other versions: @@ -71,9 +73,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#2) - +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + +-has_using=TRUE == # FULL JOIN with USING returns a value column made from the @@ -106,9 +109,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#2) - +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + +-has_using=TRUE == select kv1.*, '---', kv2.* @@ -135,9 +139,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=KeyValue.Value#2) - +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + +-has_using=TRUE == select key @@ -187,9 +192,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-has_using=TRUE +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) +-ColumnRef(type=INT64, column=KeyValue.Key#3) @@ -213,9 +219,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + +-has_using=TRUE == # `key` is the column from the left with LEFT JOIN USING (key). @@ -236,9 +243,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + +-has_using=TRUE == select key from KeyValue kv1 full join KeyValue kv2 using (key) @@ -266,9 +274,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + +-has_using=TRUE == # Note that {key, kv1.key, kv2.key} produce three distinct output columns here. @@ -315,13 +324,14 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=KeyValue.Value#2) - | +-ColumnRef(type=STRING, column=KeyValue.Value#4) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + +-has_using=TRUE == # Note that the COALESCE generated for FULL JOIN USING includes the implicit @@ -367,10 +377,11 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=s1.key#1) - +-ColumnRef(type=INT64, column=s2.key#2) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=s1.key#1) + | +-ColumnRef(type=INT64, column=s2.key#2) + +-has_using=TRUE == select 1 from KeyValue join TestTable using (value) @@ -407,10 +418,11 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[TestTable.key#3], table=TestTable, column_index_list=[0]) +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-Cast(INT32 -> INT64) - +-ColumnRef(type=INT32, column=TestTable.key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=TestTable.key#3) + +-has_using=TRUE == select 1 @@ -458,20 +470,21 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=SimpleTypes.[uint32#21, double#27], table=SimpleTypes, column_index_list=[2, 8]) +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) - +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) - +-Cast(FLOAT -> DOUBLE) - | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) - +-ColumnRef(type=DOUBLE, column=SimpleTypes.double#27) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) + | +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) + | +-Cast(FLOAT -> DOUBLE) + | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) + | +-ColumnRef(type=DOUBLE, column=SimpleTypes.double#27) + +-has_using=TRUE -- ALTERNATION GROUP: left -- @@ -498,20 +511,21 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=SimpleTypes.[uint32#21, double#27], table=SimpleTypes, column_index_list=[2, 8]) +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) - +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) - +-Cast(FLOAT -> DOUBLE) - | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) - +-ColumnRef(type=DOUBLE, column=SimpleTypes.double#27) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) + | +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) + | +-Cast(FLOAT -> DOUBLE) + | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) + | +-ColumnRef(type=DOUBLE, column=SimpleTypes.double#27) + +-has_using=TRUE -- ALTERNATION GROUP: right -- @@ -538,20 +552,21 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=SimpleTypes.[uint32#21, double#27], table=SimpleTypes, column_index_list=[2, 8]) +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) - +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) - +-Cast(FLOAT -> DOUBLE) - | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) - +-ColumnRef(type=DOUBLE, column=SimpleTypes.double#27) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=SimpleTypes.uint32#21) + | +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) + | +-Cast(FLOAT -> DOUBLE) + | | +-ColumnRef(type=FLOAT, column=SimpleTypes.float#8) + | +-ColumnRef(type=DOUBLE, column=SimpleTypes.double#27) + +-has_using=TRUE == select * @@ -599,16 +614,17 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=q1.a3#3) - | +-ColumnRef(type=INT64, column=q2.a3#5) - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=q1.a1#1) - | +-ColumnRef(type=STRING, column=q2.a1#8) - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - +-ColumnRef(type=STRING, column=q1.a4#4) - +-ColumnRef(type=STRING, column=q2.a4#10) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=q1.a3#3) + | | +-ColumnRef(type=INT64, column=q2.a3#5) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=q1.a1#1) + | | +-ColumnRef(type=STRING, column=q2.a1#8) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | +-ColumnRef(type=STRING, column=q1.a4#4) + | +-ColumnRef(type=STRING, column=q2.a4#10) + +-has_using=TRUE == # TODO: Should produce an error message that more explicitly calls @@ -698,15 +714,17 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-has_using=TRUE +-right_scan= | +-TableScan(column_list=KeyValue.[Key#5, Value#6], table=KeyValue, column_index_list=[0, 1], alias="kv3") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + +-has_using=TRUE == # `key` is the column from the third join since the inner joins take @@ -742,21 +760,24 @@ QueryStmt | | | +-right_scan= | | | | +-TableScan(column_list=[KeyValue.Key#3], table=KeyValue, column_index_list=[0], alias="kv2") | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | | | +-has_using=TRUE | | +-right_scan= | | | +-TableScan(column_list=[KeyValue.Key#5], table=KeyValue, column_index_list=[0], alias="kv3") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | | +-has_using=TRUE | +-right_scan= | | +-TableScan(column_list=[KeyValue.Key#7], table=KeyValue, column_index_list=[0], alias="kv4") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#5) - | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | +-has_using=TRUE +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) +-ColumnRef(type=INT64, column=KeyValue.Key#5) @@ -792,19 +813,20 @@ QueryStmt +-right_scan= | +-TableScan(column_list=KeyValue.[Key#3, Value#4], table=KeyValue, column_index_list=[0, 1], alias="kv2") +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(3) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | +-ColumnRef(type=INT64, column=KeyValue.Key#3) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#1) - | +-ColumnRef(type=INT64, column=KeyValue.Key#3) - +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) - | +-ColumnRef(type=STRING, column=KeyValue.Value#2) - | +-ColumnRef(type=STRING, column=KeyValue.Value#4) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=KeyValue.Key#1) - +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(3) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + | +-FunctionCall(ZetaSQL:$equal(STRING, STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=KeyValue.Value#2) + | | +-ColumnRef(type=STRING, column=KeyValue.Value#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-ColumnRef(type=INT64, column=KeyValue.Key#3) + +-has_using=TRUE == # We resolve `key` on the lhs of the join, opaquely, and it resolves to a scan. @@ -920,10 +942,11 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[TestExtraValueTable.value#3], table=TestExtraValueTable, column_index_list=[0], alias="rhs") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=lhs.int32_val1#2) - +-Cast(INT32 -> INT64) - +-ColumnRef(type=INT32, column=$join_right.int32_val1#6) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=lhs.int32_val1#2) + | +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=$join_right.int32_val1#6) + +-has_using=TRUE == # JOIN USING between two proto value tables. @@ -1000,16 +1023,17 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[KitchenSinkValueTable.value#2], table=KitchenSinkValueTable, column_index_list=[0], alias="k2") +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$join_left.int64_key_1#3) - | +-ColumnRef(type=INT64, column=$join_right.int64_key_1#4) - +-FunctionCall(ZetaSQL:$equal(DATE, DATE) -> BOOL) - | +-ColumnRef(type=DATE, column=$join_left.date#5) - | +-ColumnRef(type=DATE, column=$join_right.date#6) - +-FunctionCall(ZetaSQL:$equal(BOOL, BOOL) -> BOOL) - +-ColumnRef(type=BOOL, column=$join_left.bool_val#7) - +-ColumnRef(type=BOOL, column=$join_right.bool_val#8) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(2) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$join_left.int64_key_1#3) + | | +-ColumnRef(type=INT64, column=$join_right.int64_key_1#4) + | +-FunctionCall(ZetaSQL:$equal(DATE, DATE) -> BOOL) + | | +-ColumnRef(type=DATE, column=$join_left.date#5) + | | +-ColumnRef(type=DATE, column=$join_right.date#6) + | +-FunctionCall(ZetaSQL:$equal(BOOL, BOOL) -> BOOL) + | +-ColumnRef(type=BOOL, column=$join_left.bool_val#7) + | +-ColumnRef(type=BOOL, column=$join_right.bool_val#8) + +-has_using=TRUE == select 1 @@ -1084,13 +1108,14 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[KitchenSinkValueTable.value#2], table=KitchenSinkValueTable, column_index_list=[0], alias="k2") +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$join_left.int64_key_1#3) - | +-ColumnRef(type=INT64, column=$join_right.int64_key_1#4) - +-FunctionCall(ZetaSQL:$equal(ARRAY, ARRAY) -> BOOL) - +-ColumnRef(type=ARRAY, column=$join_left.repeated_bool_val#5) - +-ColumnRef(type=ARRAY, column=$join_right.repeated_bool_val#6) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$join_left.int64_key_1#3) + | | +-ColumnRef(type=INT64, column=$join_right.int64_key_1#4) + | +-FunctionCall(ZetaSQL:$equal(ARRAY, ARRAY) -> BOOL) + | +-ColumnRef(type=ARRAY, column=$join_left.repeated_bool_val#5) + | +-ColumnRef(type=ARRAY, column=$join_right.repeated_bool_val#6) + +-has_using=TRUE == select 1 @@ -1248,9 +1273,10 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[TestExtraValueTable.value#4], table=TestExtraValueTable, column_index_list=[0], alias="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) - +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) + | +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + +-has_using=TRUE == select int32_val1 @@ -1302,9 +1328,10 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[TestExtraValueTable.value#4], table=TestExtraValueTable, column_index_list=[0], alias="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) - +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) + | +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + +-has_using=TRUE == # The select * produces duplicate columns called int32_val1. @@ -1383,9 +1410,10 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[TestExtraValueTable.value#4], table=TestExtraValueTable, column_index_list=[0], alias="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) - +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) + | +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + +-has_using=TRUE == # The select * (above) only produces the JOIN USING int32_val1 column, so we can @@ -1464,9 +1492,10 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[TestExtraValueTable.value#4], table=TestExtraValueTable, column_index_list=[0], alias="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) - +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) + | +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + +-has_using=TRUE == # As per the specification, SELECT * after JOIN USING returns the USING @@ -1621,9 +1650,10 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[TestExtraValueTable.value#4], table=TestExtraValueTable, column_index_list=[0], alias="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) - +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) + | +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + +-has_using=TRUE == # We can chain an implicit field name through multiple steps of @@ -1673,9 +1703,10 @@ QueryStmt | | +-input_scan= | | +-TableScan(column_list=[TestExtraValueTable.value#4], table=TestExtraValueTable, column_index_list=[0], alias="t2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) - | +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + | | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) + | | +-ColumnRef(type=INT32, column=$join_right.int32_val1#8) + | +-has_using=TRUE +-right_scan= | +-ProjectScan | +-column_list=[TestExtraValueTable.value#9, $join_right.int32_val1#12] @@ -1690,9 +1721,10 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[TestExtraValueTable.value#9], table=TestExtraValueTable, column_index_list=[0], alias="t3") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) - +-ColumnRef(type=INT32, column=$join_right.int32_val1#12) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=$join_left.int32_val1#7) + | +-ColumnRef(type=INT32, column=$join_right.int32_val1#12) + +-has_using=TRUE == # JOIN USING with a scalar value table. @@ -1713,9 +1745,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[Int32ValueTable.value#2], table=Int32ValueTable, column_index_list=[0]) +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) - +-ColumnRef(type=INT32, column=Int32ValueTable.value#2) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) + | +-ColumnRef(type=INT32, column=Int32ValueTable.value#2) + +-has_using=TRUE == # Here, we join an int32 scalar value table to proto value table with an @@ -1750,10 +1783,11 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[KitchenSinkValueTable.value#2], table=KitchenSinkValueTable, column_index_list=[0]) +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) - +-ColumnRef(type=INT64, column=$join_right.int64_key_1#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) + | +-ColumnRef(type=INT64, column=$join_right.int64_key_1#3) + +-has_using=TRUE -- ALTERNATION GROUP: left -- @@ -1783,10 +1817,11 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[KitchenSinkValueTable.value#2], table=KitchenSinkValueTable, column_index_list=[0]) +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) - +-ColumnRef(type=INT64, column=$join_right.int64_key_1#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) + | +-ColumnRef(type=INT64, column=$join_right.int64_key_1#3) + +-has_using=TRUE == select int64_key_1, KitchenSinkValueTable int64_key_1 @@ -1840,10 +1875,11 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) - +-ColumnRef(type=INT64, column=t2.t1#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=Int32ValueTable.value#1) + | +-ColumnRef(type=INT64, column=t2.t1#3) + +-has_using=TRUE == # JOIN USING on UNNESTs, which act like value tables. @@ -1981,9 +2017,10 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$join_left.b#7) - +-ColumnRef(type=INT64, column=$subquery1.b#6) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$join_left.b#7) + | +-ColumnRef(type=INT64, column=$subquery1.b#6) + +-has_using=TRUE == # USING clause with column reference that has different (but @@ -2023,13 +2060,14 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-ColumnRef(type=INT64, column=a.x1#1) - | +-ColumnRef(type=UINT64, column=b.x1#3) - +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) - +-ColumnRef(type=UINT64, column=a.x2#2) - +-ColumnRef(type=INT64, column=b.x2#4) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=a.x1#1) + | | +-ColumnRef(type=UINT64, column=b.x1#3) + | +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) + | +-ColumnRef(type=UINT64, column=a.x2#2) + | +-ColumnRef(type=INT64, column=b.x2#4) + +-has_using=TRUE == # This tests INT64/UINT32, LEFT OUTER join. The output columns are from @@ -2068,15 +2106,16 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-ColumnRef(type=INT64, column=a.x1#1) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=b.x1#3) - +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) - +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=a.x2#2) - +-ColumnRef(type=INT64, column=b.x2#4) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=a.x1#1) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=b.x1#3) + | +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) + | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=a.x2#2) + | +-ColumnRef(type=INT64, column=b.x2#4) + +-has_using=TRUE == # This tests INT32/UINT64, RIGHT OUTER join. The output columns are from @@ -2115,15 +2154,16 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=a.x1#1) - | +-ColumnRef(type=UINT64, column=b.x1#3) - +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) - +-ColumnRef(type=UINT64, column=a.x2#2) - +-Cast(INT32 -> INT64) - +-ColumnRef(type=INT32, column=b.x2#4) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=a.x1#1) + | | +-ColumnRef(type=UINT64, column=b.x1#3) + | +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) + | +-ColumnRef(type=UINT64, column=a.x2#2) + | +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=b.x2#4) + +-has_using=TRUE == # This tests INT32/UINT32, FULL OUTER join. The output columns are the @@ -2178,17 +2218,18 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) - | +-Cast(INT32 -> INT64) - | | +-ColumnRef(type=INT32, column=a.x1#1) - | +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=b.x1#3) - +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) - +-Cast(UINT32 -> UINT64) - | +-ColumnRef(type=UINT32, column=a.x2#2) - +-Cast(INT32 -> INT64) - +-ColumnRef(type=INT32, column=b.x2#4) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, UINT64) -> BOOL) + | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=a.x1#1) + | | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=b.x1#3) + | +-FunctionCall(ZetaSQL:$equal(UINT64, INT64) -> BOOL) + | +-Cast(UINT32 -> UINT64) + | | +-ColumnRef(type=UINT32, column=a.x2#2) + | +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=b.x2#4) + +-has_using=TRUE == # If FULL OUTER join and the USING columns do not have a common supertype, @@ -2255,9 +2296,10 @@ QueryStmt | +-input_scan= | +-TableScan(column_list=[KitchenSinkValueTable.value#2], table=KitchenSinkValueTable, column_index_list=[0], alias="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$join_left.int64_val#3) - +-ColumnRef(type=INT64, column=$join_right.int64_val#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$join_left.int64_val#3) + | +-ColumnRef(type=INT64, column=$join_right.int64_val#4) + +-has_using=TRUE == SELECT t1, '---' as separator1, *, '---' as separator2, t2, t2.t1, t2.t3, t3 @@ -2421,9 +2463,10 @@ QueryStmt | | +-input_scan= | | +-SingleRowScan | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=t1.a#1) - | +-ColumnRef(type=INT64, column=t2.a#3) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=t1.a#1) + | | +-ColumnRef(type=INT64, column=t2.a#3) + | +-has_using=TRUE +-right_scan= | +-ProjectScan | +-column_list=t3.[b#5, d#6] @@ -2433,9 +2476,10 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=t1.b#2) - +-ColumnRef(type=INT64, column=t3.b#5) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=t1.b#2) + | +-ColumnRef(type=INT64, column=t3.b#5) + +-has_using=TRUE == # Same as previous, with value tables, where we have to manage exclusions. @@ -2543,9 +2587,10 @@ QueryStmt | | +-input_scan= | | +-SingleRowScan | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$join_left.a#7) - | +-ColumnRef(type=INT64, column=$join_right.a#8) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$join_left.a#7) + | | +-ColumnRef(type=INT64, column=$join_right.a#8) + | +-has_using=TRUE +-right_scan= | +-ProjectScan | +-column_list=[$make_struct.$struct#11, $join_right.b#13] @@ -2575,9 +2620,10 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$join_left.b#12) - +-ColumnRef(type=INT64, column=$join_right.b#13) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$join_left.b#12) + | +-ColumnRef(type=INT64, column=$join_right.b#13) + +-has_using=TRUE == [language_features={{|V_1_1_ARRAY_EQUALITY}}] @@ -2608,9 +2654,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[ComplexTypes.Int32Array#10], table=ComplexTypes, column_index_list=[3], alias="b") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(ARRAY, ARRAY) -> BOOL) - +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#4) - +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#10) + | +-FunctionCall(ZetaSQL:$equal(ARRAY, ARRAY) -> BOOL) + | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#4) + | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#10) + +-has_using=TRUE == # This is the simplest form of the query from b/63513175, which originally @@ -3830,9 +3877,10 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=t1.a#1) - +-ColumnRef(type=INT64, column=t2.a#2) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=t1.a#1) + | +-ColumnRef(type=INT64, column=t2.a#2) + +-has_using=TRUE == # Similar to the previous test cases, but using subqueries instead of @@ -3873,13 +3921,14 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=t1.a#1) - | +-ColumnRef(type=INT64, column=t2.a#4) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=t1.b#2) - +-ColumnRef(type=INT64, column=t2.b#5) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=t1.a#1) + | | +-ColumnRef(type=INT64, column=t2.a#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=t1.b#2) + | +-ColumnRef(type=INT64, column=t2.b#5) + +-has_using=TRUE == # JOIN USING for array scans where the USING column must be coerced to @@ -4222,10 +4271,11 @@ QueryStmt | | +-Literal(type=INT64, value=4) | +-element_column_list=[$array.t2#2] +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$join_left.a#3) - +-Cast(INT32 -> INT64) - +-ColumnRef(type=INT32, column=$join_right.a#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$join_left.a#3) + | +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=$join_right.a#4) + +-has_using=TRUE -- ALTERNATION GROUP: FULL -- @@ -4320,10 +4370,11 @@ QueryStmt | | +-Literal(type=INT64, value=4) | +-element_column_list=[$array.t2#2] +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$join_left.a#3) - +-Cast(INT32 -> INT64) - +-ColumnRef(type=INT32, column=$join_right.a#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$join_left.a#3) + | +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=$join_right.a#4) + +-has_using=TRUE == # Same as the previous but reversing where the coercion occurs. @@ -4497,10 +4548,11 @@ QueryStmt | | +-Literal(type=INT64, value=4) | +-element_column_list=[$array.t2#2] +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=$join_left.a#3) - +-ColumnRef(type=INT64, column=$join_right.a#4) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=$join_left.a#3) + | +-ColumnRef(type=INT64, column=$join_right.a#4) + +-has_using=TRUE -- ALTERNATION GROUP: FULL -- @@ -4595,8 +4647,8 @@ QueryStmt | | +-Literal(type=INT64, value=4) | +-element_column_list=[$array.t2#2] +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=$join_left.a#3) - +-ColumnRef(type=INT64, column=$join_right.a#4) -== + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=$join_left.a#3) + | +-ColumnRef(type=INT64, column=$join_right.a#4) + +-has_using=TRUE diff --git a/zetasql/analyzer/testdata/json.test b/zetasql/analyzer/testdata/json.test index 9bc5b703b..f44790c1e 100644 --- a/zetasql/analyzer/testdata/json.test +++ b/zetasql/analyzer/testdata/json.test @@ -90,25 +90,25 @@ SELECT json_val.field FROM ArrayTypes, UNNEST(JsonArray) AS json_val -- QueryStmt +-output_column_list= -| +-$query.field#19 AS field [JSON] +| +-$query.field#22 AS field [JSON] +-query= +-ProjectScan - +-column_list=[$query.field#19] + +-column_list=[$query.field#22] +-expr_list= - | +-field#19 := + | +-field#22 := | +-GetJsonField | +-type=JSON | +-expr= - | | +-ColumnRef(type=JSON, column=$array.json_val#18) + | | +-ColumnRef(type=JSON, column=$array.json_val#21) | +-field_name="field" +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.JsonArray#17, $array.json_val#18] + +-column_list=[ArrayTypes.JsonArray#17, $array.json_val#21] +-input_scan= | +-TableScan(column_list=[ArrayTypes.JsonArray#17], table=ArrayTypes, column_index_list=[16]) +-array_expr_list= | +-ColumnRef(type=ARRAY, column=ArrayTypes.JsonArray#17) - +-element_column_list=[$array.json_val#18] + +-element_column_list=[$array.json_val#21] == SELECT json_col['field'] FROM JsonTable; diff --git a/zetasql/analyzer/testdata/like_any_some_all.test b/zetasql/analyzer/testdata/like_any_some_all.test index 0a18e0777..aaa6ec18e 100644 --- a/zetasql/analyzer/testdata/like_any_some_all.test +++ b/zetasql/analyzer/testdata/like_any_some_all.test @@ -250,6 +250,7 @@ select true LIKE ANY ('abc') ^ == +[language_features=V_1_3_LIKE_ANY_SOME_ALL] select 'abc' NOT LIKE ANY ('abc') -- QueryStmt @@ -3522,14 +3523,105 @@ QueryStmt [no_java] [language_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] -# Collation is disallowed on the function arguments +# Collation test for like all unnest select string_ci LIKE ALL unnest(array_with_string_ci), from CollatedTable -- -ERROR: Collation is not allowed on argument 1 ("und:ci"). Use COLLATE(arg, '') to remove collation [at 3:13] - string_ci LIKE ALL unnest(array_with_string_ci), - ^ +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-FunctionCall(ZetaSQL:$like_all_array(STRING, ARRAY) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-collation_list=[und:ci] + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#12] + | +-expr_list= + | | +-$col1#12 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#11] + | | +-expr_list= + | | | +-$col1#11 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#9) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#10) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#9, $agg2#10] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#8{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#8{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#9 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#10 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#6, patterns#7] + | +-expr_list= + | | +-input#6 := ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}, is_correlated=TRUE) + | | +-patterns#7 := ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}], is_correlated=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) == [no_java] @@ -3672,79 +3764,2215 @@ ERROR: Collation conflict: "binary" vs. "und:ci". Collation on argument 2 ("bina # TODO: Collation/Annotation specific ZetaSQL analyzer tests are not working as expected while executed in java mode [no_java] [language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] -# Collation is disallowed on the function arguments -# TODO: As per discussion with shaokunz@, we might be able to allow collation on function arguments now +# Collation test for like any unnest SELECT string_ci LIKE ANY UNNEST(array_with_string_ci) FROM COLLATEdTable -- -ERROR: Collation is not allowed on argument 1 ("und:ci"). Use COLLATE(arg, '') to remove collation [at 4:13] - string_ci LIKE ANY UNNEST(array_with_string_ci) - ^ +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-FunctionCall(ZetaSQL:$like_any_array(STRING, ARRAY) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-collation_list=[und:ci] + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#12] + | +-expr_list= + | | +-$col1#12 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#11] + | | +-expr_list= + | | | +-$col1#11 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#9) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#10) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#9, $agg2#10] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#8{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#8{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#9 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#10 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#6, patterns#7] + | +-expr_list= + | | +-input#6 := ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}, is_correlated=TRUE) + | | +-patterns#7 := ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}], is_correlated=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) == # TODO: Collation/Annotation specific ZetaSQL analyzer tests are not working as expected while executed in java mode [no_java] -[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] -# Collation is disallowed on the function arguments -# TODO: As per discussion with shaokunz@, we might be able to allow collation on function arguments now +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Collation for not like all unnest SELECT - string_ci LIKE ALL UNNEST(array_with_string_ci) + string_ci NOT LIKE ALL UNNEST(array_with_string_ci) FROM COLLATEdTable -- -ERROR: Collation is not allowed on argument 1 ("und:ci"). Use COLLATE(arg, '') to remove collation [at 4:13] - string_ci LIKE ALL UNNEST(array_with_string_ci) - ^ +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-FunctionCall(ZetaSQL:$not_like_all_array(STRING, ARRAY) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-collation_list=[und:ci] + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#12] + | +-expr_list= + | | +-$col1#12 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#11] + | | +-expr_list= + | | | +-$col1#11 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#9) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#10) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#9, $agg2#10] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#8{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#8{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#9 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#10 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#6, patterns#7] + | +-expr_list= + | | +-input#6 := ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}, is_correlated=TRUE) + | | +-patterns#7 := ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}], is_correlated=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) == -# TODO: Collation/Annotation specific ZetaSQL analyzer tests are not working as expected while executed in java mode [no_java] -[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] -# Collation is disallowed on the function arguments -# TODO: As per discussion with shaokunz@, we might be able to allow collation on function arguments now -SELECT 'a' LIKE {{ANY|SOME|ALL}} UNNEST(['a', COLLATE('A', 'und:ci')]) --- -ALTERNATION GROUP: ANY --- -ERROR: Collation is not allowed on argument 2 (["und:ci"]) [at 3:12] -SELECT 'a' LIKE ANY UNNEST(['a', COLLATE('A', 'und:ci')]) - ^ --- -ALTERNATION GROUP: SOME --- -ERROR: Collation is not allowed on argument 2 (["und:ci"]) [at 3:12] -SELECT 'a' LIKE SOME UNNEST(['a', COLLATE('A', 'und:ci')]) - ^ --- -ALTERNATION GROUP: ALL +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Collation for not like any unnest +SELECT + string_ci NOT LIKE ANY UNNEST(array_with_string_ci) +FROM COLLATEdTable -- -ERROR: Collation is not allowed on argument 2 (["und:ci"]) [at 3:12] -SELECT 'a' LIKE ALL UNNEST(['a', COLLATE('A', 'und:ci')]) - ^ -== +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-FunctionCall(ZetaSQL:$not_like_any_array(STRING, ARRAY) -> BOOL) + | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-collation_list=[und:ci] + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) -# TODO: Collation/Annotation specific ZetaSQL analyzer tests are not working as expected while executed in java mode -[no_java] -[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] -# Collation is disallowed on the function arguments -# TODO: As per discussion with shaokunz@, we might be able to allow collation on function arguments now +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#5 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#5] + +-expr_list= + | +-$col1#5 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}) + | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}]) + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#12] + | +-expr_list= + | | +-$col1#12 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#11] + | | +-expr_list= + | | | +-$col1#11 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#9) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#10) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#9, $agg2#10] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#8{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#7[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#8{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#9 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#6{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#10 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#8{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#6, patterns#7] + | +-expr_list= + | | +-input#6 := ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=CollatedTable.string_ci#1{Collation:"und:ci"}, is_correlated=TRUE) + | | +-patterns#7 := ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=CollatedTable.array_with_string_ci#4[{Collation:"und:ci"}], is_correlated=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-TableScan(column_list=CollatedTable.[string_ci#1, array_with_string_ci#4], table=CollatedTable, column_index_list=[0, 3]) +== + +# TODO: Collation/Annotation specific ZetaSQL analyzer tests are not working as expected while executed in java mode +[no_java] +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +# Collation for like any/some/all with mix of collated and non collated arguments in the patterns array +SELECT 'a' LIKE {{ANY|SOME|ALL}} UNNEST(['a', COLLATE('A', 'und:ci')]) +-- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$like_any_array(STRING, ARRAY) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | +-type_annotation_map=[{Collation:"und:ci"}] + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | +-type_annotation_map={Collation:"und:ci"} + | +-Literal(type=STRING, value="A") + | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | | +-type_annotation_map=[{Collation:"und:ci"}] + | | +-Literal(type=STRING, value="a") + | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="A") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$like_all_array(STRING, ARRAY) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | +-type_annotation_map=[{Collation:"und:ci"}] + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | +-type_annotation_map={Collation:"und:ci"} + | +-Literal(type=STRING, value="A") + | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | | +-type_annotation_map=[{Collation:"und:ci"}] + | | +-Literal(type=STRING, value="a") + | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="A") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# TODO: Collation/Annotation specific ZetaSQL analyzer tests are not working as expected while executed in java mode +[no_java] +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +# Collation for like any/some/all unnest with collation only present for search value SELECT COLLATE('a', 'und:ci') LIKE {{ANY|SOME|ALL}} UNNEST(['a', 'A']) -- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$like_any_array(STRING, ARRAY) -> BOOL) + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="a") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-Literal(type=ARRAY, value=["a", "A"]) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := + | | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | | +-type_annotation_map={Collation:"und:ci"} + | | | +-Literal(type=STRING, value="a") + | | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | | +-patterns#3 := Literal(type=ARRAY, value=["a", "A"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$like_all_array(STRING, ARRAY) -> BOOL) + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="a") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-Literal(type=ARRAY, value=["a", "A"]) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := + | | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | | +-type_annotation_map={Collation:"und:ci"} + | | | +-Literal(type=STRING, value="a") + | | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | | +-patterns#3 := Literal(type=ARRAY, value=["a", "A"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +[no_java] +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Collation for like any/some/all unnest with mix of collated and non collated arguments in the patterns array +SELECT 'a' NOT LIKE {{ANY|SOME|ALL}} UNNEST(['a', COLLATE('A', 'und:ci')]) +-- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_any_array(STRING, ARRAY) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | +-type_annotation_map=[{Collation:"und:ci"}] + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | +-type_annotation_map={Collation:"und:ci"} + | +-Literal(type=STRING, value="A") + | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | | +-type_annotation_map=[{Collation:"und:ci"}] + | | +-Literal(type=STRING, value="a") + | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="A") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_all_array(STRING, ARRAY) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | +-type_annotation_map=[{Collation:"und:ci"}] + | +-Literal(type=STRING, value="a") + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | +-type_annotation_map={Collation:"und:ci"} + | +-Literal(type=STRING, value="A") + | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}]) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4{Collation:"und:ci"}] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, type_annotation_map=[{Collation:"und:ci"}], column=$subquery1.patterns#3[{Collation:"und:ci"}], is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4{Collation:"und:ci"}] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$array.pattern#4{Collation:"und:ci"}) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | | +-type_annotation_map=[{Collation:"und:ci"}] + | | +-Literal(type=STRING, value="a") + | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="A") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +[no_java] +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Collation for 'not like any/some/all' with arrays +SELECT COLLATE('a', 'und:ci') NOT LIKE {{ANY|SOME|ALL}} UNNEST(['a', 'A']) +-- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_any_array(STRING, ARRAY) -> BOOL) + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="a") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-Literal(type=ARRAY, value=["a", "A"]) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := + | | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | | +-type_annotation_map={Collation:"und:ci"} + | | | +-Literal(type=STRING, value="a") + | | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | | +-patterns#3 := Literal(type=ARRAY, value=["a", "A"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_all_array(STRING, ARRAY) -> BOOL) + | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | +-type_annotation_map={Collation:"und:ci"} + | | +-Literal(type=STRING, value="a") + | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | +-Literal(type=ARRAY, value=["a", "A"]) + | +-collation_list=[und:ci] + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, type_annotation_map={Collation:"und:ci"}, column=$subquery1.input#2{Collation:"und:ci"}, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | | +-collation_list=[und:ci] + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := + | | | +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | | | +-type_annotation_map={Collation:"und:ci"} + | | | +-Literal(type=STRING, value="a") + | | | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + | | +-patterns#3 := Literal(type=ARRAY, value=["a", "A"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Test NOT LIKE ANY|SOME|ALL with string arguments +select 'a' NOT LIKE {{ANY|SOME|ALL}} ('a', 'b'); +-- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_any(STRING, repeated(2) STRING) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-Literal(type=STRING, value="a") + | +-Literal(type=STRING, value="b") + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | | +-Literal(type=STRING, value="a") + | | +-Literal(type=STRING, value="b") + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_all(STRING, repeated(2) STRING) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-Literal(type=STRING, value="a") + | +-Literal(type=STRING, value="b") + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + | | +-Literal(type=STRING, value="a") + | | +-Literal(type=STRING, value="b") + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + +== + +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Test NOT LIKE ANY|SOME|ALL with binary arguments +select b'a' NOT LIKE {{ANY|SOME|ALL}} (b'a', b'b'); +-- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_any(BYTES, repeated(2) BYTES) -> BOOL) + | +-Literal(type=BYTES, value=b"a") + | +-Literal(type=BYTES, value=b"a") + | +-Literal(type=BYTES, value=b"b") + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(BYTES, BYTES) -> BOOL) + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=BYTES, value=b"a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) BYTES) -> ARRAY) + | | +-Literal(type=BYTES, value=b"a") + | | +-Literal(type=BYTES, value=b"b") + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_all(BYTES, repeated(2) BYTES) -> BOOL) + | +-Literal(type=BYTES, value=b"a") + | +-Literal(type=BYTES, value=b"a") + | +-Literal(type=BYTES, value=b"b") + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(BYTES, BYTES) -> BOOL) + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=BYTES, value=b"a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) BYTES) -> ARRAY) + | | +-Literal(type=BYTES, value=b"a") + | | +-Literal(type=BYTES, value=b"b") + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + +== + +[language_features=V_1_3_LIKE_ANY_SOME_ALL] +# Test NOT (LIKE ANY|SOME|ALL) to verify it is differently interpreted than (NOT LIKE ANY|SOME|ALL) +select NOT ('a' LIKE {{ANY|SOME|ALL}} ('b')); +-- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$like_any(STRING, repeated(1) STRING) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-Literal(type=STRING, value="b") + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(1) STRING) -> ARRAY) + | | +-Literal(type=STRING, value="b") + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$like_all(STRING, repeated(1) STRING) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-Literal(type=STRING, value="b") + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(1) STRING) -> ARRAY) + | | +-Literal(type=STRING, value="b") + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + +== + +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Test NOT LIKE ANY|SOME|ALL with array of string arguments +select 'a' NOT LIKE {{ANY|SOME|ALL}} UNNEST(['a','b','c']); +-- +ALTERNATION GROUPS: + ANY + SOME +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_any_array(STRING, ARRAY) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-Literal(type=ARRAY, value=["a", "b", "c"]) + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := Literal(type=ARRAY, value=["a", "b", "c"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_all_array(STRING, ARRAY) -> BOOL) + | +-Literal(type=STRING, value="a") + | +-Literal(type=ARRAY, value=["a", "b", "c"]) + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(STRING, STRING) -> BOOL) + | | | +-ColumnRef(type=STRING, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(STRING) -> BOOL) + | | +-ColumnRef(type=STRING, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=STRING, value="a") + | | +-patterns#3 := Literal(type=ARRAY, value=["a", "b", "c"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + +== + +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Test NOT LIKE ANY|SOME|ALL with array of binary arguments +select b'a' NOT LIKE {{ANY|SOME|ALL}} UNNEST([b'a',b'b',b'c']); +-- +ALTERNATION GROUPS: + ANY + SOME + +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_any_array(BYTES, ARRAY) -> BOOL) + | +-Literal(type=BYTES, value=b"a") + | +-Literal(type=ARRAY, value=[b"a", b"b", b"c"]) + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=false) + | | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=true) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=false) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(BYTES, BYTES) -> BOOL) + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=BYTES, value=b"a") + | | +-patterns#3 := Literal(type=ARRAY, value=[b"a", b"b", b"c"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + +-- +ALTERNATION GROUP: ALL +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$not_like_all_array(BYTES, ARRAY) -> BOOL) + | +-Literal(type=BYTES, value=b"a") + | +-Literal(type=ARRAY, value=[b"a", b"b", b"c"]) + +-input_scan= + +-SingleRowScan + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#8] + | +-expr_list= + | | +-$col1#8 := + | | +-SubqueryExpr + | | +-type=BOOL + | | +-subquery_type=SCALAR + | | +-parameter_list= + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2) + | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#7] + | | +-expr_list= + | | | +-$col1#7 := + | | | +-FunctionCall(ZetaSQL:$case_no_value(repeated(4) BOOL, repeated(4) BOOL, BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$or(BOOL, repeated(1) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | | +-Literal(type=INT64, value=0) + | | | +-Literal(type=BOOL, value=true) + | | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg1#5) + | | | +-Literal(type=BOOL, value=false) + | | | +-ColumnRef(type=BOOL, column=$aggregate.$agg2#6) + | | | +-Literal(type=BOOL, value=NULL) + | | | +-Literal(type=BOOL, value=true) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#5, $agg2#6] + | | +-input_scan= + | | | +-ArrayScan + | | | +-column_list=[$array.pattern#4] + | | | +-array_expr_list= + | | | | +-ColumnRef(type=ARRAY, column=$subquery1.patterns#3, is_correlated=TRUE) + | | | +-element_column_list=[$array.pattern#4] + | | +-aggregate_list= + | | +-$agg1#5 := + | | | +-AggregateFunctionCall(ZetaSQL:logical_and(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$not(BOOL) -> BOOL) + | | | +-FunctionCall(ZetaSQL:$like(BYTES, BYTES) -> BOOL) + | | | +-ColumnRef(type=BYTES, column=$subquery1.input#2, is_correlated=TRUE) + | | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | | +-$agg2#6 := + | | +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + | | +-FunctionCall(ZetaSQL:$is_null(BYTES) -> BOOL) + | | +-ColumnRef(type=BYTES, column=$array.pattern#4) + | +-input_scan= + | +-ProjectScan + | +-column_list=$subquery1.[input#2, patterns#3] + | +-expr_list= + | | +-input#2 := Literal(type=BYTES, value=b"a") + | | +-patterns#3 := Literal(type=ARRAY, value=[b"a", b"b", b"c"]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan + + +== + +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Test NOT LIKE ANY|SOME|ALL with type mismatch on lhs and rhs +select b'a' NOT LIKE {{ANY|SOME|ALL}} ('a', 'x', 'y') +-- +ALTERNATION GROUP: ANY +-- +ERROR: No matching signature for operator LIKE ANY|SOME for argument types literal BYTES and {STRING}. STRING and BYTES are different types that are not directly comparable. To write a BYTES literal, use a b-prefixed literal such as b'bytes value' [at 2:17] +select b'a' NOT LIKE ANY ('a', 'x', 'y') + ^ +-- +ALTERNATION GROUP: SOME +-- +ERROR: No matching signature for operator LIKE ANY|SOME for argument types literal BYTES and {STRING}. STRING and BYTES are different types that are not directly comparable. To write a BYTES literal, use a b-prefixed literal such as b'bytes value' [at 2:17] +select b'a' NOT LIKE SOME ('a', 'x', 'y') + ^ +-- +ALTERNATION GROUP: ALL +-- +ERROR: No matching signature for operator NOT LIKE ALL for argument types literal BYTES and {STRING}. STRING and BYTES are different types that are not directly comparable. To write a BYTES literal, use a b-prefixed literal such as b'bytes value' [at 2:17] +select b'a' NOT LIKE ALL ('a', 'x', 'y') + ^ + +== + +[language_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +# Test NOT LIKE ANY|SOME|ALL with type mismatch in rhs list +select 'a' NOT LIKE {{ANY|SOME|ALL}} ('a', b'x', 'y') +-- ALTERNATION GROUP: ANY -- -ERROR: Collation is not allowed on argument 1 ("und:ci"). Use COLLATE(arg, '') to remove collation [at 3:31] -SELECT COLLATE('a', 'und:ci') LIKE ANY UNNEST(['a', 'A']) - ^ +ERROR: No matching signature for operator LIKE ANY|SOME for argument types literal STRING and {STRING, BYTES}. STRING and BYTES are different types that are not directly comparable. To write a BYTES literal, use a b-prefixed literal such as b'bytes value' [at 2:16] +select 'a' NOT LIKE ANY ('a', b'x', 'y') + ^ -- ALTERNATION GROUP: SOME -- -ERROR: Collation is not allowed on argument 1 ("und:ci"). Use COLLATE(arg, '') to remove collation [at 3:31] -SELECT COLLATE('a', 'und:ci') LIKE SOME UNNEST(['a', 'A']) - ^ +ERROR: No matching signature for operator LIKE ANY|SOME for argument types literal STRING and {STRING, BYTES}. STRING and BYTES are different types that are not directly comparable. To write a BYTES literal, use a b-prefixed literal such as b'bytes value' [at 2:16] +select 'a' NOT LIKE SOME ('a', b'x', 'y') + ^ -- ALTERNATION GROUP: ALL -- -ERROR: Collation is not allowed on argument 1 ("und:ci"). Use COLLATE(arg, '') to remove collation [at 3:31] -SELECT COLLATE('a', 'und:ci') LIKE ALL UNNEST(['a', 'A']) - ^ +ERROR: No matching signature for operator NOT LIKE ALL for argument types literal STRING and {STRING, BYTES}. STRING and BYTES are different types that are not directly comparable. To write a BYTES literal, use a b-prefixed literal such as b'bytes value' [at 2:16] +select 'a' NOT LIKE ALL ('a', b'x', 'y') + ^ diff --git a/zetasql/analyzer/testdata/limit.test b/zetasql/analyzer/testdata/limit.test index 66278f33c..c40c91b47 100644 --- a/zetasql/analyzer/testdata/limit.test +++ b/zetasql/analyzer/testdata/limit.test @@ -75,24 +75,87 @@ select key from KeyValue order by 1 offset 5; ^ == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT Key, Value FROM KeyValue LIMIT -1; -- -ERROR: Syntax error: Unexpected "-" [at 3:7] +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-KeyValue.Key#1 AS Key [INT64] +| +-KeyValue.Value#2 AS Value [STRING] ++-query= + +-LimitOffsetScan + +-column_list=KeyValue.[Key#1, Value#2] + +-input_scan= + | +-ProjectScan + | +-column_list=KeyValue.[Key#1, Value#2] + | +-input_scan= + | +-TableScan(column_list=KeyValue.[Key#1, Value#2], table=KeyValue, column_index_list=[0, 1]) + +-limit= + +-Literal(type=INT64, value=-1) +-- +ALTERNATION GROUP: +-- +ERROR: LIMIT expects a non-negative integer literal or parameter [at 3:7] LIMIT -1; ^ == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT Key, Value FROM KeyValue LIMIT 0 OFFSET -1; -- -ERROR: Syntax error: Unexpected "-" [at 3:16] +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-KeyValue.Key#1 AS Key [INT64] +| +-KeyValue.Value#2 AS Value [STRING] ++-query= + +-LimitOffsetScan + +-column_list=KeyValue.[Key#1, Value#2] + +-input_scan= + | +-ProjectScan + | +-column_list=KeyValue.[Key#1, Value#2] + | +-input_scan= + | +-TableScan(column_list=KeyValue.[Key#1, Value#2], table=KeyValue, column_index_list=[0, 1]) + +-limit= + | +-Literal(type=INT64, value=0) + +-offset= + +-Literal(type=INT64, value=-1) +-- +ALTERNATION GROUP: +-- +ERROR: OFFSET expects a non-negative integer literal or parameter [at 3:16] LIMIT 0 OFFSET -1; ^ == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +SELECT Key, Value +FROM KeyValue +LIMIT 9223372036854775808; +-- +ERROR: Could not cast literal 9223372036854775808 to type INT64 [at 3:7] +LIMIT 9223372036854775808; + ^ +== + +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +SELECT Key, Value +FROM KeyValue +LIMIT 1 OFFSET 9223372036854775808; +-- +ERROR: Could not cast literal 9223372036854775808 to type INT64 [at 3:16] +LIMIT 1 OFFSET 9223372036854775808; + ^ +== + +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 LIMIT 1 OFFSET 1; -- QueryStmt @@ -114,6 +177,7 @@ QueryStmt +-Literal(type=INT64, value=1) == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 LIMIT @test_param_int32; -- QueryStmt @@ -134,6 +198,7 @@ QueryStmt +-Parameter(type=INT32, name="test_param_int32") == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 LIMIT @test_param_int64; -- QueryStmt @@ -153,6 +218,7 @@ QueryStmt +-Parameter(type=INT64, name="test_param_int64") == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 LIMIT @test_param_uint32; -- QueryStmt @@ -173,7 +239,16 @@ QueryStmt +-Parameter(type=UINT32, name="test_param_uint32") == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +SELECT 1 LIMIT @test_param_uint64; +-- +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +ERROR: LIMIT ... OFFSET ... expects INT64, got UINT64 [at 1:16] SELECT 1 LIMIT @test_param_uint64; + ^ +-- +ALTERNATION GROUP: -- QueryStmt +-output_column_list= @@ -193,6 +268,7 @@ QueryStmt +-Parameter(type=UINT64, name="test_param_uint64") == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 LIMIT 1 OFFSET @test_param_int32; -- QueryStmt @@ -215,6 +291,7 @@ QueryStmt +-Parameter(type=INT32, name="test_param_int32") == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 LIMIT 1 OFFSET @test_param_int64; -- QueryStmt @@ -236,6 +313,7 @@ QueryStmt +-Parameter(type=INT64, name="test_param_int64") == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] SELECT 1 LIMIT 1 OFFSET @test_param_uint32; -- QueryStmt @@ -258,7 +336,16 @@ QueryStmt +-Parameter(type=UINT32, name="test_param_uint32") == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +SELECT 1 LIMIT 1 OFFSET @test_param_uint64; +-- +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +ERROR: LIMIT ... OFFSET ... expects INT64, got UINT64 [at 1:25] SELECT 1 LIMIT 1 OFFSET @test_param_uint64; + ^ +-- +ALTERNATION GROUP: -- QueryStmt +-output_column_list= @@ -330,13 +417,244 @@ select 1 limit cast(@test_param_int64 as int32) ^ == -# Casting of other expressions does not work. -select 1 limit cast(`key` as int64) -from KeyValue +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +# Expressions may or may not be allowed as arguments to LIMIT/OFFSET depending +# on V_1_4_LIMIT_OFFSET_EXPRESSIONS. +select 1 limit mod(1, 10) +-- +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-LimitOffsetScan + +-column_list=[$query.$col1#1] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.$col1#1] + | +-expr_list= + | | +-$col1#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-limit= + +-FunctionCall(ZetaSQL:mod(INT64, INT64) -> INT64) + +-Literal(type=INT64, value=1) + +-Literal(type=INT64, value=10) +-- +ALTERNATION GROUP: +-- +ERROR: LIMIT expects an integer literal or parameter [at 3:16] +select 1 limit mod(1, 10) + ^ +== + +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +select 1 limit 1 offset mod(1, 10) +-- +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-LimitOffsetScan + +-column_list=[$query.$col1#1] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.$col1#1] + | +-expr_list= + | | +-$col1#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-limit= + | +-Literal(type=INT64, value=1) + +-offset= + +-FunctionCall(ZetaSQL:mod(INT64, INT64) -> INT64) + +-Literal(type=INT64, value=1) + +-Literal(type=INT64, value=10) +-- +ALTERNATION GROUP: +-- +ERROR: OFFSET expects an integer literal or parameter [at 1:25] +select 1 limit 1 offset mod(1, 10) + ^ +== + +# Null arguments to LIMIT & OFFSET are not allowed. +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +select 1 limit null +-- +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-LimitOffsetScan + +-column_list=[$query.$col1#1] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.$col1#1] + | +-expr_list= + | | +-$col1#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-limit= + +-Literal(type=INT64, value=NULL) +-- +ALTERNATION GROUP: +-- +ERROR: LIMIT must not be null [at 1:16] +select 1 limit null + ^ +== + +# Null arguments to LIMIT & OFFSET are not allowed. +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] +select 1 limit null +-- +ALTERNATION GROUP: V_1_4_LIMIT_OFFSET_EXPRESSIONS +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-LimitOffsetScan + +-column_list=[$query.$col1#1] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.$col1#1] + | +-expr_list= + | | +-$col1#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-limit= + +-Literal(type=INT64, value=NULL) +-- +ALTERNATION GROUP: +-- +ERROR: LIMIT must not be null [at 1:16] +select 1 limit null + ^ +== + +# LIMIT & OFFSET can take expressions that reference query parameters. +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select 1 limit mod(@test_param_int64, 10) +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-LimitOffsetScan + +-column_list=[$query.$col1#1] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.$col1#1] + | +-expr_list= + | | +-$col1#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-limit= + +-FunctionCall(ZetaSQL:mod(INT64, INT64) -> INT64) + +-Parameter(type=INT64, name="test_param_int64") + +-Literal(type=INT64, value=10) +== + +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select 1 limit 1 offset mod(@test_param_int64, 10) +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-LimitOffsetScan + +-column_list=[$query.$col1#1] + +-input_scan= + | +-ProjectScan + | +-column_list=[$query.$col1#1] + | +-expr_list= + | | +-$col1#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-limit= + | +-Literal(type=INT64, value=1) + +-offset= + +-FunctionCall(ZetaSQL:mod(INT64, INT64) -> INT64) + +-Parameter(type=INT64, name="test_param_int64") + +-Literal(type=INT64, value=10) +== + +# LIMIT & OFFSET cannot use the current name scope. +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select Key, Value from KeyValue limit max(Key) +-- +ERROR: Unrecognized name: Key [at 1:43] +select Key, Value from KeyValue limit max(Key) + ^ +== + +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select Key, Value from KeyValue limit 1 offset max(Key) +-- +ERROR: Unrecognized name: Key [at 1:52] +select Key, Value from KeyValue limit 1 offset max(Key) + ^ +== + +# LIMIT & OFFSET can reference the parent name scope, but will still fail +# because a correlated reference is not constant. +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +WITH sub AS (SELECT 1 AS lim) +SELECT lim FROM sub WHERE lim = (SELECT 1 LIMIT sub.lim) +-- +ERROR: LIMIT expression must be constant [at 2:49] +SELECT lim FROM sub WHERE lim = (SELECT 1 LIMIT sub.lim) + ^ +== + +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +WITH sub AS (SELECT 0 AS lim) +SELECT lim FROM sub WHERE lim = (SELECT 0 LIMIT 1 OFFSET sub.lim) +-- +ERROR: OFFSET expression must be constant [at 2:58] +SELECT lim FROM sub WHERE lim = (SELECT 0 LIMIT 1 OFFSET sub.lim) + ^ +== + +# LIMIT & OFFSET cannot take correlated expression subqueries. +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select Key, Value from KeyValue limit (select max(Key)) +-- +ERROR: Unrecognized name: Key [at 1:51] +select Key, Value from KeyValue limit (select max(Key)) + ^ +== + +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select Key, Value from KeyValue limit 1 offset (select max(Key)) +-- +ERROR: Unrecognized name: Key [at 1:60] +select Key, Value from KeyValue limit 1 offset (select max(Key)) + ^ +== + +# LIMIT & OFFSET expressions must be INT64. +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select 1 limit cast(mod(1, 10) as uint64) +-- +ERROR: LIMIT ... OFFSET ... expects INT64, got UINT64 [at 1:16] +select 1 limit cast(mod(1, 10) as uint64) + ^ +== + +[language_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +select 1 limit 1 offset cast(mod(1, 10) as uint64) -- -ERROR: Syntax error: Expected "@" or "@@" or integer literal but got identifier `key` [at 1:21] -select 1 limit cast(`key` as int64) - ^ +ERROR: LIMIT ... OFFSET ... expects INT64, got UINT64 [at 1:25] +select 1 limit 1 offset cast(mod(1, 10) as uint64) + ^ == SELECT * @@ -498,15 +816,20 @@ select key from KeyValue order by 1 offset 5; ^ == +[language_features={{V_1_4_LIMIT_OFFSET_EXPRESSIONS|}}] select key FROM KeyValue ORDER BY 1 {{LIMIT|LIMIT 1 OFFSET}} 18446744073709551615; -- -ALTERNATION GROUP: LIMIT +ALTERNATION GROUPS: + V_1_4_LIMIT_OFFSET_EXPRESSIONS,LIMIT + LIMIT -- ERROR: Could not cast literal 18446744073709551615 to type INT64 [at 1:43] select key FROM KeyValue ORDER BY 1 LIMIT 18446744073709551615; ^ -- -ALTERNATION GROUP: LIMIT 1 OFFSET +ALTERNATION GROUPS: + V_1_4_LIMIT_OFFSET_EXPRESSIONS,LIMIT 1 OFFSET + LIMIT 1 OFFSET -- ERROR: Could not cast literal 18446744073709551615 to type INT64 [at 1:52] select key FROM KeyValue ORDER BY 1 LIMIT 1 OFFSET 18446744073709551615; diff --git a/zetasql/analyzer/testdata/literals.test b/zetasql/analyzer/testdata/literals.test index 8e1f0d6ad..9bd39caaf 100644 --- a/zetasql/analyzer/testdata/literals.test +++ b/zetasql/analyzer/testdata/literals.test @@ -2086,3 +2086,60 @@ QueryStmt | +-$col1#1 := Literal(type=INTERVAL, value=1-2 3 4:5:6.789) +-input_scan= +-SingleRowScan +== + +[show_unparsed] +[language_features={{|V_1_4_LITERAL_CONCATENATION}}] +select 'x' "y" +-- + +ALTERNATION GROUP: +-- +ERROR: Concatenation of subsequent string literals is not supported. Did you mean to use the || operator? [at 1:12] +select 'x' "y" + ^ +-- +ALTERNATION GROUP: V_1_4_LITERAL_CONCATENATION +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRING] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRING, value="xy") + +-input_scan= + +-SingleRowScan + +[UNPARSED_SQL] +SELECT + "xy" AS a_1; +== + +[show_unparsed] +[language_features={{|V_1_4_LITERAL_CONCATENATION}}] +select b'x' b"y" +-- +ALTERNATION GROUP: +-- +ERROR: Concatenation of subsequent bytes literals is not supported. Did you mean to use the || operator? [at 1:13] +select b'x' b"y" + ^ +-- +ALTERNATION GROUP: V_1_4_LITERAL_CONCATENATION +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [BYTES] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=BYTES, value=b"xy") + +-input_scan= + +-SingleRowScan + +[UNPARSED_SQL] +SELECT + b"xy" AS a_1; diff --git a/zetasql/analyzer/testdata/map_functions.test b/zetasql/analyzer/testdata/map_functions.test new file mode 100644 index 000000000..8fca29ea7 --- /dev/null +++ b/zetasql/analyzer/testdata/map_functions.test @@ -0,0 +1,177 @@ +# Without language feature V_1_4_MAP_TYPE, map functions are not defined. +SELECT MAP_FROM_ARRAY([("a", 1), ("b", 2)]); +-- +ERROR: Function not found: MAP_FROM_ARRAY [at 1:8] +SELECT MAP_FROM_ARRAY([("a", 1), ("b", 2)]); + ^ +== + +[default language_features=V_1_4_MAP_TYPE] + +SELECT MAP_FROM_ARRAY([("a", 1), ("b", 2)]); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [MAP] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:map_from_array(ARRAY>) -> MAP) + | +-Literal(type=ARRAY>, value=[{"a", 1}, {"b", 2}]) + +-input_scan= + +-SingleRowScan +== + +SELECT MAP_FROM_ARRAY([STRUCT(1 AS foo, 2 AS bar), STRUCT(2, 4)]); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [MAP] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:map_from_array(ARRAY>) -> MAP) + | +-Literal(type=ARRAY>, value=[{foo:1, bar:2}, {foo:2, bar:4}]) + +-input_scan= + +-SingleRowScan +== + +# Struct field names hold no special meaning; only field order matters. +SELECT MAP_FROM_ARRAY([STRUCT(1 AS value, "a" AS key), STRUCT(2, "b")]); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [MAP] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:map_from_array(ARRAY>) -> MAP) + | +-Literal(type=ARRAY>, value=[{value:1, key:"a"}, {value:2, key:"b"}]) + +-input_scan= + +-SingleRowScan +== + +SELECT MAP_FROM_ARRAY(CAST(NULL AS ARRAY>)); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [MAP] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:map_from_array(ARRAY>) -> MAP) + | +-Literal(type=ARRAY>, value=NULL, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +SELECT MAP_FROM_ARRAY(CAST([] AS ARRAY>)); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [MAP] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:map_from_array(ARRAY>) -> MAP) + | +-Literal(type=ARRAY>, value=[], has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +SELECT MAP_FROM_ARRAY({{|"a"| [("a", "b")], "extra_arg"}}) +-- +ALTERNATION GROUP: +-- +ERROR: No matching signature for function MAP_FROM_ARRAY. Supported signature: MAP_FROM_ARRAY(ARRAY>) [at 1:8] +SELECT MAP_FROM_ARRAY() + ^ +-- +ALTERNATION GROUP: "a" +-- +ERROR: No matching signature for function MAP_FROM_ARRAY for argument types: STRING. Supported signature: MAP_FROM_ARRAY(ARRAY) [at 1:8] +SELECT MAP_FROM_ARRAY("a") + ^ +-- +ALTERNATION GROUP: [("a", "b")], "extra_arg" +-- +ERROR: No matching signature for function MAP_FROM_ARRAY. Supported signature: MAP_FROM_ARRAY(ARRAY>) [at 1:8] +SELECT MAP_FROM_ARRAY( [("a", "b")], "extra_arg") + ^ +== + +SELECT MAP_FROM_ARRAY(NULL); +-- +ERROR: MAP_FROM_ARRAY result type cannot be determined from argument NULL. Consider casting the argument to ARRAY> so that key type T1 and value type T2 can be determined from the argument [at 1:8] +SELECT MAP_FROM_ARRAY(NULL); + ^ +== + +SELECT MAP_FROM_ARRAY([]); +-- +ERROR: MAP_FROM_ARRAY result type cannot be determined from argument []. Consider casting the argument to ARRAY> so that key type T1 and value type T2 can be determined from the argument [at 1:8] +SELECT MAP_FROM_ARRAY([]); + ^ +== + +SELECT MAP_FROM_ARRAY([1,2,3,4]); +-- +ERROR: MAP_FROM_ARRAY input argument must be an array of structs, but got type ARRAY [at 1:8] +SELECT MAP_FROM_ARRAY([1,2,3,4]); + ^ +== + +SELECT MAP_FROM_ARRAY([(true, true, "oops")]); +-- +ERROR: MAP_FROM_ARRAY input array must be of type ARRAY>, but found a struct member with 3 fields [at 1:8] +SELECT MAP_FROM_ARRAY([(true, true, "oops")]); + ^ +== + +# Error for non-groupable key +SELECT MAP_FROM_ARRAY([(new zetasql_test__.EmptyMessage(), true)]); +-- +ERROR: MAP_FROM_ARRAY expected a groupable key, but got a key of type `zetasql_test__.EmptyMessage`, which does not support grouping [at 1:8] +SELECT MAP_FROM_ARRAY([(new zetasql_test__.EmptyMessage(), true)]); + ^ +== + +# No error for non-groupable value +SELECT MAP_FROM_ARRAY([(true, new zetasql_test__.EmptyMessage())]); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [MAP>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:map_from_array(ARRAY>>) -> MAP>) + | +-FunctionCall(ZetaSQL:$make_array(repeated(1) STRUCT>) -> ARRAY>>) + | +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-Literal(type=BOOL, value=true) + | +-MakeProto(type=PROTO) + +-input_scan= + +-SingleRowScan +== + +[language_features=V_1_4_MAP_TYPE,V_1_3_COLLATION_SUPPORT,V_1_3_ANNOTATION_FRAMEWORK] +SELECT MAP_FROM_ARRAY([('a', true), (COLLATE('Z', 'und:ci'), false)]); +-- +ERROR: Collation is not allowed on argument 1 ([<"und:ci",_>]) [at 1:8] +SELECT MAP_FROM_ARRAY([('a', true), (COLLATE('Z', 'und:ci'), false)]); + ^ +== diff --git a/zetasql/analyzer/testdata/pivot.test b/zetasql/analyzer/testdata/pivot.test index 3a4de8ba4..b162b2a6b 100644 --- a/zetasql/analyzer/testdata/pivot.test +++ b/zetasql/analyzer/testdata/pivot.test @@ -1,16 +1,9 @@ -# ERROR: Pivot on value table via UNNEST [default language_features=V_1_3_PIVOT,NUMERIC_TYPE,BIGNUMERIC_TYPE,V_1_2_CIVIL_TIME,V_1_2_GROUP_BY_ARRAY,V_1_2_GROUP_BY_STRUCT,TABLESAMPLE,V_1_3_IS_DISTINCT,V_1_1_NULL_HANDLING_MODIFIER_IN_AGGREGATE,V_1_1_ORDER_BY_IN_AGGREGATE,V_1_1_LIMIT_IN_AGGREGATE,V_1_1_HAVING_IN_AGGREGATE] -SELECT * FROM UNNEST([1,2]) AS Value PIVOT(COUNT(Value) FOR Key IN (0, 1)); --- -ERROR: PIVOT is not allowed with array scans [at 1:38] -SELECT * FROM UNNEST([1,2]) AS Value PIVOT(COUNT(Value) FOR Key IN (0, 1)); - ^ -== # ERROR: Pivot on value table via UNNEST SELECT * FROM UNNEST([1,2]) AS Value PIVOT(COUNT(Value) FOR Key IN (0, 1)); -- -ERROR: PIVOT is not allowed with array scans [at 1:38] +ERROR: PIVOT is not allowed with array scans [at 2:38] SELECT * FROM UNNEST([1,2]) AS Value PIVOT(COUNT(Value) FOR Key IN (0, 1)); ^ == @@ -5001,9 +4994,10 @@ QueryStmt | +-PivotColumn(column=$pivot.sum_key_z#11, pivot_expr_index=0, pivot_value_index=0) | +-PivotColumn(column=$pivot.sum_key_o#12, pivot_expr_index=0, pivot_value_index=1) +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$pivot.sum_key_z#5) - +-ColumnRef(type=INT64, column=$pivot.sum_key_z#11) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$pivot.sum_key_z#5) + | +-ColumnRef(type=INT64, column=$pivot.sum_key_z#11) + +-has_using=TRUE == # DISTINCT modifier in aggregate function used in PIVOT. diff --git a/zetasql/analyzer/testdata/proto_braced_constructors.test b/zetasql/analyzer/testdata/proto_braced_constructors.test new file mode 100644 index 000000000..a4e8b3625 --- /dev/null +++ b/zetasql/analyzer/testdata/proto_braced_constructors.test @@ -0,0 +1,1756 @@ +# Tests for resolving proto braced constructors. +# + +[default language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_1_CAST_DIFFERENT_ARRAY_TYPES,V_1_2_PROTO_EXTENSIONS_WITH_NEW,V_1_1_WITH_ON_SUBQUERY,V_1_3_WITH_RECURSIVE] + +[language_features=] +SELECT NEW zetasql_test__.TestExtraPB {} +-- +ERROR: Braced constructors are not supported [at 1:39] +SELECT NEW zetasql_test__.TestExtraPB {} + ^ +== + +[language_features=] +SELECT {} +-- +ERROR: Braced constructors are not supported [at 1:8] +SELECT {} + ^ +== + +SELECT NEW abc {} +-- +ERROR: Type not found: abc [at 1:12] +SELECT NEW abc {} + ^ +== + +SELECT NEW INT64{} +-- +ERROR: Braced NEW constructors are not allowed for type INT64 [at 1:12] +SELECT NEW INT64{} + ^ +== + +# The unparsed SQL uses the old syntax. +[show_unparsed] +SELECT NEW zetasql_test__.TestExtraPB {} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := MakeProto(type=PROTO) + +-input_scan= + +-SingleRowScan + +[UNPARSED_SQL] +SELECT + NEW `zetasql_test__.TestExtraPB`() AS a_1; +== + +# Error using braced constructors without an inferred type. +SELECT { abc: 1 } +-- +ERROR: Unable to infer a type for braced constructor [at 1:8] +SELECT { abc: 1 } + ^ +== + +# A simple field + string array example. +SELECT NEW zetasql_test__.TestExtraPB {int32_val2: 5 + str_value: ["abc", "def"]} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int32_val2 := Literal(type=INT32, value=5) + | +-str_value := Literal(type=ARRAY, value=["abc", "def"]) + +-input_scan= + +-SingleRowScan +== + +# An integer array example. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + repeated_int32_val: [1, 2] +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-repeated_int32_val := Literal(type=ARRAY, value=[1, 2]) + +-input_scan= + +-SingleRowScan +== + +# Nested sub-message example. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value { + nested_int64: 10 + } +} +-- + +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_value := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-nested_int64 := Literal(type=INT64, value=10) + +-input_scan= + +-SingleRowScan +== + +# Nested repeated sub-message example. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_repeated_value: [{ + nested_int64: 10 + nested_repeated_int64: [100, 200] + }, { + nested_int64: 20 + nested_repeated_int64: [300, 400] + }] +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_repeated_value := + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) PROTO) -> ARRAY>) + | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-nested_int64 := Literal(type=INT64, value=10) + | | +-nested_repeated_int64 := Literal(type=ARRAY, value=[100, 200]) + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-nested_int64 := Literal(type=INT64, value=20) + | +-nested_repeated_int64 := Literal(type=ARRAY, value=[300, 400]) + +-input_scan= + +-SingleRowScan +== + +# Map fields example. +SELECT NEW zetasql_test__.MessageWithMapField { + string_int32_map: [{ + key: "foo" + value: 10 + }, { + key: "bar" + value: 20 + }] +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-string_int32_map := + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) PROTO) -> ARRAY>) + | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-key := Literal(type=STRING, value="foo") + | | +-value := Literal(type=INT32, value=10) + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-key := Literal(type=STRING, value="bar") + | +-value := Literal(type=INT32, value=20) + +-input_scan= + +-SingleRowScan +== + +# Extensions test. +SELECT NEW zetasql_test__.TestExtraPB { + (zetasql_test__.TestExtraPBExtensionHolder.test_extra_proto_extension) { + ext_value: [1] + } +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-[zetasql_test__.TestExtraPBExtensionHolder.test_extra_proto_extension] := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-ext_value := Literal(type=ARRAY, value=[1]) + +-input_scan= + +-SingleRowScan +== + +# Another example with extensions, this time with a regular field too. +SELECT NEW zetasql_test__.TestExtraPB { + int32_val1: 5, + (zetasql_test__.TestExtraPBExtensionHolder.test_extra_proto_extension) { + ext_value: [1] + } +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int32_val1 := Literal(type=INT32, value=5) + | +-[zetasql_test__.TestExtraPBExtensionHolder.test_extra_proto_extension] := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-ext_value := Literal(type=ARRAY, value=[1]) + +-input_scan= + +-SingleRowScan +== + +# Another example with extensions, this time with two other regular fields. +SELECT NEW zetasql_test__.TestExtraPB { + str_value: ["foo"] + int32_val1: 5, + (zetasql_test__.TestExtraPBExtensionHolder.test_extra_proto_extension) { + ext_value: [1] + } +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-str_value := Literal(type=ARRAY, value=["foo"]) + | +-int32_val1 := Literal(type=INT32, value=5) + | +-[zetasql_test__.TestExtraPBExtensionHolder.test_extra_proto_extension] := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-ext_value := Literal(type=ARRAY, value=[1]) + +-input_scan= + +-SingleRowScan +== + +# Missing extension field name. This error is detected by shared code that is +# more thoroughly tested by the examples in proto_extensions.test. Here we are +# just testing that the code for NEW handles errors from that shared code +# correctly. +SELECT NEW zetasql_test__.TestExtraPB { + (zetasql_test__.TestExtraPBExtensionHolder) { + ext_value: 1 + } +} +-- +ERROR: Expected extension name of the form (MessageName.extension_field_name), but zetasql_test__.TestExtraPBExtensionHolder is a full message name. Add the extension field name. [at 2:6] + (zetasql_test__.TestExtraPBExtensionHolder) { + ^ +== + +# Example using expression for the leaf values. +SELECT NEW zetasql_test__.TestExtraPB {int32_val1: coalesce(4) + int32_val2: cast(4 as uint64)} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int32_val1 := + | | +-Cast(INT64 -> INT32) + | | +-FunctionCall(ZetaSQL:coalesce(repeated(1) INT64) -> INT64) + | | +-Literal(type=INT64, value=4) + | +-int32_val2 := Literal(type=INT32, value=4, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Example using a sub-query. +SELECT NEW zetasql_test__.TestExtraPB { + int32_val1: (SELECT key FROM TestTable) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int32_val1 := + | +-SubqueryExpr + | +-type=INT32 + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[TestTable.key#1] + | +-input_scan= + | +-TableScan(column_list=[TestTable.key#1], table=TestTable, column_index_list=[0]) + +-input_scan= + +-SingleRowScan +== + +# Non-scalar subquery. +SELECT NEW zetasql_test__.TestExtraPB { + int32_val1: (SELECT TestTable.* FROM TestTable) +} +-- +ERROR: Scalar subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values [at 2:15] + int32_val1: (SELECT TestTable.* FROM TestTable) + ^ +== + +# Invalid field test. +SELECT NEW zetasql_test__.KitchenSinkPB {xxxxx: 5} +-- +ERROR: Field 1 has name xxxxx which is not a field in proto zetasql_test__.KitchenSinkPB [at 1:42] +SELECT NEW zetasql_test__.KitchenSinkPB {xxxxx: 5} + ^ +== + +# Filling proto fields from an external query. +SELECT NEW zetasql_test__.TestExtraPB {int32_val1: t.int32_val1 + int32_val2: t.int32_val2} +from (SELECT 1 int32_val1, 2 int32_val2) t +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int32_val1 := + | | +-Cast(INT64 -> INT32) + | | +-ColumnRef(type=INT64, column=t.int32_val1#1) + | +-int32_val2 := + | +-Cast(INT64 -> INT32) + | +-ColumnRef(type=INT64, column=t.int32_val2#2) + +-input_scan= + +-ProjectScan + +-column_list=t.[int32_val1#1, int32_val2#2] + +-expr_list= + | +-int32_val1#1 := Literal(type=INT64, value=1) + | +-int32_val2#2 := Literal(type=INT64, value=2) + +-input_scan= + +-SingleRowScan +== + +# Mixing with aggregation. +# ANY_VALUE is necessary here because we don't detect it is the same +# expression as shows up in GROUP BY. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: ANY_VALUE(KitchenSink.int64_key_1) + int64_key_2: ANY_VALUE(KitchenSink.int64_key_2) + int64_val: count(*) + uint64_val: sum(length(KitchenSink.string_val)) + repeated_string_val: array_agg(KitchenSink.string_val)} +from TestTable +group by KitchenSink.int64_key_1, KitchenSink.int64_key_2 +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#11 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#11] + +-expr_list= + | +-$col1#11 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := ColumnRef(type=INT64, column=$aggregate.$agg1#4) + | +-int64_key_2 := ColumnRef(type=INT64, column=$aggregate.$agg2#5) + | +-int64_val := ColumnRef(type=INT64, column=$aggregate.$agg3#6) + | +-uint64_val := + | | +-Cast(INT64 -> UINT64) + | | +-ColumnRef(type=INT64, column=$aggregate.$agg4#7) + | +-repeated_string_val := ColumnRef(type=ARRAY, column=$aggregate.$agg5#8) + +-input_scan= + +-AggregateScan + +-column_list=$aggregate.[$agg1#4, $agg2#5, $agg3#6, $agg4#7, $agg5#8] + +-input_scan= + | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + +-group_by_list= + | +-int64_key_1#9 := + | | +-GetProtoField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=int64_key_1 + | +-int64_key_2#10 := + | +-GetProtoField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=int64_key_2 + +-aggregate_list= + +-$agg1#4 := + | +-AggregateFunctionCall(ZetaSQL:any_value(INT64) -> INT64) + | +-GetProtoField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=int64_key_1 + +-$agg2#5 := + | +-AggregateFunctionCall(ZetaSQL:any_value(INT64) -> INT64) + | +-GetProtoField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=int64_key_2 + +-$agg3#6 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + +-$agg4#7 := + | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | +-FunctionCall(ZetaSQL:length(STRING) -> INT64) + | +-GetProtoField + | +-type=STRING + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=string_val + | +-default_value="default_name" + +-$agg5#8 := + +-AggregateFunctionCall(ZetaSQL:array_agg(STRING) -> ARRAY) + +-GetProtoField + +-type=STRING + +-expr= + | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + +-field_descriptor=string_val + +-default_value="default_name" +== + +# Mixing with other expressions. +SELECT 1 + NEW zetasql_test__.TestExtraPB {int32_val1: 5}.int32_val1 +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-Literal(type=INT64, value=1) + | +-Cast(INT32 -> INT64) + | +-GetProtoField + | +-type=INT32 + | +-expr= + | | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-int32_val1 := Literal(type=INT32, value=5) + | +-field_descriptor=int32_val1 + | +-default_value=0 + +-input_scan= + +-SingleRowScan +== + +# Untyped constructor. +UPDATE TestTable SET KitchenSink = { + int64_key_1: 1 + int64_key_2: 2 + test_enum: 'TESTENUM1' +} +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + +-set_value= + +-DMLValue + +-value= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=1) + +-int64_key_2 := Literal(type=INT64, value=2) + +-test_enum := Literal(type=ENUM, value=TESTENUM1) +== + +# Array constructor. +SELECT ARRAY[{ + str_value: ["foo", "bar"] + }, { + str_value: ["baz"] + }] +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [ARRAY>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) PROTO) -> ARRAY>) + | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-str_value := Literal(type=ARRAY, value=["foo", "bar"]) + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-str_value := Literal(type=ARRAY, value=["baz"]) + +-input_scan= + +-SingleRowScan +== + +# Untyped array constructor. +UPDATE ArrayTypes SET ProtoArray = [{ + str_value: ["foo", "bar"] }, { str_value: ["baz"] }, NULL, {}] +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) + +-set_value= + +-DMLValue + +-value= + +-FunctionCall(ZetaSQL:$make_array(repeated(4) PROTO) -> ARRAY>) + +-MakeProto + | +-type=PROTO + | +-field_list= + | +-str_value := Literal(type=ARRAY, value=["foo", "bar"]) + +-MakeProto + | +-type=PROTO + | +-field_list= + | +-str_value := Literal(type=ARRAY, value=["baz"]) + +-Literal(type=PROTO, value=NULL) + +-MakeProto(type=PROTO) +== + +# Non-array inferred type for array constructor, inferred type is ignored. +UPDATE TestTable SET KitchenSink = [{}] +WHERE TRUE; +-- +ERROR: Unable to infer a type for braced constructor [at 1:37] +UPDATE TestTable SET KitchenSink = [{}] + ^ +== + +# Non-array inferred type for array constructor, inferred type is ignored. +UPDATE TestTable SET KitchenSink = [1, 2] +WHERE TRUE; +-- +ERROR: Value of type ARRAY cannot be assigned to KitchenSink, which has type zetasql_test__.KitchenSinkPB [at 1:36] +UPDATE TestTable SET KitchenSink = [1, 2] + ^ +== + +# Simple STRUCT constructor. +SELECT STRUCT(1, { int64_key_1: 1 int64_key_2: 2 }) +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + +-input_scan= + +-SingleRowScan +== + +# Struct constructors with type specified in nested constructor. +UPDATE StructWithKitchenSinkTable SET s = + STRUCT>( + { int64_key_1: 1 int64_key_2: 2 }, + STRUCT( + { int64_key_1: 10 int64_key_2: 20 })) +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[StructWithKitchenSinkTable.s#2], table=StructWithKitchenSinkTable, column_index_list=[1]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT, s STRUCT>>, column=StructWithKitchenSinkTable.s#2) + +-set_value= + +-DMLValue + +-value= + +-MakeStruct + +-type=STRUCT, s STRUCT>> + +-field_list= + +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + +-MakeStruct + +-type=STRUCT> + +-field_list= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=10) + +-int64_key_2 := Literal(type=INT64, value=20) +== + +# Struct constructors with type not specified in nested constructor, it is +# inferred from the STRUCT field definition. +UPDATE StructWithKitchenSinkTable SET s = + STRUCT>( + { int64_key_1: 1 int64_key_2: 2 }, + STRUCT({ int64_key_1: 10 int64_key_2: 20 })) +WHERE TRUE; +-- +[SAME AS PREVIOUS] +== + +# Untyped nested struct constructor. +UPDATE StructWithKitchenSinkTable SET s = ( + { int64_key_1: 1 int64_key_2: 2 }, + STRUCT({ int64_key_1: 10 int64_key_2: 20 })) +WHERE TRUE; +-- +[SAME AS PREVIOUS] +== + +# Struct of array of struct. +UPDATE StructWithKitchenSinkTable SET t = + STRUCT>>( + 1, + [STRUCT({ int64_key_1: 1 int64_key_2: 2 }), + STRUCT({ int64_key_1: 10 int64_key_2: 20 })] + ) +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[StructWithKitchenSinkTable.t#3], table=StructWithKitchenSinkTable, column_index_list=[2]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT>>>, column=StructWithKitchenSinkTable.t#3) + +-set_value= + +-DMLValue + +-value= + +-MakeStruct + +-type=STRUCT>>> + +-field_list= + +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + +-Cast(ARRAY>> -> ARRAY>>) + +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRUCT>) -> ARRAY>>) + +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + +-MakeStruct + +-type=STRUCT> + +-field_list= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=10) + +-int64_key_2 := Literal(type=INT64, value=20) +== + +# Untyped struct of array of struct. +UPDATE StructWithKitchenSinkTable SET t = ( + 1, + [STRUCT({ int64_key_1: 1 int64_key_2: 2 }), + STRUCT({ int64_key_1: 10 int64_key_2: 20 })] + ) +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[StructWithKitchenSinkTable.t#3], table=StructWithKitchenSinkTable, column_index_list=[2]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT>>>, column=StructWithKitchenSinkTable.t#3) + +-set_value= + +-DMLValue + +-value= + +-MakeStruct + +-type=STRUCT>>> + +-field_list= + +-Literal(type=INT64, value=1) + +-Cast(ARRAY>> -> ARRAY>>) + +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRUCT>) -> ARRAY>>) + +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + +-MakeStruct + +-type=STRUCT> + +-field_list= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=10) + +-int64_key_2 := Literal(type=INT64, value=20) +== + +# Inferred has too few elements for STRUCT constructor. +UPDATE StructWithKitchenSinkTable SET s = ( + { int64_key_1: 1 int64_key_2: 2 }, + STRUCT({ int64_key_1: 10 int64_key_2: 20 }), + 2, 3) +WHERE TRUE; +-- +ERROR: Value of type STRUCT, INT64, ...> cannot be assigned to s, which has type STRUCT> [at 1:43] +UPDATE StructWithKitchenSinkTable SET s = ( + ^ +== + +# Incompatible inferred type. +UPDATE StructWithKitchenSinkTable SET s = ( + 2, + STRUCT({ int64_key_1: 10 int64_key_2: 20 })) +WHERE TRUE; +-- +ERROR: Value of type STRUCT> cannot be assigned to s, which has type STRUCT> [at 1:43] +UPDATE StructWithKitchenSinkTable SET s = ( + ^ +== + +# Inferred type with struct containing proto. +UPDATE StructWithKitchenSinkTable SET s = ( + { kitchen_sink: {int64_key_1: 1, int64_key_2: 2 }, s:{kitchen_sink: {int64_key_1: 1, int64_key_2: 2 }}}) +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[StructWithKitchenSinkTable.s#2], table=StructWithKitchenSinkTable, column_index_list=[1]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT, s STRUCT>>, column=StructWithKitchenSinkTable.s#2) + +-set_value= + +-DMLValue + +-value= + +-MakeStruct + +-type=STRUCT, s STRUCT>> + +-field_list= + +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + +-MakeStruct + +-type=STRUCT> + +-field_list= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=1) + +-int64_key_2 := Literal(type=INT64, value=2) +== + +# Mixing old and new proto constructors. +SELECT NEW zetasql_test__.KitchenSinkPB ( + 1 AS int64_key_1, + 2 AS int64_key_2, + { nested_int64: 10 } AS nested_value +) +-- + +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_value := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-nested_int64 := Literal(type=INT64, value=10) + +-input_scan= + +-SingleRowScan +== + +# Infer the type of submessages in REPLACE_FIELDS. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_3_REPLACE_FIELDS] +SELECT + REPLACE_FIELDS( + KitchenSink, { nested_int64: 1 } AS nested_value + ) +FROM TestTable +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-ReplaceField + | +-type=PROTO + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-replace_field_item_list= + | +-ReplaceFieldItem + | +-expr= + | | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-nested_int64 := Literal(type=INT64, value=1) + | +-proto_field_path=nested_value + +-input_scan= + +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) +== + +# Infer the type of the lhs when trying to CAST braced constructors. +SELECT CAST( { nested_int64: 1 } AS zetasql_test__.KitchenSinkPB.Nested); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-nested_int64 := Literal(type=INT64, value=1) + +-input_scan= + +-SingleRowScan +== + +# CAST works with arrays of protos as well. +SELECT CAST( [{ nested_int64: 10 }, { nested_int64: 20 }] AS ARRAY); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [ARRAY>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$make_array(repeated(2) PROTO) -> ARRAY>) + | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-nested_int64 := Literal(type=INT64, value=10) + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-nested_int64 := Literal(type=INT64, value=20) + +-input_scan= + +-SingleRowScan +== + +# Braced constructor type inferred in generated columns. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_2_GENERATED_COLUMNS] +[no_enable_literal_replacement] +CREATE TABLE T ( + IntColumn INT32, + ProtoColumn zetasql_test__.TestExtraPB AS ({int32_val1: IntColumn int32_val2: 5}) +) +-- +CreateTableStmt ++-name_path=T ++-column_definition_list= + +-ColumnDefinition(name="IntColumn", type=INT32, column=T.IntColumn#1) + +-ColumnDefinition + +-name="ProtoColumn" + +-type=PROTO + +-column=T.ProtoColumn#2 + +-generated_column_info= + +-GeneratedColumnInfo + +-expression= + +-MakeProto + +-type=PROTO + +-field_list= + +-int32_val1 := ColumnRef(type=INT32, column=T.IntColumn#1) + +-int32_val2 := Literal(type=INT32, value=5) +== + +# Braced constructor types inferred in default column value. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_3_COLUMN_DEFAULT_VALUE] +[no_enable_literal_replacement] +CREATE TABLE T ( + IntColumn INT32, + ProtoColumn zetasql_test__.TestExtraPB DEFAULT {int32_val1: 3 int32_val2: 5} +) +-- +CreateTableStmt ++-name_path=T ++-column_definition_list= + +-ColumnDefinition(name="IntColumn", type=INT32, column=T.IntColumn#1) + +-ColumnDefinition + +-name="ProtoColumn" + +-type=PROTO + +-column=T.ProtoColumn#2 + +-default_value= + +-ColumnDefaultValue + +-expression= + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int32_val1 := Literal(type=INT32, value=3) + | +-int32_val2 := Literal(type=INT32, value=5) + +-sql="{int32_val1: 3 int32_val2: 5}" +== + +# Braced constructor type inferred in SQL function body. +CREATE FUNCTION myfunc ( ) RETURNS zetasql_test__.TestExtraPB AS ({int32_val1: 3 int32_val2: 5}); +-- +CreateFunctionStmt ++-name_path=myfunc ++-has_explicit_return_type=TRUE ++-return_type=PROTO ++-signature=() -> PROTO ++-language="SQL" ++-code="{int32_val1: 3 int32_val2: 5}" ++-function_expression= + +-MakeProto + +-type=PROTO + +-field_list= + +-int32_val1 := Literal(type=INT32, value=3) + +-int32_val2 := Literal(type=INT32, value=5) +== + +# Braced constructor type without a return type in a SQL function body is an error. +CREATE FUNCTION myfunc ( ) AS ({int32_val1: 3 int32_val2: 5}); +-- +ERROR: Unable to infer a type for braced constructor [at 1:33] +CREATE FUNCTION myfunc ( ) AS ({int32_val1: 3 int32_val2: 5}); + ^ +== + +# Braced constructor type inferred in aggregate SQL function body. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,CREATE_AGGREGATE_FUNCTION,TEMPLATE_FUNCTIONS] +CREATE AGGREGATE FUNCTION myfunc ( ) RETURNS zetasql_test__.TestExtraPB AS ({int32_val1: 3 int32_val2: 5}); +-- +CreateFunctionStmt ++-name_path=myfunc ++-has_explicit_return_type=TRUE ++-return_type=PROTO ++-signature=() -> PROTO ++-is_aggregate=TRUE ++-language="SQL" ++-code="{int32_val1: 3 int32_val2: 5}" ++-function_expression= + +-MakeProto + +-type=PROTO + +-field_list= + +-int32_val1 := Literal(type=INT32, value=3) + +-int32_val2 := Literal(type=INT32, value=5) +== + +# Braced constructor type without a return type in an aggregate SQL function body is an error. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,CREATE_AGGREGATE_FUNCTION,TEMPLATE_FUNCTIONS] +CREATE AGGREGATE FUNCTION myfunc ( ) AS ({int32_val1: 3 int32_val2: 5}); +-- +ERROR: Unable to infer a type for braced constructor [at 1:43] +CREATE AGGREGATE FUNCTION myfunc ( ) AS ({int32_val1: 3 int32_val2: 5}); + ^ +== + +# Setting proto system variables works. +SET @@proto_system_variable = {int64_key_1: 1 int64_key_2: 2} +-- +AssignmentStmt ++-target= +| +-SystemVariable(proto_system_variable, type=PROTO) ++-expr= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=1) + +-int64_key_2 := Literal(type=INT64, value=2) +== + +# Inferring through a scalar subquery works for array and non-array types. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT { nested_int64: 5 }) + nested_repeated_value: (SELECT ARRAY[{ nested_int64: 6 },{ nested_int64: 7 }]) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_value := + | | +-SubqueryExpr + | | +-type=PROTO + | | +-subquery_type=SCALAR + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#1] + | | +-expr_list= + | | | +-$col1#1 := + | | | +-MakeProto + | | | +-type=PROTO + | | | +-field_list= + | | | +-nested_int64 := Literal(type=INT64, value=5) + | | +-input_scan= + | | +-SingleRowScan + | +-nested_repeated_value := + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := + | | +-FunctionCall(ZetaSQL:$make_array(repeated(2) PROTO) -> ARRAY>) + | | +-MakeProto + | | | +-type=PROTO + | | | +-field_list= + | | | +-nested_int64 := Literal(type=INT64, value=6) + | | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-nested_int64 := Literal(type=INT64, value=7) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through a scalar subquery with the wrong type. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT 1) +} +-- +ERROR: Could not store value with type INT64 into proto field zetasql_test__.KitchenSinkPB.nested_value which has SQL type zetasql_test__.KitchenSinkPB.Nested [at 4:3] + nested_value: (SELECT 1) + ^ +== + +# Inferring through a scalar subquery with the wrong protocol buffer type. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT { nested: 5 }) +} +-- +ERROR: Field 1 has name nested which is not a field in proto zetasql_test__.KitchenSinkPB.Nested [at 4:27] + nested_value: (SELECT { nested: 5 }) + ^ +== + +# Inferring through a scalar subquery with GROUP BY. Note this does not fail on +# type inference and if grouping by proto is supported will work. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT { nested_int64: 5 } FROM TestTable GROUP BY 1) +} +-- +ERROR: Grouping by expressions of type PROTO is not allowed [at 4:69] + nested_value: (SELECT { nested_int64: 5 } FROM TestTable GROUP BY 1) + ^ +== + +# Recursively inferring through a scalar subquery works. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT (SELECT { nested_int64: 5 })) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_value := + | +-SubqueryExpr + | +-type=PROTO + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := + | | +-SubqueryExpr + | | +-type=PROTO + | | +-subquery_type=SCALAR + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#1] + | | +-expr_list= + | | | +-$col1#1 := + | | | +-MakeProto + | | | +-type=PROTO + | | | +-field_list= + | | | +-nested_int64 := Literal(type=INT64, value=5) + | | +-input_scan= + | | +-SingleRowScan + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through an array subquery works. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_repeated_value: ARRAY( + SELECT {} FROM UNNEST(GENERATE_ARRAY(1, 2)) + ) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_repeated_value := + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := MakeProto(type=PROTO) + | +-input_scan= + | +-ArrayScan + | +-array_expr_list= + | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=INT64, value=2) + | +-element_column_list=[$array.$unnest1#1] + +-input_scan= + +-SingleRowScan +== + +# Inferring from the LHS to the RHS of an IN subquery works. We get an error +# because proto types are not comparable. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 +} IN (SELECT {int64_key_1: 1 int64_key_2: 2} ) +-- +ERROR: Cannot execute IN subquery with uncomparable types zetasql_test__.KitchenSinkPB and zetasql_test__.KitchenSinkPB [at 1:8] +SELECT NEW zetasql_test__.KitchenSinkPB { + ^ +== + +# Subquery type does not match inferred type. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT 1) +} +-- +ERROR: Could not store value with type INT64 into proto field zetasql_test__.KitchenSinkPB.nested_value which has SQL type zetasql_test__.KitchenSinkPB.Nested [at 4:3] + nested_value: (SELECT 1) + ^ +== + +# Inferred type is not an array for an array subquery. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: ARRAY(SELECT { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:30] + nested_value: ARRAY(SELECT { nested_int64: 5 }) + ^ +== + +# Not inferring through a EXISTS query. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: EXISTS(SELECT { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:31] + nested_value: EXISTS(SELECT { nested_int64: 5 }) + ^ +== + +# Not inferring through a SELECT AS STRUCT query. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT AS STRUCT { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:35] + nested_value: (SELECT AS STRUCT { nested_int64: 5 }) + ^ +== + +# Not inferring through a SELECT AS PROTO query because of differing syntax. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (SELECT AS zetasql_test__.KitchenSinkPB { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:57] + nested_value: (SELECT AS zetasql_test__.KitchenSinkPB { nested_int64: 5 }) + ^ +== + +# Inference doesn't work inside SELECT AS PROTO query. +SELECT AS zetasql_test__.KitchenSinkPB + 1 AS int64_key_1, + 2 AS int64_key_2, + { nested_int64: 5 } AS nested_value +-- +ERROR: Unable to infer a type for braced constructor [at 4:3] + { nested_int64: 5 } AS nested_value + ^ +== + +# Inferring through a subquery using WITH does inference only on the returned +# SELECT column and not any select in the WITH. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (WITH Foo AS (SELECT { nested_int64: 5 } AS x) SELECT Foo.x) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:38] + nested_value: (WITH Foo AS (SELECT { nested_int64: 5 } AS x) SELECT Foo.x) + ^ +== + +# Inferring through a subquery using WITH working on first non-WITH SELECT +# column. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (WITH Foo AS (SELECT "foo" AS x) SELECT { nested_int64: 5 }) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_value := + | +-SubqueryExpr + | +-type=PROTO + | +-subquery_type=SCALAR + | +-subquery= + | +-WithScan + | +-column_list=[$expr_subquery.$col1#2] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="Foo" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[Foo.x#1] + | | +-expr_list= + | | | +-x#1 := Literal(type=STRING, value="foo") + | | +-input_scan= + | | +-SingleRowScan + | +-query= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := + | | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-nested_int64 := Literal(type=INT64, value=5) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through multiple WITH clauses. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_value: (WITH Foo AS (SELECT "foo" AS x) (WITH Bar AS (SELECT "bar" AS y) SELECT { nested_int64: 5 })) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_value := + | +-SubqueryExpr + | +-type=PROTO + | +-subquery_type=SCALAR + | +-subquery= + | +-WithScan + | +-column_list=[$expr_subquery.$col1#3] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="Foo" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[Foo.x#1] + | | +-expr_list= + | | | +-x#1 := Literal(type=STRING, value="foo") + | | +-input_scan= + | | +-SingleRowScan + | +-query= + | +-WithScan + | +-column_list=[$expr_subquery.$col1#3] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="Bar" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[Bar.y#2] + | | +-expr_list= + | | | +-y#2 := Literal(type=STRING, value="bar") + | | +-input_scan= + | | +-SingleRowScan + | +-query= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#3] + | +-expr_list= + | | +-$col1#3 := + | | +-MakeProto + | | +-type=PROTO + | | +-field_list= + | | +-nested_int64 := Literal(type=INT64, value=5) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through a subquery which has UNION ALL. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1 + int64_key_2: 2 + nested_repeated_value: ARRAY(SELECT { nested_int64: 5 } UNION ALL + SELECT { nested_int64: 6 }) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + | +-nested_repeated_value := + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-subquery= + | +-SetOperationScan + | +-column_list=[$union_all.$col1#3] + | +-op_type=UNION_ALL + | +-input_item_list= + | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[$union_all1.$col1#1] + | | | +-expr_list= + | | | | +-$col1#1 := + | | | | +-MakeProto + | | | | +-type=PROTO + | | | | +-field_list= + | | | | +-nested_int64 := Literal(type=INT64, value=5) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=[$union_all1.$col1#1] + | +-SetOperationItem + | +-scan= + | | +-ProjectScan + | | +-column_list=[$union_all2.$col1#2] + | | +-expr_list= + | | | +-$col1#2 := + | | | +-MakeProto + | | | +-type=PROTO + | | | +-field_list= + | | | +-nested_int64 := Literal(type=INT64, value=6) + | | +-input_scan= + | | +-SingleRowScan + | +-output_column_list=[$union_all2.$col1#2] + +-input_scan= + +-SingleRowScan +== + +# The analyzer allows setting required fields to NULL, but the engine will +# give an error. +SELECT NEW zetasql_test__.KitchenSinkPB {int64_key_1: NULL int64_key_2: NULL} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=NULL) + | +-int64_key_2 := Literal(type=INT64, value=NULL) + +-input_scan= + +-SingleRowScan +== + +# The analyzer allows setting values of repeated fields to NULL, but the engine +# will give an error. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 10 int64_key_2: 20 + repeated_int64_val: [1, 2, NULL, 4] +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=10) + | +-int64_key_2 := Literal(type=INT64, value=20) + | +-repeated_int64_val := Literal(type=ARRAY, value=[1, 2, NULL, 4]) + +-input_scan= + +-SingleRowScan +== + +# This is a valid query because int32_val is an optional field. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_2: 1 + int64_key_1: 2 + int32_val: NULL} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_2 := Literal(type=INT64, value=1) + | +-int64_key_1 := Literal(type=INT64, value=2) + | +-int32_val := Literal(type=INT32, value=NULL) + +-input_scan= + +-SingleRowScan +== + +# Error using braced constructors without an inferred type. +SELECT ARRAY[{ str_value: ["foo", "bar"] }, { str_value: ["baz"] }] +-- +ERROR: Unable to infer a type for braced constructor [at 1:14] +SELECT ARRAY[{ str_value: ["foo", "bar"] }, { str_value: ["baz"] }] + ^ +== + +# Error using braced constructors for non-proto type. +SELECT ARRAY[{ str_value: ["foo", "bar"] }, { str_value: ["baz"] }] +-- +ERROR: Braced constructors are not allowed for type INT64 [at 1:21] +SELECT ARRAY[{ str_value: ["foo", "bar"] }, { str_value: ["baz"] }] + ^ +== + +# Setting NUMERIC fields when the type is unsupported results in error. +SELECT NEW zetasql_test__.FieldFormatsProto { b_numeric: 6 } +-- +ERROR: Proto field zetasql_test__.FieldFormatsProto.b_numeric has unsupported type NUMERIC [at 1:47] +SELECT NEW zetasql_test__.FieldFormatsProto { b_numeric: 6 } + ^ +== + +# The inferred type is not propagated to sibling array elements. +SELECT ARRAY[ + NEW zetasql_test__.TestExtraPB{ str_value: ["foo", "bar"] }, + { str_value: ["baz"] } +] +-- +ERROR: Unable to infer a type for braced constructor [at 3:3] + { str_value: ["baz"] } + ^ +== + +# Regression test against b/259000660. +# The templated TVF should be parsed with the V_1_3_BRACED_PROTO_CONSTRUCTORS +# enabled as well. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,TABLE_VALUED_FUNCTIONS,CREATE_TABLE_FUNCTION,TEMPLATE_FUNCTIONS] +WITH + T AS ( + SELECT CAST(v as INT32) v + FROM UNNEST(GENERATE_ARRAY(2, 12)) AS v + ) +SELECT * +FROM templated_proto_braced_ctor_tvf(TABLE T) +ORDER BY dice_roll.int32_val1; +-- +QueryStmt ++-output_column_list= +| +-templated_proto_braced_ctor_tvf.dice_roll#4 AS dice_roll [PROTO] ++-query= + +-WithScan + +-column_list=[templated_proto_braced_ctor_tvf.dice_roll#4] + +-is_ordered=TRUE + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[T.v#2] + | +-expr_list= + | | +-v#2 := + | | +-Cast(INT64 -> INT32) + | | +-ColumnRef(type=INT64, column=$array.v#1) + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.v#1] + | +-array_expr_list= + | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | +-Literal(type=INT64, value=2) + | | +-Literal(type=INT64, value=12) + | +-element_column_list=[$array.v#1] + +-query= + +-OrderByScan + +-column_list=[templated_proto_braced_ctor_tvf.dice_roll#4] + +-is_ordered=TRUE + +-input_scan= + | +-ProjectScan + | +-column_list=[templated_proto_braced_ctor_tvf.dice_roll#4, $orderby.$orderbycol1#5] + | +-expr_list= + | | +-$orderbycol1#5 := + | | +-GetProtoField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=templated_proto_braced_ctor_tvf.dice_roll#4) + | | +-field_descriptor=int32_val1 + | | +-default_value=0 + | +-input_scan= + | +-TVFScan + | +-column_list=[templated_proto_braced_ctor_tvf.dice_roll#4] + | +-tvf=templated_proto_braced_ctor_tvf((ANY TABLE) -> ANY TABLE) + | +-signature=(TABLE) -> TABLE> + | +-argument_list= + | | +-FunctionArgument + | | +-scan= + | | | +-WithRefScan(column_list=[T.v#3], with_query_name="T") + | | +-argument_column_list=[T.v#3] + | +-column_index_list=[0] + +-order_by_item_list= + +-OrderByItem + +-column_ref= + +-ColumnRef(type=INT32, column=$orderby.$orderbycol1#5) + +With Templated SQL TVF signature: + templated_proto_braced_ctor_tvf(TABLE) -> TABLE> +containing resolved templated query: +QueryStmt ++-output_column_list= +| +-$query.dice_roll#2 AS dice_roll [PROTO] ++-query= + +-ProjectScan + +-column_list=[$query.dice_roll#2] + +-expr_list= + | +-dice_roll#2 := + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int32_val1 := ColumnRef(type=INT32, column=T.v#1) + +-input_scan= + +-RelationArgumentScan(column_list=[T.v#1], name="T") diff --git a/zetasql/analyzer/testdata/pseudo_columns.test b/zetasql/analyzer/testdata/pseudo_columns.test index 1adc9688c..872622448 100644 --- a/zetasql/analyzer/testdata/pseudo_columns.test +++ b/zetasql/analyzer/testdata/pseudo_columns.test @@ -336,9 +336,10 @@ QueryStmt +-right_scan= | +-TableScan(column_list=[AllPseudoColumns.Key#3], table=AllPseudoColumns, column_index_list=[0], alias="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=AllPseudoColumns.Key#1) - +-ColumnRef(type=INT32, column=AllPseudoColumns.Key#3) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=AllPseudoColumns.Key#1) + | +-ColumnRef(type=INT32, column=AllPseudoColumns.Key#3) + +-has_using=TRUE == # Pseudo-columns don't show up in GROUP BY ordinals. diff --git a/zetasql/analyzer/testdata/qualify.test b/zetasql/analyzer/testdata/qualify.test index 52a22cdc4..4060eb0a7 100644 --- a/zetasql/analyzer/testdata/qualify.test +++ b/zetasql/analyzer/testdata/qualify.test @@ -55,24 +55,24 @@ ALTERNATION GROUP: -- QueryStmt +-output_column_list= -| +-$query.a#5 AS a [INT64] +| +-$query.a#6 AS a [INT64] +-query= +-ProjectScan - +-column_list=[$query.a#5] + +-column_list=[$query.a#6] +-input_scan= +-FilterScan - +-column_list=[$analytic.$analytic1#4, $query.a#5] + +-column_list=[$analytic.$analytic1#5, $query.a#6] +-input_scan= | +-ProjectScan - | +-column_list=[$analytic.$analytic1#4, $query.a#5] + | +-column_list=[$analytic.$analytic1#5, $query.a#6] | +-expr_list= - | | +-a#5 := + | | +-a#6 := | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) - | | +-ColumnRef(type=INT64, column=$analytic.$analytic1#4) + | | +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) | | +-Literal(type=INT64, value=1) | +-input_scan= | +-AnalyticScan - | +-column_list=[$analytic.$analytic1#4] + | +-column_list=[$analytic.$analytic1#5] | +-input_scan= | | +-FilterScan | | +-input_scan= @@ -82,34 +82,34 @@ QueryStmt | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic1#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$query.a#5) + +-ColumnRef(type=INT64, column=$query.a#6) +-Literal(type=INT64, value=1) -- ALTERNATION GROUP: DISTINCT -- QueryStmt +-output_column_list= -| +-$distinct.a#6 AS a [INT64] +| +-$distinct.a#7 AS a [INT64] +-query= +-AggregateScan - +-column_list=[$distinct.a#6] + +-column_list=[$distinct.a#7] +-input_scan= | +-FilterScan - | +-column_list=[$analytic.$analytic1#4, $query.a#5] + | +-column_list=[$analytic.$analytic1#5, $query.a#6] | +-input_scan= | | +-ProjectScan - | | +-column_list=[$analytic.$analytic1#4, $query.a#5] + | | +-column_list=[$analytic.$analytic1#5, $query.a#6] | | +-expr_list= - | | | +-a#5 := + | | | +-a#6 := | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) - | | | +-ColumnRef(type=INT64, column=$analytic.$analytic1#4) + | | | +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) | | | +-Literal(type=INT64, value=1) | | +-input_scan= | | +-AnalyticScan - | | +-column_list=[$analytic.$analytic1#4] + | | +-column_list=[$analytic.$analytic1#5] | | +-input_scan= | | | +-FilterScan | | | +-input_scan= @@ -119,13 +119,13 @@ QueryStmt | | +-function_group_list= | | +-AnalyticFunctionGroup | | +-analytic_function_list= - | | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | | +-$analytic1#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) | +-filter_expr= | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$query.a#5) + | +-ColumnRef(type=INT64, column=$query.a#6) | +-Literal(type=INT64, value=1) +-group_by_list= - +-a#6 := ColumnRef(type=INT64, column=$query.a#5) + +-a#7 := ColumnRef(type=INT64, column=$query.a#6) == # Qualify expression is not a bool @@ -147,10 +147,10 @@ QueryStmt +-column_list=[KeyValue.Key#1] +-input_scan= +-FilterScan - +-column_list=[KeyValue.Key#1, $analytic.$analytic1#3] + +-column_list=[KeyValue.Key#1, $analytic.$analytic1#4] +-input_scan= | +-AnalyticScan - | +-column_list=[KeyValue.Key#1, $analytic.$analytic1#3] + | +-column_list=[KeyValue.Key#1, $analytic.$analytic1#4] | +-input_scan= | | +-FilterScan | | +-column_list=[KeyValue.Key#1] @@ -167,10 +167,10 @@ QueryStmt | | +-column_ref= | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) | +-analytic_function_list= - | +-$analytic1#3 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$analytic.$analytic1#3) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#4) +-Literal(type=INT64, value=1) == @@ -186,10 +186,10 @@ QueryStmt +-column_list=[KeyValue.Key#1] +-input_scan= +-FilterScan - +-column_list=[KeyValue.Key#1, $analytic.$analytic1#3] + +-column_list=[KeyValue.Key#1, $analytic.$analytic1#4] +-input_scan= | +-AnalyticScan - | +-column_list=[KeyValue.Key#1, $analytic.$analytic1#3] + | +-column_list=[KeyValue.Key#1, $analytic.$analytic1#4] | +-input_scan= | | +-FilterScan | | +-column_list=[KeyValue.Key#1] @@ -206,10 +206,10 @@ QueryStmt | | +-column_ref= | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) | +-analytic_function_list= - | +-$analytic1#3 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$analytic.$analytic1#3) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#4) +-Literal(type=INT64, value=1) == @@ -225,10 +225,10 @@ QueryStmt +-column_list=[KeyValue.Key#1] +-input_scan= +-FilterScan - +-column_list=[KeyValue.Key#1, $analytic.$analytic1#3] + +-column_list=[KeyValue.Key#1, $analytic.$analytic1#4] +-input_scan= | +-AnalyticScan - | +-column_list=[KeyValue.Key#1, $analytic.$analytic1#3] + | +-column_list=[KeyValue.Key#1, $analytic.$analytic1#4] | +-input_scan= | | +-FilterScan | | +-column_list=[KeyValue.Key#1] @@ -245,10 +245,10 @@ QueryStmt | | +-column_ref= | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) | +-analytic_function_list= - | +-$analytic1#3 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$analytic.$analytic1#3) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#4) +-Literal(type=INT64, value=1) == @@ -339,21 +339,21 @@ select 1 from KeyValue WHERE true QUALIFY row_number() over () = 1 -- QueryStmt +-output_column_list= -| +-$query.$col1#3 AS `$col1` [INT64] +| +-$query.$col1#4 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$query.$col1#3] + +-column_list=[$query.$col1#4] +-input_scan= +-FilterScan - +-column_list=[$query.$col1#3, $analytic.$analytic1#4] + +-column_list=[$query.$col1#4, $analytic.$analytic1#5] +-input_scan= | +-AnalyticScan - | +-column_list=[$query.$col1#3, $analytic.$analytic1#4] + | +-column_list=[$query.$col1#4, $analytic.$analytic1#5] | +-input_scan= | | +-ProjectScan - | | +-column_list=[$query.$col1#3] + | | +-column_list=[$query.$col1#4] | | +-expr_list= - | | | +-$col1#3 := Literal(type=INT64, value=1) + | | | +-$col1#4 := Literal(type=INT64, value=1) | | +-input_scan= | | +-FilterScan | | +-input_scan= @@ -363,10 +363,10 @@ QueryStmt | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic1#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$analytic.$analytic1#4) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) +-Literal(type=INT64, value=1) == @@ -382,10 +382,10 @@ QueryStmt +-column_list=MultipleColumns.[int_a#1, int_b#3] +-input_scan= +-FilterScan - +-column_list=[MultipleColumns.int_a#1, MultipleColumns.int_b#3, $analytic.$analytic1#7] + +-column_list=[MultipleColumns.int_a#1, MultipleColumns.int_b#3, $analytic.$analytic1#8] +-input_scan= | +-AnalyticScan - | +-column_list=[MultipleColumns.int_a#1, MultipleColumns.int_b#3, $analytic.$analytic1#7] + | +-column_list=[MultipleColumns.int_a#1, MultipleColumns.int_b#3, $analytic.$analytic1#8] | +-input_scan= | | +-FilterScan | | +-column_list=MultipleColumns.[int_a#1, int_b#3] @@ -402,7 +402,7 @@ QueryStmt | | +-column_ref= | | +-ColumnRef(type=INT64, column=MultipleColumns.int_a#1) | +-analytic_function_list= - | +-$analytic1#7 := + | +-$analytic1#8 := | +-AnalyticFunctionCall(ZetaSQL:avg(INT64) -> DOUBLE) | +-ColumnRef(type=INT64, column=MultipleColumns.int_b#3) | +-window_frame= @@ -415,7 +415,7 @@ QueryStmt | +-Literal(type=INT64, value=1) +-filter_expr= +-FunctionCall(ZetaSQL:$less(DOUBLE, DOUBLE) -> BOOL) - +-ColumnRef(type=DOUBLE, column=$analytic.$analytic1#7) + +-ColumnRef(type=DOUBLE, column=$analytic.$analytic1#8) +-Literal(type=DOUBLE, value=5) == @@ -454,22 +454,22 @@ select 1 AS x, row_number() over () as a from KeyValue WHERE true QUALIFY x = 1 -- QueryStmt +-output_column_list= -| +-$query.x#4 AS x [INT64] -| +-$analytic.a#5 AS a [INT64] +| +-$query.x#5 AS x [INT64] +| +-$analytic.a#6 AS a [INT64] +-query= +-ProjectScan - +-column_list=[$query.x#4, $analytic.a#5] + +-column_list=[$query.x#5, $analytic.a#6] +-input_scan= +-FilterScan - +-column_list=[$query.x#4, $analytic.a#5] + +-column_list=[$query.x#5, $analytic.a#6] +-input_scan= | +-AnalyticScan - | +-column_list=[$query.x#4, $analytic.a#5] + | +-column_list=[$query.x#5, $analytic.a#6] | +-input_scan= | | +-ProjectScan - | | +-column_list=[$query.x#4] + | | +-column_list=[$query.x#5] | | +-expr_list= - | | | +-x#4 := Literal(type=INT64, value=1) + | | | +-x#5 := Literal(type=INT64, value=1) | | +-input_scan= | | +-FilterScan | | +-input_scan= @@ -479,10 +479,10 @@ QueryStmt | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-a#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-a#6 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$query.x#4) + +-ColumnRef(type=INT64, column=$query.x#5) +-Literal(type=INT64, value=1) == @@ -493,22 +493,22 @@ QUALIFY CHAR_LENGTH(x) < 4 -- QueryStmt +-output_column_list= -| +-$query.x#4 AS x [STRING] -| +-$analytic.a#5 AS a [INT64] +| +-$query.x#5 AS x [STRING] +| +-$analytic.a#6 AS a [INT64] +-query= +-ProjectScan - +-column_list=[$query.x#4, $analytic.a#5] + +-column_list=[$query.x#5, $analytic.a#6] +-input_scan= +-FilterScan - +-column_list=[$query.x#4, $analytic.a#5] + +-column_list=[$query.x#5, $analytic.a#6] +-input_scan= | +-AnalyticScan - | +-column_list=[$query.x#4, $analytic.a#5] + | +-column_list=[$query.x#5, $analytic.a#6] | +-input_scan= | | +-ProjectScan - | | +-column_list=[$query.x#4] + | | +-column_list=[$query.x#5] | | +-expr_list= - | | | +-x#4 := Literal(type=STRING, value="abc") + | | | +-x#5 := Literal(type=STRING, value="abc") | | +-input_scan= | | +-FilterScan | | +-input_scan= @@ -518,11 +518,11 @@ QueryStmt | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-a#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-a#6 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) +-FunctionCall(ZetaSQL:char_length(STRING) -> INT64) - | +-ColumnRef(type=STRING, column=$query.x#4) + | +-ColumnRef(type=STRING, column=$query.x#5) +-Literal(type=INT64, value=4) == @@ -575,24 +575,24 @@ QUALIFY SUM(KeyValue.Key) > 1 and ROW_NUMBER() OVER () > 0; -- QueryStmt +-output_column_list= -| +-$query.$col1#3 AS `$col1` [INT64] +| +-$query.$col1#5 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$query.$col1#3] + +-column_list=[$query.$col1#5] +-input_scan= +-FilterScan - +-column_list=[$query.$col1#3, $aggregate.$agg1#4, $analytic.$analytic1#5] + +-column_list=[$query.$col1#5, $aggregate.$agg1#6, $analytic.$analytic1#7] +-input_scan= | +-AnalyticScan - | +-column_list=[$query.$col1#3, $aggregate.$agg1#4, $analytic.$analytic1#5] + | +-column_list=[$query.$col1#5, $aggregate.$agg1#6, $analytic.$analytic1#7] | +-input_scan= | | +-ProjectScan - | | +-column_list=[$query.$col1#3, $aggregate.$agg1#4] + | | +-column_list=[$query.$col1#5, $aggregate.$agg1#6] | | +-expr_list= - | | | +-$col1#3 := Literal(type=INT64, value=1) + | | | +-$col1#5 := Literal(type=INT64, value=1) | | +-input_scan= | | +-AggregateScan - | | +-column_list=[$aggregate.$agg1#4] + | | +-column_list=[$aggregate.$agg1#6] | | +-input_scan= | | | +-FilterScan | | | +-column_list=[KeyValue.Key#1] @@ -601,20 +601,20 @@ QueryStmt | | | +-filter_expr= | | | +-Literal(type=BOOL, value=true) | | +-aggregate_list= - | | +-$agg1#4 := + | | +-$agg1#6 := | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-$analytic1#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic1#7 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$aggregate.$agg1#4) + | +-ColumnRef(type=INT64, column=$aggregate.$agg1#6) | +-Literal(type=INT64, value=1) +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#7) +-Literal(type=INT64, value=0) == @@ -626,16 +626,16 @@ QUALIFY row_number() over () = 1 QueryStmt +-output_column_list= | +-KeyValue.Key#1 AS Key [INT64] -| +-$analytic.rank#4 AS rank [INT64] +| +-$analytic.rank#5 AS rank [INT64] +-query= +-ProjectScan - +-column_list=[KeyValue.Key#1, $analytic.rank#4] + +-column_list=[KeyValue.Key#1, $analytic.rank#5] +-input_scan= +-FilterScan - +-column_list=[KeyValue.Key#1, $analytic.rank#4, $analytic.$analytic2#5] + +-column_list=[KeyValue.Key#1, $analytic.rank#5, $analytic.$analytic2#6] +-input_scan= | +-AnalyticScan - | +-column_list=[KeyValue.Key#1, $analytic.rank#4, $analytic.$analytic2#5] + | +-column_list=[KeyValue.Key#1, $analytic.rank#5, $analytic.$analytic2#6] | +-input_scan= | | +-FilterScan | | +-column_list=[KeyValue.Key#1] @@ -652,13 +652,13 @@ QueryStmt | | | +-column_ref= | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) | | +-analytic_function_list= - | | +-rank#4 := AnalyticFunctionCall(ZetaSQL:rank() -> INT64) + | | +-rank#5 := AnalyticFunctionCall(ZetaSQL:rank() -> INT64) | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-$analytic2#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic2#6 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$analytic.$analytic2#5) + +-ColumnRef(type=INT64, column=$analytic.$analytic2#6) +-Literal(type=INT64, value=1) == @@ -670,12 +670,12 @@ FROM (SELECT Key from KeyValue) -- QueryStmt +-output_column_list= -| +-$query.$col1#6 AS `$col1` [INT64] +| +-$query.$col1#7 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$query.$col1#6] + +-column_list=[$query.$col1#7] +-expr_list= - | +-$col1#6 := + | +-$col1#7 := | +-SubqueryExpr | +-type=INT64 | +-subquery_type=SCALAR @@ -686,10 +686,10 @@ QueryStmt | +-column_list=[$subquery2.x#3] | +-input_scan= | +-FilterScan - | +-column_list=[$subquery2.x#3, $subquery2.y#4, $analytic.$analytic1#5] + | +-column_list=[$subquery2.x#3, $subquery2.y#4, $analytic.$analytic1#6] | +-input_scan= | | +-AnalyticScan - | | +-column_list=[$subquery2.x#3, $subquery2.y#4, $analytic.$analytic1#5] + | | +-column_list=[$subquery2.x#3, $subquery2.y#4, $analytic.$analytic1#6] | | +-input_scan= | | | +-FilterScan | | | +-column_list=$subquery2.[x#3, y#4] @@ -712,10 +712,10 @@ QueryStmt | | | +-column_ref= | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1, is_correlated=TRUE) | | +-analytic_function_list= - | | +-$analytic1#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | | +-$analytic1#6 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) | +-filter_expr= | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) + | +-ColumnRef(type=INT64, column=$analytic.$analytic1#6) | +-Literal(type=INT64, value=1) +-input_scan= +-ProjectScan @@ -732,12 +732,12 @@ FROM (SELECT Key from KeyValue) AS SubQuery -- QueryStmt +-output_column_list= -| +-$query.$col1#6 AS `$col1` [INT64] +| +-$query.$col1#7 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$query.$col1#6] + +-column_list=[$query.$col1#7] +-expr_list= - | +-$col1#6 := + | +-$col1#7 := | +-SubqueryExpr | +-type=INT64 | +-subquery_type=SCALAR @@ -748,10 +748,10 @@ QueryStmt | +-column_list=[$subquery1.x#3] | +-input_scan= | +-FilterScan - | +-column_list=[$subquery1.x#3, $subquery1.y#4, $analytic.$analytic1#5] + | +-column_list=[$subquery1.x#3, $subquery1.y#4, $analytic.$analytic1#6] | +-input_scan= | | +-AnalyticScan - | | +-column_list=[$subquery1.x#3, $subquery1.y#4, $analytic.$analytic1#5] + | | +-column_list=[$subquery1.x#3, $subquery1.y#4, $analytic.$analytic1#6] | | +-input_scan= | | | +-FilterScan | | | +-column_list=$subquery1.[x#3, y#4] @@ -774,10 +774,10 @@ QueryStmt | | | +-column_ref= | | | +-ColumnRef(type=INT64, column=KeyValue.Key#1, is_correlated=TRUE) | | +-analytic_function_list= - | | +-$analytic1#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | | +-$analytic1#6 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) | +-filter_expr= | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) + | +-ColumnRef(type=INT64, column=$analytic.$analytic1#6) | +-Literal(type=INT64, value=1) +-input_scan= +-ProjectScan @@ -827,19 +827,19 @@ QUALIFY sum(key) > 1; -- QueryStmt +-output_column_list= -| +-$analytic.$analytic1#4 AS `$col1` [INT64] +| +-$analytic.$analytic1#5 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$analytic.$analytic1#4] + +-column_list=[$analytic.$analytic1#5] +-input_scan= +-FilterScan - +-column_list=[$aggregate.$agg1#5, $analytic.$analytic1#4] + +-column_list=[$aggregate.$agg1#6, $analytic.$analytic1#5] +-input_scan= | +-AnalyticScan - | +-column_list=[$aggregate.$agg1#5, $analytic.$analytic1#4] + | +-column_list=[$aggregate.$agg1#6, $analytic.$analytic1#5] | +-input_scan= | | +-AggregateScan - | | +-column_list=[$aggregate.$agg1#5] + | | +-column_list=[$aggregate.$agg1#6] | | +-input_scan= | | | +-FilterScan | | | +-column_list=[KeyValue.Key#1] @@ -848,16 +848,16 @@ QueryStmt | | | +-filter_expr= | | | +-Literal(type=BOOL, value=true) | | +-aggregate_list= - | | +-$agg1#5 := + | | +-$agg1#6 := | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-$analytic1#4 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-$analytic1#5 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) +-filter_expr= +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$aggregate.$agg1#5) + +-ColumnRef(type=INT64, column=$aggregate.$agg1#6) +-Literal(type=INT64, value=1) == @@ -910,24 +910,24 @@ QUALIFY SUM(SUM(key)) OVER () > 1; -- QueryStmt +-output_column_list= -| +-$query.$col1#3 AS `$col1` [INT64] +| +-$query.$col1#5 AS `$col1` [INT64] +-query= +-ProjectScan - +-column_list=[$query.$col1#3] + +-column_list=[$query.$col1#5] +-input_scan= +-FilterScan - +-column_list=[$query.$col1#3, $aggregate.$agg1#4, $analytic.$analytic1#5] + +-column_list=[$query.$col1#5, $aggregate.$agg1#6, $analytic.$analytic1#7] +-input_scan= | +-AnalyticScan - | +-column_list=[$query.$col1#3, $aggregate.$agg1#4, $analytic.$analytic1#5] + | +-column_list=[$query.$col1#5, $aggregate.$agg1#6, $analytic.$analytic1#7] | +-input_scan= | | +-ProjectScan - | | +-column_list=[$query.$col1#3, $aggregate.$agg1#4] + | | +-column_list=[$query.$col1#5, $aggregate.$agg1#6] | | +-expr_list= - | | | +-$col1#3 := Literal(type=INT64, value=1) + | | | +-$col1#5 := Literal(type=INT64, value=1) | | +-input_scan= | | +-AggregateScan - | | +-column_list=[$aggregate.$agg1#4] + | | +-column_list=[$aggregate.$agg1#6] | | +-input_scan= | | | +-FilterScan | | | +-column_list=[KeyValue.Key#1] @@ -936,15 +936,15 @@ QueryStmt | | | +-filter_expr= | | | +-Literal(type=BOOL, value=true) | | +-aggregate_list= - | | +-$agg1#4 := + | | +-$agg1#6 := | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) | | +-ColumnRef(type=INT64, column=KeyValue.Key#1) | +-function_group_list= | +-AnalyticFunctionGroup | +-analytic_function_list= - | +-$analytic1#5 := + | +-$analytic1#7 := | +-AnalyticFunctionCall(ZetaSQL:sum(INT64) -> INT64) - | +-ColumnRef(type=INT64, column=$aggregate.$agg1#4) + | +-ColumnRef(type=INT64, column=$aggregate.$agg1#6) | +-window_frame= | +-WindowFrame(frame_unit=ROWS) | +-start_expr= @@ -953,7 +953,7 @@ QueryStmt | +-WindowFrameExpr(boundary_type=UNBOUNDED FOLLOWING) +-filter_expr= +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=$analytic.$analytic1#5) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#7) +-Literal(type=INT64, value=1) == @@ -1674,3 +1674,348 @@ select * QUALIFY(3) ERROR: Query without FROM clause cannot have a QUALIFY clause [at 1:10] select * QUALIFY(3) ^ +== + +# Regression test for b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT * FROM T +QUALIFY RANK() OVER (ORDER BY SUM(a) DESC) > 1 +-- + +ERROR: Star expansion expression references column a which is neither grouped nor aggregated [at 2:8] +SELECT * FROM T + ^ +== + +# Regression test for b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT SUM(a), * FROM T +QUALIFY RANK() OVER (ORDER BY SUM(a) DESC) > 0 +-- + +ERROR: Star expansion expression references column a which is neither grouped nor aggregated [at 2:16] +SELECT SUM(a), * FROM T + ^ +== + +# Regression test for b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT * FROM T +QUALIFY RANK() OVER (ORDER BY a) > COUNT(*); +-- + +ERROR: Star expansion expression references column a which is neither grouped nor aggregated [at 2:8] +SELECT * FROM T + ^ +== + +# Regression test for b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT a FROM T +QUALIFY RANK() OVER (ORDER BY a) > COUNT(*); +-- +ERROR: SELECT list expression references column a which is neither grouped nor aggregated [at 2:8] +SELECT a FROM T + ^ + +== + +# Regression test for b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT ROW_NUMBER() OVER (), a FROM T +QUALIFY RANK() OVER (ORDER BY SUM(a) DESC) > 0 + +-- +ERROR: SELECT list expression references column a which is neither grouped nor aggregated [at 2:30] +SELECT ROW_NUMBER() OVER (), a FROM T + ^ +== + +# Regression test for b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT ROW_NUMBER() OVER () AS b, a FROM T +QUALIFY b = 0 AND RANK() OVER (ORDER BY SUM(a) DESC) > 0 +-- +ERROR: SELECT list expression references column a which is neither grouped nor aggregated [at 2:35] +SELECT ROW_NUMBER() OVER () AS b, a FROM T + ^ + +== + +# Tests shape that should continue to work after fixing b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT ROW_NUMBER() OVER () AS b FROM T +QUALIFY b = 0 AND RANK() OVER (ORDER BY SUM(a) DESC) > 0 +-- +QueryStmt ++-output_column_list= +| +-$analytic.b#6 AS b [INT64] ++-query= + +-WithScan + +-column_list=[$analytic.b#6] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[T.a#1] + | +-expr_list= + | | +-a#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$analytic.b#6] + +-input_scan= + +-FilterScan + +-column_list=[$aggregate.$agg1#7, $analytic.b#6, $analytic.$analytic2#8] + +-input_scan= + | +-AnalyticScan + | +-column_list=[$aggregate.$agg1#7, $analytic.b#6, $analytic.$analytic2#8] + | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$aggregate.$agg1#7] + | | +-input_scan= + | | | +-WithRefScan(column_list=[T.a#2], with_query_name="T") + | | +-aggregate_list= + | | +-$agg1#7 := + | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=T.a#2) + | +-function_group_list= + | +-AnalyticFunctionGroup + | | +-analytic_function_list= + | | +-b#6 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + | +-AnalyticFunctionGroup + | +-order_by= + | | +-WindowOrdering + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1#7) + | | +-is_descending=TRUE + | +-analytic_function_list= + | +-$analytic2#8 := AnalyticFunctionCall(ZetaSQL:rank() -> INT64) + +-filter_expr= + +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$analytic.b#6) + | +-Literal(type=INT64, value=0) + +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + +-ColumnRef(type=INT64, column=$analytic.$analytic2#8) + +-Literal(type=INT64, value=0) + +== + +# Tests shape that should continue to work after fixing b/309029333 +WITH T AS (SELECT 1 AS a) +SELECT 1 + 1 AS c FROM T +QUALIFY c = 0 AND RANK() OVER (ORDER BY SUM(a) DESC) > 0 +-- +QueryStmt ++-output_column_list= +| +-$query.c#7 AS c [INT64] ++-query= + +-WithScan + +-column_list=[$query.c#7] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[T.a#1] + | +-expr_list= + | | +-a#1 := Literal(type=INT64, value=1) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$query.c#7] + +-input_scan= + +-FilterScan + +-column_list=[$query.c#7, $aggregate.$agg1#8, $analytic.$analytic1#9] + +-input_scan= + | +-AnalyticScan + | +-column_list=[$query.c#7, $aggregate.$agg1#8, $analytic.$analytic1#9] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[$query.c#7, $aggregate.$agg1#8] + | | +-expr_list= + | | | +-c#7 := + | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | +-Literal(type=INT64, value=1) + | | | +-Literal(type=INT64, value=1) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=[$aggregate.$agg1#8] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[T.a#2, $pre_groupby.c#6] + | | | +-expr_list= + | | | | +-c#6 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-Literal(type=INT64, value=1) + | | | | +-Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-WithRefScan(column_list=[T.a#2], with_query_name="T") + | | +-aggregate_list= + | | +-$agg1#8 := + | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=T.a#2) + | +-function_group_list= + | +-AnalyticFunctionGroup + | +-order_by= + | | +-WindowOrdering + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | | +-ColumnRef(type=INT64, column=$aggregate.$agg1#8) + | | +-is_descending=TRUE + | +-analytic_function_list= + | +-$analytic1#9 := AnalyticFunctionCall(ZetaSQL:rank() -> INT64) + +-filter_expr= + +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$query.c#7) + | +-Literal(type=INT64, value=0) + +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#9) + +-Literal(type=INT64, value=0) + +== + +WITH T AS (SELECT 1 AS a, 2 AS b) +SELECT 1 +FROM T +QUALIFY SUM(a) = 10 AND ROW_NUMBER() OVER (ORDER BY SUM(b)) > 0 +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#8 AS `$col1` [INT64] ++-query= + +-WithScan + +-column_list=[$query.$col1#8] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=T.[a#1, b#2] + | +-expr_list= + | | +-a#1 := Literal(type=INT64, value=1) + | | +-b#2 := Literal(type=INT64, value=2) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$query.$col1#8] + +-input_scan= + +-FilterScan + +-column_list=[$query.$col1#8, $aggregate.$agg1#9, $aggregate.$agg2#10, $analytic.$analytic1#11] + +-input_scan= + | +-AnalyticScan + | +-column_list=[$query.$col1#8, $aggregate.$agg1#9, $aggregate.$agg2#10, $analytic.$analytic1#11] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[$query.$col1#8, $aggregate.$agg1#9, $aggregate.$agg2#10] + | | +-expr_list= + | | | +-$col1#8 := Literal(type=INT64, value=1) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#9, $agg2#10] + | | +-input_scan= + | | | +-WithRefScan(column_list=T.[a#3, b#4], with_query_name="T") + | | +-aggregate_list= + | | +-$agg1#9 := + | | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=T.a#3) + | | +-$agg2#10 := + | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=T.b#4) + | +-function_group_list= + | +-AnalyticFunctionGroup + | +-order_by= + | | +-WindowOrdering + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | +-ColumnRef(type=INT64, column=$aggregate.$agg2#10) + | +-analytic_function_list= + | +-$analytic1#11 := AnalyticFunctionCall(ZetaSQL:row_number() -> INT64) + +-filter_expr= + +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$aggregate.$agg1#9) + | +-Literal(type=INT64, value=10) + +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#11) + +-Literal(type=INT64, value=0) +== + +WITH T AS (SELECT 1 AS a, 2 AS b) +SELECT 1 +FROM T +QUALIFY SUM(a) = 10 AND SUM(SUM(b)) OVER () > 0 +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#8 AS `$col1` [INT64] ++-query= + +-WithScan + +-column_list=[$query.$col1#8] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=T.[a#1, b#2] + | +-expr_list= + | | +-a#1 := Literal(type=INT64, value=1) + | | +-b#2 := Literal(type=INT64, value=2) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$query.$col1#8] + +-input_scan= + +-FilterScan + +-column_list=[$query.$col1#8, $aggregate.$agg1#9, $aggregate.$agg2#10, $analytic.$analytic1#11] + +-input_scan= + | +-AnalyticScan + | +-column_list=[$query.$col1#8, $aggregate.$agg1#9, $aggregate.$agg2#10, $analytic.$analytic1#11] + | +-input_scan= + | | +-ProjectScan + | | +-column_list=[$query.$col1#8, $aggregate.$agg1#9, $aggregate.$agg2#10] + | | +-expr_list= + | | | +-$col1#8 := Literal(type=INT64, value=1) + | | +-input_scan= + | | +-AggregateScan + | | +-column_list=$aggregate.[$agg1#9, $agg2#10] + | | +-input_scan= + | | | +-WithRefScan(column_list=T.[a#3, b#4], with_query_name="T") + | | +-aggregate_list= + | | +-$agg1#9 := + | | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=T.a#3) + | | +-$agg2#10 := + | | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=T.b#4) + | +-function_group_list= + | +-AnalyticFunctionGroup + | +-analytic_function_list= + | +-$analytic1#11 := + | +-AnalyticFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | +-ColumnRef(type=INT64, column=$aggregate.$agg2#10) + | +-window_frame= + | +-WindowFrame(frame_unit=ROWS) + | +-start_expr= + | | +-WindowFrameExpr(boundary_type=UNBOUNDED PRECEDING) + | +-end_expr= + | +-WindowFrameExpr(boundary_type=UNBOUNDED FOLLOWING) + +-filter_expr= + +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=$aggregate.$agg1#9) + | +-Literal(type=INT64, value=10) + +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + +-ColumnRef(type=INT64, column=$analytic.$analytic1#11) + +-Literal(type=INT64, value=0) diff --git a/zetasql/analyzer/testdata/recursive_views.test b/zetasql/analyzer/testdata/recursive_views.test index db2e5bb03..8360ab33b 100644 --- a/zetasql/analyzer/testdata/recursive_views.test +++ b/zetasql/analyzer/testdata/recursive_views.test @@ -475,9 +475,10 @@ CreateMaterializedViewStmt | | | +-right_scan= | | | | +-RecursiveRefScan(column_list=[$view.n#8]) | | | +-join_expr= -| | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) -| | | +-ColumnRef(type=INT64, column=x-y.z.n#7) -| | | +-ColumnRef(type=INT64, column=$view.n#8) +| | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) +| | | | +-ColumnRef(type=INT64, column=x-y.z.n#7) +| | | | +-ColumnRef(type=INT64, column=$view.n#8) +| | | +-has_using=TRUE | | +-recursive=TRUE | +-output_column_list=[x-y.z.n#7] +-sql="SELECT 1 AS n\n UNION ALL (\n WITH RECURSIVE `x-y.z` AS (\n SELECT 1 AS n\n UNION ALL\n SELECT n + 1 FROM `x-y.z`\n ) SELECT * FROM `x-y.z` INNER JOIN `a-b.c`.d USING (n)\n )" diff --git a/zetasql/analyzer/testdata/select_dotstar.test b/zetasql/analyzer/testdata/select_dotstar.test index 6abb1d230..fbb13bbec 100644 --- a/zetasql/analyzer/testdata/select_dotstar.test +++ b/zetasql/analyzer/testdata/select_dotstar.test @@ -3086,9 +3086,9 @@ QueryStmt SELECT STRUCT(ANY_VALUE(t).*) FROM (SELECT 1 a, 2 b) as t; -- -ERROR: Syntax error: Expected ")" or "," but got "." [at 1:27] +ERROR: Syntax error: Unexpected "*" [at 1:28] SELECT STRUCT(ANY_VALUE(t).*) - ^ + ^ == SELECT diff --git a/zetasql/analyzer/testdata/sql_builder_group_by_all.test b/zetasql/analyzer/testdata/sql_builder_group_by_all.test index 92481a47b..9d6273ea9 100644 --- a/zetasql/analyzer/testdata/sql_builder_group_by_all.test +++ b/zetasql/analyzer/testdata/sql_builder_group_by_all.test @@ -468,3 +468,23 @@ FROM ) AS aggregatescan_8; == +# Repro for b/323439034: When SELECT clause does not contain aggregate column +# and no grouping keys are chosen, it should produce AggregateScan with no +# aggregate_list and no group_by_list. +SELECT 'a' AS x +FROM KeyValue +GROUP BY ALL +-- +[UNPARSED_SQL] +SELECT + "a" AS x +FROM + ( + SELECT + NULL + FROM + KeyValue + GROUP BY ALL + ) AS aggregatescan_1; +== + diff --git a/zetasql/analyzer/testdata/sql_builder_set_operation_left.test b/zetasql/analyzer/testdata/sql_builder_set_operation_left.test index 786b6b366..5d3a24cb5 100644 --- a/zetasql/analyzer/testdata/sql_builder_set_operation_left.test +++ b/zetasql/analyzer/testdata/sql_builder_set_operation_left.test @@ -1,4 +1,4 @@ -[default language_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default language_features=V_1_4_CORRESPONDING_FULL] [default no_show_resolved_ast] [default show_unparsed] @@ -101,13 +101,17 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[int32#19, int64#20] + | | | +-ProjectScan + | | | +-column_list=$distinct.[int32#19, int64#20, int32#19] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | | | +-group_by_list= - | | | +-int32#19 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | | +-int64#20 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[int32#19, int64#20] + | | | +-input_scan= + | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) + | | | +-group_by_list= + | | | +-int32#19 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | | +-int64#20 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) | | +-output_column_list=$distinct.[int32#19, int64#20, int32#19] | +-SetOperationItem | +-scan= @@ -181,13 +185,17 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[int32#19, int64#20] + | | | +-ProjectScan + | | | +-column_list=$distinct.[int32#19, int64#20, int32#19] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | | | +-group_by_list= - | | | +-int32#19 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | | +-int64#20 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[int32#19, int64#20] + | | | +-input_scan= + | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) + | | | +-group_by_list= + | | | +-int32#19 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | | +-int64#20 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) | | +-output_column_list=$distinct.[int32#19, int64#20, int32#19] | +-SetOperationItem | +-scan= @@ -264,13 +272,17 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[int32#19, int64#20] + | | | +-ProjectScan + | | | +-column_list=$distinct.[int32#19, int64#20, int32#19] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | | | +-group_by_list= - | | | +-int32#19 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) - | | | +-int64#20 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[int32#19, int64#20] + | | | +-input_scan= + | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) + | | | +-group_by_list= + | | | +-int32#19 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | | +-int64#20 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) | | +-output_column_list=$distinct.[int32#19, int64#20, int32#19] | +-SetOperationItem | +-scan= @@ -880,16 +892,20 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=[SimpleTypes.double#27, SimpleTypes.int64#20, $union_all2_cast.int32#40] - | | +-expr_list= - | | | +-int32#40 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) + | | +-column_list=[$union_all2_cast.int32#40, SimpleTypes.double#27] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[double#27, int64#20, int32#19] + | | +-column_list=[SimpleTypes.double#27, SimpleTypes.int64#20, $union_all2_cast.int32#40] + | | +-expr_list= + | | | +-int32#40 := + | | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, double#27], table=SimpleTypes, column_index_list=[0, 1, 8]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[double#27, int64#20, int32#19] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20, double#27], table=SimpleTypes, column_index_list=[0, 1, 8]) | +-output_column_list=[$union_all2_cast.int32#40, SimpleTypes.double#27] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT @@ -964,13 +980,17 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[b#3, c#4, a#5] - | | +-expr_list= - | | | +-b#3 := Literal(type=INT64, value=NULL) - | | | +-c#4 := Literal(type=INT64, value=NULL) - | | | +-a#5 := Literal(type=INT64, value=NULL) + | | +-column_list=$union_all2.[a#5, b#3] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[b#3, c#4, a#5] + | | +-expr_list= + | | | +-b#3 := Literal(type=INT64, value=NULL) + | | | +-c#4 := Literal(type=INT64, value=NULL) + | | | +-a#5 := Literal(type=INT64, value=NULL) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[a#5, b#3] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT @@ -1112,13 +1132,17 @@ QueryStmt | | +-output_column_list=SimpleTypes.[int32#1, int32#1] | +-SetOperationItem | +-scan= - | | +-AggregateScan - | | +-column_list=$distinct.[int32#37, int64#38] + | | +-ProjectScan + | | +-column_list=$distinct.[int32#37, int32#37] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - | | +-group_by_list= - | | +-int32#37 := ColumnRef(type=INT32, column=SimpleTypes.int32#19) - | | +-int64#38 := ColumnRef(type=INT64, column=SimpleTypes.int64#20) + | | +-AggregateScan + | | +-column_list=$distinct.[int32#37, int64#38] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) + | | +-group_by_list= + | | +-int32#37 := ColumnRef(type=INT32, column=SimpleTypes.int32#19) + | | +-int64#38 := ColumnRef(type=INT64, column=SimpleTypes.int64#20) | +-output_column_list=$distinct.[int32#37, int32#37] +-column_match_mode=CORRESPONDING +-column_propagation_mode=LEFT diff --git a/zetasql/analyzer/testdata/sql_builder_set_operation_strict.test b/zetasql/analyzer/testdata/sql_builder_set_operation_strict.test index b11697c27..70ed830d1 100644 --- a/zetasql/analyzer/testdata/sql_builder_set_operation_strict.test +++ b/zetasql/analyzer/testdata/sql_builder_set_operation_strict.test @@ -1,4 +1,4 @@ -[default language_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default language_features=V_1_4_CORRESPONDING_FULL] [default no_show_resolved_ast] [default show_unparsed] @@ -109,9 +109,13 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[int32#21, int64#22, int64#22] + | | +-column_list=SimpleTypes.[int64#22, int64#22, int32#21] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#21, int64#22], table=SimpleTypes, column_index_list=[0, 1]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[int32#21, int64#22, int64#22] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#21, int64#22], table=SimpleTypes, column_index_list=[0, 1]) | +-output_column_list=SimpleTypes.[int64#22, int64#22, int32#21] +-column_match_mode=CORRESPONDING @@ -234,13 +238,17 @@ QueryStmt | | +-output_column_list=SimpleTypes.[int64#2, int32#1, int32#1] | +-SetOperationItem | +-scan= - | | +-AggregateScan - | | +-column_list=$distinct.[int64#37, int32#38] + | | +-ProjectScan + | | +-column_list=$distinct.[int64#37, int32#38, int32#38] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) - | | +-group_by_list= - | | +-int64#37 := ColumnRef(type=INT64, column=SimpleTypes.int64#20) - | | +-int32#38 := ColumnRef(type=INT32, column=SimpleTypes.int32#19) + | | +-AggregateScan + | | +-column_list=$distinct.[int64#37, int32#38] + | | +-input_scan= + | | | +-TableScan(column_list=SimpleTypes.[int32#19, int64#20], table=SimpleTypes, column_index_list=[0, 1]) + | | +-group_by_list= + | | +-int64#37 := ColumnRef(type=INT64, column=SimpleTypes.int64#20) + | | +-int32#38 := ColumnRef(type=INT32, column=SimpleTypes.int32#19) | +-output_column_list=$distinct.[int64#37, int32#38, int32#38] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER @@ -655,16 +663,20 @@ QueryStmt | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=[SimpleTypes.double#27, $union_all2_cast.int32#40] - | | +-expr_list= - | | | +-int32#40 := - | | | +-Cast(INT32 -> INT64) - | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) + | | +-column_list=[$union_all2_cast.int32#40, SimpleTypes.double#27] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= | | +-ProjectScan - | | +-column_list=SimpleTypes.[double#27, int32#19] + | | +-column_list=[SimpleTypes.double#27, $union_all2_cast.int32#40] + | | +-expr_list= + | | | +-int32#40 := + | | | +-Cast(INT32 -> INT64) + | | | +-ColumnRef(type=INT32, column=SimpleTypes.int32#19) | | +-input_scan= - | | +-TableScan(column_list=SimpleTypes.[int32#19, double#27], table=SimpleTypes, column_index_list=[0, 8]) + | | +-ProjectScan + | | +-column_list=SimpleTypes.[double#27, int32#19] + | | +-input_scan= + | | +-TableScan(column_list=SimpleTypes.[int32#19, double#27], table=SimpleTypes, column_index_list=[0, 8]) | +-output_column_list=[$union_all2_cast.int32#40, SimpleTypes.double#27] +-column_match_mode=CORRESPONDING diff --git a/zetasql/analyzer/testdata/sql_table_function_inlining.test b/zetasql/analyzer/testdata/sql_table_function_inlining.test index 01e1f9bd1..3f4cc13e0 100644 --- a/zetasql/analyzer/testdata/sql_table_function_inlining.test +++ b/zetasql/analyzer/testdata/sql_table_function_inlining.test @@ -3828,11 +3828,12 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=arg_table.[key#6, value#7], with_query_name="arg_table") +-join_expr= - +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=JoinsTableArgToScannedTable.key#4) - | +-ColumnRef(type=INT64, column=arg_table.key#6) - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-ColumnRef(type=INT64, column=JoinsTableArgToScannedTable.value#5) - +-ColumnRef(type=INT64, column=arg_table.value#7) + | +-FunctionCall(ZetaSQL:$and(BOOL, repeated(1) BOOL) -> BOOL) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=JoinsTableArgToScannedTable.key#4) + | | +-ColumnRef(type=INT64, column=arg_table.key#6) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=JoinsTableArgToScannedTable.value#5) + | +-ColumnRef(type=INT64, column=arg_table.value#7) + +-has_using=TRUE diff --git a/zetasql/analyzer/testdata/sqlbuilder_corresponding.test b/zetasql/analyzer/testdata/sqlbuilder_corresponding.test index 14129d797..58feeea12 100644 --- a/zetasql/analyzer/testdata/sqlbuilder_corresponding.test +++ b/zetasql/analyzer/testdata/sqlbuilder_corresponding.test @@ -12935,30 +12935,38 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[a#3, b#4] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#3, b#4, a#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$subquery1.[a#1, b#2] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[a#3, b#4] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$subquery1.[a#1, b#2] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) | | +-output_column_list=$distinct.[a#3, b#4, a#3] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_all2.[col3#5, col1#6, col2#7] - | | +-expr_list= - | | | +-col3#5 := Literal(type=INT64, value=1) - | | | +-col1#6 := Literal(type=INT64, value=1) - | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-column_list=$union_all2.[col1#6, col2#7, col3#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_all2.[col3#5, col1#6, col2#7] + | | +-expr_list= + | | | +-col3#5 := Literal(type=INT64, value=1) + | | | +-col1#6 := Literal(type=INT64, value=1) + | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_all2.[col1#6, col2#7, col3#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER @@ -13003,30 +13011,38 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[a#3, b#4] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#3, b#4, a#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$subquery1.[a#1, b#2] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[a#3, b#4] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$subquery1.[a#1, b#2] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) | | +-output_column_list=$distinct.[a#3, b#4, a#3] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$union_distinct2.[col3#5, col1#6, col2#7] - | | +-expr_list= - | | | +-col3#5 := Literal(type=INT64, value=1) - | | | +-col1#6 := Literal(type=INT64, value=1) - | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-column_list=$union_distinct2.[col1#6, col2#7, col3#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$union_distinct2.[col3#5, col1#6, col2#7] + | | +-expr_list= + | | | +-col3#5 := Literal(type=INT64, value=1) + | | | +-col1#6 := Literal(type=INT64, value=1) + | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$union_distinct2.[col1#6, col2#7, col3#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER @@ -13071,30 +13087,38 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[a#3, b#4] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#3, b#4, a#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$subquery1.[a#1, b#2] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[a#3, b#4] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$subquery1.[a#1, b#2] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) | | +-output_column_list=$distinct.[a#3, b#4, a#3] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$except_all2.[col3#5, col1#6, col2#7] - | | +-expr_list= - | | | +-col3#5 := Literal(type=INT64, value=1) - | | | +-col1#6 := Literal(type=INT64, value=1) - | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-column_list=$except_all2.[col1#6, col2#7, col3#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$except_all2.[col3#5, col1#6, col2#7] + | | +-expr_list= + | | | +-col3#5 := Literal(type=INT64, value=1) + | | | +-col1#6 := Literal(type=INT64, value=1) + | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$except_all2.[col1#6, col2#7, col3#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER @@ -13139,30 +13163,38 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[a#3, b#4] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#3, b#4, a#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$subquery1.[a#1, b#2] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[a#3, b#4] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$subquery1.[a#1, b#2] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) | | +-output_column_list=$distinct.[a#3, b#4, a#3] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$except_distinct2.[col3#5, col1#6, col2#7] - | | +-expr_list= - | | | +-col3#5 := Literal(type=INT64, value=1) - | | | +-col1#6 := Literal(type=INT64, value=1) - | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-column_list=$except_distinct2.[col1#6, col2#7, col3#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$except_distinct2.[col3#5, col1#6, col2#7] + | | +-expr_list= + | | | +-col3#5 := Literal(type=INT64, value=1) + | | | +-col1#6 := Literal(type=INT64, value=1) + | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$except_distinct2.[col1#6, col2#7, col3#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER @@ -13207,30 +13239,38 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[a#3, b#4] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#3, b#4, a#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$subquery1.[a#1, b#2] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[a#3, b#4] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$subquery1.[a#1, b#2] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) | | +-output_column_list=$distinct.[a#3, b#4, a#3] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$intersect_all2.[col3#5, col1#6, col2#7] - | | +-expr_list= - | | | +-col3#5 := Literal(type=INT64, value=1) - | | | +-col1#6 := Literal(type=INT64, value=1) - | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-column_list=$intersect_all2.[col1#6, col2#7, col3#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$intersect_all2.[col3#5, col1#6, col2#7] + | | +-expr_list= + | | | +-col3#5 := Literal(type=INT64, value=1) + | | | +-col1#6 := Literal(type=INT64, value=1) + | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$intersect_all2.[col1#6, col2#7, col3#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER @@ -13275,30 +13315,38 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[a#3, b#4] + | | | +-ProjectScan + | | | +-column_list=$distinct.[a#3, b#4, a#3] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-ProjectScan - | | | | +-column_list=$subquery1.[a#1, b#2] - | | | | +-expr_list= - | | | | | +-a#1 := Literal(type=INT64, value=1) - | | | | | +-b#2 := Literal(type=INT64, value=2) - | | | | +-input_scan= - | | | | +-SingleRowScan - | | | +-group_by_list= - | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) - | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) + | | | +-AggregateScan + | | | +-column_list=$distinct.[a#3, b#4] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=$subquery1.[a#1, b#2] + | | | | +-expr_list= + | | | | | +-a#1 := Literal(type=INT64, value=1) + | | | | | +-b#2 := Literal(type=INT64, value=2) + | | | | +-input_scan= + | | | | +-SingleRowScan + | | | +-group_by_list= + | | | +-a#3 := ColumnRef(type=INT64, column=$subquery1.a#1) + | | | +-b#4 := ColumnRef(type=INT64, column=$subquery1.b#2) | | +-output_column_list=$distinct.[a#3, b#4, a#3] | +-SetOperationItem | +-scan= | | +-ProjectScan - | | +-column_list=$intersect_distinct2.[col3#5, col1#6, col2#7] - | | +-expr_list= - | | | +-col3#5 := Literal(type=INT64, value=1) - | | | +-col1#6 := Literal(type=INT64, value=1) - | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-column_list=$intersect_distinct2.[col1#6, col2#7, col3#5] + | | +-node_source="resolver_set_operation_corresponding" | | +-input_scan= - | | +-SingleRowScan + | | +-ProjectScan + | | +-column_list=$intersect_distinct2.[col3#5, col1#6, col2#7] + | | +-expr_list= + | | | +-col3#5 := Literal(type=INT64, value=1) + | | | +-col1#6 := Literal(type=INT64, value=1) + | | | +-col2#7 := Literal(type=INT64, value=2) + | | +-input_scan= + | | +-SingleRowScan | +-output_column_list=$intersect_distinct2.[col1#6, col2#7, col3#5] +-column_match_mode=CORRESPONDING +-column_propagation_mode=INNER @@ -13745,13 +13793,17 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[int64#19, int32#20] + | | | +-ProjectScan + | | | +-column_list=$distinct.[int32#20, int64#19] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | | | +-group_by_list= - | | | +-int64#19 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | | | +-int32#20 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | | +-AggregateScan + | | | +-column_list=$distinct.[int64#19, int32#20] + | | | +-input_scan= + | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) + | | | +-group_by_list= + | | | +-int64#19 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | | +-int32#20 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) | | +-output_column_list=$distinct.[int32#20, int64#19] | +-SetOperationItem | +-scan= @@ -13815,13 +13867,17 @@ QueryStmt +-input_item_list= | +-SetOperationItem | | +-scan= - | | | +-AggregateScan - | | | +-column_list=$distinct.[int64#19, int32#20] + | | | +-ProjectScan + | | | +-column_list=$distinct.[int32#20, int64#19] + | | | +-node_source="resolver_set_operation_corresponding" | | | +-input_scan= - | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) - | | | +-group_by_list= - | | | +-int64#19 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) - | | | +-int32#20 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) + | | | +-AggregateScan + | | | +-column_list=$distinct.[int64#19, int32#20] + | | | +-input_scan= + | | | | +-TableScan(column_list=SimpleTypes.[int32#1, int64#2], table=SimpleTypes, column_index_list=[0, 1]) + | | | +-group_by_list= + | | | +-int64#19 := ColumnRef(type=INT64, column=SimpleTypes.int64#2) + | | | +-int32#20 := ColumnRef(type=INT32, column=SimpleTypes.int32#1) | | +-output_column_list=$distinct.[int32#20, int64#19] | +-SetOperationItem | +-scan= diff --git a/zetasql/analyzer/testdata/standalone_expr.test b/zetasql/analyzer/testdata/standalone_expr.test index 2e1a85a00..c483c07da 100644 --- a/zetasql/analyzer/testdata/standalone_expr.test +++ b/zetasql/analyzer/testdata/standalone_expr.test @@ -430,16 +430,16 @@ KeyValue.Key # of the SELECT list itself) and because it outputs multiple columns. (select 1 x).* -- -ERROR: Syntax error: Expected end of input but got "." [at 1:13] +ERROR: Syntax error: Unexpected "*" [at 1:14] (select 1 x).* - ^ + ^ == @test_param_proto.* -- -ERROR: Syntax error: Expected end of input but got "." [at 1:18] +ERROR: Syntax error: Unexpected "*" [at 1:19] @test_param_proto.* - ^ + ^ == [parameter_mode=positional] @@ -447,9 +447,9 @@ ERROR: Syntax error: Expected end of input but got "." [at 1:18] ?.* -- -ERROR: Syntax error: Expected end of input but got "." [at 1:2] +ERROR: Syntax error: Unexpected "*" [at 1:3] ?.* - ^ + ^ == * diff --git a/zetasql/analyzer/testdata/struct_braced_constructors.test b/zetasql/analyzer/testdata/struct_braced_constructors.test new file mode 100644 index 000000000..c46d7dfe4 --- /dev/null +++ b/zetasql/analyzer/testdata/struct_braced_constructors.test @@ -0,0 +1,1667 @@ +[default language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_1_CAST_DIFFERENT_ARRAY_TYPES,V_1_2_PROTO_EXTENSIONS_WITH_NEW,V_1_1_WITH_ON_SUBQUERY,V_1_3_WITH_RECURSIVE,V_1_4_STRUCT_BRACED_CONSTRUCTORS] + +[language_features=] +SELECT STRUCT {} +-- +ERROR: Braced constructors are not supported [at 1:15] +SELECT STRUCT {} + ^ +== + +# Error +SELECT {} +-- +ERROR: Unable to infer a type for braced constructor [at 1:8] +SELECT {} + ^ +== + + +[show_unparsed] +SELECT STRUCT {} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT<>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT<>, value={}) + +-input_scan= + +-SingleRowScan + +[UNPARSED_SQL] +SELECT + STRUCT< > () AS a_1; +== + +# Error using braced constructors without an inferred type. +SELECT { abc: 1 } +-- +ERROR: Unable to infer a type for braced constructor [at 1:8] +SELECT { abc: 1 } + ^ +== + +# Match examples from the designdoc: http://shortn/_OYQOTcX2AM +# Pattern 1: explicit type +# Positions and names matched +SELECT STRUCT{a:1, b:"foo"} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT, value={a:1, b:"foo"}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Positions and names matched, INT64 coerces to DOUBLE. +SELECT STRUCT{a:1, b:"foo"} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT, value={a:1, b:"foo"}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Error: Positions mismatched, names are matched. +SELECT STRUCT{b:"foo", a:1} +-- +ERROR: Require naming match but field name does not match at position 0: 'a' vs 'b' [at 1:34] +SELECT STRUCT{b:"foo", a:1} + ^ +== + +# Error: Positions mismatched, skipping "a". +SELECT STRUCT{b:"foo"} +-- +ERROR: STRUCT type has 2 fields but constructor call has 1 fields [at 1:8] +SELECT STRUCT{b:"foo"} + ^ +== + +# Error: Positions are matched, names mismatched. +SELECT STRUCT{b:1} +-- +ERROR: Require naming match but field name does not match at position 0: 'a' vs 'b' [at 1:24] +SELECT STRUCT{b:1} + ^ +== + +# Error: Positions are matched, names missing. +SELECT STRUCT{a:1} +-- +ERROR: Require naming match but field name does not match at position 0: '' vs 'a' [at 1:22] +SELECT STRUCT{a:1} + ^ +== + +# Pattern 2: Bare STRUCT +# struct_system_variable type: STRUCT> +# OK +SET @@struct_system_variable = STRUCT{c:1, d: {a:2, b:'foo'}} +-- +AssignmentStmt ++-target= +| +-SystemVariable(struct_system_variable, type=STRUCT>) ++-expr= + +-Literal(type=STRUCT>, value={c:1, d:{a:2, b:"foo"}}) +== + +# struct_system_variable type: STRUCT> +# Error:Setting struct system variables with imcompatible type. +SET @@struct_system_variable = STRUCT{c:1, d: 2} +-- +ERROR: Expected type STRUCT>; found STRUCT [at 1:32] +SET @@struct_system_variable = STRUCT{c:1, d: 2} + ^ +== + +# Pattern 3: Bare braces + +# Error: no expected type +SELECT {b: "foo"} +-- +ERROR: Unable to infer a type for braced constructor [at 1:8] +SELECT {b: "foo"} + ^ +== + +# struct_system_variable type: STRUCT> +# OK +SET @@struct_system_variable = {c:1, d: {a:2, b:'foo'}} +-- +AssignmentStmt ++-target= +| +-SystemVariable(struct_system_variable, type=STRUCT>) ++-expr= + +-Literal(type=STRUCT>, value={c:1, d:{a:2, b:"foo"}}) +== + +# struct_system_variable type: STRUCT> +# Error: Name mismatch +SET @@struct_system_variable = {c:1, d: {a:2, e:'foo'}} +-- +ERROR: Require naming match but field name does not match at position 1: 'b' vs 'e' [at 1:47] +SET @@struct_system_variable = {c:1, d: {a:2, e:'foo'}} + ^ +== + +# struct_system_variable type: STRUCT> +# Error: Name missing +SET @@struct_system_variable = {c:1} +-- +ERROR: Require naming match but field num does not match, expected: 2, actual: 1 [at 1:32] +SET @@struct_system_variable = {c:1} + ^ +== + +# Error: STRUCT should not use proto extension. +SELECT STRUCT {(a.b): 1} +-- +ERROR: STRUCT Braced constructor is not allowed to use proto extension. [at 1:16] +SELECT STRUCT {(a.b): 1} + ^ +== + +# Compatible test. +# New syntax wraps old syntax. +SELECT STRUCT { + value: STRUCT ("foo", "bar") +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={value:{"foo", "bar"}}) + +-input_scan= + +-SingleRowScan +== + +# old syntax wraps new syntax +SELECT STRUCT ( + STRUCT {foo: "foo", bar: "bar"} AS value +) +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={value:{foo:"foo", bar:"bar"}}) + +-input_scan= + +-SingleRowScan +== + +# Edge case: Duplicated field names are OK. +SELECT STRUCT{a:1, a:"foo"} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT, value={a:1, a:"foo"}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# A simple field + string array example. +SELECT STRUCT { + int32_val2: 5, + str_value: ["abc", "def"] +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={int32_val2:5, str_value:["abc", "def"]}) + +-input_scan= + +-SingleRowScan +== + +# An integer array example. +SELECT STRUCT { + int64_key_1: 1, + int64_key_2: 2, + repeated_int32_val: [1, 2] +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={int64_key_1:1, int64_key_2:2, repeated_int32_val:[1, 2]}) + +-input_scan= + +-SingleRowScan +== + +SELECT STRUCT { + int64_key_1: 1, + int64_key_2: 2, + nested_value: STRUCT { + nested_int64: 10 + } +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={int64_key_1:1, int64_key_2:2, nested_value:{nested_int64:10}}) + +-input_scan= + +-SingleRowScan +== + +# Example using expression for the leaf values. +SELECT STRUCT { + int32_val1: coalesce(4), + int32_val2: cast(4 as uint64) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-MakeStruct + | +-type=STRUCT + | +-field_list= + | +-FunctionCall(ZetaSQL:coalesce(repeated(1) INT64) -> INT64) + | | +-Literal(type=INT64, value=4) + | +-Literal(type=UINT64, value=4, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Example using a sub-query. +SELECT STRUCT { + int32_val1: (SELECT key FROM TestTable) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-MakeStruct + | +-type=STRUCT + | +-field_list= + | +-SubqueryExpr + | +-type=INT32 + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[TestTable.key#1] + | +-input_scan= + | +-TableScan(column_list=[TestTable.key#1], table=TestTable, column_index_list=[0]) + +-input_scan= + +-SingleRowScan +== + +# Error:Non-scalar subquery. +SELECT STRUCT { + int32_val1: (SELECT TestTable.* FROM TestTable) +} +-- +ERROR: Scalar subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values [at 2:15] + int32_val1: (SELECT TestTable.* FROM TestTable) + ^ +== + +# Filling STRUCT fields from an external query. +SELECT STRUCT {int32_val1: t.int32_val1, + int32_val2: t.int32_val2} +from (SELECT 1 int32_val1, 2 int32_val2) t +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeStruct + | +-type=STRUCT + | +-field_list= + | +-ColumnRef(type=INT64, column=t.int32_val1#1) + | +-ColumnRef(type=INT64, column=t.int32_val2#2) + +-input_scan= + +-ProjectScan + +-column_list=t.[int32_val1#1, int32_val2#2] + +-expr_list= + | +-int32_val1#1 := Literal(type=INT64, value=1) + | +-int32_val2#2 := Literal(type=INT64, value=2) + +-input_scan= + +-SingleRowScan +== + +# Mixing with aggregation. +# ANY_VALUE is necessary here because we don't detect it is the same +# expression as shows up in GROUP BY. +SELECT STRUCT { + int64_key_1: ANY_VALUE(KitchenSink.int64_key_1), + int64_key_2: ANY_VALUE(KitchenSink.int64_key_2), + int64_val: count(*), + uint64_val: sum(length(KitchenSink.string_val)), + repeated_string_val: array_agg(KitchenSink.string_val)} +from TestTable +group by KitchenSink.int64_key_1, KitchenSink.int64_key_2 +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#11 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#11] + +-expr_list= + | +-$col1#11 := + | +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-ColumnRef(type=INT64, column=$aggregate.$agg1#4) + | +-ColumnRef(type=INT64, column=$aggregate.$agg2#5) + | +-ColumnRef(type=INT64, column=$aggregate.$agg3#6) + | +-ColumnRef(type=INT64, column=$aggregate.$agg4#7) + | +-ColumnRef(type=ARRAY, column=$aggregate.$agg5#8) + +-input_scan= + +-AggregateScan + +-column_list=$aggregate.[$agg1#4, $agg2#5, $agg3#6, $agg4#7, $agg5#8] + +-input_scan= + | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + +-group_by_list= + | +-int64_key_1#9 := + | | +-GetProtoField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=int64_key_1 + | +-int64_key_2#10 := + | +-GetProtoField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=int64_key_2 + +-aggregate_list= + +-$agg1#4 := + | +-AggregateFunctionCall(ZetaSQL:any_value(INT64) -> INT64) + | +-GetProtoField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=int64_key_1 + +-$agg2#5 := + | +-AggregateFunctionCall(ZetaSQL:any_value(INT64) -> INT64) + | +-GetProtoField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=int64_key_2 + +-$agg3#6 := AggregateFunctionCall(ZetaSQL:$count_star() -> INT64) + +-$agg4#7 := + | +-AggregateFunctionCall(ZetaSQL:sum(INT64) -> INT64) + | +-FunctionCall(ZetaSQL:length(STRING) -> INT64) + | +-GetProtoField + | +-type=STRING + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=string_val + | +-default_value="default_name" + +-$agg5#8 := + +-AggregateFunctionCall(ZetaSQL:array_agg(STRING) -> ARRAY) + +-GetProtoField + +-type=STRING + +-expr= + | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + +-field_descriptor=string_val + +-default_value="default_name" +== + +# Mixing with other expressions. +SELECT 1 + STRUCT {int32_val1: 5}.int32_val1 +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | +-Literal(type=INT64, value=1) + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-Literal(type=STRUCT, value={int32_val1:5}) + | +-field_idx=0 + +-input_scan= + +-SingleRowScan +== + +# Untyped constructor. +# type of TestStruct: STRUCT> +UPDATE SimpleTypesWithStruct SET TestStruct = { + c: 1, + d: { + a: 2, + b: "bar", + } +} +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[SimpleTypesWithStruct.TestStruct#3], table=SimpleTypesWithStruct, column_index_list=[2]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT>, column=SimpleTypesWithStruct.TestStruct#3) + +-set_value= + +-DMLValue + +-value= + +-Literal(type=STRUCT>, value={c:1, d:{a:2, b:"bar"}}) +== + +# Array constructor. +SELECT ARRAY>>[{ + str_value: ["foo", "bar"] + }, { + str_value: ["baz"] + }] +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [ARRAY>>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=ARRAY>>, value=[{str_value:["foo", "bar"]}, {str_value:["baz"]}], has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Mixed use of struct and proto +# Complex array constructor. +# STRUCT>> +UPDATE StructWithKitchenSinkTable SET t = { + a: 1, + b: [ + { + kitchen_sink: {int64_key_1: 1, int64_key_2: 2} + }, + { + kitchen_sink: {int64_key_1: 3, int64_key_2: 4} + } + ] +} +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[StructWithKitchenSinkTable.t#3], table=StructWithKitchenSinkTable, column_index_list=[2]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT>>>, column=StructWithKitchenSinkTable.t#3) + +-set_value= + +-DMLValue + +-value= + +-MakeStruct + +-type=STRUCT>>> + +-field_list= + +-Literal(type=INT64, value=1) + +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRUCT>) -> ARRAY>>) + +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + +-MakeStruct + +-type=STRUCT> + +-field_list= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=3) + +-int64_key_2 := Literal(type=INT64, value=4) +== + +# Non-array inferred type for array constructor, inferred type is ignored. +UPDATE TestTable SET KitchenSink = [{}] +WHERE TRUE; +-- +ERROR: Unable to infer a type for braced constructor [at 1:37] +UPDATE TestTable SET KitchenSink = [{}] + ^ +== + +# Nested explicit STRUCT constructor. +SELECT STRUCT>{c: 1, d: { a: 1, b:2 }} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={c:1, d:{a:1, b:2}}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Nested explicit STRUCT constructor. +# Mixed use old and new syntax. +SELECT STRUCT>(1, { a: 1, b:2 }) +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={1, {a:1, b:2}}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Errror: Nested field without explicit STRUCT constructor or STRUCT keyword. +SELECT STRUCT{c: 1, d: { a: 1, b: 2 }} +-- +ERROR: Unable to infer a type for braced constructor [at 1:24] +SELECT STRUCT{c: 1, d: { a: 1, b: 2 }} + ^ +== + +# Errror: Nested field should still have colon. +SELECT STRUCT>{c: 1, d { a: 1, b:2 }} +-- +ERROR: Struct field 2 should use colon(:) to separate field and value [at 1:60] +SELECT STRUCT>{c: 1, d { a: 1, b:2 }} + ^ +== + +# Have trailing comma is fine. +SELECT STRUCT{c: 1, d: STRUCT { a: 1, b: 2 },} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={c:1, d:{a:1, b:2}}) + +-input_scan= + +-SingleRowScan +== + +# Leading comma is not allowed. +SELECT STRUCT{,} +-- +ERROR: Syntax error: Unexpected "," [at 1:15] +SELECT STRUCT{,} + ^ +== + +# Error: should use comma instead of space for field separation. +SELECT STRUCT{c:1 d:2} +-- +ERROR: STRUCT Braced constructor is not allowed to use pure whitespace separation, please use comma instead [at 1:19] +SELECT STRUCT{c:1 d:2} + ^ +== + +# Struct constructors with type specified in nested constructor. +# type of s: STRUCT> +UPDATE StructWithKitchenSinkTable SET s = STRUCT> { + kitchen_sink: new zetasql_test__.KitchenSinkPB { + int64_key_1: 1, + int64_key_2: 1 + }, + s: STRUCT { + kitchen_sink: NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1, + int64_key_2: 1 + } + } +} +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[StructWithKitchenSinkTable.s#2], table=StructWithKitchenSinkTable, column_index_list=[1]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT, s STRUCT>>, column=StructWithKitchenSinkTable.s#2) + +-set_value= + +-DMLValue + +-value= + +-MakeStruct + +-type=STRUCT, s STRUCT>> + +-field_list= + +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=1) + +-MakeStruct + +-type=STRUCT> + +-field_list= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=1) + +-int64_key_2 := Literal(type=INT64, value=1) +== + +# Struct constructors with type not specified in nested constructor, it is +# inferred from the STRUCT field definition. +# type of s: STRUCT> +UPDATE StructWithKitchenSinkTable SET s = { + kitchen_sink: { + int64_key_1: 1, + int64_key_2: 1 + }, + s: { + kitchen_sink: { + int64_key_1: 1, + int64_key_2: 1 + } + } +} +WHERE TRUE; +-- +[SAME AS PREVIOUS] +== + +# Struct of array of struct. +# type of t: STRUCT>> +UPDATE StructWithKitchenSinkTable SET t = { + a: 1, + b: [{kitchen_sink: { int64_key_1: 1 int64_key_2: 2 }}, + {kitchen_sink: { int64_key_1: 10 int64_key_2: 20 }} + ] +} +WHERE TRUE; +-- +UpdateStmt ++-table_scan= +| +-TableScan(column_list=[StructWithKitchenSinkTable.t#3], table=StructWithKitchenSinkTable, column_index_list=[2]) ++-column_access_list=WRITE ++-where_expr= +| +-Literal(type=BOOL, value=true) ++-update_item_list= + +-UpdateItem + +-target= + | +-ColumnRef(type=STRUCT>>>, column=StructWithKitchenSinkTable.t#3) + +-set_value= + +-DMLValue + +-value= + +-MakeStruct + +-type=STRUCT>>> + +-field_list= + +-Literal(type=INT64, value=1) + +-FunctionCall(ZetaSQL:$make_array(repeated(2) STRUCT>) -> ARRAY>>) + +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-MakeProto + | +-type=PROTO + | +-field_list= + | +-int64_key_1 := Literal(type=INT64, value=1) + | +-int64_key_2 := Literal(type=INT64, value=2) + +-MakeStruct + +-type=STRUCT> + +-field_list= + +-MakeProto + +-type=PROTO + +-field_list= + +-int64_key_1 := Literal(type=INT64, value=10) + +-int64_key_2 := Literal(type=INT64, value=20) +== + +## Inferred type is different from actual. +UPDATE StructWithKitchenSinkTable SET t = { + a: 1, + b: 'foo' +} +WHERE TRUE; +-- +ERROR: Value of type STRUCT cannot be assigned to t, which has type STRUCT>> [at 1:43] +UPDATE StructWithKitchenSinkTable SET t = { + ^ +== + +# Infer the type of submessages in REPLACE_FIELDS. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_3_REPLACE_FIELDS,V_1_4_STRUCT_BRACED_CONSTRUCTORS] +SELECT + REPLACE_FIELDS(STRUCT {a: 1, b: STRUCT {c: 1}}, + { c:2 } AS b) +FROM TestTable +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-ReplaceField + | +-type=STRUCT> + | +-expr= + | | +-Literal(type=STRUCT>, value={a:1, b:{c:1}}) + | +-replace_field_item_list= + | +-ReplaceFieldItem + | +-expr= + | | +-Literal(type=STRUCT, value={c:2}) + | +-struct_index_path=[1] + +-input_scan= + +-TableScan(table=TestTable) +== + +# Infer the type of the lhs when trying to CAST braced constructors. +SELECT CAST( { a: 1 } AS STRUCT); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT, value={a:1}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# CAST works with arrays of protos as well. +SELECT CAST( [{ nested_int64: 10 }, { nested_int64: 20 }] AS ARRAY>); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [ARRAY>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=ARRAY>, value=[{nested_int64:10}, {nested_int64:20}], has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Test CAST behaviors. + +# Mismatched by name. +SELECT CAST( { a: 1 } AS STRUCT); +-- +ERROR: Require naming match but field name does not match at position 0: '' vs 'a' [at 1:16] +SELECT CAST( { a: 1 } AS STRUCT); + ^ +== + +# Mismatched, fewer fields. +SELECT CAST( { a: 1 } AS STRUCT); +-- +ERROR: Require naming match but field num does not match, expected: 2, actual: 1 [at 1:14] +SELECT CAST( { a: 1 } AS STRUCT); + ^ +== + +# Matched by ignorecase. +SELECT CAST( { a: 1 } AS STRUCT); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT, value={A:1}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Matched for nested. +SELECT CAST( { a: 1, b: {c: 'foo'}} AS STRUCT>); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={a:1, b:{c:"foo"}}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Mismatched for nested field. +SELECT CAST( { a: 1, b: {c: 'foo'}} AS STRUCT>); +-- +ERROR: Require naming match but field name does not match at position 0: 'd' vs 'c' [at 1:26] +SELECT CAST( { a: 1, b: {c: 'foo'}} AS STRUCT>); + ^ +== + +# Mismatched for missing nested field. +SELECT CAST( { a: 1, b: {c: 'foo'}} AS STRUCT>); +-- +ERROR: Require naming match but field name does not match at position 0: '' vs 'c' [at 1:26] +SELECT CAST( { a: 1, b: {c: 'foo'}} AS STRUCT>); + ^ +== + +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS,V_1_2_GENERATED_COLUMNS] +[no_enable_literal_replacement] +CREATE TABLE T ( + IntColumn INT32, + StructColumn STRUCT AS ({a:1, b:'foo'}) +) +-- +CreateTableStmt ++-name_path=T ++-column_definition_list= + +-ColumnDefinition(name="IntColumn", type=INT32, column=T.IntColumn#1) + +-ColumnDefinition + +-name="StructColumn" + +-type=STRUCT + +-column=T.StructColumn#2 + +-generated_column_info= + +-GeneratedColumnInfo + +-expression= + +-Literal(type=STRUCT, value={a:1, b:"foo"}) +== + +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS,V_1_2_GENERATED_COLUMNS] +[no_enable_literal_replacement] +CREATE TABLE T ( + IntColumn INT32, + StructColumn STRUCT AS ({a:IntColumn, b:'foo'}) +) +-- +CreateTableStmt ++-name_path=T ++-column_definition_list= + +-ColumnDefinition(name="IntColumn", type=INT32, column=T.IntColumn#1) + +-ColumnDefinition + +-name="StructColumn" + +-type=STRUCT + +-column=T.StructColumn#2 + +-generated_column_info= + +-GeneratedColumnInfo + +-expression= + +-MakeStruct + +-type=STRUCT + +-field_list= + +-ColumnRef(type=INT32, column=T.IntColumn#1) + +-Literal(type=STRING, value="foo") +== + +# Braced constructor types inferred in default column value. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS,V_1_3_COLUMN_DEFAULT_VALUE] +[no_enable_literal_replacement] +CREATE TABLE T ( + IntColumn INT32, + StructColumn STRUCT DEFAULT ({a:1, b:'foo'}) +) +-- +CreateTableStmt ++-name_path=T ++-column_definition_list= + +-ColumnDefinition(name="IntColumn", type=INT32, column=T.IntColumn#1) + +-ColumnDefinition + +-name="StructColumn" + +-type=STRUCT + +-column=T.StructColumn#2 + +-default_value= + +-ColumnDefaultValue + +-expression= + | +-Literal(type=STRUCT, value={a:1, b:"foo"}) + +-sql="{a:1, b:'foo'}" +== + +# Need to do coercion with generated column. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS,V_1_2_GENERATED_COLUMNS] +[no_enable_literal_replacement] +CREATE TABLE T ( + IntColumn INT32, + StructColumn STRUCT AS ({a:IntColumn, b:'foo'}) +) +-- +CreateTableStmt ++-name_path=T ++-column_definition_list= + +-ColumnDefinition(name="IntColumn", type=INT32, column=T.IntColumn#1) + +-ColumnDefinition + +-name="StructColumn" + +-type=STRUCT + +-column=T.StructColumn#2 + +-generated_column_info= + +-GeneratedColumnInfo + +-expression= + +-MakeStruct + +-type=STRUCT + +-field_list= + +-Cast(INT32 -> INT64) + | +-ColumnRef(type=INT32, column=T.IntColumn#1) + +-Literal(type=STRING, value="foo") +== + +# Braced constructor type inferred in SQL function body. +CREATE FUNCTION myfunc ( ) RETURNS STRUCT AS ({a: 3, b: 'foo'}); +-- +CreateFunctionStmt ++-name_path=myfunc ++-has_explicit_return_type=TRUE ++-return_type=STRUCT ++-signature=() -> STRUCT ++-language="SQL" ++-code="{a: 3, b: 'foo'}" ++-function_expression= + +-Literal(type=STRUCT, value={a:3, b:"foo"}) +== + +# Braced constructor type without a return type in a SQL function body is an error. +CREATE FUNCTION myfunc ( ) AS ({a: 3, b: 'foo'}); +-- +ERROR: Unable to infer a type for braced constructor [at 1:33] +CREATE FUNCTION myfunc ( ) AS ({a: 3, b: 'foo'}); + ^ +== + +## Braced constructor type inferred in aggregate SQL function body. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS,CREATE_AGGREGATE_FUNCTION,TEMPLATE_FUNCTIONS] +CREATE AGGREGATE FUNCTION myfunc ( ) RETURNS STRUCT AS ({int32_val1: 3, int32_val2: 5}); +-- +CreateFunctionStmt ++-name_path=myfunc ++-has_explicit_return_type=TRUE ++-return_type=STRUCT ++-signature=() -> STRUCT ++-is_aggregate=TRUE ++-language="SQL" ++-code="{int32_val1: 3, int32_val2: 5}" ++-function_expression= + +-Literal(type=STRUCT, value={int32_val1:3, int32_val2:5}) +== + +# Braced constructor type without a return type in an aggregate SQL function body is an error. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS,CREATE_AGGREGATE_FUNCTION,TEMPLATE_FUNCTIONS] +CREATE AGGREGATE FUNCTION myfunc ( ) AS ({int32_val1: 3 int32_val2: 5}); +-- +ERROR: Unable to infer a type for braced constructor [at 1:43] +CREATE AGGREGATE FUNCTION myfunc ( ) AS ({int32_val1: 3 int32_val2: 5}); + ^ +== + +# Inferring through a scalar subquery works for array and non-array types. +SELECT STRUCT, nested_repeated_value ARRAY>> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT { nested_int64: 5 }), + nested_repeated_value: (SELECT ARRAY[{ nested_int64: 6 },{ nested_int64: 7 }]) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [STRUCT, nested_repeated_value ARRAY>>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeStruct + | +-type=STRUCT, nested_repeated_value ARRAY>> + | +-field_list= + | +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + | +-Literal(type=INT64, value=2, has_explicit_type=TRUE) + | +-SubqueryExpr + | | +-type=STRUCT + | | +-subquery_type=SCALAR + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#1] + | | +-expr_list= + | | | +-$col1#1 := Literal(type=STRUCT, value={nested_int64:5}) + | | +-input_scan= + | | +-SingleRowScan + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := Literal(type=ARRAY>, value=[{nested_int64:6}, {nested_int64:7}]) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through a scalar subquery with the wrong type. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT 1) +} +-- +ERROR: Struct field 3 has type INT64 which does not coerce to STRUCT [at 4:17] + nested_value: (SELECT 1) + ^ +== + +# Inferring through a scalar subquery with the wrong protocol buffer type. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT { nested: 5 }) +} +-- +ERROR: Require naming match but field name does not match at position 0: 'nested_int64' vs 'nested' [at 4:27] + nested_value: (SELECT { nested: 5 }) + ^ +== + +# Inferring through a scalar subquery with GROUP BY. Note this does not fail on +# type inference and if grouping by proto is supported will work. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT { nested_int64: 5 } FROM TestTable GROUP BY 1) +} +-- +ERROR: Grouping by expressions of type STRUCT is not allowed [at 4:69] + nested_value: (SELECT { nested_int64: 5 } FROM TestTable GROUP BY 1) + ^ +== + +# Recursively inferring through a scalar subquery works. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT (SELECT { nested_int64: 5 })) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + | +-Literal(type=INT64, value=2, has_explicit_type=TRUE) + | +-SubqueryExpr + | +-type=STRUCT + | +-subquery_type=SCALAR + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := + | | +-SubqueryExpr + | | +-type=STRUCT + | | +-subquery_type=SCALAR + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$expr_subquery.$col1#1] + | | +-expr_list= + | | | +-$col1#1 := Literal(type=STRUCT, value={nested_int64:5}) + | | +-input_scan= + | | +-SingleRowScan + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through an array subquery works. +SELECT STRUCT>> { + int64_key_1: 1, + int64_key_2: 2, + nested_repeated_value: ARRAY( + SELECT {nested_int64: x} FROM UNNEST(GENERATE_ARRAY(1, 2)) x + ) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [STRUCT>>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeStruct + | +-type=STRUCT>> + | +-field_list= + | +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + | +-Literal(type=INT64, value=2, has_explicit_type=TRUE) + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.x#1) + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.x#1] + | +-array_expr_list= + | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=INT64, value=2) + | +-element_column_list=[$array.x#1] + +-input_scan= + +-SingleRowScan +== + +# Inferring from the LHS to the RHS of an IN subquery works. +SELECT STRUCT { + int64_key_1: 1, + int64_key_2: 2, +} IN (SELECT {int64_key_1: 1, int64_key_2: 2} ) +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#2 AS `$col1` [BOOL] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#2] + +-expr_list= + | +-$col1#2 := + | +-SubqueryExpr + | +-type=BOOL + | +-subquery_type=IN + | +-in_expr= + | | +-Literal(type=STRUCT, value={int64_key_1:1, int64_key_2:2}, has_explicit_type=TRUE) + | +-subquery= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#1] + | +-expr_list= + | | +-$col1#1 := Literal(type=STRUCT, value={int64_key_1:1, int64_key_2:2}) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Subquery type does not match inferred type. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT 'foo') +} +-- +ERROR: Struct field 3 has type STRING which does not coerce to STRUCT [at 4:17] + nested_value: (SELECT 'foo') + ^ +== + +# Inferred type is not an array for an array subquery. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: ARRAY(SELECT { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:30] + nested_value: ARRAY(SELECT { nested_int64: 5 }) + ^ +== + +# Not inferring through a EXISTS query. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: EXISTS(SELECT { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:31] + nested_value: EXISTS(SELECT { nested_int64: 5 }) + ^ +== + +# Not inferring through a SELECT AS STRUCT query. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT AS STRUCT { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:35] + nested_value: (SELECT AS STRUCT { nested_int64: 5 }) + ^ +== + +# Not inferring through a SELECT AS PROTO query because of differing syntax. +SELECT NEW zetasql_test__.KitchenSinkPB { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (SELECT AS zetasql_test__.KitchenSinkPB { nested_int64: 5 }) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:57] + nested_value: (SELECT AS zetasql_test__.KitchenSinkPB { nested_int64: 5 }) + ^ +== + +# Inference workds inside bracket constructor. +SELECT STRUCT> ( + 1, + 2, + { nested_int64: 5 } +) +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={int64_key_1:1, int64_key_2:2, nested_value:{nested_int64:5}}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Inferring through a subquery using WITH does inference only on the returned +# SELECT column and not any select in the WITH. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (WITH Foo AS (SELECT { nested_int64: 5 } AS x) SELECT Foo.x) +} +-- +ERROR: Unable to infer a type for braced constructor [at 4:38] + nested_value: (WITH Foo AS (SELECT { nested_int64: 5 } AS x) SELECT Foo.x) + ^ +== + +# Inferring through a subquery using WITH working on first non-WITH SELECT +# column. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (WITH Foo AS (SELECT "foo" AS x) SELECT { nested_int64: 5 }) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#3 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#3] + +-expr_list= + | +-$col1#3 := + | +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + | +-Literal(type=INT64, value=2, has_explicit_type=TRUE) + | +-SubqueryExpr + | +-type=STRUCT + | +-subquery_type=SCALAR + | +-subquery= + | +-WithScan + | +-column_list=[$expr_subquery.$col1#2] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="Foo" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[Foo.x#1] + | | +-expr_list= + | | | +-x#1 := Literal(type=STRING, value="foo") + | | +-input_scan= + | | +-SingleRowScan + | +-query= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#2] + | +-expr_list= + | | +-$col1#2 := Literal(type=STRUCT, value={nested_int64:5}) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through multiple WITH clauses. +SELECT STRUCT> { + int64_key_1: 1, + int64_key_2: 2, + nested_value: (WITH Foo AS (SELECT "foo" AS x) (WITH Bar AS (SELECT "bar" AS y) SELECT { nested_int64: 5 })) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-MakeStruct + | +-type=STRUCT> + | +-field_list= + | +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + | +-Literal(type=INT64, value=2, has_explicit_type=TRUE) + | +-SubqueryExpr + | +-type=STRUCT + | +-subquery_type=SCALAR + | +-subquery= + | +-WithScan + | +-column_list=[$expr_subquery.$col1#3] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="Foo" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[Foo.x#1] + | | +-expr_list= + | | | +-x#1 := Literal(type=STRING, value="foo") + | | +-input_scan= + | | +-SingleRowScan + | +-query= + | +-WithScan + | +-column_list=[$expr_subquery.$col1#3] + | +-with_entry_list= + | | +-WithEntry + | | +-with_query_name="Bar" + | | +-with_subquery= + | | +-ProjectScan + | | +-column_list=[Bar.y#2] + | | +-expr_list= + | | | +-y#2 := Literal(type=STRING, value="bar") + | | +-input_scan= + | | +-SingleRowScan + | +-query= + | +-ProjectScan + | +-column_list=[$expr_subquery.$col1#3] + | +-expr_list= + | | +-$col1#3 := Literal(type=STRUCT, value={nested_int64:5}) + | +-input_scan= + | +-SingleRowScan + +-input_scan= + +-SingleRowScan +== + +# Inferring through a subquery which has UNION ALL. +SELECT STRUCT>> { + int64_key_1: 1, + int64_key_2: 2, + nested_repeated_value: ARRAY(SELECT { nested_int64: 5 } UNION ALL + SELECT { nested_int64: 6 }) +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#4 AS `$col1` [STRUCT>>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#4] + +-expr_list= + | +-$col1#4 := + | +-MakeStruct + | +-type=STRUCT>> + | +-field_list= + | +-Literal(type=INT64, value=1, has_explicit_type=TRUE) + | +-Literal(type=INT64, value=2, has_explicit_type=TRUE) + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-subquery= + | +-SetOperationScan + | +-column_list=[$union_all.$col1#3] + | +-op_type=UNION_ALL + | +-input_item_list= + | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[$union_all1.$col1#1] + | | | +-expr_list= + | | | | +-$col1#1 := Literal(type=STRUCT, value={nested_int64:5}) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=[$union_all1.$col1#1] + | +-SetOperationItem + | +-scan= + | | +-ProjectScan + | | +-column_list=[$union_all2.$col1#2] + | | +-expr_list= + | | | +-$col1#2 := Literal(type=STRUCT, value={nested_int64:6}) + | | +-input_scan= + | | +-SingleRowScan + | +-output_column_list=[$union_all2.$col1#2] + +-input_scan= + +-SingleRowScan +== + +# The analyzer allows setting required fields to NULL, but the engine will +# give an error. +SELECT STRUCT { + int64_key_1: null, + int64_key_2: null, +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT, value={int64_key_1:NULL, int64_key_2:NULL}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# The analyzer allows setting values of repeated fields to NULL, but the engine +# will give an error. +SELECT STRUCT> { + int64_key_1: 10, + int64_key_2: 20, + repeated_int64_val: [1, 2, NULL, 4] +} +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRUCT>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=STRUCT>, value={int64_key_1:10, int64_key_2:20, repeated_int64_val:[1, 2, NULL, 4]}, has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Error using braced constructors without an inferred type. +SELECT ARRAY[{ str_value: ["foo", "bar"] }, { str_value: ["baz"] }] +-- +ERROR: Unable to infer a type for braced constructor [at 1:14] +SELECT ARRAY[{ str_value: ["foo", "bar"] }, { str_value: ["baz"] }] + ^ +== + +# Using STRUCT can work. +SELECT ARRAY[STRUCT{ str_value: ["foo", "bar"] }, STRUCT{ str_value: ["baz"] }] +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [ARRAY>>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=ARRAY>>, value=[{str_value:["foo", "bar"]}, {str_value:["baz"]}]) + +-input_scan= + +-SingleRowScan +== + +# Or with explicit type. +SELECT ARRAY>>[{ str_value: ["foo", "bar"] }, { str_value: ["baz"] }] +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [ARRAY>>] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := Literal(type=ARRAY>>, value=[{str_value:["foo", "bar"]}, {str_value:["baz"]}], has_explicit_type=TRUE) + +-input_scan= + +-SingleRowScan +== + +# Similar to b/259000660. +# TVF using braced constructor should also work. +[language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS,TABLE_VALUED_FUNCTIONS,CREATE_TABLE_FUNCTION,TEMPLATE_FUNCTIONS] +WITH + T AS ( + SELECT CAST(v as INT32) v + FROM UNNEST(GENERATE_ARRAY(2, 12)) AS v + ) +SELECT * +FROM templated_struct_braced_ctor_tvf(TABLE T) +ORDER BY dice_roll.int32_val1; +-- +QueryStmt ++-output_column_list= +| +-templated_struct_braced_ctor_tvf.dice_roll#4 AS dice_roll [STRUCT] ++-query= + +-WithScan + +-column_list=[templated_struct_braced_ctor_tvf.dice_roll#4] + +-is_ordered=TRUE + +-with_entry_list= + | +-WithEntry + | +-with_query_name="T" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[T.v#2] + | +-expr_list= + | | +-v#2 := + | | +-Cast(INT64 -> INT32) + | | +-ColumnRef(type=INT64, column=$array.v#1) + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.v#1] + | +-array_expr_list= + | | +-FunctionCall(ZetaSQL:generate_array(INT64, INT64, optional(0) INT64) -> ARRAY) + | | +-Literal(type=INT64, value=2) + | | +-Literal(type=INT64, value=12) + | +-element_column_list=[$array.v#1] + +-query= + +-OrderByScan + +-column_list=[templated_struct_braced_ctor_tvf.dice_roll#4] + +-is_ordered=TRUE + +-input_scan= + | +-ProjectScan + | +-column_list=[templated_struct_braced_ctor_tvf.dice_roll#4, $orderby.$orderbycol1#5] + | +-expr_list= + | | +-$orderbycol1#5 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=templated_struct_braced_ctor_tvf.dice_roll#4) + | | +-field_idx=0 + | +-input_scan= + | +-TVFScan + | +-column_list=[templated_struct_braced_ctor_tvf.dice_roll#4] + | +-tvf=templated_struct_braced_ctor_tvf((ANY TABLE) -> ANY TABLE) + | +-signature=(TABLE) -> TABLE> + | +-argument_list= + | | +-FunctionArgument + | | +-scan= + | | | +-WithRefScan(column_list=[T.v#3], with_query_name="T") + | | +-argument_column_list=[T.v#3] + | +-column_index_list=[0] + +-order_by_item_list= + +-OrderByItem + +-column_ref= + +-ColumnRef(type=INT32, column=$orderby.$orderbycol1#5) + +With Templated SQL TVF signature: + templated_struct_braced_ctor_tvf(TABLE) -> TABLE> +containing resolved templated query: +QueryStmt ++-output_column_list= +| +-$query.dice_roll#2 AS dice_roll [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.dice_roll#2] + +-expr_list= + | +-dice_roll#2 := + | +-MakeStruct + | +-type=STRUCT + | +-field_list= + | +-ColumnRef(type=INT32, column=T.v#1) + +-input_scan= + +-RelationArgumentScan(column_list=[T.v#1], name="T") diff --git a/zetasql/analyzer/testdata/struct_construction.test b/zetasql/analyzer/testdata/struct_construction.test index f285f6366..e306cf540 100644 --- a/zetasql/analyzer/testdata/struct_construction.test +++ b/zetasql/analyzer/testdata/struct_construction.test @@ -665,9 +665,9 @@ QueryStmt select struct(abc AS def) FROM (select 1 abc) -- -ERROR: STRUCT constructors cannot specify both an explicit type and field names with AS [at 1:30] +ERROR: STRUCT constructors cannot specify both an explicit type and field names with AS [at 1:33] select struct(abc AS def) - ^ + ^ == # A field name is inferred for the third field only. diff --git a/zetasql/analyzer/testdata/tablesample.test b/zetasql/analyzer/testdata/tablesample.test index dee6aa0e5..496542db6 100644 --- a/zetasql/analyzer/testdata/tablesample.test +++ b/zetasql/analyzer/testdata/tablesample.test @@ -538,9 +538,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=KeyValue.[Key#7, Value#8], table=KeyValue, column_index_list=[0, 1], alias="kv2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#5) - | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | +-has_using=TRUE +-method="random3" +-size= | +-Literal(type=INT64, value=10) @@ -973,9 +974,10 @@ QueryStmt | +-right_scan= | | +-TableScan(column_list=KeyValue.[Key#7, Value#8], table=KeyValue, column_index_list=[0, 1], alias="kv2") | +-join_expr= - | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | +-ColumnRef(type=INT64, column=KeyValue.Key#5) - | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#5) + | | +-ColumnRef(type=INT64, column=KeyValue.Key#7) + | +-has_using=TRUE +-method="reservoir" +-size= | +-Literal(type=INT64, value=10) @@ -1458,9 +1460,10 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=t2.[Key#10, Value#11, weight#12], with_query_name="t2") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) - +-ColumnRef(type=DOUBLE, column=t1.weight#9) - +-ColumnRef(type=DOUBLE, column=t2.weight#12) + | +-FunctionCall(ZetaSQL:$equal(DOUBLE, DOUBLE) -> BOOL) + | +-ColumnRef(type=DOUBLE, column=t1.weight#9) + | +-ColumnRef(type=DOUBLE, column=t2.weight#12) + +-has_using=TRUE == with diff --git a/zetasql/analyzer/testdata/templated_sql_function.test b/zetasql/analyzer/testdata/templated_sql_function.test index 92a09b86b..952482b4f 100644 --- a/zetasql/analyzer/testdata/templated_sql_function.test +++ b/zetasql/analyzer/testdata/templated_sql_function.test @@ -680,6 +680,476 @@ FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) +-ArgumentRef(parse_location=4-5, type=INT64, name="x", argument_kind=AGGREGATE) == +# Regression test for b/316178328. +# +# The UDF is defined like +# CREATE FUNCTION udf_any_and_double_args_return_any( +# a ANY TYPE, x DOUBLE) AS (IF(x < 0, 'a', 'b')); +# +# The NULL value for argument 'x' in the second call should be converted to +# DOUBLE explicitly, or otherwise the '<' in the templated SQL body would become +# a comparison between two INT64s. +[language_features={{|TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS}}] +SELECT + udf_any_and_double_args_return_any(1, 314), + udf_any_and_double_args_return_any(1, null); +-- +ALTERNATION GROUP: +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRING] +| +-$query.$col2#2 AS `$col2` [STRING] ++-query= + +-ProjectScan + +-column_list=$query.[$col1#1, $col2#2] + +-expr_list= + | +-$col1#1 := + | | +-FunctionCall(Templated_SQL_Function:udf_any_and_double_args_return_any(INT64, DOUBLE) -> STRING) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=DOUBLE, value=314) + | +-$col2#2 := + | +-FunctionCall(Templated_SQL_Function:udf_any_and_double_args_return_any(INT64, DOUBLE) -> STRING) + | +-Literal(type=INT64, value=1) + | +-Literal(type=DOUBLE, value=NULL) + +-input_scan= + +-SingleRowScan + +With Templated SQL function call: + Templated_SQL_Function:udf_any_and_double_args_return_any(INT64, DOUBLE) -> STRING +containing resolved templated expression: +FunctionCall(ZetaSQL:if(BOOL, STRING, STRING) -> STRING) ++-FunctionCall(ZetaSQL:$less(DOUBLE, DOUBLE) -> BOOL) +| +-ArgumentRef(type=DOUBLE, name="x") +| +-Literal(type=DOUBLE, value=0) ++-Literal(type=STRING, value="a") ++-Literal(type=STRING, value="b") + +With Templated SQL function call: + Templated_SQL_Function:udf_any_and_double_args_return_any(INT64, DOUBLE) -> STRING +containing resolved templated expression: +FunctionCall(ZetaSQL:if(BOOL, STRING, STRING) -> STRING) ++-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) +| +-ArgumentRef(type=INT64, name="x") +| +-Literal(type=INT64, value=0) ++-Literal(type=STRING, value="a") ++-Literal(type=STRING, value="b") +-- +ALTERNATION GROUP: TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRING] +| +-$query.$col2#2 AS `$col2` [STRING] ++-query= + +-ProjectScan + +-column_list=$query.[$col1#1, $col2#2] + +-expr_list= + | +-$col1#1 := + | | +-FunctionCall(Templated_SQL_Function:udf_any_and_double_args_return_any(INT64, DOUBLE) -> STRING) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=DOUBLE, value=314) + | +-$col2#2 := + | +-FunctionCall(Templated_SQL_Function:udf_any_and_double_args_return_any(INT64, DOUBLE) -> STRING) + | +-Literal(type=INT64, value=1) + | +-Literal(type=DOUBLE, value=NULL) + +-input_scan= + +-SingleRowScan + +With Templated SQL function call: + Templated_SQL_Function:udf_any_and_double_args_return_any(INT64, DOUBLE) -> STRING +containing resolved templated expression: +FunctionCall(ZetaSQL:if(BOOL, STRING, STRING) -> STRING) ++-FunctionCall(ZetaSQL:$less(DOUBLE, DOUBLE) -> BOOL) +| +-ArgumentRef(type=DOUBLE, name="x") +| +-Literal(type=DOUBLE, value=0) ++-Literal(type=STRING, value="a") ++-Literal(type=STRING, value="b") +== + +# Regression test for b/316178328. +# +# The UDF is defined like +# CREATE FUNCTION udf_any_and_double_array_args_return_any( +# a ANY TYPE, x ARRAY) AS (IF(x[SAFE_OFFSET(0)] < 0, 'a', 'b')); +# +# The NULL and [] values for argument 'x' in the 2nd and 3rd calls should be +# converted to DOUBLE array explicitly. +# Before the fix, the type of 'x' in the 2nd and 3rd templated SQL function call +# bodies would be ARRAY which is the default type of an untyped NULL or +# an empty array. +[language_features={{|TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS}}] +SELECT + udf_any_and_double_array_args_return_any(1, [314.0]), + udf_any_and_double_array_args_return_any(1, null), + udf_any_and_double_array_args_return_any(1, []); +-- +ALTERNATION GROUP: +-- +ERROR: Invalid function udf_any_and_double_array_args_return_any [at 3:3] + udf_any_and_double_array_args_return_any(1, null), + ^ +Analysis of function Templated_SQL_Function:udf_any_and_double_array_args_return_any failed [at 1:4] +IF(x[SAFE_OFFSET(0)] < 0, 'a', 'b') + ^ +Function calls with SAFE are not supported [at 1:4] +IF(x[SAFE_OFFSET(0)] < 0, 'a', 'b') + ^ +-- +ALTERNATION GROUP: TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRING] +| +-$query.$col2#2 AS `$col2` [STRING] +| +-$query.$col3#3 AS `$col3` [STRING] ++-query= + +-ProjectScan + +-column_list=$query.[$col1#1, $col2#2, $col3#3] + +-expr_list= + | +-$col1#1 := + | | +-FunctionCall(Templated_SQL_Function:udf_any_and_double_array_args_return_any(INT64, ARRAY) -> STRING) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=ARRAY, value=[314]) + | +-$col2#2 := + | | +-FunctionCall(Templated_SQL_Function:udf_any_and_double_array_args_return_any(INT64, ARRAY) -> STRING) + | | +-Literal(type=INT64, value=1) + | | +-Literal(type=ARRAY, value=NULL) + | +-$col3#3 := + | +-FunctionCall(Templated_SQL_Function:udf_any_and_double_array_args_return_any(INT64, ARRAY) -> STRING) + | +-Literal(type=INT64, value=1) + | +-Literal(type=ARRAY, value=[]) + +-input_scan= + +-SingleRowScan + +With Templated SQL function call: + Templated_SQL_Function:udf_any_and_double_array_args_return_any(INT64, ARRAY) -> STRING +containing resolved templated expression: +FunctionCall(ZetaSQL:if(BOOL, STRING, STRING) -> STRING) ++-FunctionCall(ZetaSQL:$less(DOUBLE, DOUBLE) -> BOOL) +| +-FunctionCall(ZetaSQL:$safe_array_at_offset(ARRAY, INT64) -> DOUBLE) +| | +-ArgumentRef(type=ARRAY, name="x") +| | +-Literal(type=INT64, value=0) +| +-Literal(type=DOUBLE, value=0) ++-Literal(type=STRING, value="a") ++-Literal(type=STRING, value="b") +== + +# Regression test for b/259962379. +# +# The UDF is defined like +# CREATE FUNCTION udf_any_and_string_args_return_string_arg( +# a ANY TYPE, x DOUBLE) AS (x); +# +# Before the fix, the type of 'x' in the templated SQL function call body would +# be INT64 which is the default type of an untyped NULL. +# Cannot run with the feature disabled, or it would run into the error like +# 'Unparsed tree does not produce same result shape as the original resolved +# tree'. +[language_features=TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS] +SELECT udf_any_and_string_args_return_string_arg(TRUE, NULL); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRING] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(Templated_SQL_Function:udf_any_and_string_args_return_string_arg(BOOL, STRING) -> STRING) + | +-Literal(type=BOOL, value=true) + | +-Literal(type=STRING, value=NULL) + +-input_scan= + +-SingleRowScan + +With Templated SQL function call: + Templated_SQL_Function:udf_any_and_string_args_return_string_arg(BOOL, STRING) -> STRING +containing resolved templated expression: +ArgumentRef(type=STRING, name="x") +== + +# Regression test for b/259962379. +# +# The UDF is defined like +# CREATE FUNCTION udf_any_and_string_args_return_string_arg( +# a ANY TYPE, x DOUBLE) AS (x); +# +# Unlike the case above, here the function argument is an untyped parameter. +# This test case verifies that the untyped parameter can be handled correctly +# with and without the feature. +[language_features=TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS] +[allow_undeclared_parameters] +SELECT udf_any_and_string_args_return_string_arg(TRUE, @untyped_param); +-- +QueryStmt ++-output_column_list= +| +-$query.$col1#1 AS `$col1` [STRING] ++-query= + +-ProjectScan + +-column_list=[$query.$col1#1] + +-expr_list= + | +-$col1#1 := + | +-FunctionCall(Templated_SQL_Function:udf_any_and_string_args_return_string_arg(BOOL, STRING) -> STRING) + | +-Literal(type=BOOL, value=true) + | +-Parameter(parse_location=55-69, type=STRING, name="untyped_param") + +-input_scan= + +-SingleRowScan + +With Templated SQL function call: + Templated_SQL_Function:udf_any_and_string_args_return_string_arg(BOOL, STRING) -> STRING +containing resolved templated expression: +ArgumentRef(type=STRING, name="x") +[UNDECLARED_PARAMETERS] +untyped_param: STRING +== + +# Regression test for b/316178328. +# +# The UDA is defined like +# CREATE AGGREGATE FUNCTION uda_any_and_double_args_return_any( +# a ANY TYPE NOT AGGREGATE, x DOUBLE) AS ( +# STRING_AGG(IF(x < 0, 'a', 'b')) +# ); +# +# The NULL value for argument 'x' in the call should be converted to DOUBLE +# explicitly, or otherwise the '<' in the templated SQL body would become a +# comparison between two INT64s. +[language_features={{|TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS}}] +SELECT uda_any_and_double_args_return_any(3.1415, NULL) +FROM KeyValue +-- +ALTERNATION GROUP: +-- +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#3 AS `$col1` [STRING] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#3] + +-input_scan= + +-AggregateScan + +-column_list=[$aggregate.$agg1#3] + +-input_scan= + | +-TableScan(table=KeyValue) + +-aggregate_list= + +-$agg1#3 := + +-AggregateFunctionCall(Templated_SQL_Function:uda_any_and_double_args_return_any(DOUBLE, DOUBLE) -> STRING) + +-Literal(type=DOUBLE, value=3.1415) + +-Literal(type=DOUBLE, value=NULL) + +With Templated SQL function call: + Templated_SQL_Function:uda_any_and_double_args_return_any(DOUBLE {is_not_aggregate: true}, DOUBLE) -> STRING +containing resolved templated expression: +ColumnRef(type=STRING, column=$aggregate.$agg1#1) + + $agg1#1 := + +-AggregateFunctionCall(ZetaSQL:string_agg(STRING) -> STRING) + +-FunctionCall(ZetaSQL:if(BOOL, STRING, STRING) -> STRING) + +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | +-ArgumentRef(parse_location=14-15, type=INT64, name="x", argument_kind=AGGREGATE) + | +-Literal(type=INT64, value=0) + +-Literal(type=STRING, value="a") + +-Literal(type=STRING, value="b") +-- +ALTERNATION GROUP: TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS +-- +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#3 AS `$col1` [STRING] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#3] + +-input_scan= + +-AggregateScan + +-column_list=[$aggregate.$agg1#3] + +-input_scan= + | +-TableScan(table=KeyValue) + +-aggregate_list= + +-$agg1#3 := + +-AggregateFunctionCall(Templated_SQL_Function:uda_any_and_double_args_return_any(DOUBLE, DOUBLE) -> STRING) + +-Literal(type=DOUBLE, value=3.1415) + +-Literal(type=DOUBLE, value=NULL) + +With Templated SQL function call: + Templated_SQL_Function:uda_any_and_double_args_return_any(DOUBLE {is_not_aggregate: true}, DOUBLE) -> STRING +containing resolved templated expression: +ColumnRef(type=STRING, column=$aggregate.$agg1#1) + + $agg1#1 := + +-AggregateFunctionCall(ZetaSQL:string_agg(STRING) -> STRING) + +-FunctionCall(ZetaSQL:if(BOOL, STRING, STRING) -> STRING) + +-FunctionCall(ZetaSQL:$less(DOUBLE, DOUBLE) -> BOOL) + | +-ArgumentRef(parse_location=14-15, type=DOUBLE, name="x", argument_kind=AGGREGATE) + | +-Literal(type=DOUBLE, value=0) + +-Literal(type=STRING, value="a") + +-Literal(type=STRING, value="b") +== + +# Regression test for b/316178328. +# +# The UDA is defined like +# CREATE AGGREGATE FUNCTION uda_any_and_double_array_args_return_any( +# a ANY TYPE, x ARRAY NOT AGGREGATE) AS ( +# IF(x[SAFE_OFFSET(0)] < 0, MAX(a), MIN(a)) +# ); +# +# The NULL and [] values for argument 'x' in the 2nd and 3rd calls should be +# converted to DOUBLE array explicitly. +# Before the fix, the type of 'x' in the 2nd and 3rd templated SQL function call +# bodies would be ARRAY which is the default type of an untyped NULL or +# an empty array. +[language_features={{|TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS}}] +SELECT + uda_any_and_double_array_args_return_any(key, [314.0]), + uda_any_and_double_array_args_return_any(key, null), + uda_any_and_double_array_args_return_any(key, []) +FROM + keyValue +-- +ALTERNATION GROUP: +-- +ERROR: Invalid function uda_any_and_double_array_args_return_any [at 3:3] + uda_any_and_double_array_args_return_any(key, null), + ^ +Analysis of function Templated_SQL_Function:uda_any_and_double_array_args_return_any failed [at 1:4] +IF(x[SAFE_OFFSET(0)] < 0, MAX(a), MIN(a)) + ^ +Function calls with SAFE are not supported [at 1:4] +IF(x[SAFE_OFFSET(0)] < 0, MAX(a), MIN(a)) + ^ +-- +ALTERNATION GROUP: TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS +-- +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#3 AS `$col1` [INT64] +| +-$aggregate.$agg2#4 AS `$col2` [INT64] +| +-$aggregate.$agg3#5 AS `$col3` [INT64] ++-query= + +-ProjectScan + +-column_list=$aggregate.[$agg1#3, $agg2#4, $agg3#5] + +-input_scan= + +-AggregateScan + +-column_list=$aggregate.[$agg1#3, $agg2#4, $agg3#5] + +-input_scan= + | +-TableScan(column_list=[KeyValue.Key#1], table=KeyValue, column_index_list=[0]) + +-aggregate_list= + +-$agg1#3 := + | +-AggregateFunctionCall(Templated_SQL_Function:uda_any_and_double_array_args_return_any(INT64, ARRAY) -> INT64) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-Literal(type=ARRAY, value=[314]) + +-$agg2#4 := + | +-AggregateFunctionCall(Templated_SQL_Function:uda_any_and_double_array_args_return_any(INT64, ARRAY) -> INT64) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-Literal(type=ARRAY, value=NULL) + +-$agg3#5 := + +-AggregateFunctionCall(Templated_SQL_Function:uda_any_and_double_array_args_return_any(INT64, ARRAY) -> INT64) + +-ColumnRef(type=INT64, column=KeyValue.Key#1) + +-Literal(type=ARRAY, value=[]) + +With Templated SQL function call: + Templated_SQL_Function:uda_any_and_double_array_args_return_any(INT64, ARRAY {is_not_aggregate: true}) -> INT64 +containing resolved templated expression: +FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) ++-FunctionCall(ZetaSQL:$less(DOUBLE, DOUBLE) -> BOOL) +| +-FunctionCall(ZetaSQL:$safe_array_at_offset(ARRAY, INT64) -> DOUBLE) +| | +-ArgumentRef(type=ARRAY, name="x", argument_kind=NOT_AGGREGATE) +| | +-Literal(type=INT64, value=0) +| +-Literal(type=DOUBLE, value=0) ++-ColumnRef(type=INT64, column=$aggregate.$agg1#1) ++-ColumnRef(type=INT64, column=$aggregate.$agg2#2) + + $agg1#1 := + +-AggregateFunctionCall(ZetaSQL:max(INT64) -> INT64) + +-ArgumentRef(parse_location=30-31, type=INT64, name="a", argument_kind=AGGREGATE) + + $agg2#2 := + +-AggregateFunctionCall(ZetaSQL:min(INT64) -> INT64) + +-ArgumentRef(parse_location=38-39, type=INT64, name="a", argument_kind=AGGREGATE) +== + +# Regression test for b/259962379. +# +# The UDA is defined like +# CREATE AGGREGATE FUNCTION uda_any_and_string_args_return_string( +# a ANY TYPE, x STRING NOT AGGREGATE) AS ( +# IF(LOGICAL_OR(a), x, x || '_suffix') +# ); +# +# Before the fix, the type of 'x' in the templated SQL function call body would +# be INT64 which is the default type of an untyped NULL. +[language_features={{|TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS}}] +SELECT uda_any_and_string_args_return_string(key > 43, NULL) +FROM + keyValue +-- +ALTERNATION GROUP: +-- +ERROR: Invalid function uda_any_and_string_args_return_string [at 1:8] +SELECT uda_any_and_string_args_return_string(key > 43, NULL) + ^ +Analysis of function Templated_SQL_Function:uda_any_and_string_args_return_string failed [at 1:22] +IF(LOGICAL_OR(a), x, x || '_suffix') + ^ +No matching signature for operator || for argument types: INT64, STRING. Supported signatures: STRING || STRING; BYTES || BYTES; ARRAY || ARRAY [at 1:22] +IF(LOGICAL_OR(a), x, x || '_suffix') + ^ +-- +Signature Mismatch Details: +ERROR: Invalid function uda_any_and_string_args_return_string [at 1:8] +SELECT uda_any_and_string_args_return_string(key > 43, NULL) + ^ +Analysis of function Templated_SQL_Function:uda_any_and_string_args_return_string failed [at 1:22] +IF(LOGICAL_OR(a), x, x || '_suffix') + ^ +No matching signature for operator || + Argument types: INT64, STRING + Signature: STRING || STRING + Argument 1: Unable to coerce type INT64 to expected type STRING + Signature: BYTES || BYTES + Argument 1: Unable to coerce type INT64 to expected type BYTES + Signature: (ARRAY) || (ARRAY) + Argument 1: expected array type but found INT64 [at 1:22] +IF(LOGICAL_OR(a), x, x || '_suffix') + ^ +-- +ALTERNATION GROUP: TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS +-- +QueryStmt ++-output_column_list= +| +-$aggregate.$agg1#3 AS `$col1` [STRING] ++-query= + +-ProjectScan + +-column_list=[$aggregate.$agg1#3] + +-input_scan= + +-AggregateScan + +-column_list=[$aggregate.$agg1#3] + +-input_scan= + | +-TableScan(column_list=[KeyValue.Key#1], table=KeyValue, column_index_list=[0]) + +-aggregate_list= + +-$agg1#3 := + +-AggregateFunctionCall(Templated_SQL_Function:uda_any_and_string_args_return_string(BOOL, STRING) -> STRING) + +-FunctionCall(ZetaSQL:$greater(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=KeyValue.Key#1) + | +-Literal(type=INT64, value=43) + +-Literal(type=STRING, value=NULL) + +With Templated SQL function call: + Templated_SQL_Function:uda_any_and_string_args_return_string(BOOL, STRING {is_not_aggregate: true}) -> STRING +containing resolved templated expression: +FunctionCall(ZetaSQL:if(BOOL, STRING, STRING) -> STRING) ++-ColumnRef(type=BOOL, column=$aggregate.$agg1#1) ++-ArgumentRef(type=STRING, name="x", argument_kind=NOT_AGGREGATE) ++-FunctionCall(ZetaSQL:concat(STRING, repeated(1) STRING) -> STRING) + +-ArgumentRef(type=STRING, name="x", argument_kind=NOT_AGGREGATE) + +-Literal(type=STRING, value="_suffix") + + $agg1#1 := + +-AggregateFunctionCall(ZetaSQL:logical_or(BOOL) -> BOOL) + +-ArgumentRef(parse_location=14-15, type=BOOL, name="a", argument_kind=AGGREGATE) +== + ################################################################################ # # Negative test cases diff --git a/zetasql/analyzer/testdata/tvf_relation_args.test b/zetasql/analyzer/testdata/tvf_relation_args.test index d39fb24a3..dd6ccb770 100644 --- a/zetasql/analyzer/testdata/tvf_relation_args.test +++ b/zetasql/analyzer/testdata/tvf_relation_args.test @@ -4052,9 +4052,10 @@ QueryStmt | | +-right_scan= | | | +-WithRefScan(column_list=[w2.key#6], with_query_name="w2") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=w1.key#5) - | | +-ColumnRef(type=INT64, column=w2.key#6) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=w1.key#5) + | | | +-ColumnRef(type=INT64, column=w2.key#6) + | | +-has_using=TRUE | +-order_by_item_list= | +-OrderByItem | +-column_ref= diff --git a/zetasql/analyzer/testdata/typed_hints_and_options.test b/zetasql/analyzer/testdata/typed_hints_and_options.test index 4b4551a9d..a9d168ec0 100644 --- a/zetasql/analyzer/testdata/typed_hints_and_options.test +++ b/zetasql/analyzer/testdata/typed_hints_and_options.test @@ -884,6 +884,22 @@ CreateTableFunctionStmt +-$query.p#1 AS p [INT64] == +[language_features=ENABLE_ALTER_ARRAY_OPTIONS] +CREATE TABLE foo (a STRING OPTIONS (string_array_allow_alter_option += ['str1', 'str2'])); +-- +ERROR: Operators '+=' and '-=' are not allowed for option string_array_allow_alter_option [at 1:37] +CREATE TABLE foo (a STRING OPTIONS (string_array_allow_alter_option += ['str1... + ^ +== + +[language_features=ENABLE_ALTER_ARRAY_OPTIONS] +ALTER TABLE abTable ADD COLUMN c STRING OPTIONS (string_array_allow_alter_option += ['str1', 'str2']); +-- +ERROR: Operators '+=' and '-=' are not allowed for option string_array_allow_alter_option [at 1:50] +ALTER TABLE abTable ADD COLUMN c STRING OPTIONS (string_array_allow_alter_opt... + ^ +== + ALTER TABLE abTable ALTER COLUMN b SET OPTIONS (string_array_allow_alter_option += ['str1', 'str2']); -- ERROR: Syntax error: Invalid operator "+=" [at 1:81] diff --git a/zetasql/analyzer/testdata/unnest_multiway.test b/zetasql/analyzer/testdata/unnest_multiway.test index 022412de4..c2ef61488 100644 --- a/zetasql/analyzer/testdata/unnest_multiway.test +++ b/zetasql/analyzer/testdata/unnest_multiway.test @@ -1,89 +1,1573 @@ +# ============== New syntax does not work without language feature ========= +SELECT * +FROM UNNEST([1,2], mode => "STRICT"); +-- +ERROR: Argument `mode` is not supported [at 2:20] +FROM UNNEST([1,2], mode => "STRICT"); + ^ +== + +SELECT * +FROM UNNEST([1,2], [3,4], mode => "STRICT"); +-- +ERROR: The UNNEST operator supports exactly one argument [at 2:20] +FROM UNNEST([1,2], [3,4], mode => "STRICT"); + ^ +== + +SELECT * +FROM UNNEST([1,2] AS array_alias) +-- +ERROR: Argument alias is not supported in the UNNEST operator [at 2:19] +FROM UNNEST([1,2] AS array_alias) + ^ +== + +[default language_features=V_1_4_MULTIWAY_UNNEST,V_1_4_WITH_EXPRESSION] +[default enabled_ast_rewrites=DEFAULTS,-WITH_EXPR,+MULTIWAY_UNNEST] +# ============== UNNEST with legacy syntax ========= +SELECT * +FROM UNNEST([1,2]){{| AS e}} +-- +ALTERNATION GROUP: +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] ++-query= + +-ProjectScan + +-column_list=[$array.$unnest1#1] + +-input_scan= + +-ArrayScan + +-column_list=[$array.$unnest1#1] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + +-element_column_list=[$array.$unnest1#1] +-- +ALTERNATION GROUP: AS e +-- +QueryStmt ++-output_column_list= +| +-$array.e#1 AS e [INT64] ++-query= + +-ProjectScan + +-column_list=[$array.e#1] + +-input_scan= + +-ArrayScan + +-column_list=[$array.e#1] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + +-element_column_list=[$array.e#1] +== + +SELECT * +FROM TestTable, UNNEST(TestTable.KitchenSink.repeated_int32_val){{| AS e}} +-- +ALTERNATION GROUP: +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.$unnest1#4 AS `$unnest1` [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int32_val + | +-default_value=[] + +-element_column_list=[$array.$unnest1#4] +-- +ALTERNATION GROUP: AS e +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.e#4 AS e [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.e#4] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.e#4] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int32_val + | +-default_value=[] + +-element_column_list=[$array.e#4] +== + +# ============== UNNEST with single path expression argument ========= +[language_features=V_1_4_MULTIWAY_UNNEST{{|,V_1_4_SINGLETON_UNNEST_INFERS_ALIAS}}] +[show_unparsed] +# Allow infer alias will introduce backward compatibility break, because the +# selected column gets replaced. +WITH t AS (SELECT [1, 2, 3] AS xs) +SELECT xs +FROM t, UNNEST(t.xs); +-- +ALTERNATION GROUP: +-- +QueryStmt ++-output_column_list= +| +-t.xs#2 AS xs [ARRAY] ++-query= + +-WithScan + +-column_list=[t.xs#2] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="t" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[t.xs#1] + | +-expr_list= + | | +-xs#1 := Literal(type=ARRAY, value=[1, 2, 3]) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[t.xs#2] + +-input_scan= + +-ArrayScan + +-column_list=[t.xs#2] + +-input_scan= + | +-WithRefScan(column_list=[t.xs#2], with_query_name="t") + +-array_expr_list= + | +-ColumnRef(type=ARRAY, column=t.xs#2) + +-element_column_list=[$array.$unnest1#3] + +[UNPARSED_SQL] +WITH + t AS ( + SELECT + ARRAY< INT64 >[1, 2, 3] AS a_1 + ) +SELECT + withrefscan_2.a_1 AS xs +FROM + ( + SELECT + withrefscan_2.a_1 AS a_1 + FROM + t AS withrefscan_2 + ) AS withrefscan_2 + JOIN + UNNEST(withrefscan_2.a_1 AS a_3); +-- +ALTERNATION GROUP: ,V_1_4_SINGLETON_UNNEST_INFERS_ALIAS +-- +QueryStmt ++-output_column_list= +| +-$array.xs#3 AS xs [INT64] ++-query= + +-WithScan + +-column_list=[$array.xs#3] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="t" + | +-with_subquery= + | +-ProjectScan + | +-column_list=[t.xs#1] + | +-expr_list= + | | +-xs#1 := Literal(type=ARRAY, value=[1, 2, 3]) + | +-input_scan= + | +-SingleRowScan + +-query= + +-ProjectScan + +-column_list=[$array.xs#3] + +-input_scan= + +-ArrayScan + +-column_list=[t.xs#2, $array.xs#3] + +-input_scan= + | +-WithRefScan(column_list=[t.xs#2], with_query_name="t") + +-array_expr_list= + | +-ColumnRef(type=ARRAY, column=t.xs#2) + +-element_column_list=[$array.xs#3] + +[UNPARSED_SQL] +WITH + t AS ( + SELECT + ARRAY< INT64 >[1, 2, 3] AS a_1 + ) +SELECT + a_3 AS xs +FROM + ( + SELECT + withrefscan_2.a_1 AS a_1 + FROM + t AS withrefscan_2 + ) AS withrefscan_2 + JOIN + UNNEST(withrefscan_2.a_1 AS a_3); +== # ============== UNNEST with builtin enum mode argument ========= -# Array zip mode is not implemented. SELECT * FROM UNNEST([1,2], mode => {{"STRICT"|"TRUNCATE"|"PAD"}}); -- ALTERNATION GROUP: "STRICT" -- -ERROR: The named argument `mode` used in UNNEST is not implemented [at 2:20] +ERROR: Argument `mode` is not allowed when UNNEST only has one array argument [at 2:20] FROM UNNEST([1,2], mode => "STRICT"); ^ -- ALTERNATION GROUP: "TRUNCATE" -- -ERROR: The named argument `mode` used in UNNEST is not implemented [at 2:20] +ERROR: Argument `mode` is not allowed when UNNEST only has one array argument [at 2:20] FROM UNNEST([1,2], mode => "TRUNCATE"); ^ -- ALTERNATION GROUP: "PAD" -- -ERROR: The named argument `mode` used in UNNEST is not implemented [at 2:20] +ERROR: Argument `mode` is not allowed when UNNEST only has one array argument [at 2:20] FROM UNNEST([1,2], mode => "PAD"); ^ == -# Named arguments other than `mode` are not supported. +# Named argument with a name other than `mode` is not supported. SELECT * FROM UNNEST([1, 2], unsupported_named_argument => "PAD"); -- -ERROR: Unsupported named argument `unsupported_named_argument` in UNNEST [at 2:21] +ERROR: Unsupported named argument `unsupported_named_argument` in UNNEST; use `mode` instead [at 2:21] FROM UNNEST([1, 2], unsupported_named_argument => "PAD"); ^ == -# ============== UNNEST with array argument alias ========= -# Column alias is not implemented: single column. +# `mode` argument only works if it can be coerced to ARRAY_ZIP_MODE enum. +# Currently only STRING and INT64 literals are allowed. And they will be +# unparsed to be wrapped with CAST. +[show_unparsed] SELECT * -FROM UNNEST([1,2] AS array_alias) +FROM UNNEST([1,2], [2], mode => {{1|0.05|"STRICT"}}) -- -ERROR: Argument alias in UNNEST in FROM clause is not implemented [at 2:19] -FROM UNNEST([1,2] AS array_alias) - ^ +ALTERNATION GROUP: 1 +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + | +-Literal(type=ARRAY, value=[2]) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[UNPARSED_SQL] +SELECT + a_1 AS a_1, + a_2 AS a_2 +FROM + UNNEST(ARRAY< INT64 >[1, 2] AS a_1, ARRAY< INT64 >[2] AS a_2, mode => CAST("PAD" AS ARRAY_ZIP_MODE)); + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest2#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +[UNPARSED_SQL] +SELECT + projectscan_22.a_20 AS a_20, + projectscan_22.a_21 AS a_21 +FROM + ( + SELECT + a_19.arr0 AS a_20, + a_19.arr1 AS a_21 + FROM + UNNEST(WITH(a_1 AS ARRAY< INT64 >[1, 2], a_15 AS `IF`((a_1) IS NULL, 0, ARRAY_LENGTH(a_1)), a_5 AS ARRAY< + INT64 >[2], a_16 AS `IF`((a_5) IS NULL, 0, ARRAY_LENGTH(a_5)), a_17 AS `IF`(CAST("PAD" AS ARRAY_ZIP_MODE) IS NULL, + ERROR("UNNEST does not allow NULL mode argument"), CAST("PAD" AS ARRAY_ZIP_MODE)), a_18 AS `IF`((a_17 = + CAST("STRICT" AS ARRAY_ZIP_MODE)) AND ((LEAST(a_15, a_16)) != (GREATEST(a_15, a_16))), ERROR("Unnested arrays under STRICT mode must have equal lengths"), + CAST(NULL AS INT64)), a_11 AS `IF`(a_17 = CAST("TRUNCATE" AS ARRAY_ZIP_MODE), LEAST(a_15, a_16), GREATEST(a_15, + a_16)), ARRAY( + SELECT + STRUCT< arr0 INT64, arr1 INT64, offset INT64 > (orderbyscan_13.a_2, orderbyscan_13.a_6, orderbyscan_13.a_9) AS a_14 + FROM + ( + SELECT + filterscan_12.a_2 AS a_2, + filterscan_12.a_3 AS a_3, + filterscan_12.a_6 AS a_6, + filterscan_12.a_7 AS a_7, + filterscan_12.a_9 AS a_9 + FROM + ( + SELECT + projectscan_10.a_2 AS a_2, + projectscan_10.a_3 AS a_3, + projectscan_10.a_6 AS a_6, + projectscan_10.a_7 AS a_7, + projectscan_10.a_9 AS a_9 + FROM + ( + SELECT + arrayscan_4.a_2 AS a_2, + arrayscan_4.a_3 AS a_3, + arrayscan_8.a_6 AS a_6, + arrayscan_8.a_7 AS a_7, + COALESCE(arrayscan_4.a_3, arrayscan_8.a_7) AS a_9 + FROM + ( + SELECT + a_2 AS a_2, + a_3 AS a_3 + FROM + UNNEST(a_1 AS a_2) WITH OFFSET AS a_3 + ) AS arrayscan_4 + FULL JOIN + ( + SELECT + a_6 AS a_6, + a_7 AS a_7 + FROM + UNNEST(a_5 AS a_6) WITH OFFSET AS a_7 + ) AS arrayscan_8 + ON (arrayscan_4.a_3) = (arrayscan_8.a_7) + ) AS projectscan_10 + WHERE + (projectscan_10.a_9) < a_11 + ) AS filterscan_12 + ORDER BY filterscan_12.a_9 + ) AS orderbyscan_13 + )) AS a_19) + ) AS projectscan_22; +-- +ALTERNATION GROUP: 0.05 +-- +ERROR: Named argument `mode` used in UNNEST should have type ARRAY_ZIP_MODE, but got type DOUBLE [at 2:33] +FROM UNNEST([1,2], [2], mode => 0.05) + ^ +-- +ALTERNATION GROUP: "STRICT" +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + | +-Literal(type=ARRAY, value=[2]) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=STRICT) + +[UNPARSED_SQL] +SELECT + a_1 AS a_1, + a_2 AS a_2 +FROM + UNNEST(ARRAY< INT64 >[1, 2] AS a_1, ARRAY< INT64 >[2] AS a_2, mode => CAST("STRICT" AS ARRAY_ZIP_MODE)); + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest2#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=STRICT) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=STRICT) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +[UNPARSED_SQL] +SELECT + projectscan_22.a_20 AS a_20, + projectscan_22.a_21 AS a_21 +FROM + ( + SELECT + a_19.arr0 AS a_20, + a_19.arr1 AS a_21 + FROM + UNNEST(WITH(a_1 AS ARRAY< INT64 >[1, 2], a_15 AS `IF`((a_1) IS NULL, 0, ARRAY_LENGTH(a_1)), a_5 AS ARRAY< + INT64 >[2], a_16 AS `IF`((a_5) IS NULL, 0, ARRAY_LENGTH(a_5)), a_17 AS `IF`(CAST("STRICT" AS ARRAY_ZIP_MODE) IS NULL, + ERROR("UNNEST does not allow NULL mode argument"), CAST("STRICT" AS ARRAY_ZIP_MODE)), a_18 AS `IF`((a_17 = + CAST("STRICT" AS ARRAY_ZIP_MODE)) AND ((LEAST(a_15, a_16)) != (GREATEST(a_15, a_16))), ERROR("Unnested arrays under STRICT mode must have equal lengths"), + CAST(NULL AS INT64)), a_11 AS `IF`(a_17 = CAST("TRUNCATE" AS ARRAY_ZIP_MODE), LEAST(a_15, a_16), GREATEST(a_15, + a_16)), ARRAY( + SELECT + STRUCT< arr0 INT64, arr1 INT64, offset INT64 > (orderbyscan_13.a_2, orderbyscan_13.a_6, orderbyscan_13.a_9) AS a_14 + FROM + ( + SELECT + filterscan_12.a_2 AS a_2, + filterscan_12.a_3 AS a_3, + filterscan_12.a_6 AS a_6, + filterscan_12.a_7 AS a_7, + filterscan_12.a_9 AS a_9 + FROM + ( + SELECT + projectscan_10.a_2 AS a_2, + projectscan_10.a_3 AS a_3, + projectscan_10.a_6 AS a_6, + projectscan_10.a_7 AS a_7, + projectscan_10.a_9 AS a_9 + FROM + ( + SELECT + arrayscan_4.a_2 AS a_2, + arrayscan_4.a_3 AS a_3, + arrayscan_8.a_6 AS a_6, + arrayscan_8.a_7 AS a_7, + COALESCE(arrayscan_4.a_3, arrayscan_8.a_7) AS a_9 + FROM + ( + SELECT + a_2 AS a_2, + a_3 AS a_3 + FROM + UNNEST(a_1 AS a_2) WITH OFFSET AS a_3 + ) AS arrayscan_4 + FULL JOIN + ( + SELECT + a_6 AS a_6, + a_7 AS a_7 + FROM + UNNEST(a_5 AS a_6) WITH OFFSET AS a_7 + ) AS arrayscan_8 + ON (arrayscan_4.a_3) = (arrayscan_8.a_7) + ) AS projectscan_10 + WHERE + (projectscan_10.a_9) < a_11 + ) AS filterscan_12 + ORDER BY filterscan_12.a_9 + ) AS orderbyscan_13 + )) AS a_19) + ) AS projectscan_22; == -# Table alias is not allowed when the expression in UNNEST has alias. +# `mode` argument that is a non-literal only works if it has ARRAY_ZIP_MODE as +# an explicit type. SELECT * -FROM UNNEST([1, 2] AS col_alias) AS table_alias +FROM UNNEST([1,2], [2], mode => {{CAST("STRICT" AS ARRAY_ZIP_MODE)|CAST(1 AS INT32)}}); -- -ERROR: Table alias in UNNEST in FROM clause is not allowed when arguments in UNNEST have alias [at 2:34] -FROM UNNEST([1, 2] AS col_alias) AS table_alias - ^ +ALTERNATION GROUP: CAST("STRICT" AS ARRAY_ZIP_MODE) +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + | +-Literal(type=ARRAY, value=[2]) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_zip_mode= + +-Cast(STRING -> ENUM) + +-Literal(type=STRING, value="STRICT") + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest2#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Cast(STRING -> ENUM) + | | | | +-Literal(type=STRING, value="STRICT") + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Cast(STRING -> ENUM) + | | | +-Literal(type=STRING, value="STRICT") + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +-- +ALTERNATION GROUP: CAST(1 AS INT32) +-- +ERROR: Named argument `mode` used in UNNEST should have type ARRAY_ZIP_MODE, but got type INT32 [at 2:33] +FROM UNNEST([1,2], [2], mode => CAST(1 AS INT32)); + ^ == -# ============== UNNEST with multiple array arguments ========= -# Multiple expressions in UNNEST are not implemented. +# `mode` argument that is a non-literal STRING or non-literal INT doesn't work. SELECT * -FROM UNNEST([1, 2, 3], [2, 3]) +FROM UNNEST([1,2], [2], mode => {{(SELECT "STRICT")|CONCAT("STR", "ICT")|(0 + 1)}}) -- -ERROR: Multiple arguments in UNNEST in FROM clause is not implemented [at 2:24] -FROM UNNEST([1, 2, 3], [2, 3]) - ^ +ALTERNATION GROUP: (SELECT "STRICT") +-- +ERROR: Named argument `mode` used in UNNEST should have type ARRAY_ZIP_MODE, but got type STRING [at 2:33] +FROM UNNEST([1,2], [2], mode => (SELECT "STRICT")) + ^ +-- +ALTERNATION GROUP: CONCAT("STR", "ICT") +-- +ERROR: Named argument `mode` used in UNNEST should have type ARRAY_ZIP_MODE, but got type STRING [at 2:33] +FROM UNNEST([1,2], [2], mode => CONCAT("STR", "ICT")) + ^ +-- +ALTERNATION GROUP: (0 + 1) +-- +ERROR: Named argument `mode` used in UNNEST should have type ARRAY_ZIP_MODE, but got type INT64 [at 2:34] +FROM UNNEST([1,2], [2], mode => (0 + 1)) + ^ == -# Multiple columns with and without alias are not allowed. +# NULL `mode` argument without explicit type will be coerced to ENUM type. SELECT * -FROM UNNEST([1, 2, 3] AS literal_array, TestTable.KitchenSink.repeated_int32_val) +FROM UNNEST([1,2], [2], mode => {{NULL|CAST(NULL AS ARRAY_ZIP_MODE)}}) -- -ERROR: Multiple arguments in UNNEST in FROM clause is not implemented [at 2:41] -FROM UNNEST([1, 2, 3] AS literal_array, TestTable.KitchenSink.repeated_int32_... - ^ +ALTERNATION GROUP: NULL +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + | +-Literal(type=ARRAY, value=[2]) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=NULL) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest2#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=NULL) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=NULL) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +-- +ALTERNATION GROUP: CAST(NULL AS ARRAY_ZIP_MODE) +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + | +-Literal(type=ARRAY, value=[2]) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=NULL, has_explicit_type=TRUE) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest2#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=NULL, has_explicit_type=TRUE) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=NULL, has_explicit_type=TRUE) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] == -# Table alias is not allowed when UNNEST contains multiple expressions. +# `mode` argument works with other existing syntax in UNNEST operator. +[show_unparsed] SELECT * -FROM UNNEST([1,2], [3,4]) AS table_alias +FROM UNNEST([1, 2], [1, 2], mode => "STRICT") WITH OFFSET off; -- -ERROR: Table alias in UNNEST in FROM clause is not allowed when UNNEST contains multiple arguments [at 2:27] -FROM UNNEST([1,2], [3,4]) AS table_alias - ^ +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] +| +-$array_offset.off#3 AS off [INT64] ++-query= + +-ProjectScan + +-column_list=[$array.$unnest1#1, $array.$unnest2#2, $array_offset.off#3] + +-input_scan= + +-ArrayScan + +-column_list=[$array.$unnest1#1, $array.$unnest2#2, $array_offset.off#3] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + | +-Literal(type=ARRAY, value=[1, 2]) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_offset_column= + | +-ColumnHolder(column=$array_offset.off#3) + +-array_zip_mode= + +-Literal(type=ENUM, value=STRICT) + +[UNPARSED_SQL] +SELECT + a_1 AS a_1, + a_2 AS a_2, + a_3 AS off +FROM + UNNEST(ARRAY< INT64 >[1, 2] AS a_1, ARRAY< INT64 >[1, 2] AS a_2, mode => CAST("STRICT" AS ARRAY_ZIP_MODE)) WITH OFFSET AS a_3; + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] +| +-$array_offset.off#3 AS off [INT64] ++-query= + +-ProjectScan + +-column_list=[$array.$unnest1#1, $array.$unnest2#2, $array_offset.off#3] + +-input_scan= + +-ProjectScan + +-column_list=[$array.$unnest1#1, $array.$unnest2#2, $array_offset.off#3] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#17) + | | +-field_idx=0 + | +-$unnest2#2 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#17) + | | +-field_idx=1 + | +-off#3 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#17) + | +-field_idx=2 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#17] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#4 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr0_len#5 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#4) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#4) + | | +-arr1#6 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr1_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#6) + | | +-mode#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=STRICT) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=STRICT) + | | +-strict_check#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#8) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#5) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#7) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#5) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#7) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#10 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#8) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#5) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#7) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#5) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#7) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#4) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#6) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#10) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#16] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#16 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#11) + | | +-ColumnRef(type=INT64, column=$array.arr1#13) + | | +-ColumnRef(type=INT64, column=$full_join.offset#15) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#11, $array_offset.offset#12, $array.arr1#13, $array_offset.offset#14, $full_join.offset#15] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#11, $array_offset.offset#12, $array.arr1#13, $array_offset.offset#14, $full_join.offset#15] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#11, $array_offset.offset#12, $array.arr1#13, $array_offset.offset#14, $full_join.offset#15] + | | | +-expr_list= + | | | | +-offset#15 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#12) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#11, $array_offset.offset#12, $array.arr1#13, $array_offset.offset#14] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#11, $array_offset.offset#12] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#4, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#11] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#12) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#12) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#15) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#10, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#15) + +-element_column_list=[$array.$with_expr_element#17] +[UNPARSED_SQL] +SELECT + projectscan_23.a_20 AS a_20, + projectscan_23.a_21 AS a_21, + projectscan_23.a_22 AS off +FROM + ( + SELECT + a_19.arr0 AS a_20, + a_19.arr1 AS a_21, + a_19.offset AS a_22 + FROM + UNNEST(WITH(a_1 AS ARRAY< INT64 >[1, 2], a_15 AS `IF`((a_1) IS NULL, 0, ARRAY_LENGTH(a_1)), a_5 AS ARRAY< + INT64 >[1, 2], a_16 AS `IF`((a_5) IS NULL, 0, ARRAY_LENGTH(a_5)), a_17 AS `IF`(CAST("STRICT" AS ARRAY_ZIP_MODE) IS NULL, + ERROR("UNNEST does not allow NULL mode argument"), CAST("STRICT" AS ARRAY_ZIP_MODE)), a_18 AS `IF`((a_17 = + CAST("STRICT" AS ARRAY_ZIP_MODE)) AND ((LEAST(a_15, a_16)) != (GREATEST(a_15, a_16))), ERROR("Unnested arrays under STRICT mode must have equal lengths"), + CAST(NULL AS INT64)), a_11 AS `IF`(a_17 = CAST("TRUNCATE" AS ARRAY_ZIP_MODE), LEAST(a_15, a_16), GREATEST(a_15, + a_16)), ARRAY( + SELECT + STRUCT< arr0 INT64, arr1 INT64, offset INT64 > (orderbyscan_13.a_2, orderbyscan_13.a_6, orderbyscan_13.a_9) AS a_14 + FROM + ( + SELECT + filterscan_12.a_2 AS a_2, + filterscan_12.a_3 AS a_3, + filterscan_12.a_6 AS a_6, + filterscan_12.a_7 AS a_7, + filterscan_12.a_9 AS a_9 + FROM + ( + SELECT + projectscan_10.a_2 AS a_2, + projectscan_10.a_3 AS a_3, + projectscan_10.a_6 AS a_6, + projectscan_10.a_7 AS a_7, + projectscan_10.a_9 AS a_9 + FROM + ( + SELECT + arrayscan_4.a_2 AS a_2, + arrayscan_4.a_3 AS a_3, + arrayscan_8.a_6 AS a_6, + arrayscan_8.a_7 AS a_7, + COALESCE(arrayscan_4.a_3, arrayscan_8.a_7) AS a_9 + FROM + ( + SELECT + a_2 AS a_2, + a_3 AS a_3 + FROM + UNNEST(a_1 AS a_2) WITH OFFSET AS a_3 + ) AS arrayscan_4 + FULL JOIN + ( + SELECT + a_6 AS a_6, + a_7 AS a_7 + FROM + UNNEST(a_5 AS a_6) WITH OFFSET AS a_7 + ) AS arrayscan_8 + ON (arrayscan_4.a_3) = (arrayscan_8.a_7) + ) AS projectscan_10 + WHERE + (projectscan_10.a_9) < a_11 + ) AS filterscan_12 + ORDER BY filterscan_12.a_9 + ) AS orderbyscan_13 + )) AS a_19) + ) AS projectscan_23; == +# ============== UNNEST with table alias or argument alias ========= # Table alias is not allowed when UNNEST has multiple expressions with aliases. SELECT * FROM UNNEST([1, 2] AS col_alias, [2, 3] AS another_col_alias) AS table_alias -- -ERROR: Table alias in UNNEST in FROM clause is not allowed when UNNEST contains multiple arguments [at 2:63] +ERROR: When 2 or more array arguments are supplied to UNNEST, aliases for the element columns must be specified following the argument inside the parenthesis [at 2:63] FROM UNNEST([1, 2] AS col_alias, [2, 3] AS another_col_alias) AS table_alias ^ == @@ -93,8 +1577,5609 @@ FROM UNNEST([1, 2] AS col_alias, [2, 3] AS another_col_alias) AS table_alias SELECT * FROM UNNEST([1, 2] AS col_alias, [2, 3]) AS table_alias -- -ERROR: Table alias in UNNEST in FROM clause is not allowed when UNNEST contains multiple arguments [at 2:42] +ERROR: When 2 or more array arguments are supplied to UNNEST, aliases for the element columns must be specified following the argument inside the parenthesis [at 2:42] FROM UNNEST([1, 2] AS col_alias, [2, 3]) AS table_alias ^ == +# Table alias is not allowed when UNNEST has multiple expressions, none of them +# has alias. +SELECT * +FROM UNNEST([1, 2], [2, 3]) AS table_alias +-- +ERROR: When 2 or more array arguments are supplied to UNNEST, aliases for the element columns must be specified following the argument inside the parenthesis [at 2:29] +FROM UNNEST([1, 2], [2, 3]) AS table_alias + ^ +== + +# Table alias is not allowed when UNNEST argument has alias. +SELECT * +FROM UNNEST([1, 2] AS col_alias) AS table_alias +-- +ERROR: Alias outside UNNEST is not allowed when the argument inside the parenthesis has alias [at 2:34] +FROM UNNEST([1, 2] AS col_alias) AS table_alias + ^ +== + +# Column alias in single argument. +[show_unparsed] +SELECT * +FROM UNNEST([1,2] AS array_alias) +-- +QueryStmt ++-output_column_list= +| +-$array.array_alias#1 AS array_alias [INT64] ++-query= + +-ProjectScan + +-column_list=[$array.array_alias#1] + +-input_scan= + +-ArrayScan + +-column_list=[$array.array_alias#1] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + +-element_column_list=[$array.array_alias#1] + +[UNPARSED_SQL] +SELECT + a_1 AS array_alias +FROM + UNNEST(ARRAY< INT64 >[1, 2] AS a_1); +== + +# ============== UNNEST with multiple array arguments ========= +# 2 arguments: NULL literal does not coerce to array, so it is only accepted if +# it has explicit type. +SELECT * +FROM UNNEST({{NULL, NULL|CAST(NULL AS ARRAY), CAST(NULL AS ARRAY)}}) +-- +ALTERNATION GROUP: NULL, NULL +-- +ERROR: Values referenced in UNNEST must be arrays. UNNEST contains expression of type INT64 [at 2:13] +FROM UNNEST(NULL, NULL) + ^ +-- +ALTERNATION GROUP: CAST(NULL AS ARRAY), CAST(NULL AS ARRAY) +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [STRING] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=NULL, has_explicit_type=TRUE) + | +-Literal(type=ARRAY, value=NULL, has_explicit_type=TRUE) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [STRING] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest2#2 := + | +-GetStructField + | +-type=STRING + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=NULL, has_explicit_type=TRUE) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=NULL, has_explicit_type=TRUE) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=STRING, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +== + +# 2 arguments: literal expression without any alias. +SELECT * +FROM UNNEST([1, 2, 3], [2, 3]) +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-Literal(type=ARRAY, value=[2, 3]) + +-element_column_list=$array.[$unnest1#1, $unnest2#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.$unnest2#2 AS `$unnest2` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, $unnest2#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest2#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2, 3]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +== + +# 2 arguments: literal expression with alias, literal expression without any alias. +SELECT * +FROM UNNEST([1, 2, 3] AS literal_array, [2, 3]) +-- +QueryStmt ++-output_column_list= +| +-$array.literal_array#1 AS literal_array [INT64] +| +-$array.$unnest1#2 AS `$unnest1` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[literal_array#1, $unnest1#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[literal_array#1, $unnest1#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-Literal(type=ARRAY, value=[2, 3]) + +-element_column_list=$array.[literal_array#1, $unnest1#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.literal_array#1 AS literal_array [INT64] +| +-$array.$unnest1#2 AS `$unnest1` [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[literal_array#1, $unnest1#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[literal_array#1, $unnest1#2] + +-expr_list= + | +-literal_array#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-$unnest1#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2, 3]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +== + +# 2 arguments: literal expression without alias, literal expression with alias. +SELECT * +FROM UNNEST([1, 2, 3], [2, 3] AS literal_array) +-- +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.literal_array#2 AS literal_array [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, literal_array#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[$unnest1#1, literal_array#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-Literal(type=ARRAY, value=[2, 3]) + +-element_column_list=$array.[$unnest1#1, literal_array#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.$unnest1#1 AS `$unnest1` [INT64] +| +-$array.literal_array#2 AS literal_array [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[$unnest1#1, literal_array#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[$unnest1#1, literal_array#2] + +-expr_list= + | +-$unnest1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-literal_array#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2, 3]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +== + +# 2 arguments: literal expression with aliases. +SELECT * +FROM UNNEST([1, 2, 3] AS array1, [2, 3] AS array2) +-- +QueryStmt ++-output_column_list= +| +-$array.array1#1 AS array1 [INT64] +| +-$array.array2#2 AS array2 [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[array1#1, array2#2] + +-input_scan= + +-ArrayScan + +-column_list=$array.[array1#1, array2#2] + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-Literal(type=ARRAY, value=[2, 3]) + +-element_column_list=$array.[array1#1, array2#2] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$array.array1#1 AS array1 [INT64] +| +-$array.array2#2 AS array2 [INT64] ++-query= + +-ProjectScan + +-column_list=$array.[array1#1, array2#2] + +-input_scan= + +-ProjectScan + +-column_list=$array.[array1#1, array2#2] + +-expr_list= + | +-array1#1 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | | +-field_idx=0 + | +-array2#2 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#16) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[$array.$with_expr_element#16] + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#3 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#4 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-arr1#5 := Literal(type=ARRAY, value=[2, 3]) + | | +-arr1_len#6 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-mode#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#9 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#7) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#4) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#6) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#15] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#15 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#10) + | | +-ColumnRef(type=INT64, column=$array.arr1#12) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13, $full_join.offset#14] + | | | +-expr_list= + | | | | +-offset#14 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#10, $array_offset.offset#11, $array.arr1#12, $array_offset.offset#13] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#10, $array_offset.offset#11] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#3, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#10] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#11) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#12, $array_offset.offset#13] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#5, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#12] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#13) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#11) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#13) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#9, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#14) + +-element_column_list=[$array.$with_expr_element#16] +== + +# 2 arguments: literal expression and path expression without any alias. +SELECT * +FROM TestTable, UNNEST([1, 2, 3], TestTable.KitchenSink.repeated_int32_val) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.$unnest1#4 AS `$unnest1` [INT64] +| +-$array.repeated_int32_val#5 AS repeated_int32_val [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.repeated_int32_val#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.repeated_int32_val#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int32_val + | +-default_value=[] + +-element_column_list=$array.[$unnest1#4, repeated_int32_val#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.$unnest1#4 AS `$unnest1` [INT64] +| +-$array.repeated_int32_val#5 AS repeated_int32_val [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.repeated_int32_val#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.repeated_int32_val#5] + +-expr_list= + | +-$unnest1#4 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | | +-field_idx=0 + | +-repeated_int32_val#5 := + | +-GetStructField + | +-type=INT32 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#19] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#6 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-arr1#8 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr1_len#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-mode#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#12 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#18] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#18 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#13) + | | +-ColumnRef(type=INT32, column=$array.arr1#15) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | | +-expr_list= + | | | | +-offset#17 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#15, $array_offset.offset#16] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#15] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#16) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#17) + +-element_column_list=[$array.$with_expr_element#19] +== + +# 2 arguments: literal expression with alias, path expression without explicit alias. +SELECT * +FROM TestTable, UNNEST([1, 2, 3] AS literal_array, TestTable.KitchenSink.repeated_int32_val) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.literal_array#4 AS literal_array [INT64] +| +-$array.repeated_int32_val#5 AS repeated_int32_val [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.literal_array#4, $array.repeated_int32_val#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.literal_array#4, $array.repeated_int32_val#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int32_val + | +-default_value=[] + +-element_column_list=$array.[literal_array#4, repeated_int32_val#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.literal_array#4 AS literal_array [INT64] +| +-$array.repeated_int32_val#5 AS repeated_int32_val [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.literal_array#4, $array.repeated_int32_val#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.literal_array#4, $array.repeated_int32_val#5] + +-expr_list= + | +-literal_array#4 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | | +-field_idx=0 + | +-repeated_int32_val#5 := + | +-GetStructField + | +-type=INT32 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#19] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#6 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-arr1#8 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr1_len#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-mode#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#12 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#18] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#18 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#13) + | | +-ColumnRef(type=INT32, column=$array.arr1#15) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | | +-expr_list= + | | | | +-offset#17 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#15, $array_offset.offset#16] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#15] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#16) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#17) + +-element_column_list=[$array.$with_expr_element#19] +== + +# 2 arguments: literal expression without alias, path expression with explicit alias. +SELECT * +FROM TestTable, UNNEST([1, 2, 3], TestTable.KitchenSink.repeated_int32_val AS array2) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.$unnest1#4 AS `$unnest1` [INT64] +| +-$array.array2#5 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int32_val + | +-default_value=[] + +-element_column_list=$array.[$unnest1#4, array2#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.$unnest1#4 AS `$unnest1` [INT64] +| +-$array.array2#5 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5] + +-expr_list= + | +-$unnest1#4 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | | +-field_idx=0 + | +-array2#5 := + | +-GetStructField + | +-type=INT32 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#19] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#6 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-arr1#8 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr1_len#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-mode#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#12 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#18] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#18 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#13) + | | +-ColumnRef(type=INT32, column=$array.arr1#15) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | | +-expr_list= + | | | | +-offset#17 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#15, $array_offset.offset#16] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#15] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#16) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#17) + +-element_column_list=[$array.$with_expr_element#19] +== + +# 2 arguments: literal expression and path expression with explicit alias. +SELECT * +FROM TestTable, UNNEST([1, 2, 3] AS array1, TestTable.KitchenSink.repeated_int32_val AS array2) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.array1#4 AS array1 [INT64] +| +-$array.array2#5 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int32_val + | +-default_value=[] + +-element_column_list=$array.[array1#4, array2#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.array1#4 AS array1 [INT64] +| +-$array.array2#5 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-expr_list= + | +-array1#4 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | | +-field_idx=0 + | +-array2#5 := + | +-GetStructField + | +-type=INT32 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#19] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#6 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-arr1#8 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr1_len#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-mode#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#12 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#18] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#18 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#13) + | | +-ColumnRef(type=INT32, column=$array.arr1#15) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | | +-expr_list= + | | | | +-offset#17 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#15, $array_offset.offset#16] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#15] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#16) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#17) + +-element_column_list=[$array.$with_expr_element#19] +== + +[language_features=V_1_4_MULTIWAY_UNNEST,V_1_3_UNNEST_AND_FLATTEN_ARRAYS,V_1_4_WITH_EXPRESSION] +# 2 arguments: path expressions of implicit flatten and array element access +# with explicit alias. +SELECT * +FROM TestTable, UNNEST( + {{TestTable.KitchenSink.nested_repeated_value.nested_int64|FLATTEN(TestTable.KitchenSink.nested_repeated_value.nested_int64)}} AS array1, + TestTable.KitchenSink.nested_repeated_value[OFFSET(0)].nested_repeated_int64 AS array2 +) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.array1#4 AS array1 [INT64] +| +-$array.array2#5 AS array2 [INT64] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-Flatten + | | +-type=ARRAY + | | +-expr= + | | | +-GetProtoField + | | | +-type=ARRAY> + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=nested_repeated_value + | | | +-default_value=[] + | | +-get_field_list= + | | +-GetProtoField + | | +-type=INT64 + | | +-expr= + | | | +-FlattenedArg(type=PROTO) + | | +-field_descriptor=nested_int64 + | | +-default_value=88 + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY>, INT64) -> PROTO) + | | +-GetProtoField + | | | +-type=ARRAY> + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=nested_repeated_value + | | | +-default_value=[] + | | +-Literal(type=INT64, value=0) + | +-field_descriptor=nested_repeated_int64 + | +-default_value=[] + +-element_column_list=$array.[array1#4, array2#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.array1#4 AS array1 [INT64] +| +-$array.array2#5 AS array2 [INT64] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-expr_list= + | +-array1#4 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#23) + | | +-field_idx=0 + | +-array2#5 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#23) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#23] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#10 := + | | | +-WithExpr + | | | +-type=ARRAY + | | | +-assignment_list= + | | | | +-injected#6 := + | | | | +-GetProtoField + | | | | +-type=ARRAY> + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | | +-field_descriptor=nested_repeated_value + | | | | +-default_value=[] + | | | +-expr= + | | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#6) + | | | +-Literal(type=ARRAY, value=NULL) + | | | +-SubqueryExpr + | | | +-type=ARRAY + | | | +-subquery_type=ARRAY + | | | +-parameter_list= + | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#6) + | | | +-subquery= + | | | +-OrderByScan + | | | +-column_list=[$flatten.injected#9] + | | | +-is_ordered=TRUE + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$flatten.injected#7, $offset.injected#8, $flatten.injected#9] + | | | | +-expr_list= + | | | | | +-injected#9 := + | | | | | +-GetProtoField + | | | | | +-type=INT64 + | | | | | +-expr= + | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#7) + | | | | | +-field_descriptor=nested_int64 + | | | | | +-default_value=88 + | | | | +-input_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$flatten.injected#7, $offset.injected#8] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#6, is_correlated=TRUE) + | | | | +-element_column_list=[$flatten.injected#7] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$offset.injected#8) + | | | +-order_by_item_list= + | | | +-OrderByItem + | | | +-column_ref= + | | | +-ColumnRef(type=INT64, column=$offset.injected#8) + | | +-arr0_len#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#10) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#10) + | | +-arr1#12 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-FunctionCall(ZetaSQL:$array_at_offset(ARRAY>, INT64) -> PROTO) + | | | | +-GetProtoField + | | | | | +-type=ARRAY> + | | | | | +-expr= + | | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | | | +-field_descriptor=nested_repeated_value + | | | | | +-default_value=[] + | | | | +-Literal(type=INT64, value=0) + | | | +-field_descriptor=nested_repeated_int64 + | | | +-default_value=[] + | | +-arr1_len#13 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#12) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#12) + | | +-mode#14 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#15 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#14) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#11) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#13) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#11) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#13) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#16 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#14) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#11) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#13) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#11) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#13) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#10) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#12) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#16) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#22] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#22 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#17) + | | +-ColumnRef(type=INT64, column=$array.arr1#19) + | | +-ColumnRef(type=INT64, column=$full_join.offset#21) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#17, $array_offset.offset#18, $array.arr1#19, $array_offset.offset#20, $full_join.offset#21] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#17, $array_offset.offset#18, $array.arr1#19, $array_offset.offset#20, $full_join.offset#21] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#17, $array_offset.offset#18, $array.arr1#19, $array_offset.offset#20, $full_join.offset#21] + | | | +-expr_list= + | | | | +-offset#21 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#18) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#20) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#17, $array_offset.offset#18, $array.arr1#19, $array_offset.offset#20] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#17, $array_offset.offset#18] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#10, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#17] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#18) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#19, $array_offset.offset#20] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#12, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#19] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#20) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#18) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#20) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#21) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#16, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#21) + +-element_column_list=[$array.$with_expr_element#23] +== + +[show_unparsed] +# 2 arguments: path expressions with inferred alias. +SELECT * +FROM TestTable, UNNEST( + TestTable.KitchenSink.repeated_int32_val, + TestTable.KitchenSink.repeated_string_val +) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.repeated_int32_val#4 AS repeated_int32_val [INT32] +| +-$array.repeated_string_val#5 AS repeated_string_val [STRING] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.repeated_string_val#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.repeated_string_val#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_int32_val + | | +-default_value=[] + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_string_val + | +-default_value=[] + +-element_column_list=$array.[repeated_int32_val#4, repeated_string_val#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[UNPARSED_SQL] +SELECT + testtable_4.a_1 AS key, + testtable_4.a_2 AS TestEnum, + testtable_4.a_3 AS KitchenSink, + a_5 AS repeated_int32_val, + a_6 AS repeated_string_val +FROM + ( + SELECT + TestTable.key AS a_1, + TestTable.TestEnum AS a_2, + TestTable.KitchenSink AS a_3 + FROM + TestTable + ) AS testtable_4 + JOIN + UNNEST(testtable_4.a_3.repeated_int32_val AS a_5, testtable_4.a_3.repeated_string_val AS a_6, mode => CAST("PAD" AS ARRAY_ZIP_MODE)); + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.repeated_int32_val#4 AS repeated_int32_val [INT32] +| +-$array.repeated_string_val#5 AS repeated_string_val [STRING] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.repeated_string_val#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.repeated_string_val#5] + +-expr_list= + | +-repeated_int32_val#4 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | | +-field_idx=0 + | +-repeated_string_val#5 := + | +-GetStructField + | +-type=STRING + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#19] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#6 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr0_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-arr1#8 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_string_val + | | | +-default_value=[] + | | +-arr1_len#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-mode#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#12 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#18] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#18 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT32, column=$array.arr0#13) + | | +-ColumnRef(type=STRING, column=$array.arr1#15) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | | +-expr_list= + | | | | +-offset#17 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#15, $array_offset.offset#16] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#15] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#16) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#17) + +-element_column_list=[$array.$with_expr_element#19] +[UNPARSED_SQL] +SELECT + projectscan_26.a_1 AS key, + projectscan_26.a_2 AS TestEnum, + projectscan_26.a_3 AS KitchenSink, + projectscan_26.a_24 AS repeated_int32_val, + projectscan_26.a_25 AS repeated_string_val +FROM + ( + SELECT + testtable_4.a_1 AS a_1, + testtable_4.a_2 AS a_2, + testtable_4.a_3 AS a_3, + a_23.arr0 AS a_24, + a_23.arr1 AS a_25 + FROM + ( + SELECT + TestTable.key AS a_1, + TestTable.TestEnum AS a_2, + TestTable.KitchenSink AS a_3 + FROM + TestTable + ) AS testtable_4 + JOIN + UNNEST(WITH(a_5 AS testtable_4.a_3.repeated_int32_val, a_19 AS `IF`((a_5) IS NULL, 0, ARRAY_LENGTH(a_5)), a_9 AS testtable_4.a_3.repeated_string_val, a_20 AS `IF`((a_9) IS NULL, + 0, ARRAY_LENGTH(a_9)), a_21 AS `IF`(CAST("PAD" AS ARRAY_ZIP_MODE) IS NULL, ERROR("UNNEST does not allow NULL mode argument"), + CAST("PAD" AS ARRAY_ZIP_MODE)), a_22 AS `IF`((a_21 = CAST("STRICT" AS ARRAY_ZIP_MODE)) AND ((LEAST(a_19, + a_20)) != (GREATEST(a_19, a_20))), ERROR("Unnested arrays under STRICT mode must have equal lengths"), + CAST(NULL AS INT64)), a_15 AS `IF`(a_21 = CAST("TRUNCATE" AS ARRAY_ZIP_MODE), LEAST(a_19, a_20), GREATEST(a_19, + a_20)), ARRAY( + SELECT + STRUCT< arr0 INT32, arr1 STRING, offset INT64 > (orderbyscan_17.a_6, orderbyscan_17.a_10, orderbyscan_17.a_13) AS a_18 + FROM + ( + SELECT + filterscan_16.a_6 AS a_6, + filterscan_16.a_7 AS a_7, + filterscan_16.a_10 AS a_10, + filterscan_16.a_11 AS a_11, + filterscan_16.a_13 AS a_13 + FROM + ( + SELECT + projectscan_14.a_6 AS a_6, + projectscan_14.a_7 AS a_7, + projectscan_14.a_10 AS a_10, + projectscan_14.a_11 AS a_11, + projectscan_14.a_13 AS a_13 + FROM + ( + SELECT + arrayscan_8.a_6 AS a_6, + arrayscan_8.a_7 AS a_7, + arrayscan_12.a_10 AS a_10, + arrayscan_12.a_11 AS a_11, + COALESCE(arrayscan_8.a_7, arrayscan_12.a_11) AS a_13 + FROM + ( + SELECT + a_6 AS a_6, + a_7 AS a_7 + FROM + UNNEST(a_5 AS a_6) WITH OFFSET AS a_7 + ) AS arrayscan_8 + FULL JOIN + ( + SELECT + a_10 AS a_10, + a_11 AS a_11 + FROM + UNNEST(a_9 AS a_10) WITH OFFSET AS a_11 + ) AS arrayscan_12 + ON (arrayscan_8.a_7) = (arrayscan_12.a_11) + ) AS projectscan_14 + WHERE + (projectscan_14.a_13) < a_15 + ) AS filterscan_16 + ORDER BY filterscan_16.a_13 + ) AS orderbyscan_17 + )) AS a_23) + ) AS projectscan_26; +== + +[show_unparsed] +# 2 arguments: path expressions with inferred alias. +SELECT * +FROM TestTable, UNNEST( + TestTable.KitchenSink.repeated_int32_val, + TestTable.KitchenSink.repeated_int32_val{{| AS array2}} +) +-- +ALTERNATION GROUP: +-- +ERROR: Duplicate value table name `repeated_int32_val` found in UNNEST is not allowed [at 5:25] + TestTable.KitchenSink.repeated_int32_val + ^ +-- +ALTERNATION GROUP: AS array2 +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.repeated_int32_val#4 AS repeated_int32_val [INT32] +| +-$array.array2#5 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.array2#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.array2#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_int32_val + | | +-default_value=[] + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int32_val + | +-default_value=[] + +-element_column_list=$array.[repeated_int32_val#4, array2#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[UNPARSED_SQL] +SELECT + testtable_4.a_1 AS key, + testtable_4.a_2 AS TestEnum, + testtable_4.a_3 AS KitchenSink, + a_5 AS repeated_int32_val, + a_6 AS array2 +FROM + ( + SELECT + TestTable.key AS a_1, + TestTable.TestEnum AS a_2, + TestTable.KitchenSink AS a_3 + FROM + TestTable + ) AS testtable_4 + JOIN + UNNEST(testtable_4.a_3.repeated_int32_val AS a_5, testtable_4.a_3.repeated_int32_val AS a_6, mode => CAST("PAD" AS ARRAY_ZIP_MODE)); + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.repeated_int32_val#4 AS repeated_int32_val [INT32] +| +-$array.array2#5 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.array2#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.repeated_int32_val#4, $array.array2#5] + +-expr_list= + | +-repeated_int32_val#4 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | | +-field_idx=0 + | +-array2#5 := + | +-GetStructField + | +-type=INT32 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#19] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#6 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr0_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-arr1#8 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr1_len#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-mode#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#12 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#18] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#18 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT32, column=$array.arr0#13) + | | +-ColumnRef(type=INT32, column=$array.arr1#15) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | | +-expr_list= + | | | | +-offset#17 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#15, $array_offset.offset#16] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#15] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#16) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#17) + +-element_column_list=[$array.$with_expr_element#19] +[UNPARSED_SQL] +SELECT + projectscan_26.a_1 AS key, + projectscan_26.a_2 AS TestEnum, + projectscan_26.a_3 AS KitchenSink, + projectscan_26.a_24 AS repeated_int32_val, + projectscan_26.a_25 AS array2 +FROM + ( + SELECT + testtable_4.a_1 AS a_1, + testtable_4.a_2 AS a_2, + testtable_4.a_3 AS a_3, + a_23.arr0 AS a_24, + a_23.arr1 AS a_25 + FROM + ( + SELECT + TestTable.key AS a_1, + TestTable.TestEnum AS a_2, + TestTable.KitchenSink AS a_3 + FROM + TestTable + ) AS testtable_4 + JOIN + UNNEST(WITH(a_5 AS testtable_4.a_3.repeated_int32_val, a_19 AS `IF`((a_5) IS NULL, 0, ARRAY_LENGTH(a_5)), a_9 AS testtable_4.a_3.repeated_int32_val, a_20 AS `IF`((a_9) IS NULL, + 0, ARRAY_LENGTH(a_9)), a_21 AS `IF`(CAST("PAD" AS ARRAY_ZIP_MODE) IS NULL, ERROR("UNNEST does not allow NULL mode argument"), + CAST("PAD" AS ARRAY_ZIP_MODE)), a_22 AS `IF`((a_21 = CAST("STRICT" AS ARRAY_ZIP_MODE)) AND ((LEAST(a_19, + a_20)) != (GREATEST(a_19, a_20))), ERROR("Unnested arrays under STRICT mode must have equal lengths"), + CAST(NULL AS INT64)), a_15 AS `IF`(a_21 = CAST("TRUNCATE" AS ARRAY_ZIP_MODE), LEAST(a_19, a_20), GREATEST(a_19, + a_20)), ARRAY( + SELECT + STRUCT< arr0 INT32, arr1 INT32, offset INT64 > (orderbyscan_17.a_6, orderbyscan_17.a_10, orderbyscan_17.a_13) AS a_18 + FROM + ( + SELECT + filterscan_16.a_6 AS a_6, + filterscan_16.a_7 AS a_7, + filterscan_16.a_10 AS a_10, + filterscan_16.a_11 AS a_11, + filterscan_16.a_13 AS a_13 + FROM + ( + SELECT + projectscan_14.a_6 AS a_6, + projectscan_14.a_7 AS a_7, + projectscan_14.a_10 AS a_10, + projectscan_14.a_11 AS a_11, + projectscan_14.a_13 AS a_13 + FROM + ( + SELECT + arrayscan_8.a_6 AS a_6, + arrayscan_8.a_7 AS a_7, + arrayscan_12.a_10 AS a_10, + arrayscan_12.a_11 AS a_11, + COALESCE(arrayscan_8.a_7, arrayscan_12.a_11) AS a_13 + FROM + ( + SELECT + a_6 AS a_6, + a_7 AS a_7 + FROM + UNNEST(a_5 AS a_6) WITH OFFSET AS a_7 + ) AS arrayscan_8 + FULL JOIN + ( + SELECT + a_10 AS a_10, + a_11 AS a_11 + FROM + UNNEST(a_9 AS a_10) WITH OFFSET AS a_11 + ) AS arrayscan_12 + ON (arrayscan_8.a_7) = (arrayscan_12.a_11) + ) AS projectscan_14 + WHERE + (projectscan_14.a_13) < a_15 + ) AS filterscan_16 + ORDER BY filterscan_16.a_13 + ) AS orderbyscan_17 + )) AS a_23) + ) AS projectscan_26; +== + +[show_unparsed] +# 2 arguments: path expressions with explicit alias. +SELECT * +FROM TestTable, UNNEST( + TestTable.KitchenSink.repeated_int32_val AS array1, + TestTable.KitchenSink.repeated_string_val AS array2 +) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.array1#4 AS array1 [INT32] +| +-$array.array2#5 AS array2 [STRING] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_int32_val + | | +-default_value=[] + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_string_val + | +-default_value=[] + +-element_column_list=$array.[array1#4, array2#5] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[UNPARSED_SQL] +SELECT + testtable_4.a_1 AS key, + testtable_4.a_2 AS TestEnum, + testtable_4.a_3 AS KitchenSink, + a_5 AS array1, + a_6 AS array2 +FROM + ( + SELECT + TestTable.key AS a_1, + TestTable.TestEnum AS a_2, + TestTable.KitchenSink AS a_3 + FROM + TestTable + ) AS testtable_4 + JOIN + UNNEST(testtable_4.a_3.repeated_int32_val AS a_5, testtable_4.a_3.repeated_string_val AS a_6, mode => CAST("PAD" AS ARRAY_ZIP_MODE)); + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.array1#4 AS array1 [INT32] +| +-$array.array2#5 AS array2 [STRING] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.array1#4, $array.array2#5] + +-expr_list= + | +-array1#4 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | | +-field_idx=0 + | +-array2#5 := + | +-GetStructField + | +-type=STRING + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#19) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#19] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#6 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr0_len#7 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-arr1#8 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_string_val + | | | +-default_value=[] + | | +-arr1_len#9 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-mode#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#11 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#12 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#10) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#7) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#9) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#18] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#18 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT32, column=$array.arr0#13) + | | +-ColumnRef(type=STRING, column=$array.arr1#15) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16, $full_join.offset#17] + | | | +-expr_list= + | | | | +-offset#17 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#13, $array_offset.offset#14, $array.arr1#15, $array_offset.offset#16] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#13, $array_offset.offset#14] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#6, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#13] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#14) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#15, $array_offset.offset#16] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#8, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#15] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#16) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#14) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#16) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#17) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#12, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#17) + +-element_column_list=[$array.$with_expr_element#19] +[UNPARSED_SQL] +SELECT + projectscan_26.a_1 AS key, + projectscan_26.a_2 AS TestEnum, + projectscan_26.a_3 AS KitchenSink, + projectscan_26.a_24 AS array1, + projectscan_26.a_25 AS array2 +FROM + ( + SELECT + testtable_4.a_1 AS a_1, + testtable_4.a_2 AS a_2, + testtable_4.a_3 AS a_3, + a_23.arr0 AS a_24, + a_23.arr1 AS a_25 + FROM + ( + SELECT + TestTable.key AS a_1, + TestTable.TestEnum AS a_2, + TestTable.KitchenSink AS a_3 + FROM + TestTable + ) AS testtable_4 + JOIN + UNNEST(WITH(a_5 AS testtable_4.a_3.repeated_int32_val, a_19 AS `IF`((a_5) IS NULL, 0, ARRAY_LENGTH(a_5)), a_9 AS testtable_4.a_3.repeated_string_val, a_20 AS `IF`((a_9) IS NULL, + 0, ARRAY_LENGTH(a_9)), a_21 AS `IF`(CAST("PAD" AS ARRAY_ZIP_MODE) IS NULL, ERROR("UNNEST does not allow NULL mode argument"), + CAST("PAD" AS ARRAY_ZIP_MODE)), a_22 AS `IF`((a_21 = CAST("STRICT" AS ARRAY_ZIP_MODE)) AND ((LEAST(a_19, + a_20)) != (GREATEST(a_19, a_20))), ERROR("Unnested arrays under STRICT mode must have equal lengths"), + CAST(NULL AS INT64)), a_15 AS `IF`(a_21 = CAST("TRUNCATE" AS ARRAY_ZIP_MODE), LEAST(a_19, a_20), GREATEST(a_19, + a_20)), ARRAY( + SELECT + STRUCT< arr0 INT32, arr1 STRING, offset INT64 > (orderbyscan_17.a_6, orderbyscan_17.a_10, orderbyscan_17.a_13) AS a_18 + FROM + ( + SELECT + filterscan_16.a_6 AS a_6, + filterscan_16.a_7 AS a_7, + filterscan_16.a_10 AS a_10, + filterscan_16.a_11 AS a_11, + filterscan_16.a_13 AS a_13 + FROM + ( + SELECT + projectscan_14.a_6 AS a_6, + projectscan_14.a_7 AS a_7, + projectscan_14.a_10 AS a_10, + projectscan_14.a_11 AS a_11, + projectscan_14.a_13 AS a_13 + FROM + ( + SELECT + arrayscan_8.a_6 AS a_6, + arrayscan_8.a_7 AS a_7, + arrayscan_12.a_10 AS a_10, + arrayscan_12.a_11 AS a_11, + COALESCE(arrayscan_8.a_7, arrayscan_12.a_11) AS a_13 + FROM + ( + SELECT + a_6 AS a_6, + a_7 AS a_7 + FROM + UNNEST(a_5 AS a_6) WITH OFFSET AS a_7 + ) AS arrayscan_8 + FULL JOIN + ( + SELECT + a_10 AS a_10, + a_11 AS a_11 + FROM + UNNEST(a_9 AS a_10) WITH OFFSET AS a_11 + ) AS arrayscan_12 + ON (arrayscan_8.a_7) = (arrayscan_12.a_11) + ) AS projectscan_14 + WHERE + (projectscan_14.a_13) < a_15 + ) AS filterscan_16 + ORDER BY filterscan_16.a_13 + ) AS orderbyscan_17 + )) AS a_23) + ) AS projectscan_26; +== + +# 2 arguments: path expressions reference columns from different lhs tables. +SELECT * +FROM TestTable, ComplexTypes, UNNEST( + TestTable.KitchenSink.repeated_int32_val AS array1, + ComplexTypes.Int32Array AS array2 +) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-ComplexTypes.key#4 AS key [INT32] +| +-ComplexTypes.TestEnum#5 AS TestEnum [ENUM] +| +-ComplexTypes.KitchenSink#6 AS KitchenSink [PROTO] +| +-ComplexTypes.Int32Array#7 AS Int32Array [ARRAY] +| +-ComplexTypes.TestStruct#8 AS TestStruct [STRUCT>] +| +-ComplexTypes.TestProto#9 AS TestProto [PROTO] +| +-$array.array1#10 AS array1 [INT32] +| +-$array.array2#11 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, $array.array1#10, $array.array2#11] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, $array.array1#10, $array.array2#11] + +-input_scan= + | +-JoinScan + | +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9] + | +-left_scan= + | | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + | +-right_scan= + | +-TableScan(column_list=ComplexTypes.[key#4, TestEnum#5, KitchenSink#6, Int32Array#7, TestStruct#8, TestProto#9], table=ComplexTypes, column_index_list=[0, 1, 2, 3, 4, 5]) + +-array_expr_list= + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_int32_val + | | +-default_value=[] + | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + +-element_column_list=$array.[array1#10, array2#11] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-ComplexTypes.key#4 AS key [INT32] +| +-ComplexTypes.TestEnum#5 AS TestEnum [ENUM] +| +-ComplexTypes.KitchenSink#6 AS KitchenSink [PROTO] +| +-ComplexTypes.Int32Array#7 AS Int32Array [ARRAY] +| +-ComplexTypes.TestStruct#8 AS TestStruct [STRUCT>] +| +-ComplexTypes.TestProto#9 AS TestProto [PROTO] +| +-$array.array1#10 AS array1 [INT32] +| +-$array.array2#11 AS array2 [INT32] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, $array.array1#10, $array.array2#11] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, $array.array1#10, $array.array2#11] + +-expr_list= + | +-array1#10 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#25) + | | +-field_idx=0 + | +-array2#11 := + | +-GetStructField + | +-type=INT32 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#25) + | +-field_idx=1 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, $array.$with_expr_element#25] + +-input_scan= + | +-JoinScan + | +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9] + | +-left_scan= + | | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + | +-right_scan= + | +-TableScan(column_list=ComplexTypes.[key#4, TestEnum#5, KitchenSink#6, Int32Array#7, TestStruct#8, TestProto#9], table=ComplexTypes, column_index_list=[0, 1, 2, 3, 4, 5]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#12 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr0_len#13 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#12) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#12) + | | +-arr1#14 := ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | | +-arr1_len#15 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#14) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#14) + | | +-mode#16 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#17 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#16) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#13) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#15) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#13) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#15) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#18 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#16) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#13) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#15) + | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#13) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#15) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#12) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#14) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#18) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#24] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#24 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT32, column=$array.arr0#19) + | | +-ColumnRef(type=INT32, column=$array.arr1#21) + | | +-ColumnRef(type=INT64, column=$full_join.offset#23) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#19, $array_offset.offset#20, $array.arr1#21, $array_offset.offset#22, $full_join.offset#23] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#19, $array_offset.offset#20, $array.arr1#21, $array_offset.offset#22, $full_join.offset#23] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#19, $array_offset.offset#20, $array.arr1#21, $array_offset.offset#22, $full_join.offset#23] + | | | +-expr_list= + | | | | +-offset#23 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#20) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#22) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#19, $array_offset.offset#20, $array.arr1#21, $array_offset.offset#22] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr0#19, $array_offset.offset#20] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#12, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr0#19] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#20) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr1#21, $array_offset.offset#22] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#14, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr1#21] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#22) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#20) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#22) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#23) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#18, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#23) + +-element_column_list=[$array.$with_expr_element#25] +== + +[show_unparsed] +# 2 arguments: correlated path expressions reference columns from different +# out-of-scope tables. +SELECT ( + SELECT AS STRUCT * + FROM UNNEST( + TestTable.KitchenSink.repeated_int32_val AS array1, + ComplexTypes.Int32Array AS array2, + mode => 'PAD' + ) +) AS col1 +FROM TestTable, ComplexTypes +-- +QueryStmt ++-output_column_list= +| +-$query.col1#13 AS col1 [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.col1#13] + +-expr_list= + | +-col1#13 := + | +-SubqueryExpr + | +-type=STRUCT + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#12] + | +-expr_list= + | | +-$struct#12 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT32, column=$array.array1#10) + | | +-ColumnRef(type=INT32, column=$array.array2#11) + | +-input_scan= + | +-ProjectScan + | +-column_list=$array.[array1#10, array2#11] + | +-input_scan= + | +-ArrayScan + | +-column_list=$array.[array1#10, array2#11] + | +-array_expr_list= + | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7, is_correlated=TRUE) + | +-element_column_list=$array.[array1#10, array2#11] + | +-array_zip_mode= + | +-Literal(type=ENUM, value=PAD) + +-input_scan= + +-JoinScan + +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7] + +-left_scan= + | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + +-right_scan= + +-TableScan(column_list=[ComplexTypes.Int32Array#7], table=ComplexTypes, column_index_list=[3]) + +[UNPARSED_SQL] +SELECT + ( + SELECT + STRUCT< array1 INT32, array2 INT32 > (projectscan_8.a_6, projectscan_8.a_7) AS a_9 + FROM + ( + SELECT + a_6 AS a_6, + a_7 AS a_7 + FROM + UNNEST(testtable_2.a_1.repeated_int32_val AS a_6, complextypes_4.a_3 AS a_7, mode => CAST("PAD" AS ARRAY_ZIP_MODE)) + ) AS projectscan_8 + ) AS col1 +FROM + ( + SELECT + TestTable.KitchenSink AS a_1 + FROM + TestTable + ) AS testtable_2 + CROSS JOIN + ( + SELECT + ComplexTypes.Int32Array AS a_3 + FROM + ComplexTypes + ) AS complextypes_4; + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.col1#13 AS col1 [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.col1#13] + +-expr_list= + | +-col1#13 := + | +-SubqueryExpr + | +-type=STRUCT + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#12] + | +-expr_list= + | | +-$struct#12 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT32, column=$array.array1#10) + | | +-ColumnRef(type=INT32, column=$array.array2#11) + | +-input_scan= + | +-ProjectScan + | +-column_list=$array.[array1#10, array2#11] + | +-input_scan= + | +-ProjectScan + | +-column_list=$array.[array1#10, array2#11] + | +-expr_list= + | | +-array1#10 := + | | | +-GetStructField + | | | +-type=INT32 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#27) + | | | +-field_idx=0 + | | +-array2#11 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#27) + | | +-field_idx=1 + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.$with_expr_element#27] + | +-array_expr_list= + | | +-WithExpr + | | +-type=ARRAY> + | | +-assignment_list= + | | | +-arr0#14 := + | | | | +-GetProtoField + | | | | +-type=ARRAY + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | +-field_descriptor=repeated_int32_val + | | | | +-default_value=[] + | | | +-arr0_len#15 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#14) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#14) + | | | +-arr1#16 := ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7, is_correlated=TRUE) + | | | +-arr1_len#17 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#16) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#16) + | | | +-mode#18 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | | +-Literal(type=ENUM, value=PAD) + | | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-strict_check#19 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#18) + | | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#15) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#17) + | | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#15) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#17) + | | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | | +-Literal(type=INT64, value=NULL) + | | | +-result_len#20 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#18) + | | | | +-Literal(type=ENUM, value=TRUNCATE) + | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#15) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#17) + | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#15) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#17) + | | +-expr= + | | +-SubqueryExpr + | | +-type=ARRAY> + | | +-subquery_type=ARRAY + | | +-parameter_list= + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#14) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#16) + | | | +-ColumnRef(type=INT64, column=$with_expr.result_len#20) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$make_struct.$struct#26] + | | +-is_ordered=TRUE + | | +-expr_list= + | | | +-$struct#26 := + | | | +-MakeStruct + | | | +-type=STRUCT + | | | +-field_list= + | | | +-ColumnRef(type=INT32, column=$array.arr0#21) + | | | +-ColumnRef(type=INT32, column=$array.arr1#23) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#25) + | | +-input_scan= + | | +-OrderByScan + | | +-column_list=[$array.arr0#21, $array_offset.offset#22, $array.arr1#23, $array_offset.offset#24, $full_join.offset#25] + | | +-is_ordered=TRUE + | | +-input_scan= + | | | +-FilterScan + | | | +-column_list=[$array.arr0#21, $array_offset.offset#22, $array.arr1#23, $array_offset.offset#24, $full_join.offset#25] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$array.arr0#21, $array_offset.offset#22, $array.arr1#23, $array_offset.offset#24, $full_join.offset#25] + | | | | +-expr_list= + | | | | | +-offset#25 := + | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#22) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#24) + | | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[$array.arr0#21, $array_offset.offset#22, $array.arr1#23, $array_offset.offset#24] + | | | | +-join_type=FULL + | | | | +-left_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr0#21, $array_offset.offset#22] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#14, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr0#21] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#22) + | | | | +-right_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr1#23, $array_offset.offset#24] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#16, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr1#23] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#24) + | | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#22) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#24) + | | | +-filter_expr= + | | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#25) + | | | +-ColumnRef(type=INT64, column=$with_expr.result_len#20, is_correlated=TRUE) + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | +-ColumnRef(type=INT64, column=$full_join.offset#25) + | +-element_column_list=[$array.$with_expr_element#27] + +-input_scan= + +-JoinScan + +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7] + +-left_scan= + | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + +-right_scan= + +-TableScan(column_list=[ComplexTypes.Int32Array#7], table=ComplexTypes, column_index_list=[3]) +[UNPARSED_SQL] +SELECT + ( + SELECT + STRUCT< array1 INT32, array2 INT32 > (projectscan_28.a_25, projectscan_28.a_26) AS a_29 + FROM + ( + SELECT + projectscan_27.a_25 AS a_25, + projectscan_27.a_26 AS a_26 + FROM + ( + SELECT + a_24.arr0 AS a_25, + a_24.arr1 AS a_26 + FROM + UNNEST(WITH(a_6 AS testtable_2.a_1.repeated_int32_val, a_20 AS `IF`((a_6) IS NULL, 0, ARRAY_LENGTH(a_6)), a_10 AS complextypes_4.a_3, a_21 AS `IF`((a_10) IS NULL, + 0, ARRAY_LENGTH(a_10)), a_22 AS `IF`(CAST("PAD" AS ARRAY_ZIP_MODE) IS NULL, ERROR("UNNEST does not allow NULL mode argument"), + CAST("PAD" AS ARRAY_ZIP_MODE)), a_23 AS `IF`((a_22 = CAST("STRICT" AS ARRAY_ZIP_MODE)) AND ((LEAST(a_20, + a_21)) != (GREATEST(a_20, a_21))), ERROR("Unnested arrays under STRICT mode must have equal lengths"), + CAST(NULL AS INT64)), a_16 AS `IF`(a_22 = CAST("TRUNCATE" AS ARRAY_ZIP_MODE), LEAST(a_20, a_21), GREATEST(a_20, + a_21)), ARRAY( + SELECT + STRUCT< arr0 INT32, arr1 INT32, offset INT64 > (orderbyscan_18.a_7, orderbyscan_18.a_11, orderbyscan_18.a_14) AS a_19 + FROM + ( + SELECT + filterscan_17.a_7 AS a_7, + filterscan_17.a_8 AS a_8, + filterscan_17.a_11 AS a_11, + filterscan_17.a_12 AS a_12, + filterscan_17.a_14 AS a_14 + FROM + ( + SELECT + projectscan_15.a_7 AS a_7, + projectscan_15.a_8 AS a_8, + projectscan_15.a_11 AS a_11, + projectscan_15.a_12 AS a_12, + projectscan_15.a_14 AS a_14 + FROM + ( + SELECT + arrayscan_9.a_7 AS a_7, + arrayscan_9.a_8 AS a_8, + arrayscan_13.a_11 AS a_11, + arrayscan_13.a_12 AS a_12, + COALESCE(arrayscan_9.a_8, arrayscan_13.a_12) AS a_14 + FROM + ( + SELECT + a_7 AS a_7, + a_8 AS a_8 + FROM + UNNEST(a_6 AS a_7) WITH OFFSET AS a_8 + ) AS arrayscan_9 + FULL JOIN + ( + SELECT + a_11 AS a_11, + a_12 AS a_12 + FROM + UNNEST(a_10 AS a_11) WITH OFFSET AS a_12 + ) AS arrayscan_13 + ON (arrayscan_9.a_8) = (arrayscan_13.a_12) + ) AS projectscan_15 + WHERE + (projectscan_15.a_14) < a_16 + ) AS filterscan_17 + ORDER BY filterscan_17.a_14 + ) AS orderbyscan_18 + )) AS a_24) + ) AS projectscan_27 + ) AS projectscan_28 + ) AS col1 +FROM + ( + SELECT + TestTable.KitchenSink AS a_1 + FROM + TestTable + ) AS testtable_2 + CROSS JOIN + ( + SELECT + ComplexTypes.Int32Array AS a_3 + FROM + ComplexTypes + ) AS complextypes_4; +== + +[language_features=V_1_4_MULTIWAY_UNNEST,V_1_3_UNNEST_AND_FLATTEN_ARRAYS,V_1_4_WITH_EXPRESSION] +# 2 arguments: FLATTEN correlated path expressions reference columns from +# different out-of-scope tables. +SELECT ( + SELECT AS STRUCT * + FROM UNNEST( + FLATTEN(TestTable.KitchenSink.nested_repeated_value.nested_int64) AS array1, + ComplexTypes.Int32Array AS array2, + mode => 'PAD' + ) +) AS col1 +FROM TestTable, ComplexTypes +-- +QueryStmt ++-output_column_list= +| +-$query.col1#13 AS col1 [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.col1#13] + +-expr_list= + | +-col1#13 := + | +-SubqueryExpr + | +-type=STRUCT + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#12] + | +-expr_list= + | | +-$struct#12 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.array1#10) + | | +-ColumnRef(type=INT32, column=$array.array2#11) + | +-input_scan= + | +-ProjectScan + | +-column_list=$array.[array1#10, array2#11] + | +-input_scan= + | +-ArrayScan + | +-column_list=$array.[array1#10, array2#11] + | +-array_expr_list= + | | +-Flatten + | | | +-type=ARRAY + | | | +-expr= + | | | | +-GetProtoField + | | | | +-type=ARRAY> + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | +-field_descriptor=nested_repeated_value + | | | | +-default_value=[] + | | | +-get_field_list= + | | | +-GetProtoField + | | | +-type=INT64 + | | | +-expr= + | | | | +-FlattenedArg(type=PROTO) + | | | +-field_descriptor=nested_int64 + | | | +-default_value=88 + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7, is_correlated=TRUE) + | +-element_column_list=$array.[array1#10, array2#11] + | +-array_zip_mode= + | +-Literal(type=ENUM, value=PAD) + +-input_scan= + +-JoinScan + +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7] + +-left_scan= + | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + +-right_scan= + +-TableScan(column_list=[ComplexTypes.Int32Array#7], table=ComplexTypes, column_index_list=[3]) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.col1#13 AS col1 [STRUCT] ++-query= + +-ProjectScan + +-column_list=[$query.col1#13] + +-expr_list= + | +-col1#13 := + | +-SubqueryExpr + | +-type=STRUCT + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#12] + | +-expr_list= + | | +-$struct#12 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.array1#10) + | | +-ColumnRef(type=INT32, column=$array.array2#11) + | +-input_scan= + | +-ProjectScan + | +-column_list=$array.[array1#10, array2#11] + | +-input_scan= + | +-ProjectScan + | +-column_list=$array.[array1#10, array2#11] + | +-expr_list= + | | +-array1#10 := + | | | +-GetStructField + | | | +-type=INT64 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#31) + | | | +-field_idx=0 + | | +-array2#11 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#31) + | | +-field_idx=1 + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.$with_expr_element#31] + | +-array_expr_list= + | | +-WithExpr + | | +-type=ARRAY> + | | +-assignment_list= + | | | +-arr0#18 := + | | | | +-WithExpr + | | | | +-type=ARRAY + | | | | +-assignment_list= + | | | | | +-injected#14 := + | | | | | +-GetProtoField + | | | | | +-type=ARRAY> + | | | | | +-expr= + | | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | | +-field_descriptor=nested_repeated_value + | | | | | +-default_value=[] + | | | | +-expr= + | | | | +-FunctionCall(ZetaSQL:if(BOOL, ARRAY, ARRAY) -> ARRAY) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#14) + | | | | +-Literal(type=ARRAY, value=NULL) + | | | | +-SubqueryExpr + | | | | +-type=ARRAY + | | | | +-subquery_type=ARRAY + | | | | +-parameter_list= + | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#14) + | | | | +-subquery= + | | | | +-OrderByScan + | | | | +-column_list=[$flatten.injected#17] + | | | | +-is_ordered=TRUE + | | | | +-input_scan= + | | | | | +-ProjectScan + | | | | | +-column_list=[$flatten.injected#15, $offset.injected#16, $flatten.injected#17] + | | | | | +-expr_list= + | | | | | | +-injected#17 := + | | | | | | +-GetProtoField + | | | | | | +-type=INT64 + | | | | | | +-expr= + | | | | | | | +-ColumnRef(type=PROTO, column=$flatten.injected#15) + | | | | | | +-field_descriptor=nested_int64 + | | | | | | +-default_value=88 + | | | | | +-input_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$flatten.injected#15, $offset.injected#16] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY>, column=$flatten_input.injected#14, is_correlated=TRUE) + | | | | | +-element_column_list=[$flatten.injected#15] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$offset.injected#16) + | | | | +-order_by_item_list= + | | | | +-OrderByItem + | | | | +-column_ref= + | | | | +-ColumnRef(type=INT64, column=$offset.injected#16) + | | | +-arr0_len#19 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#18) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#18) + | | | +-arr1#20 := ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7, is_correlated=TRUE) + | | | +-arr1_len#21 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#20) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#20) + | | | +-mode#22 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | | +-Literal(type=ENUM, value=PAD) + | | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-strict_check#23 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#22) + | | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#19) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#21) + | | | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#19) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#21) + | | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | | +-Literal(type=INT64, value=NULL) + | | | +-result_len#24 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#22) + | | | | +-Literal(type=ENUM, value=TRUNCATE) + | | | +-FunctionCall(ZetaSQL:least(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#19) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#21) + | | | +-FunctionCall(ZetaSQL:greatest(repeated(2) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#19) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#21) + | | +-expr= + | | +-SubqueryExpr + | | +-type=ARRAY> + | | +-subquery_type=ARRAY + | | +-parameter_list= + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#18) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#20) + | | | +-ColumnRef(type=INT64, column=$with_expr.result_len#24) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$make_struct.$struct#30] + | | +-is_ordered=TRUE + | | +-expr_list= + | | | +-$struct#30 := + | | | +-MakeStruct + | | | +-type=STRUCT + | | | +-field_list= + | | | +-ColumnRef(type=INT64, column=$array.arr0#25) + | | | +-ColumnRef(type=INT32, column=$array.arr1#27) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#29) + | | +-input_scan= + | | +-OrderByScan + | | +-column_list=[$array.arr0#25, $array_offset.offset#26, $array.arr1#27, $array_offset.offset#28, $full_join.offset#29] + | | +-is_ordered=TRUE + | | +-input_scan= + | | | +-FilterScan + | | | +-column_list=[$array.arr0#25, $array_offset.offset#26, $array.arr1#27, $array_offset.offset#28, $full_join.offset#29] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$array.arr0#25, $array_offset.offset#26, $array.arr1#27, $array_offset.offset#28, $full_join.offset#29] + | | | | +-expr_list= + | | | | | +-offset#29 := + | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#26) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#28) + | | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[$array.arr0#25, $array_offset.offset#26, $array.arr1#27, $array_offset.offset#28] + | | | | +-join_type=FULL + | | | | +-left_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr0#25, $array_offset.offset#26] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#18, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr0#25] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#26) + | | | | +-right_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr1#27, $array_offset.offset#28] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#20, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr1#27] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#28) + | | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#26) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#28) + | | | +-filter_expr= + | | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#29) + | | | +-ColumnRef(type=INT64, column=$with_expr.result_len#24, is_correlated=TRUE) + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | +-ColumnRef(type=INT64, column=$full_join.offset#29) + | +-element_column_list=[$array.$with_expr_element#31] + +-input_scan= + +-JoinScan + +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7] + +-left_scan= + | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + +-right_scan= + +-TableScan(column_list=[ComplexTypes.Int32Array#7], table=ComplexTypes, column_index_list=[3]) +== + + +# 3 arguments: literal expression without alias, path expression with explicit alias. +SELECT * +FROM TestTable, UNNEST([1, 2, 3], TestTable.KitchenSink.repeated_int32_val AS array2, TestTable.KitchenSink.repeated_int64_val AS array3) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.$unnest1#4 AS `$unnest1` [INT64] +| +-$array.array2#5 AS array2 [INT32] +| +-$array.array3#6 AS array3 [INT64] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5, $array.array3#6] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5, $array.array3#6] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2, 3]) + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_int32_val + | | +-default_value=[] + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | +-field_descriptor=repeated_int64_val + | +-default_value=[] + +-element_column_list=$array.[$unnest1#4, array2#5, array3#6] + +-array_zip_mode= + +-Literal(type=ENUM, value=PAD) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-$array.$unnest1#4 AS `$unnest1` [INT64] +| +-$array.array2#5 AS array2 [INT32] +| +-$array.array3#6 AS array3 [INT64] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5, $array.array3#6] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$unnest1#4, $array.array2#5, $array.array3#6] + +-expr_list= + | +-$unnest1#4 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#25) + | | +-field_idx=0 + | +-array2#5 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#25) + | | +-field_idx=1 + | +-array3#6 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, column=$array.$with_expr_element#25) + | +-field_idx=2 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, $array.$with_expr_element#25] + +-input_scan= + | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY> + | +-assignment_list= + | | +-arr0#7 := Literal(type=ARRAY, value=[1, 2, 3]) + | | +-arr0_len#8 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#7) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#7) + | | +-arr1#9 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr1_len#10 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#9) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#9) + | | +-arr2#11 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int64_val + | | | +-default_value=[] + | | +-arr2_len#12 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#11) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#11) + | | +-mode#13 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=PAD) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=PAD) + | | +-strict_check#14 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#13) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(3) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#8) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#10) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#12) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(3) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#8) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#10) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#12) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#15 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#13) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(3) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#8) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#10) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#12) + | | +-FunctionCall(ZetaSQL:greatest(repeated(3) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#8) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#10) + | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#12) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#7) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#9) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#11) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#15) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#24] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#24 := + | | +-MakeStruct + | | +-type=STRUCT + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#16) + | | +-ColumnRef(type=INT32, column=$array.arr1#18) + | | +-ColumnRef(type=INT64, column=$array.arr2#21) + | | +-ColumnRef(type=INT64, column=$full_join.offset#23) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#16, $array_offset.offset#17, $array.arr1#18, $array_offset.offset#19, $full_join.offset#20, $array.arr2#21, $array_offset.offset#22, $full_join.offset#23] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#16, $array_offset.offset#17, $array.arr1#18, $array_offset.offset#19, $full_join.offset#20, $array.arr2#21, $array_offset.offset#22, $full_join.offset#23] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#16, $array_offset.offset#17, $array.arr1#18, $array_offset.offset#19, $full_join.offset#20, $array.arr2#21, $array_offset.offset#22, $full_join.offset#23] + | | | +-expr_list= + | | | | +-offset#23 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$full_join.offset#20) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#22) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#16, $array_offset.offset#17, $array.arr1#18, $array_offset.offset#19, $full_join.offset#20, $array.arr2#21, $array_offset.offset#22] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$array.arr0#16, $array_offset.offset#17, $array.arr1#18, $array_offset.offset#19, $full_join.offset#20] + | | | | +-expr_list= + | | | | | +-offset#20 := + | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#17) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#19) + | | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[$array.arr0#16, $array_offset.offset#17, $array.arr1#18, $array_offset.offset#19] + | | | | +-join_type=FULL + | | | | +-left_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr0#16, $array_offset.offset#17] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#7, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr0#16] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#17) + | | | | +-right_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr1#18, $array_offset.offset#19] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr1#9, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr1#18] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#19) + | | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#17) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#19) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr2#21, $array_offset.offset#22] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#11, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr2#21] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#22) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#20) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#22) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#23) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#15, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#23) + +-element_column_list=[$array.$with_expr_element#25] +== + +# 10 arguments: literal and path expressions with explicit and inferred aliases +# and without alias. +# Note that, SELECT * will expand STRUCT or PROTO array element. +SELECT * +FROM TestTable, ComplexTypes, MoreComplexTypes, UNNEST( + [1, 2] AS array1, + [STRUCT(1 AS x), STRUCT(2)] AS array2, + TestTable.KitchenSink.repeated_int32_val AS array3, + ["hello"], + TestTable.KitchenSink.repeated_string_val AS array5, + ComplexTypes.Int32Array AS array6, + MoreComplexTypes.ArrayOfStruct AS array7, + TestTable.KitchenSink.repeated_float_val, + TestTable.KitchenSink.nested_repeated_value, + TestTable.KitchenSink.nested_value.nested_repeated_int64, + mode => 'STRICT' +) +-- +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-ComplexTypes.key#4 AS key [INT32] +| +-ComplexTypes.TestEnum#5 AS TestEnum [ENUM] +| +-ComplexTypes.KitchenSink#6 AS KitchenSink [PROTO] +| +-ComplexTypes.Int32Array#7 AS Int32Array [ARRAY] +| +-ComplexTypes.TestStruct#8 AS TestStruct [STRUCT>] +| +-ComplexTypes.TestProto#9 AS TestProto [PROTO] +| +-MoreComplexTypes.key#10 AS key [INT32] +| +-MoreComplexTypes.ArrayOfStruct#11 AS ArrayOfStruct [ARRAY>] +| +-MoreComplexTypes.StructOfArrayOfStruct#12 AS StructOfArrayOfStruct [STRUCT, z ARRAY>>] +| +-$array.array1#13 AS array1 [INT64] +| +-$query.x#23 AS x [INT64] +| +-$array.array3#15 AS array3 [INT32] +| +-$array.$unnest1#16 AS `$unnest1` [STRING] +| +-$array.array5#17 AS array5 [STRING] +| +-$array.array6#18 AS array6 [INT32] +| +-$query.a#24 AS a [INT32] +| +-$query.b#25 AS b [STRING] +| +-$array.repeated_float_val#20 AS repeated_float_val [FLOAT] +| +-$query.nested_int64#26 AS nested_int64 [INT64] +| +-$query.nested_repeated_int64#27 AS nested_repeated_int64 [ARRAY] +| +-$query.nested_repeated_int32#28 AS nested_repeated_int32 [ARRAY] +| +-$query.value#29 AS value [ARRAY] +| +-$array.nested_repeated_int64#22 AS nested_repeated_int64 [INT64] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, MoreComplexTypes.key#10, MoreComplexTypes.ArrayOfStruct#11, MoreComplexTypes.StructOfArrayOfStruct#12, $array.array1#13, $query.x#23, $array.array3#15, $array.$unnest1#16, $array.array5#17, $array.array6#18, $query.a#24, $query.b#25, $array.repeated_float_val#20, $query.nested_int64#26, $query.nested_repeated_int64#27, $query.nested_repeated_int32#28, $query.value#29, $array.nested_repeated_int64#22] + +-expr_list= + | +-x#23 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.array2#14) + | | +-field_idx=0 + | +-a#24 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | +-field_idx=0 + | +-b#25 := + | | +-GetStructField + | | +-type=STRING + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | +-field_idx=1 + | +-nested_int64#26 := + | | +-GetProtoField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=nested_int64 + | | +-default_value=88 + | +-nested_repeated_int64#27 := + | | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=nested_repeated_int64 + | | +-default_value=[] + | +-nested_repeated_int32#28 := + | | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=nested_repeated_int32 + | | +-default_value=[] + | +-value#29 := + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | +-field_descriptor=value + | +-default_value=[] + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, MoreComplexTypes.key#10, MoreComplexTypes.ArrayOfStruct#11, MoreComplexTypes.StructOfArrayOfStruct#12, $array.array1#13, $array.array2#14, $array.array3#15, $array.$unnest1#16, $array.array5#17, $array.array6#18, $array.array7#19, $array.repeated_float_val#20, $array.nested_repeated_value#21, $array.nested_repeated_int64#22] + +-input_scan= + | +-JoinScan + | +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, MoreComplexTypes.key#10, MoreComplexTypes.ArrayOfStruct#11, MoreComplexTypes.StructOfArrayOfStruct#12] + | +-left_scan= + | | +-JoinScan + | | +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9] + | | +-left_scan= + | | | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + | | +-right_scan= + | | +-TableScan(column_list=ComplexTypes.[key#4, TestEnum#5, KitchenSink#6, Int32Array#7, TestStruct#8, TestProto#9], table=ComplexTypes, column_index_list=[0, 1, 2, 3, 4, 5]) + | +-right_scan= + | +-TableScan(column_list=MoreComplexTypes.[key#10, ArrayOfStruct#11, StructOfArrayOfStruct#12], table=MoreComplexTypes, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-Literal(type=ARRAY, value=[1, 2]) + | +-Literal(type=ARRAY>, value=[{x:1}, {x:2}]) + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_int32_val + | | +-default_value=[] + | +-Literal(type=ARRAY, value=["hello"]) + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_string_val + | | +-default_value=[] + | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | +-ColumnRef(type=ARRAY>, column=MoreComplexTypes.ArrayOfStruct#11) + | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=repeated_float_val + | | +-default_value=[] + | +-GetProtoField + | | +-type=ARRAY> + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=nested_repeated_value + | | +-default_value=[] + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-GetProtoField + | | +-type=PROTO + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-field_descriptor=nested_value + | | +-default_value=NULL + | +-field_descriptor=nested_repeated_int64 + | +-default_value=[] + +-element_column_list=$array.[array1#13, array2#14, array3#15, $unnest1#16, array5#17, array6#18, array7#19, repeated_float_val#20, nested_repeated_value#21, nested_repeated_int64#22] + +-array_zip_mode= + +-Literal(type=ENUM, value=STRICT) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-TestTable.key#1 AS key [INT32] +| +-TestTable.TestEnum#2 AS TestEnum [ENUM] +| +-TestTable.KitchenSink#3 AS KitchenSink [PROTO] +| +-ComplexTypes.key#4 AS key [INT32] +| +-ComplexTypes.TestEnum#5 AS TestEnum [ENUM] +| +-ComplexTypes.KitchenSink#6 AS KitchenSink [PROTO] +| +-ComplexTypes.Int32Array#7 AS Int32Array [ARRAY] +| +-ComplexTypes.TestStruct#8 AS TestStruct [STRUCT>] +| +-ComplexTypes.TestProto#9 AS TestProto [PROTO] +| +-MoreComplexTypes.key#10 AS key [INT32] +| +-MoreComplexTypes.ArrayOfStruct#11 AS ArrayOfStruct [ARRAY>] +| +-MoreComplexTypes.StructOfArrayOfStruct#12 AS StructOfArrayOfStruct [STRUCT, z ARRAY>>] +| +-$array.array1#13 AS array1 [INT64] +| +-$query.x#23 AS x [INT64] +| +-$array.array3#15 AS array3 [INT32] +| +-$array.$unnest1#16 AS `$unnest1` [STRING] +| +-$array.array5#17 AS array5 [STRING] +| +-$array.array6#18 AS array6 [INT32] +| +-$query.a#24 AS a [INT32] +| +-$query.b#25 AS b [STRING] +| +-$array.repeated_float_val#20 AS repeated_float_val [FLOAT] +| +-$query.nested_int64#26 AS nested_int64 [INT64] +| +-$query.nested_repeated_int64#27 AS nested_repeated_int64 [ARRAY] +| +-$query.nested_repeated_int32#28 AS nested_repeated_int32 [ARRAY] +| +-$query.value#29 AS value [ARRAY] +| +-$array.nested_repeated_int64#22 AS nested_repeated_int64 [INT64] ++-query= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, MoreComplexTypes.key#10, MoreComplexTypes.ArrayOfStruct#11, MoreComplexTypes.StructOfArrayOfStruct#12, $array.array1#13, $query.x#23, $array.array3#15, $array.$unnest1#16, $array.array5#17, $array.array6#18, $query.a#24, $query.b#25, $array.repeated_float_val#20, $query.nested_int64#26, $query.nested_repeated_int64#27, $query.nested_repeated_int32#28, $query.value#29, $array.nested_repeated_int64#22] + +-expr_list= + | +-x#23 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.array2#14) + | | +-field_idx=0 + | +-a#24 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | +-field_idx=0 + | +-b#25 := + | | +-GetStructField + | | +-type=STRING + | | +-expr= + | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | +-field_idx=1 + | +-nested_int64#26 := + | | +-GetProtoField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=nested_int64 + | | +-default_value=88 + | +-nested_repeated_int64#27 := + | | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=nested_repeated_int64 + | | +-default_value=[] + | +-nested_repeated_int32#28 := + | | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=nested_repeated_int32 + | | +-default_value=[] + | +-value#29 := + | +-GetProtoField + | +-type=ARRAY + | +-expr= + | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | +-field_descriptor=value + | +-default_value=[] + +-input_scan= + +-ProjectScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, MoreComplexTypes.key#10, MoreComplexTypes.ArrayOfStruct#11, MoreComplexTypes.StructOfArrayOfStruct#12, $array.array1#13, $array.array2#14, $array.array3#15, $array.$unnest1#16, $array.array5#17, $array.array6#18, $array.array7#19, $array.repeated_float_val#20, $array.nested_repeated_value#21, $array.nested_repeated_int64#22] + +-expr_list= + | +-array1#13 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=0 + | +-array2#14 := + | | +-GetStructField + | | +-type=STRUCT + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=1 + | +-array3#15 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=2 + | +-$unnest1#16 := + | | +-GetStructField + | | +-type=STRING + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=3 + | +-array5#17 := + | | +-GetStructField + | | +-type=STRING + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=4 + | +-array6#18 := + | | +-GetStructField + | | +-type=INT32 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=5 + | +-array7#19 := + | | +-GetStructField + | | +-type=STRUCT + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=6 + | +-repeated_float_val#20 := + | | +-GetStructField + | | +-type=FLOAT + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=7 + | +-nested_repeated_value#21 := + | | +-GetStructField + | | +-type=PROTO + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | | +-field_idx=8 + | +-nested_repeated_int64#22 := + | +-GetStructField + | +-type=INT64 + | +-expr= + | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#83) + | +-field_idx=9 + +-input_scan= + +-ArrayScan + +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, MoreComplexTypes.key#10, MoreComplexTypes.ArrayOfStruct#11, MoreComplexTypes.StructOfArrayOfStruct#12, $array.$with_expr_element#83] + +-input_scan= + | +-JoinScan + | +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9, MoreComplexTypes.key#10, MoreComplexTypes.ArrayOfStruct#11, MoreComplexTypes.StructOfArrayOfStruct#12] + | +-left_scan= + | | +-JoinScan + | | +-column_list=[TestTable.key#1, TestTable.TestEnum#2, TestTable.KitchenSink#3, ComplexTypes.key#4, ComplexTypes.TestEnum#5, ComplexTypes.KitchenSink#6, ComplexTypes.Int32Array#7, ComplexTypes.TestStruct#8, ComplexTypes.TestProto#9] + | | +-left_scan= + | | | +-TableScan(column_list=TestTable.[key#1, TestEnum#2, KitchenSink#3], table=TestTable, column_index_list=[0, 1, 2]) + | | +-right_scan= + | | +-TableScan(column_list=ComplexTypes.[key#4, TestEnum#5, KitchenSink#6, Int32Array#7, TestStruct#8, TestProto#9], table=ComplexTypes, column_index_list=[0, 1, 2, 3, 4, 5]) + | +-right_scan= + | +-TableScan(column_list=MoreComplexTypes.[key#10, ArrayOfStruct#11, StructOfArrayOfStruct#12], table=MoreComplexTypes, column_index_list=[0, 1, 2]) + +-array_expr_list= + | +-WithExpr + | +-type=ARRAY, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>> + | +-assignment_list= + | | +-arr0#30 := Literal(type=ARRAY, value=[1, 2]) + | | +-arr0_len#31 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#30) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#30) + | | +-arr1#32 := Literal(type=ARRAY>, value=[{x:1}, {x:2}]) + | | +-arr1_len#33 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#32) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) + | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#32) + | | +-arr2#34 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-arr2_len#35 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#34) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#34) + | | +-arr3#36 := Literal(type=ARRAY, value=["hello"]) + | | +-arr3_len#37 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#36) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#36) + | | +-arr4#38 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_string_val + | | | +-default_value=[] + | | +-arr4_len#39 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#38) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#38) + | | +-arr5#40 := ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | | +-arr5_len#41 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#40) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#40) + | | +-arr6#42 := ColumnRef(type=ARRAY>, column=MoreComplexTypes.ArrayOfStruct#11) + | | +-arr6_len#43 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#42) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) + | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#42) + | | +-arr7#44 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=repeated_float_val + | | | +-default_value=[] + | | +-arr7_len#45 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#44) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#44) + | | +-arr8#46 := + | | | +-GetProtoField + | | | +-type=ARRAY> + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | +-field_descriptor=nested_repeated_value + | | | +-default_value=[] + | | +-arr8_len#47 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#46) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) + | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#46) + | | +-arr9#48 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-GetProtoField + | | | | +-type=PROTO + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | | | +-field_descriptor=nested_value + | | | | +-default_value=NULL + | | | +-field_descriptor=nested_repeated_int64 + | | | +-default_value=[] + | | +-arr9_len#49 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#48) + | | | +-Literal(type=INT64, value=0) + | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#48) + | | +-mode#50 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | +-Literal(type=ENUM, value=STRICT) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | +-Literal(type=ENUM, value=STRICT) + | | +-strict_check#51 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#50) + | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | +-FunctionCall(ZetaSQL:least(repeated(10) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#31) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#33) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#35) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#37) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#39) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#41) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#43) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#45) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#47) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#49) + | | | | +-FunctionCall(ZetaSQL:greatest(repeated(10) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#31) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#33) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#35) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#37) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#39) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#41) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#43) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#45) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#47) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#49) + | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | +-Literal(type=INT64, value=NULL) + | | +-result_len#52 := + | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#50) + | | | +-Literal(type=ENUM, value=TRUNCATE) + | | +-FunctionCall(ZetaSQL:least(repeated(10) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#31) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#33) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#35) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#37) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#39) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#41) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#43) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#45) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#47) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#49) + | | +-FunctionCall(ZetaSQL:greatest(repeated(10) INT64) -> INT64) + | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#31) + | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#33) + | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#35) + | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#37) + | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#39) + | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#41) + | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#43) + | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#45) + | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#47) + | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#49) + | +-expr= + | +-SubqueryExpr + | +-type=ARRAY, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>> + | +-subquery_type=ARRAY + | +-parameter_list= + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#30) + | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#32) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#34) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#36) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#38) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#40) + | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#42) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#44) + | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#46) + | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#48) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#52) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#82] + | +-is_ordered=TRUE + | +-expr_list= + | | +-$struct#82 := + | | +-MakeStruct + | | +-type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64> + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.arr0#53) + | | +-ColumnRef(type=STRUCT, column=$array.arr1#55) + | | +-ColumnRef(type=INT32, column=$array.arr2#58) + | | +-ColumnRef(type=STRING, column=$array.arr3#61) + | | +-ColumnRef(type=STRING, column=$array.arr4#64) + | | +-ColumnRef(type=INT32, column=$array.arr5#67) + | | +-ColumnRef(type=STRUCT, column=$array.arr6#70) + | | +-ColumnRef(type=FLOAT, column=$array.arr7#73) + | | +-ColumnRef(type=PROTO, column=$array.arr8#76) + | | +-ColumnRef(type=INT64, column=$array.arr9#79) + | | +-ColumnRef(type=INT64, column=$full_join.offset#81) + | +-input_scan= + | +-OrderByScan + | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74, $full_join.offset#75, $array.arr8#76, $array_offset.offset#77, $full_join.offset#78, $array.arr9#79, $array_offset.offset#80, $full_join.offset#81] + | +-is_ordered=TRUE + | +-input_scan= + | | +-FilterScan + | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74, $full_join.offset#75, $array.arr8#76, $array_offset.offset#77, $full_join.offset#78, $array.arr9#79, $array_offset.offset#80, $full_join.offset#81] + | | +-input_scan= + | | | +-ProjectScan + | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74, $full_join.offset#75, $array.arr8#76, $array_offset.offset#77, $full_join.offset#78, $array.arr9#79, $array_offset.offset#80, $full_join.offset#81] + | | | +-expr_list= + | | | | +-offset#81 := + | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$full_join.offset#78) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#80) + | | | +-input_scan= + | | | +-JoinScan + | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74, $full_join.offset#75, $array.arr8#76, $array_offset.offset#77, $full_join.offset#78, $array.arr9#79, $array_offset.offset#80] + | | | +-join_type=FULL + | | | +-left_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74, $full_join.offset#75, $array.arr8#76, $array_offset.offset#77, $full_join.offset#78] + | | | | +-expr_list= + | | | | | +-offset#78 := + | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#75) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#77) + | | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74, $full_join.offset#75, $array.arr8#76, $array_offset.offset#77] + | | | | +-join_type=FULL + | | | | +-left_scan= + | | | | | +-ProjectScan + | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74, $full_join.offset#75] + | | | | | +-expr_list= + | | | | | | +-offset#75 := + | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#72) + | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#74) + | | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72, $array.arr7#73, $array_offset.offset#74] + | | | | | +-join_type=FULL + | | | | | +-left_scan= + | | | | | | +-ProjectScan + | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71, $full_join.offset#72] + | | | | | | +-expr_list= + | | | | | | | +-offset#72 := + | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#69) + | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#71) + | | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69, $array.arr6#70, $array_offset.offset#71] + | | | | | | +-join_type=FULL + | | | | | | +-left_scan= + | | | | | | | +-ProjectScan + | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68, $full_join.offset#69] + | | | | | | | +-expr_list= + | | | | | | | | +-offset#69 := + | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#66) + | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#68) + | | | | | | | +-input_scan= + | | | | | | | +-JoinScan + | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66, $array.arr5#67, $array_offset.offset#68] + | | | | | | | +-join_type=FULL + | | | | | | | +-left_scan= + | | | | | | | | +-ProjectScan + | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65, $full_join.offset#66] + | | | | | | | | +-expr_list= + | | | | | | | | | +-offset#66 := + | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#63) + | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#65) + | | | | | | | | +-input_scan= + | | | | | | | | +-JoinScan + | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63, $array.arr4#64, $array_offset.offset#65] + | | | | | | | | +-join_type=FULL + | | | | | | | | +-left_scan= + | | | | | | | | | +-ProjectScan + | | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62, $full_join.offset#63] + | | | | | | | | | +-expr_list= + | | | | | | | | | | +-offset#63 := + | | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#60) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#62) + | | | | | | | | | +-input_scan= + | | | | | | | | | +-JoinScan + | | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60, $array.arr3#61, $array_offset.offset#62] + | | | | | | | | | +-join_type=FULL + | | | | | | | | | +-left_scan= + | | | | | | | | | | +-ProjectScan + | | | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59, $full_join.offset#60] + | | | | | | | | | | +-expr_list= + | | | | | | | | | | | +-offset#60 := + | | | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#57) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#59) + | | | | | | | | | | +-input_scan= + | | | | | | | | | | +-JoinScan + | | | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57, $array.arr2#58, $array_offset.offset#59] + | | | | | | | | | | +-join_type=FULL + | | | | | | | | | | +-left_scan= + | | | | | | | | | | | +-ProjectScan + | | | | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56, $full_join.offset#57] + | | | | | | | | | | | +-expr_list= + | | | | | | | | | | | | +-offset#57 := + | | | | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#54) + | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#56) + | | | | | | | | | | | +-input_scan= + | | | | | | | | | | | +-JoinScan + | | | | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54, $array.arr1#55, $array_offset.offset#56] + | | | | | | | | | | | +-join_type=FULL + | | | | | | | | | | | +-left_scan= + | | | | | | | | | | | | +-ArrayScan + | | | | | | | | | | | | +-column_list=[$array.arr0#53, $array_offset.offset#54] + | | | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#30, is_correlated=TRUE) + | | | | | | | | | | | | +-element_column_list=[$array.arr0#53] + | | | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#54) + | | | | | | | | | | | +-right_scan= + | | | | | | | | | | | | +-ArrayScan + | | | | | | | | | | | | +-column_list=[$array.arr1#55, $array_offset.offset#56] + | | | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#32, is_correlated=TRUE) + | | | | | | | | | | | | +-element_column_list=[$array.arr1#55] + | | | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#56) + | | | | | | | | | | | +-join_expr= + | | | | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#54) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#56) + | | | | | | | | | | +-right_scan= + | | | | | | | | | | | +-ArrayScan + | | | | | | | | | | | +-column_list=[$array.arr2#58, $array_offset.offset#59] + | | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#34, is_correlated=TRUE) + | | | | | | | | | | | +-element_column_list=[$array.arr2#58] + | | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#59) + | | | | | | | | | | +-join_expr= + | | | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#57) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#59) + | | | | | | | | | +-right_scan= + | | | | | | | | | | +-ArrayScan + | | | | | | | | | | +-column_list=[$array.arr3#61, $array_offset.offset#62] + | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#36, is_correlated=TRUE) + | | | | | | | | | | +-element_column_list=[$array.arr3#61] + | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#62) + | | | | | | | | | +-join_expr= + | | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#60) + | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#62) + | | | | | | | | +-right_scan= + | | | | | | | | | +-ArrayScan + | | | | | | | | | +-column_list=[$array.arr4#64, $array_offset.offset#65] + | | | | | | | | | +-array_expr_list= + | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#38, is_correlated=TRUE) + | | | | | | | | | +-element_column_list=[$array.arr4#64] + | | | | | | | | | +-array_offset_column= + | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#65) + | | | | | | | | +-join_expr= + | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#63) + | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#65) + | | | | | | | +-right_scan= + | | | | | | | | +-ArrayScan + | | | | | | | | +-column_list=[$array.arr5#67, $array_offset.offset#68] + | | | | | | | | +-array_expr_list= + | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#40, is_correlated=TRUE) + | | | | | | | | +-element_column_list=[$array.arr5#67] + | | | | | | | | +-array_offset_column= + | | | | | | | | +-ColumnHolder(column=$array_offset.offset#68) + | | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#66) + | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#68) + | | | | | | +-right_scan= + | | | | | | | +-ArrayScan + | | | | | | | +-column_list=[$array.arr6#70, $array_offset.offset#71] + | | | | | | | +-array_expr_list= + | | | | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#42, is_correlated=TRUE) + | | | | | | | +-element_column_list=[$array.arr6#70] + | | | | | | | +-array_offset_column= + | | | | | | | +-ColumnHolder(column=$array_offset.offset#71) + | | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#69) + | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#71) + | | | | | +-right_scan= + | | | | | | +-ArrayScan + | | | | | | +-column_list=[$array.arr7#73, $array_offset.offset#74] + | | | | | | +-array_expr_list= + | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#44, is_correlated=TRUE) + | | | | | | +-element_column_list=[$array.arr7#73] + | | | | | | +-array_offset_column= + | | | | | | +-ColumnHolder(column=$array_offset.offset#74) + | | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#72) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#74) + | | | | +-right_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr8#76, $array_offset.offset#77] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#46, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr8#76] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#77) + | | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$full_join.offset#75) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#77) + | | | +-right_scan= + | | | | +-ArrayScan + | | | | +-column_list=[$array.arr9#79, $array_offset.offset#80] + | | | | +-array_expr_list= + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#48, is_correlated=TRUE) + | | | | +-element_column_list=[$array.arr9#79] + | | | | +-array_offset_column= + | | | | +-ColumnHolder(column=$array_offset.offset#80) + | | | +-join_expr= + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#78) + | | | +-ColumnRef(type=INT64, column=$array_offset.offset#80) + | | +-filter_expr= + | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | +-ColumnRef(type=INT64, column=$full_join.offset#81) + | | +-ColumnRef(type=INT64, column=$with_expr.result_len#52, is_correlated=TRUE) + | +-order_by_item_list= + | +-OrderByItem + | +-column_ref= + | +-ColumnRef(type=INT64, column=$full_join.offset#81) + +-element_column_list=[$array.$with_expr_element#83] +== + +# 10 arguments: literal and correlated path expressions with explicit and +# inferred aliases and without alias. +# Note that, SELECT * will expand STRUCT or PROTO array element. +SELECT ( + SELECT AS STRUCT * + FROM UNNEST( + [1, 2] AS array1, + [STRUCT(1 AS x), STRUCT(2)] AS array2, + TestTable.KitchenSink.repeated_int32_val AS array3, + ["hello"], + TestTable.KitchenSink.repeated_string_val AS array5, + ComplexTypes.Int32Array AS array6, + MoreComplexTypes.ArrayOfStruct AS array7, + TestTable.KitchenSink.repeated_float_val, + TestTable.KitchenSink.nested_repeated_value, + TestTable.KitchenSink.nested_value.nested_repeated_int64, + mode => 'TRUNCATE' + ) +) AS col1 +FROM TestTable, ComplexTypes, MoreComplexTypes +-- +QueryStmt ++-output_column_list= +| +-$query.col1#31 AS col1 [STRUCT, nested_repeated_int32 ARRAY, value ARRAY, nested_repeated_int64 INT64>] ++-query= + +-ProjectScan + +-column_list=[$query.col1#31] + +-expr_list= + | +-col1#31 := + | +-SubqueryExpr + | +-type=STRUCT, nested_repeated_int32 ARRAY, value ARRAY, nested_repeated_int64 INT64> + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | | +-ColumnRef(type=ARRAY>, column=MoreComplexTypes.ArrayOfStruct#11) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#30] + | +-expr_list= + | | +-$struct#30 := + | | +-MakeStruct + | | +-type=STRUCT, nested_repeated_int32 ARRAY, value ARRAY, nested_repeated_int64 INT64> + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.array1#13) + | | +-ColumnRef(type=INT64, column=$expr_subquery.x#23) + | | +-ColumnRef(type=INT32, column=$array.array3#15) + | | +-ColumnRef(type=STRING, column=$array.$unnest1#16) + | | +-ColumnRef(type=STRING, column=$array.array5#17) + | | +-ColumnRef(type=INT32, column=$array.array6#18) + | | +-ColumnRef(type=INT32, column=$expr_subquery.a#24) + | | +-ColumnRef(type=STRING, column=$expr_subquery.b#25) + | | +-ColumnRef(type=FLOAT, column=$array.repeated_float_val#20) + | | +-ColumnRef(type=INT64, column=$expr_subquery.nested_int64#26) + | | +-ColumnRef(type=ARRAY, column=$expr_subquery.nested_repeated_int64#27) + | | +-ColumnRef(type=ARRAY, column=$expr_subquery.nested_repeated_int32#28) + | | +-ColumnRef(type=ARRAY, column=$expr_subquery.value#29) + | | +-ColumnRef(type=INT64, column=$array.nested_repeated_int64#22) + | +-input_scan= + | +-ProjectScan + | +-column_list=[$array.array1#13, $expr_subquery.x#23, $array.array3#15, $array.$unnest1#16, $array.array5#17, $array.array6#18, $expr_subquery.a#24, $expr_subquery.b#25, $array.repeated_float_val#20, $expr_subquery.nested_int64#26, $expr_subquery.nested_repeated_int64#27, $expr_subquery.nested_repeated_int32#28, $expr_subquery.value#29, $array.nested_repeated_int64#22] + | +-expr_list= + | | +-x#23 := + | | | +-GetStructField + | | | +-type=INT64 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.array2#14) + | | | +-field_idx=0 + | | +-a#24 := + | | | +-GetStructField + | | | +-type=INT32 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | | +-field_idx=0 + | | +-b#25 := + | | | +-GetStructField + | | | +-type=STRING + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | | +-field_idx=1 + | | +-nested_int64#26 := + | | | +-GetProtoField + | | | +-type=INT64 + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | | +-field_descriptor=nested_int64 + | | | +-default_value=88 + | | +-nested_repeated_int64#27 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | | +-field_descriptor=nested_repeated_int64 + | | | +-default_value=[] + | | +-nested_repeated_int32#28 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | | +-field_descriptor=nested_repeated_int32 + | | | +-default_value=[] + | | +-value#29 := + | | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=value + | | +-default_value=[] + | +-input_scan= + | +-ArrayScan + | +-column_list=$array.[array1#13, array2#14, array3#15, $unnest1#16, array5#17, array6#18, array7#19, repeated_float_val#20, nested_repeated_value#21, nested_repeated_int64#22] + | +-array_expr_list= + | | +-Literal(type=ARRAY, value=[1, 2]) + | | +-Literal(type=ARRAY>, value=[{x:1}, {x:2}]) + | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | +-field_descriptor=repeated_int32_val + | | | +-default_value=[] + | | +-Literal(type=ARRAY, value=["hello"]) + | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | +-field_descriptor=repeated_string_val + | | | +-default_value=[] + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7, is_correlated=TRUE) + | | +-ColumnRef(type=ARRAY>, column=MoreComplexTypes.ArrayOfStruct#11, is_correlated=TRUE) + | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | +-field_descriptor=repeated_float_val + | | | +-default_value=[] + | | +-GetProtoField + | | | +-type=ARRAY> + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | +-field_descriptor=nested_repeated_value + | | | +-default_value=[] + | | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-GetProtoField + | | | +-type=PROTO + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | +-field_descriptor=nested_value + | | | +-default_value=NULL + | | +-field_descriptor=nested_repeated_int64 + | | +-default_value=[] + | +-element_column_list=$array.[array1#13, array2#14, array3#15, $unnest1#16, array5#17, array6#18, array7#19, repeated_float_val#20, nested_repeated_value#21, nested_repeated_int64#22] + | +-array_zip_mode= + | +-Literal(type=ENUM, value=TRUNCATE) + +-input_scan= + +-JoinScan + +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7, MoreComplexTypes.ArrayOfStruct#11] + +-left_scan= + | +-JoinScan + | +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7] + | +-left_scan= + | | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + | +-right_scan= + | +-TableScan(column_list=[ComplexTypes.Int32Array#7], table=ComplexTypes, column_index_list=[3]) + +-right_scan= + +-TableScan(column_list=[MoreComplexTypes.ArrayOfStruct#11], table=MoreComplexTypes, column_index_list=[1]) + +[REWRITTEN AST] +QueryStmt ++-output_column_list= +| +-$query.col1#31 AS col1 [STRUCT, nested_repeated_int32 ARRAY, value ARRAY, nested_repeated_int64 INT64>] ++-query= + +-ProjectScan + +-column_list=[$query.col1#31] + +-expr_list= + | +-col1#31 := + | +-SubqueryExpr + | +-type=STRUCT, nested_repeated_int32 ARRAY, value ARRAY, nested_repeated_int64 INT64> + | +-subquery_type=SCALAR + | +-parameter_list= + | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3) + | | +-ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7) + | | +-ColumnRef(type=ARRAY>, column=MoreComplexTypes.ArrayOfStruct#11) + | +-subquery= + | +-ProjectScan + | +-column_list=[$make_struct.$struct#30] + | +-expr_list= + | | +-$struct#30 := + | | +-MakeStruct + | | +-type=STRUCT, nested_repeated_int32 ARRAY, value ARRAY, nested_repeated_int64 INT64> + | | +-field_list= + | | +-ColumnRef(type=INT64, column=$array.array1#13) + | | +-ColumnRef(type=INT64, column=$expr_subquery.x#23) + | | +-ColumnRef(type=INT32, column=$array.array3#15) + | | +-ColumnRef(type=STRING, column=$array.$unnest1#16) + | | +-ColumnRef(type=STRING, column=$array.array5#17) + | | +-ColumnRef(type=INT32, column=$array.array6#18) + | | +-ColumnRef(type=INT32, column=$expr_subquery.a#24) + | | +-ColumnRef(type=STRING, column=$expr_subquery.b#25) + | | +-ColumnRef(type=FLOAT, column=$array.repeated_float_val#20) + | | +-ColumnRef(type=INT64, column=$expr_subquery.nested_int64#26) + | | +-ColumnRef(type=ARRAY, column=$expr_subquery.nested_repeated_int64#27) + | | +-ColumnRef(type=ARRAY, column=$expr_subquery.nested_repeated_int32#28) + | | +-ColumnRef(type=ARRAY, column=$expr_subquery.value#29) + | | +-ColumnRef(type=INT64, column=$array.nested_repeated_int64#22) + | +-input_scan= + | +-ProjectScan + | +-column_list=[$array.array1#13, $expr_subquery.x#23, $array.array3#15, $array.$unnest1#16, $array.array5#17, $array.array6#18, $expr_subquery.a#24, $expr_subquery.b#25, $array.repeated_float_val#20, $expr_subquery.nested_int64#26, $expr_subquery.nested_repeated_int64#27, $expr_subquery.nested_repeated_int32#28, $expr_subquery.value#29, $array.nested_repeated_int64#22] + | +-expr_list= + | | +-x#23 := + | | | +-GetStructField + | | | +-type=INT64 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.array2#14) + | | | +-field_idx=0 + | | +-a#24 := + | | | +-GetStructField + | | | +-type=INT32 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | | +-field_idx=0 + | | +-b#25 := + | | | +-GetStructField + | | | +-type=STRING + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, column=$array.array7#19) + | | | +-field_idx=1 + | | +-nested_int64#26 := + | | | +-GetProtoField + | | | +-type=INT64 + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | | +-field_descriptor=nested_int64 + | | | +-default_value=88 + | | +-nested_repeated_int64#27 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | | +-field_descriptor=nested_repeated_int64 + | | | +-default_value=[] + | | +-nested_repeated_int32#28 := + | | | +-GetProtoField + | | | +-type=ARRAY + | | | +-expr= + | | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | | +-field_descriptor=nested_repeated_int32 + | | | +-default_value=[] + | | +-value#29 := + | | +-GetProtoField + | | +-type=ARRAY + | | +-expr= + | | | +-ColumnRef(type=PROTO, column=$array.nested_repeated_value#21) + | | +-field_descriptor=value + | | +-default_value=[] + | +-input_scan= + | +-ProjectScan + | +-column_list=$array.[array1#13, array2#14, array3#15, $unnest1#16, array5#17, array6#18, array7#19, repeated_float_val#20, nested_repeated_value#21, nested_repeated_int64#22] + | +-expr_list= + | | +-array1#13 := + | | | +-GetStructField + | | | +-type=INT64 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=0 + | | +-array2#14 := + | | | +-GetStructField + | | | +-type=STRUCT + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=1 + | | +-array3#15 := + | | | +-GetStructField + | | | +-type=INT32 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=2 + | | +-$unnest1#16 := + | | | +-GetStructField + | | | +-type=STRING + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=3 + | | +-array5#17 := + | | | +-GetStructField + | | | +-type=STRING + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=4 + | | +-array6#18 := + | | | +-GetStructField + | | | +-type=INT32 + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=5 + | | +-array7#19 := + | | | +-GetStructField + | | | +-type=STRUCT + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=6 + | | +-repeated_float_val#20 := + | | | +-GetStructField + | | | +-type=FLOAT + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=7 + | | +-nested_repeated_value#21 := + | | | +-GetStructField + | | | +-type=PROTO + | | | +-expr= + | | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | | +-field_idx=8 + | | +-nested_repeated_int64#22 := + | | +-GetStructField + | | +-type=INT64 + | | +-expr= + | | | +-ColumnRef(type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>, column=$array.$with_expr_element#85) + | | +-field_idx=9 + | +-input_scan= + | +-ArrayScan + | +-column_list=[$array.$with_expr_element#85] + | +-array_expr_list= + | | +-WithExpr + | | +-type=ARRAY, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>> + | | +-assignment_list= + | | | +-arr0#32 := Literal(type=ARRAY, value=[1, 2]) + | | | +-arr0_len#33 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#32) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#32) + | | | +-arr1#34 := Literal(type=ARRAY>, value=[{x:1}, {x:2}]) + | | | +-arr1_len#35 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#34) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) + | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#34) + | | | +-arr2#36 := + | | | | +-GetProtoField + | | | | +-type=ARRAY + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | +-field_descriptor=repeated_int32_val + | | | | +-default_value=[] + | | | +-arr2_len#37 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#36) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#36) + | | | +-arr3#38 := Literal(type=ARRAY, value=["hello"]) + | | | +-arr3_len#39 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#38) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#38) + | | | +-arr4#40 := + | | | | +-GetProtoField + | | | | +-type=ARRAY + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | +-field_descriptor=repeated_string_val + | | | | +-default_value=[] + | | | +-arr4_len#41 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#40) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#40) + | | | +-arr5#42 := ColumnRef(type=ARRAY, column=ComplexTypes.Int32Array#7, is_correlated=TRUE) + | | | +-arr5_len#43 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#42) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#42) + | | | +-arr6#44 := ColumnRef(type=ARRAY>, column=MoreComplexTypes.ArrayOfStruct#11, is_correlated=TRUE) + | | | +-arr6_len#45 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#44) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) + | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#44) + | | | +-arr7#46 := + | | | | +-GetProtoField + | | | | +-type=ARRAY + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | +-field_descriptor=repeated_float_val + | | | | +-default_value=[] + | | | +-arr7_len#47 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#46) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#46) + | | | +-arr8#48 := + | | | | +-GetProtoField + | | | | +-type=ARRAY> + | | | | +-expr= + | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | +-field_descriptor=nested_repeated_value + | | | | +-default_value=[] + | | | +-arr8_len#49 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY>) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#48) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY>) -> INT64) + | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#48) + | | | +-arr9#50 := + | | | | +-GetProtoField + | | | | +-type=ARRAY + | | | | +-expr= + | | | | | +-GetProtoField + | | | | | +-type=PROTO + | | | | | +-expr= + | | | | | | +-ColumnRef(type=PROTO, column=TestTable.KitchenSink#3, is_correlated=TRUE) + | | | | | +-field_descriptor=nested_value + | | | | | +-default_value=NULL + | | | | +-field_descriptor=nested_repeated_int64 + | | | | +-default_value=[] + | | | +-arr9_len#51 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$is_null(ARRAY) -> BOOL) + | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#50) + | | | | +-Literal(type=INT64, value=0) + | | | | +-FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) + | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#50) + | | | +-mode#52 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, ENUM, ENUM) -> ENUM) + | | | | +-FunctionCall(ZetaSQL:$is_null(ENUM) -> BOOL) + | | | | | +-Literal(type=ENUM, value=TRUNCATE) + | | | | +-FunctionCall(ZetaSQL:error(STRING) -> ENUM) + | | | | | +-Literal(type=STRING, value="UNNEST does not allow NULL mode argument") + | | | | +-Literal(type=ENUM, value=TRUNCATE) + | | | +-strict_check#53 := + | | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | | +-FunctionCall(ZetaSQL:$and(repeated(2) BOOL) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#52) + | | | | | | +-Literal(type=ENUM, value=STRICT) + | | | | | +-FunctionCall(ZetaSQL:$not_equal(INT64, INT64) -> BOOL) + | | | | | +-FunctionCall(ZetaSQL:least(repeated(10) INT64) -> INT64) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#33) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#35) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#37) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#39) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#41) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#43) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#45) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#47) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#49) + | | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#51) + | | | | | +-FunctionCall(ZetaSQL:greatest(repeated(10) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#33) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#35) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#37) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#39) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#41) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#43) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#45) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#47) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#49) + | | | | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#51) + | | | | +-FunctionCall(ZetaSQL:error(STRING) -> INT64) + | | | | | +-Literal(type=STRING, value="Unnested arrays under STRICT mode must have equal lengths") + | | | | +-Literal(type=INT64, value=NULL) + | | | +-result_len#54 := + | | | +-FunctionCall(ZetaSQL:if(BOOL, INT64, INT64) -> INT64) + | | | +-FunctionCall(ZetaSQL:$equal(ENUM, ENUM) -> BOOL) + | | | | +-ColumnRef(type=ENUM, column=$with_expr.mode#52) + | | | | +-Literal(type=ENUM, value=TRUNCATE) + | | | +-FunctionCall(ZetaSQL:least(repeated(10) INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#33) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#35) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#37) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#39) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#41) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#43) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#45) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#47) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#49) + | | | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#51) + | | | +-FunctionCall(ZetaSQL:greatest(repeated(10) INT64) -> INT64) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr0_len#33) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr1_len#35) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr2_len#37) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr3_len#39) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr4_len#41) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr5_len#43) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr6_len#45) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr7_len#47) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr8_len#49) + | | | +-ColumnRef(type=INT64, column=$with_expr.arr9_len#51) + | | +-expr= + | | +-SubqueryExpr + | | +-type=ARRAY, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64>> + | | +-subquery_type=ARRAY + | | +-parameter_list= + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#32) + | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#34) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#36) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#38) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#40) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#42) + | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#44) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#46) + | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#48) + | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#50) + | | | +-ColumnRef(type=INT64, column=$with_expr.result_len#54) + | | +-subquery= + | | +-ProjectScan + | | +-column_list=[$make_struct.$struct#84] + | | +-is_ordered=TRUE + | | +-expr_list= + | | | +-$struct#84 := + | | | +-MakeStruct + | | | +-type=STRUCT, arr2 INT32, arr3 STRING, arr4 STRING, arr5 INT32, arr6 STRUCT, arr7 FLOAT, arr8 PROTO, arr9 INT64, offset INT64> + | | | +-field_list= + | | | +-ColumnRef(type=INT64, column=$array.arr0#55) + | | | +-ColumnRef(type=STRUCT, column=$array.arr1#57) + | | | +-ColumnRef(type=INT32, column=$array.arr2#60) + | | | +-ColumnRef(type=STRING, column=$array.arr3#63) + | | | +-ColumnRef(type=STRING, column=$array.arr4#66) + | | | +-ColumnRef(type=INT32, column=$array.arr5#69) + | | | +-ColumnRef(type=STRUCT, column=$array.arr6#72) + | | | +-ColumnRef(type=FLOAT, column=$array.arr7#75) + | | | +-ColumnRef(type=PROTO, column=$array.arr8#78) + | | | +-ColumnRef(type=INT64, column=$array.arr9#81) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#83) + | | +-input_scan= + | | +-OrderByScan + | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76, $full_join.offset#77, $array.arr8#78, $array_offset.offset#79, $full_join.offset#80, $array.arr9#81, $array_offset.offset#82, $full_join.offset#83] + | | +-is_ordered=TRUE + | | +-input_scan= + | | | +-FilterScan + | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76, $full_join.offset#77, $array.arr8#78, $array_offset.offset#79, $full_join.offset#80, $array.arr9#81, $array_offset.offset#82, $full_join.offset#83] + | | | +-input_scan= + | | | | +-ProjectScan + | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76, $full_join.offset#77, $array.arr8#78, $array_offset.offset#79, $full_join.offset#80, $array.arr9#81, $array_offset.offset#82, $full_join.offset#83] + | | | | +-expr_list= + | | | | | +-offset#83 := + | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#80) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#82) + | | | | +-input_scan= + | | | | +-JoinScan + | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76, $full_join.offset#77, $array.arr8#78, $array_offset.offset#79, $full_join.offset#80, $array.arr9#81, $array_offset.offset#82] + | | | | +-join_type=FULL + | | | | +-left_scan= + | | | | | +-ProjectScan + | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76, $full_join.offset#77, $array.arr8#78, $array_offset.offset#79, $full_join.offset#80] + | | | | | +-expr_list= + | | | | | | +-offset#80 := + | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#77) + | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#79) + | | | | | +-input_scan= + | | | | | +-JoinScan + | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76, $full_join.offset#77, $array.arr8#78, $array_offset.offset#79] + | | | | | +-join_type=FULL + | | | | | +-left_scan= + | | | | | | +-ProjectScan + | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76, $full_join.offset#77] + | | | | | | +-expr_list= + | | | | | | | +-offset#77 := + | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#74) + | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#76) + | | | | | | +-input_scan= + | | | | | | +-JoinScan + | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74, $array.arr7#75, $array_offset.offset#76] + | | | | | | +-join_type=FULL + | | | | | | +-left_scan= + | | | | | | | +-ProjectScan + | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73, $full_join.offset#74] + | | | | | | | +-expr_list= + | | | | | | | | +-offset#74 := + | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#71) + | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#73) + | | | | | | | +-input_scan= + | | | | | | | +-JoinScan + | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71, $array.arr6#72, $array_offset.offset#73] + | | | | | | | +-join_type=FULL + | | | | | | | +-left_scan= + | | | | | | | | +-ProjectScan + | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70, $full_join.offset#71] + | | | | | | | | +-expr_list= + | | | | | | | | | +-offset#71 := + | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#68) + | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#70) + | | | | | | | | +-input_scan= + | | | | | | | | +-JoinScan + | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68, $array.arr5#69, $array_offset.offset#70] + | | | | | | | | +-join_type=FULL + | | | | | | | | +-left_scan= + | | | | | | | | | +-ProjectScan + | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67, $full_join.offset#68] + | | | | | | | | | +-expr_list= + | | | | | | | | | | +-offset#68 := + | | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#65) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#67) + | | | | | | | | | +-input_scan= + | | | | | | | | | +-JoinScan + | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65, $array.arr4#66, $array_offset.offset#67] + | | | | | | | | | +-join_type=FULL + | | | | | | | | | +-left_scan= + | | | | | | | | | | +-ProjectScan + | | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64, $full_join.offset#65] + | | | | | | | | | | +-expr_list= + | | | | | | | | | | | +-offset#65 := + | | | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#62) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#64) + | | | | | | | | | | +-input_scan= + | | | | | | | | | | +-JoinScan + | | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62, $array.arr3#63, $array_offset.offset#64] + | | | | | | | | | | +-join_type=FULL + | | | | | | | | | | +-left_scan= + | | | | | | | | | | | +-ProjectScan + | | | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61, $full_join.offset#62] + | | | | | | | | | | | +-expr_list= + | | | | | | | | | | | | +-offset#62 := + | | | | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#59) + | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#61) + | | | | | | | | | | | +-input_scan= + | | | | | | | | | | | +-JoinScan + | | | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59, $array.arr2#60, $array_offset.offset#61] + | | | | | | | | | | | +-join_type=FULL + | | | | | | | | | | | +-left_scan= + | | | | | | | | | | | | +-ProjectScan + | | | | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58, $full_join.offset#59] + | | | | | | | | | | | | +-expr_list= + | | | | | | | | | | | | | +-offset#59 := + | | | | | | | | | | | | | +-FunctionCall(ZetaSQL:coalesce(repeated(2) INT64) -> INT64) + | | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#56) + | | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#58) + | | | | | | | | | | | | +-input_scan= + | | | | | | | | | | | | +-JoinScan + | | | | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56, $array.arr1#57, $array_offset.offset#58] + | | | | | | | | | | | | +-join_type=FULL + | | | | | | | | | | | | +-left_scan= + | | | | | | | | | | | | | +-ArrayScan + | | | | | | | | | | | | | +-column_list=[$array.arr0#55, $array_offset.offset#56] + | | | | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr0#32, is_correlated=TRUE) + | | | | | | | | | | | | | +-element_column_list=[$array.arr0#55] + | | | | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#56) + | | | | | | | | | | | | +-right_scan= + | | | | | | | | | | | | | +-ArrayScan + | | | | | | | | | | | | | +-column_list=[$array.arr1#57, $array_offset.offset#58] + | | | | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr1#34, is_correlated=TRUE) + | | | | | | | | | | | | | +-element_column_list=[$array.arr1#57] + | | | | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#58) + | | | | | | | | | | | | +-join_expr= + | | | | | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#56) + | | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#58) + | | | | | | | | | | | +-right_scan= + | | | | | | | | | | | | +-ArrayScan + | | | | | | | | | | | | +-column_list=[$array.arr2#60, $array_offset.offset#61] + | | | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr2#36, is_correlated=TRUE) + | | | | | | | | | | | | +-element_column_list=[$array.arr2#60] + | | | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#61) + | | | | | | | | | | | +-join_expr= + | | | | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#59) + | | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#61) + | | | | | | | | | | +-right_scan= + | | | | | | | | | | | +-ArrayScan + | | | | | | | | | | | +-column_list=[$array.arr3#63, $array_offset.offset#64] + | | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr3#38, is_correlated=TRUE) + | | | | | | | | | | | +-element_column_list=[$array.arr3#63] + | | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#64) + | | | | | | | | | | +-join_expr= + | | | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#62) + | | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#64) + | | | | | | | | | +-right_scan= + | | | | | | | | | | +-ArrayScan + | | | | | | | | | | +-column_list=[$array.arr4#66, $array_offset.offset#67] + | | | | | | | | | | +-array_expr_list= + | | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr4#40, is_correlated=TRUE) + | | | | | | | | | | +-element_column_list=[$array.arr4#66] + | | | | | | | | | | +-array_offset_column= + | | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#67) + | | | | | | | | | +-join_expr= + | | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#65) + | | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#67) + | | | | | | | | +-right_scan= + | | | | | | | | | +-ArrayScan + | | | | | | | | | +-column_list=[$array.arr5#69, $array_offset.offset#70] + | | | | | | | | | +-array_expr_list= + | | | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr5#42, is_correlated=TRUE) + | | | | | | | | | +-element_column_list=[$array.arr5#69] + | | | | | | | | | +-array_offset_column= + | | | | | | | | | +-ColumnHolder(column=$array_offset.offset#70) + | | | | | | | | +-join_expr= + | | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#68) + | | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#70) + | | | | | | | +-right_scan= + | | | | | | | | +-ArrayScan + | | | | | | | | +-column_list=[$array.arr6#72, $array_offset.offset#73] + | | | | | | | | +-array_expr_list= + | | | | | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr6#44, is_correlated=TRUE) + | | | | | | | | +-element_column_list=[$array.arr6#72] + | | | | | | | | +-array_offset_column= + | | | | | | | | +-ColumnHolder(column=$array_offset.offset#73) + | | | | | | | +-join_expr= + | | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#71) + | | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#73) + | | | | | | +-right_scan= + | | | | | | | +-ArrayScan + | | | | | | | +-column_list=[$array.arr7#75, $array_offset.offset#76] + | | | | | | | +-array_expr_list= + | | | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr7#46, is_correlated=TRUE) + | | | | | | | +-element_column_list=[$array.arr7#75] + | | | | | | | +-array_offset_column= + | | | | | | | +-ColumnHolder(column=$array_offset.offset#76) + | | | | | | +-join_expr= + | | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#74) + | | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#76) + | | | | | +-right_scan= + | | | | | | +-ArrayScan + | | | | | | +-column_list=[$array.arr8#78, $array_offset.offset#79] + | | | | | | +-array_expr_list= + | | | | | | | +-ColumnRef(type=ARRAY>, column=$with_expr.arr8#48, is_correlated=TRUE) + | | | | | | +-element_column_list=[$array.arr8#78] + | | | | | | +-array_offset_column= + | | | | | | +-ColumnHolder(column=$array_offset.offset#79) + | | | | | +-join_expr= + | | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | | +-ColumnRef(type=INT64, column=$full_join.offset#77) + | | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#79) + | | | | +-right_scan= + | | | | | +-ArrayScan + | | | | | +-column_list=[$array.arr9#81, $array_offset.offset#82] + | | | | | +-array_expr_list= + | | | | | | +-ColumnRef(type=ARRAY, column=$with_expr.arr9#50, is_correlated=TRUE) + | | | | | +-element_column_list=[$array.arr9#81] + | | | | | +-array_offset_column= + | | | | | +-ColumnHolder(column=$array_offset.offset#82) + | | | | +-join_expr= + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=$full_join.offset#80) + | | | | +-ColumnRef(type=INT64, column=$array_offset.offset#82) + | | | +-filter_expr= + | | | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=$full_join.offset#83) + | | | +-ColumnRef(type=INT64, column=$with_expr.result_len#54, is_correlated=TRUE) + | | +-order_by_item_list= + | | +-OrderByItem + | | +-column_ref= + | | +-ColumnRef(type=INT64, column=$full_join.offset#83) + | +-element_column_list=[$array.$with_expr_element#85] + +-input_scan= + +-JoinScan + +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7, MoreComplexTypes.ArrayOfStruct#11] + +-left_scan= + | +-JoinScan + | +-column_list=[TestTable.KitchenSink#3, ComplexTypes.Int32Array#7] + | +-left_scan= + | | +-TableScan(column_list=[TestTable.KitchenSink#3], table=TestTable, column_index_list=[2]) + | +-right_scan= + | +-TableScan(column_list=[ComplexTypes.Int32Array#7], table=ComplexTypes, column_index_list=[3]) + +-right_scan= + +-TableScan(column_list=[MoreComplexTypes.ArrayOfStruct#11], table=MoreComplexTypes, column_index_list=[1]) +== + diff --git a/zetasql/analyzer/testdata/unnest_single_path.test b/zetasql/analyzer/testdata/unnest_single_path.test index 0499cbfe0..e883314e5 100644 --- a/zetasql/analyzer/testdata/unnest_single_path.test +++ b/zetasql/analyzer/testdata/unnest_single_path.test @@ -304,43 +304,43 @@ from ArrayTypes.ProtoArray -- QueryStmt +-output_column_list= -| +-$query.int32_val1#19 AS int32_val1 [INT32] -| +-$query.int32_val2#20 AS int32_val2 [INT32] -| +-$query.str_value#21 AS str_value [ARRAY] +| +-$query.int32_val1#22 AS int32_val1 [INT32] +| +-$query.int32_val2#23 AS int32_val2 [INT32] +| +-$query.str_value#24 AS str_value [ARRAY] +-query= +-ProjectScan - +-column_list=$query.[int32_val1#19, int32_val2#20, str_value#21] + +-column_list=$query.[int32_val1#22, int32_val2#23, str_value#24] +-expr_list= - | +-int32_val1#19 := + | +-int32_val1#22 := | | +-GetProtoField | | +-type=INT32 | | +-expr= - | | | +-ColumnRef(type=PROTO, column=$array.ProtoArray#18) + | | | +-ColumnRef(type=PROTO, column=$array.ProtoArray#21) | | +-field_descriptor=int32_val1 | | +-default_value=0 - | +-int32_val2#20 := + | +-int32_val2#23 := | | +-GetProtoField | | +-type=INT32 | | +-expr= - | | | +-ColumnRef(type=PROTO, column=$array.ProtoArray#18) + | | | +-ColumnRef(type=PROTO, column=$array.ProtoArray#21) | | +-field_descriptor=int32_val2 | | +-default_value=0 - | +-str_value#21 := + | +-str_value#24 := | +-GetProtoField | +-type=ARRAY | +-expr= - | | +-ColumnRef(type=PROTO, column=$array.ProtoArray#18) + | | +-ColumnRef(type=PROTO, column=$array.ProtoArray#21) | +-field_descriptor=str_value | +-default_value=[] +-input_scan= +-ArrayScan - +-column_list=[$array.ProtoArray#18] + +-column_list=[$array.ProtoArray#21] +-node_source="single_table_array_name_path" +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14]) +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$array.ProtoArray#18] + +-element_column_list=[$array.ProtoArray#21] == # Path expression pattern: table.array.json_field @@ -349,13 +349,13 @@ from ArrayTypes.JsonArray.json_field -- QueryStmt +-output_column_list= -| +-$array.json_field#18 AS json_field [JSON] +| +-$array.json_field#21 AS json_field [JSON] +-query= +-ProjectScan - +-column_list=[$array.json_field#18] + +-column_list=[$array.json_field#21] +-input_scan= +-ArrayScan - +-column_list=[$array.json_field#18] + +-column_list=[$array.json_field#21] +-node_source="single_table_array_name_path" +-input_scan= | +-TableScan(column_list=[ArrayTypes.JsonArray#17], table=ArrayTypes, column_index_list=[16]) @@ -370,7 +370,7 @@ QueryStmt | +-expr= | | +-FlattenedArg(type=JSON) | +-field_name="json_field" - +-element_column_list=[$array.json_field#18] + +-element_column_list=[$array.json_field#21] == # Path expression pattern: @@ -383,13 +383,13 @@ ALTERNATION GROUP: has_int32_val1 -- QueryStmt +-output_column_list= -| +-$array.has_int32_val1#18 AS has_int32_val1 [BOOL] +| +-$array.has_int32_val1#21 AS has_int32_val1 [BOOL] +-query= +-ProjectScan - +-column_list=[$array.has_int32_val1#18] + +-column_list=[$array.has_int32_val1#21] +-input_scan= +-ArrayScan - +-column_list=[$array.has_int32_val1#18] + +-column_list=[$array.has_int32_val1#21] +-node_source="single_table_array_name_path" +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14]) @@ -405,19 +405,19 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=int32_val1 | +-get_has_bit=TRUE - +-element_column_list=[$array.has_int32_val1#18] + +-element_column_list=[$array.has_int32_val1#21] -- ALTERNATION GROUP: int32_val1 -- QueryStmt +-output_column_list= -| +-$array.int32_val1#18 AS int32_val1 [INT32] +| +-$array.int32_val1#21 AS int32_val1 [INT32] +-query= +-ProjectScan - +-column_list=[$array.int32_val1#18] + +-column_list=[$array.int32_val1#21] +-input_scan= +-ArrayScan - +-column_list=[$array.int32_val1#18] + +-column_list=[$array.int32_val1#21] +-node_source="single_table_array_name_path" +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14]) @@ -433,7 +433,7 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=int32_val1 | +-default_value=0 - +-element_column_list=[$array.int32_val1#18] + +-element_column_list=[$array.int32_val1#21] == # Name scoping against array scan's output value table column that has an @@ -681,13 +681,13 @@ SELECT * FROM ArrayTypes.ProtoArray.int32_val1 AS Value PIVOT(COUNT(Value) FOR i -- QueryStmt +-output_column_list= -| +-$array.Value#18 AS Value [INT32] +| +-$array.Value#21 AS Value [INT32] +-query= +-ProjectScan - +-column_list=[$array.Value#18] + +-column_list=[$array.Value#21] +-input_scan= +-ArrayScan - +-column_list=[$array.Value#18] + +-column_list=[$array.Value#21] +-node_source="single_table_array_name_path" +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="ArrayTypes") @@ -703,39 +703,39 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=int32_val1 | +-default_value=0 - +-element_column_list=[$array.Value#18] + +-element_column_list=[$array.Value#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.Value#18 AS Value [INT32] +| +-$array.Value#21 AS Value [INT32] +-query= +-ProjectScan - +-column_list=[$array.Value#18] + +-column_list=[$array.Value#21] +-input_scan= +-ProjectScan - +-column_list=[$array.Value#18] + +-column_list=[$array.Value#21] +-expr_list= - | +-Value#18 := ColumnRef(type=INT32, column=$flatten.injected#20) + | +-Value#21 := ColumnRef(type=INT32, column=$flatten.injected#23) +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-expr_list= - | +-injected#20 := + | +-injected#23 := | +-GetProtoField | +-type=INT32 | +-expr= - | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | +-field_descriptor=int32_val1 | +-default_value=0 +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14], alias="ArrayTypes") +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$flatten.injected#19] + +-element_column_list=[$flatten.injected#22] == [enabled_ast_rewrites=DEFAULTS] @@ -744,13 +744,13 @@ SELECT * FROM ArrayTypes.ProtoArray.int32_val1 UNPIVOT(a FOR b IN (int32_val1)); -- QueryStmt +-output_column_list= -| +-$array.int32_val1#18 AS int32_val1 [INT32] +| +-$array.int32_val1#21 AS int32_val1 [INT32] +-query= +-ProjectScan - +-column_list=[$array.int32_val1#18] + +-column_list=[$array.int32_val1#21] +-input_scan= +-ArrayScan - +-column_list=[$array.int32_val1#18] + +-column_list=[$array.int32_val1#21] +-node_source="single_table_array_name_path" +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14]) @@ -766,39 +766,39 @@ QueryStmt | | +-FlattenedArg(type=PROTO) | +-field_descriptor=int32_val1 | +-default_value=0 - +-element_column_list=[$array.int32_val1#18] + +-element_column_list=[$array.int32_val1#21] [REWRITTEN AST] QueryStmt +-output_column_list= -| +-$array.int32_val1#18 AS int32_val1 [INT32] +| +-$array.int32_val1#21 AS int32_val1 [INT32] +-query= +-ProjectScan - +-column_list=[$array.int32_val1#18] + +-column_list=[$array.int32_val1#21] +-input_scan= +-ProjectScan - +-column_list=[$array.int32_val1#18] + +-column_list=[$array.int32_val1#21] +-expr_list= - | +-int32_val1#18 := ColumnRef(type=INT32, column=$flatten.injected#20) + | +-int32_val1#21 := ColumnRef(type=INT32, column=$flatten.injected#23) +-input_scan= +-ProjectScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19, $flatten.injected#20] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22, $flatten.injected#23] +-expr_list= - | +-injected#20 := + | +-injected#23 := | +-GetProtoField | +-type=INT32 | +-expr= - | | +-ColumnRef(type=PROTO, column=$flatten.injected#19) + | | +-ColumnRef(type=PROTO, column=$flatten.injected#22) | +-field_descriptor=int32_val1 | +-default_value=0 +-input_scan= +-ArrayScan - +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#19] + +-column_list=[ArrayTypes.ProtoArray#15, $flatten.injected#22] +-input_scan= | +-TableScan(column_list=[ArrayTypes.ProtoArray#15], table=ArrayTypes, column_index_list=[14]) +-array_expr_list= | +-ColumnRef(type=ARRAY>, column=ArrayTypes.ProtoArray#15) - +-element_column_list=[$flatten.injected#19] + +-element_column_list=[$flatten.injected#22] == [language_features=V_1_4_SINGLE_TABLE_NAME_ARRAY_PATH,V_1_3_UNNEST_AND_FLATTEN_ARRAYS,TABLESAMPLE] diff --git a/zetasql/analyzer/testdata/value_tables.test b/zetasql/analyzer/testdata/value_tables.test index d392cf389..49591e3af 100644 --- a/zetasql/analyzer/testdata/value_tables.test +++ b/zetasql/analyzer/testdata/value_tables.test @@ -1284,10 +1284,11 @@ QueryStmt | +-input_scan= | +-SingleRowScan +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - +-Cast(INT32 -> INT64) - | +-ColumnRef(type=INT32, column=$join_left.a#4) - +-ColumnRef(type=INT64, column=$subquery1.a#3) + | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | +-Cast(INT32 -> INT64) + | | +-ColumnRef(type=INT32, column=$join_left.a#4) + | +-ColumnRef(type=INT64, column=$subquery1.a#3) + +-has_using=TRUE == # Output table from SELECT AS STRUCT is a value table. diff --git a/zetasql/analyzer/testdata/with.test b/zetasql/analyzer/testdata/with.test index f4f9fafc3..790e4ce33 100644 --- a/zetasql/analyzer/testdata/with.test +++ b/zetasql/analyzer/testdata/with.test @@ -213,9 +213,10 @@ QueryStmt +-right_scan= | +-WithRefScan(column_list=subq.[key#7, TestEnum#8, KitchenSink#9], with_query_name="subq") +-join_expr= - +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) - +-ColumnRef(type=INT32, column=subq.key#4) - +-ColumnRef(type=INT32, column=subq.key#7) + | +-FunctionCall(ZetaSQL:$equal(INT32, INT32) -> BOOL) + | +-ColumnRef(type=INT32, column=subq.key#4) + | +-ColumnRef(type=INT32, column=subq.key#7) + +-has_using=TRUE == # Error in a with subquery that is never referenced. diff --git a/zetasql/analyzer/testdata/with_recursive.test b/zetasql/analyzer/testdata/with_recursive.test index cece26020..08b295068 100644 --- a/zetasql/analyzer/testdata/with_recursive.test +++ b/zetasql/analyzer/testdata/with_recursive.test @@ -1815,9 +1815,10 @@ QueryStmt | | | +-right_scan= | | | | +-WithRefScan(column_list=[t2.n#8], with_query_name="t2") | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | +-ColumnRef(type=INT64, column=t1.n#7) - | | | +-ColumnRef(type=INT64, column=t2.n#8) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=t1.n#7) + | | | | +-ColumnRef(type=INT64, column=t2.n#8) + | | | +-has_using=TRUE | | +-recursive=TRUE | +-output_column_list=[t1.n#7] +-query= @@ -1943,9 +1944,10 @@ QueryStmt | | | +-right_scan= | | | | +-WithRefScan(column_list=[t2.n#7], with_query_name="t2") | | | +-join_expr= - | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | | +-ColumnRef(type=INT64, column=t1.n#6) - | | | +-ColumnRef(type=INT64, column=t2.n#7) + | | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | | +-ColumnRef(type=INT64, column=t1.n#6) + | | | | +-ColumnRef(type=INT64, column=t2.n#7) + | | | +-has_using=TRUE | | +-recursive=TRUE | +-output_column_list=[t1.n#6] +-query= @@ -3471,9 +3473,10 @@ QueryStmt | | | +-column_ref= | | | +-ColumnRef(type=INT64, column=tbl.n#5) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t.n#4) - | | +-ColumnRef(type=INT64, column=tbl.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t.n#4) + | | | +-ColumnRef(type=INT64, column=tbl.n#5) + | | +-has_using=TRUE | +-output_column_list=[$subquery1.n#6] +-query= | +-ProjectScan @@ -3564,9 +3567,10 @@ QueryStmt | | +-right_scan= | | | +-RecursiveRefScan(column_list=[t.n#5]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=tbl.n#4) - | | +-ColumnRef(type=INT64, column=t.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=tbl.n#4) + | | | +-ColumnRef(type=INT64, column=t.n#5) + | | +-has_using=TRUE | +-output_column_list=[$subquery1.n#6] +-query= | +-ProjectScan @@ -3633,9 +3637,10 @@ QueryStmt | | +-right_scan= | | | +-WithRefScan(column_list=[t_other.n#5], with_query_name="t_other") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t.n#4) - | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t.n#4) + | | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | +-has_using=TRUE | +-output_column_list=[t.n#4] +-query= | +-ProjectScan @@ -3694,9 +3699,10 @@ QueryStmt | | +-right_scan= | | | +-WithRefScan(column_list=[t_other.n#5], with_query_name="t_other") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t.n#4) - | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t.n#4) + | | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | +-has_using=TRUE | +-output_column_list=[t.n#4] +-query= | +-ProjectScan @@ -3769,9 +3775,10 @@ QueryStmt | | +-right_scan= | | | +-WithRefScan(column_list=[t_other.n#5], with_query_name="t_other") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t.n#4) - | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t.n#4) + | | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | +-has_using=TRUE | +-output_column_list=[t.n#4] +-query= | +-ProjectScan @@ -3833,9 +3840,10 @@ QueryStmt | | +-right_scan= | | | +-WithRefScan(column_list=[t_other.n#5], with_query_name="t_other") | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t.n#4) - | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t.n#4) + | | | +-ColumnRef(type=INT64, column=t_other.n#5) + | | +-has_using=TRUE | +-output_column_list=[t.n#4] +-query= | +-ProjectScan @@ -3914,9 +3922,10 @@ QueryStmt | | +-right_scan= | | | +-RecursiveRefScan(column_list=[t.n#5]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t_other.n#4) - | | +-ColumnRef(type=INT64, column=t.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t_other.n#4) + | | | +-ColumnRef(type=INT64, column=t.n#5) + | | +-has_using=TRUE | +-output_column_list=[t_other.n#4] +-query= | +-ProjectScan @@ -3977,9 +3986,10 @@ QueryStmt | | | +-input_scan= | | | +-RecursiveRefScan(column_list=[t.n#5]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t_other.n#4) - | | +-ColumnRef(type=INT64, column=t.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t_other.n#4) + | | | +-ColumnRef(type=INT64, column=t.n#5) + | | +-has_using=TRUE | +-output_column_list=[t_other.n#4] +-query= | +-ProjectScan @@ -4050,9 +4060,10 @@ QueryStmt | | +-right_scan= | | | +-RecursiveRefScan(column_list=[t.n#5]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t_other.n#4) - | | +-ColumnRef(type=INT64, column=t.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t_other.n#4) + | | | +-ColumnRef(type=INT64, column=t.n#5) + | | +-has_using=TRUE | +-output_column_list=[t.n#5] +-query= | +-ProjectScan @@ -4114,9 +4125,10 @@ QueryStmt | | | +-input_scan= | | | +-RecursiveRefScan(column_list=[t.n#5]) | | +-join_expr= - | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) - | | +-ColumnRef(type=INT64, column=t_other.n#4) - | | +-ColumnRef(type=INT64, column=t.n#5) + | | | +-FunctionCall(ZetaSQL:$equal(INT64, INT64) -> BOOL) + | | | +-ColumnRef(type=INT64, column=t_other.n#4) + | | | +-ColumnRef(type=INT64, column=t.n#5) + | | +-has_using=TRUE | +-output_column_list=[t.n#5] +-query= | +-ProjectScan @@ -5579,3 +5591,446 @@ ERROR: CORRESPONDING for set operations cannot be used in WITH RECURSIVE [at 3:2 SELECT 1 AS n UNION ALL CORRESPONDING SELECT 2 AS n UNION ALL SELECT n + ... ^ == + +[language_features=V_1_3_WITH_RECURSIVE{{|,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER}}] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y + UNION ALL + SELECT y AS x, x + y AS y FROM Fibonacci +) WITH DEPTH +SELECT y FROM Fibonacci + +-- +ALTERNATION GROUP: +-- +ERROR: Recursion depth modifier is not supported [at 5:3] +) WITH DEPTH + ^ +-- +ALTERNATION GROUP: ,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER +-- +QueryStmt ++-output_column_list= +| +-Fibonacci.y#10 AS y [INT64] ++-query= + +-WithScan + +-column_list=[Fibonacci.y#10] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Fibonacci" + | +-with_subquery= + | +-RecursiveScan + | +-column_list=[$union_all.x#3, $union_all.y#4, $recursion_depth.depth#8] + | +-op_type=UNION_ALL + | +-non_recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=$union_all1.[x#1, y#2] + | | | +-expr_list= + | | | | +-x#1 := Literal(type=INT64, value=0) + | | | | +-y#2 := Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=$union_all1.[x#1, y#2] + | +-recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[Fibonacci.y#6, $union_all2.y#7] + | | | +-expr_list= + | | | | +-y#7 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.x#5) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.y#6) + | | | +-input_scan= + | | | +-RecursiveRefScan(column_list=Fibonacci.[x#5, y#6]) + | | +-output_column_list=[Fibonacci.y#6, $union_all2.y#7] + | +-recursion_depth_modifier= + | +-RecursionDepthModifier + | +-recursion_depth_column= + | +-ColumnHolder(column=$recursion_depth.depth#8) + +-query= + | +-ProjectScan + | +-column_list=[Fibonacci.y#10] + | +-input_scan= + | +-WithRefScan(column_list=Fibonacci.[x#9, y#10, depth#11], with_query_name="Fibonacci") + +-recursive=TRUE +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y + UNION ALL + SELECT y AS x, x + y AS y FROM Fibonacci +) WITH DEPTH {{|AS depth}} +SELECT y FROM Fibonacci WHERE depth < 3 + +-- +QueryStmt ++-output_column_list= +| +-Fibonacci.y#10 AS y [INT64] ++-query= + +-WithScan + +-column_list=[Fibonacci.y#10] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Fibonacci" + | +-with_subquery= + | +-RecursiveScan + | +-column_list=[$union_all.x#3, $union_all.y#4, $recursion_depth.depth#8] + | +-op_type=UNION_ALL + | +-non_recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=$union_all1.[x#1, y#2] + | | | +-expr_list= + | | | | +-x#1 := Literal(type=INT64, value=0) + | | | | +-y#2 := Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=$union_all1.[x#1, y#2] + | +-recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[Fibonacci.y#6, $union_all2.y#7] + | | | +-expr_list= + | | | | +-y#7 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.x#5) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.y#6) + | | | +-input_scan= + | | | +-RecursiveRefScan(column_list=Fibonacci.[x#5, y#6]) + | | +-output_column_list=[Fibonacci.y#6, $union_all2.y#7] + | +-recursion_depth_modifier= + | +-RecursionDepthModifier + | +-recursion_depth_column= + | +-ColumnHolder(column=$recursion_depth.depth#8) + +-query= + | +-ProjectScan + | +-column_list=[Fibonacci.y#10] + | +-input_scan= + | +-FilterScan + | +-column_list=Fibonacci.[x#9, y#10, depth#11] + | +-input_scan= + | | +-WithRefScan(column_list=Fibonacci.[x#9, y#10, depth#11], with_query_name="Fibonacci") + | +-filter_expr= + | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=Fibonacci.depth#11) + | +-Literal(type=INT64, value=3) + +-recursive=TRUE +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y + UNION ALL + SELECT y AS x, x + y AS y FROM Fibonacci WHERE depth < 3 +) WITH DEPTH AS depth +SELECT y FROM Fibonacci WHERE depth < 3 + +-- +ERROR: Unrecognized name: depth [at 4:52] + SELECT y AS x, x + y AS y FROM Fibonacci WHERE depth < 3 + ^ +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y + UNION ALL + SELECT y AS x, x + y AS y FROM Fibonacci +) WITH DEPTH AS iteration +SELECT y FROM Fibonacci WHERE iteration < 3 + +-- +QueryStmt ++-output_column_list= +| +-Fibonacci.y#10 AS y [INT64] ++-query= + +-WithScan + +-column_list=[Fibonacci.y#10] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Fibonacci" + | +-with_subquery= + | +-RecursiveScan + | +-column_list=[$union_all.x#3, $union_all.y#4, $recursion_depth.iteration#8] + | +-op_type=UNION_ALL + | +-non_recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=$union_all1.[x#1, y#2] + | | | +-expr_list= + | | | | +-x#1 := Literal(type=INT64, value=0) + | | | | +-y#2 := Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=$union_all1.[x#1, y#2] + | +-recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[Fibonacci.y#6, $union_all2.y#7] + | | | +-expr_list= + | | | | +-y#7 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.x#5) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.y#6) + | | | +-input_scan= + | | | +-RecursiveRefScan(column_list=Fibonacci.[x#5, y#6]) + | | +-output_column_list=[Fibonacci.y#6, $union_all2.y#7] + | +-recursion_depth_modifier= + | +-RecursionDepthModifier + | +-recursion_depth_column= + | +-ColumnHolder(column=$recursion_depth.iteration#8) + +-query= + | +-ProjectScan + | +-column_list=[Fibonacci.y#10] + | +-input_scan= + | +-FilterScan + | +-column_list=Fibonacci.[x#9, y#10, iteration#11] + | +-input_scan= + | | +-WithRefScan(column_list=Fibonacci.[x#9, y#10, iteration#11], with_query_name="Fibonacci") + | +-filter_expr= + | +-FunctionCall(ZetaSQL:$less(INT64, INT64) -> BOOL) + | +-ColumnRef(type=INT64, column=Fibonacci.iteration#11) + | +-Literal(type=INT64, value=3) + +-recursive=TRUE +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y, 0 AS depth + UNION ALL + SELECT y AS x, x + y AS y, depth + 1 FROM Fibonacci +) WITH DEPTH +SELECT y FROM Fibonacci WHERE depth < 3 + +-- +ERROR: WITH DEPTH modifier depth column is named 'depth' which collides with one of the existing names. [at 5:3] +) WITH DEPTH + ^ +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y, 0 AS iteration + UNION ALL + SELECT y AS x, x + y AS y, iteration + 1 FROM Fibonacci +) WITH DEPTH AS iteration +SELECT y FROM Fibonacci WHERE iteration < 3 + +-- +ERROR: WITH DEPTH modifier depth column is named 'iteration' which collides with one of the existing names. [at 5:3] +) WITH DEPTH AS iteration + ^ +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT AS STRUCT 0 AS x, 1 AS y + UNION ALL + SELECT AS STRUCT y AS x, x + y AS y FROM Fibonacci +) WITH DEPTH +SELECT y FROM Fibonacci + +-- +ERROR: WITH DEPTH modifier is not allowed when the recursive query produces a value table. [at 5:3] +) WITH DEPTH + ^ +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y + UNION ALL + SELECT y AS x, x + y AS y FROM Fibonacci +) WITH DEPTH {{|MAX 10|BETWEEN 1 AND @test_param_int64}} +SELECT y FROM Fibonacci +-- +ALTERNATION GROUP: +-- +QueryStmt ++-output_column_list= +| +-Fibonacci.y#10 AS y [INT64] ++-query= + +-WithScan + +-column_list=[Fibonacci.y#10] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Fibonacci" + | +-with_subquery= + | +-RecursiveScan + | +-column_list=[$union_all.x#3, $union_all.y#4, $recursion_depth.depth#8] + | +-op_type=UNION_ALL + | +-non_recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=$union_all1.[x#1, y#2] + | | | +-expr_list= + | | | | +-x#1 := Literal(type=INT64, value=0) + | | | | +-y#2 := Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=$union_all1.[x#1, y#2] + | +-recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[Fibonacci.y#6, $union_all2.y#7] + | | | +-expr_list= + | | | | +-y#7 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.x#5) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.y#6) + | | | +-input_scan= + | | | +-RecursiveRefScan(column_list=Fibonacci.[x#5, y#6]) + | | +-output_column_list=[Fibonacci.y#6, $union_all2.y#7] + | +-recursion_depth_modifier= + | +-RecursionDepthModifier + | +-recursion_depth_column= + | +-ColumnHolder(column=$recursion_depth.depth#8) + +-query= + | +-ProjectScan + | +-column_list=[Fibonacci.y#10] + | +-input_scan= + | +-WithRefScan(column_list=Fibonacci.[x#9, y#10, depth#11], with_query_name="Fibonacci") + +-recursive=TRUE +-- +ALTERNATION GROUP: MAX 10 +-- +QueryStmt ++-output_column_list= +| +-Fibonacci.y#10 AS y [INT64] ++-query= + +-WithScan + +-column_list=[Fibonacci.y#10] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Fibonacci" + | +-with_subquery= + | +-RecursiveScan + | +-column_list=[$union_all.x#3, $union_all.y#4, $recursion_depth.depth#8] + | +-op_type=UNION_ALL + | +-non_recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=$union_all1.[x#1, y#2] + | | | +-expr_list= + | | | | +-x#1 := Literal(type=INT64, value=0) + | | | | +-y#2 := Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=$union_all1.[x#1, y#2] + | +-recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[Fibonacci.y#6, $union_all2.y#7] + | | | +-expr_list= + | | | | +-y#7 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.x#5) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.y#6) + | | | +-input_scan= + | | | +-RecursiveRefScan(column_list=Fibonacci.[x#5, y#6]) + | | +-output_column_list=[Fibonacci.y#6, $union_all2.y#7] + | +-recursion_depth_modifier= + | +-RecursionDepthModifier + | +-upper_bound= + | | +-Literal(type=INT64, value=10) + | +-recursion_depth_column= + | +-ColumnHolder(column=$recursion_depth.depth#8) + +-query= + | +-ProjectScan + | +-column_list=[Fibonacci.y#10] + | +-input_scan= + | +-WithRefScan(column_list=Fibonacci.[x#9, y#10, depth#11], with_query_name="Fibonacci") + +-recursive=TRUE +-- +ALTERNATION GROUP: BETWEEN 1 AND @test_param_int64 +-- +QueryStmt ++-output_column_list= +| +-Fibonacci.y#10 AS y [INT64] ++-query= + +-WithScan + +-column_list=[Fibonacci.y#10] + +-with_entry_list= + | +-WithEntry + | +-with_query_name="Fibonacci" + | +-with_subquery= + | +-RecursiveScan + | +-column_list=[$union_all.x#3, $union_all.y#4, $recursion_depth.depth#8] + | +-op_type=UNION_ALL + | +-non_recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=$union_all1.[x#1, y#2] + | | | +-expr_list= + | | | | +-x#1 := Literal(type=INT64, value=0) + | | | | +-y#2 := Literal(type=INT64, value=1) + | | | +-input_scan= + | | | +-SingleRowScan + | | +-output_column_list=$union_all1.[x#1, y#2] + | +-recursive_term= + | | +-SetOperationItem + | | +-scan= + | | | +-ProjectScan + | | | +-column_list=[Fibonacci.y#6, $union_all2.y#7] + | | | +-expr_list= + | | | | +-y#7 := + | | | | +-FunctionCall(ZetaSQL:$add(INT64, INT64) -> INT64) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.x#5) + | | | | +-ColumnRef(type=INT64, column=Fibonacci.y#6) + | | | +-input_scan= + | | | +-RecursiveRefScan(column_list=Fibonacci.[x#5, y#6]) + | | +-output_column_list=[Fibonacci.y#6, $union_all2.y#7] + | +-recursion_depth_modifier= + | +-RecursionDepthModifier + | +-lower_bound= + | | +-Literal(type=INT64, value=1) + | +-upper_bound= + | | +-Parameter(type=INT64, name="test_param_int64") + | +-recursion_depth_column= + | +-ColumnHolder(column=$recursion_depth.depth#8) + +-query= + | +-ProjectScan + | +-column_list=[Fibonacci.y#10] + | +-input_scan= + | +-WithRefScan(column_list=Fibonacci.[x#9, y#10, depth#11], with_query_name="Fibonacci") + +-recursive=TRUE +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE + NonRecusiveQuery AS ( + SELECT 1 AS value + ) WITH DEPTH BETWEEN 1 AND 3 +SELECT * FROM NonRecusiveQuery +-- +ERROR: Recursion depth modifier is not allowed for non-recursive CTE named NonRecusiveQuery [at 4:5] + ) WITH DEPTH BETWEEN 1 AND 3 + ^ +== + +[language_features=V_1_3_WITH_RECURSIVE,V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER] +WITH RECURSIVE Fibonacci AS ( + SELECT 0 AS x, 1 AS y + UNION ALL + SELECT y AS x, x + y AS y FROM Fibonacci +) WITH DEPTH BETWEEN 2 AND 1 +SELECT y FROM Fibonacci + + +-- +ERROR: WITH DEPTH expects lower bound (Int64(2)) no larger than upper bound (Int64(1)) [at 5:3] +) WITH DEPTH BETWEEN 2 AND 1 + ^ diff --git a/zetasql/base/BUILD b/zetasql/base/BUILD index 17d00b306..b510fece8 100644 --- a/zetasql/base/BUILD +++ b/zetasql/base/BUILD @@ -12,7 +12,6 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# licenses(["notice"]) @@ -298,6 +297,70 @@ cc_test( ], ) +cc_library( + name = "map_view", + hdrs = [ + "map_view.h", + ], + deps = [ + ":associative_view_internal", + "@com_google_absl//absl/base:core_headers", + ], +) + +cc_test( + name = "map_view_test", + srcs = ["map_view_test.cc"], + deps = [ + ":map_view", + "//zetasql/base/testing:zetasql_gtest_main", + "@com_google_absl//absl/algorithm:container", + "@com_google_absl//absl/base:core_headers", + "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/random", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:cord", + ], +) + +cc_library( + name = "associative_view_internal", + hdrs = [ + "associative_view_internal.h", + ], + deps = [ + ":requires", + "@com_google_absl//absl/base:core_headers", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:cord", + ], +) + +cc_test( + name = "associative_view_internal_test", + srcs = ["associative_view_internal_test.cc"], + deps = [ + ":associative_view_internal", + "//zetasql/base/testing:zetasql_gtest_main", + ], +) + +cc_library( + name = "requires", + hdrs = [ + "requires.h", + ], +) + +cc_test( + name = "requires_test", + srcs = ["requires_test.cc"], + deps = [ + ":requires", + "//zetasql/base/testing:zetasql_gtest_main", + ], +) + cc_library( name = "stl_util", hdrs = [ @@ -386,7 +449,6 @@ cc_test( ":mathlimits", ":mathutil", "//zetasql/base/testing:zetasql_gtest_main", - "@com_google_absl//absl/base:core_headers", ], ) @@ -495,7 +557,6 @@ cc_library( srcs = ["time_proto_util.cc"], hdrs = ["time_proto_util.h"], deps = [ - "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", @@ -511,7 +572,6 @@ cc_test( ":time_proto_util", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", - "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/time", "@com_google_protobuf//:protobuf", ], @@ -522,8 +582,8 @@ cc_library( srcs = ["string_numbers.cc"], hdrs = ["string_numbers.h"], deps = [ + ":check", ":logging", - "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/strings", ], ) @@ -703,7 +763,6 @@ cc_test( ":exactfloat", "//zetasql/base/testing:zetasql_gtest_main", "@com_google_absl//absl/base", - "@com_google_absl//absl/base:core_headers", ], ) @@ -755,3 +814,38 @@ cc_test( "//zetasql/base/testing:zetasql_gtest_main", ], ) + +cc_library( + name = "castops", + hdrs = ["castops.h"], +) + +cc_test( + name = "castops_test", + srcs = [ + "castops_test.cc", + ], + deps = [ + ":castops", + "//zetasql/base/testing:zetasql_gtest_main", + "@com_google_absl//absl/numeric:int128", + ], +) + +cc_library( + name = "lossless_convert", + hdrs = ["lossless_convert.h"], + deps = [ + ":castops", + ], +) + +cc_test( + name = "lossless_convert_test", + size = "small", + srcs = ["lossless_convert_test.cc"], + deps = [ + ":lossless_convert", + "//zetasql/base/testing:zetasql_gtest_main", + ], +) diff --git a/zetasql/base/arena_allocator.h b/zetasql/base/arena_allocator.h index 4ffe71169..c3b84b1dc 100644 --- a/zetasql/base/arena_allocator.h +++ b/zetasql/base/arena_allocator.h @@ -109,7 +109,7 @@ template class ArenaAllocator { ArenaAllocator(C* arena) : arena_(arena) { } // NOLINT pointer allocate(size_type n, - std::allocator::const_pointer /*hint*/ = nullptr) { + const void* /*hint*/ = nullptr) { assert(arena_ && "No arena to allocate from!"); return reinterpret_cast(arena_->AllocAligned(n * sizeof(T), kAlignment)); diff --git a/zetasql/base/associative_view_internal.h b/zetasql/base/associative_view_internal.h new file mode 100644 index 000000000..d92151b1d --- /dev/null +++ b/zetasql/base/associative_view_internal.h @@ -0,0 +1,720 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef THIRD_PARTY_ZETASQL_ZETASQL_BASE_ASSOCIATIVE_VIEW_INTERNAL_H_ +#define THIRD_PARTY_ZETASQL_ZETASQL_BASE_ASSOCIATIVE_VIEW_INTERNAL_H_ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "absl/base/optimization.h" +#include "absl/strings/cord.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/requires.h" + +namespace zetasql_base { + +// An empty type that is used to instruct `AssociateView` to add `find` and +// `contains` overloads for all `T`s. +template +struct AlsoSupportsLookupWith {}; + +namespace internal_associative_view { + +template +struct DefaultAdditionalLookupForKeyHelper { + using type = AlsoSupportsLookupWith<>; +}; + +template <> +struct DefaultAdditionalLookupForKeyHelper { + using type = AlsoSupportsLookupWith; +}; + +template <> +struct DefaultAdditionalLookupForKeyHelper { + using type = AlsoSupportsLookupWith; +}; + +template +using DefaultExtraLookupForKey = + typename internal_associative_view::DefaultAdditionalLookupForKeyHelper< + Key>::type; + +// +// kHasSubscript checks if operator[] exists for a given map type. +// + +template +inline constexpr bool kHasSubscript = zetasql_base::Requires( + [](auto&& c, auto&& k) -> decltype(c[k]) {}); + +// +// kHasAt checks if at() exists for a given map type. +// + +template +inline constexpr bool kHasAt = zetasql_base::Requires( + [](auto&& c, auto&& k) -> decltype(c.at(k)) {}); + +struct NoneSuch {}; + +template +inline constexpr bool kHasHeterogeneousLookup = zetasql_base::Requires( + [](auto&& c) -> decltype(c.find(NoneSuch{})) {}); + +// We assume that the presence of "at()" or "operator[]" is a strong signal that +// a given map type has unique keys. Most modern map types provide at(), but +// there are a few older map implementations that only provide operator[]. This +// is a function instead of a trait class because the static_assert failure is +// easier to read. +template +constexpr bool MapTypeHasUniqueKeys() { + return kHasSubscript || kHasAt; +} + +template +inline constexpr bool kHasContainsMethod = + zetasql_base::Requires( + [](auto&& c, auto&& k) -> decltype(c.contains(k)) {}); + +// Returns true if no elements in the list have the same keys. +template +constexpr bool HasUniqueKeys(const std::initializer_list& values) { + for (auto i = values.begin(); i < values.end(); ++i) { + for (auto j = i + 1; j < values.end(); ++j) { + if (i->first == j->first) { + return false; + } + } + } + return true; +} + +// `FindImpl` is an implementation of CRTP pattern to transform template methods +// `FindInternal` and `ContainsInternal` into a set of non-template overloads of +// `find` and `contains` for fixed number of supported lookup types. +template +class FindImpl { + public: + auto find(const Key& key) const { + return static_cast(this)->FindInternal(key); + } + bool contains(const Key& key) const { + return static_cast(this)->ContainsInternal(key); + } +}; + +template +class FindImpl { + // Note: without `const char*` overloads, lookup calls with raw/literal + // strings would be ambiguous. + public: + auto find(absl::string_view key) const { + return static_cast(this)->FindInternal(key); + } + bool contains(absl::string_view key) const { + return static_cast(this)->ContainsInternal(key); + } + auto find(const char* key) const { return find(absl::string_view(key)); } + bool contains(const char* key) const { + return contains(absl::string_view(key)); + } +}; + +template +class LookupOverloads : public FindImpl>... { + public: + using FindImpl>::find...; + using FindImpl>::contains...; +}; + +template +const mapped_type& AtFunction(const View& v, const Key& key) { + auto it = v.find(key); + if (ABSL_PREDICT_FALSE(it == v.end())) { + absl::base_internal::ThrowStdOutOfRange("MapView::at failed bounds check"); + } + return it->second; +} + +template +class AtImpl { + public: + const mapped_type& at(const Key& key) const { + return AtFunction(*static_cast(this), key); + } +}; +template +class AtImpl { + // Note: without `const char*` overloads, lookup calls with raw/literal + // strings would be ambiguous. + public: + const mapped_type& at(absl::string_view key) const { + return AtFunction(*static_cast(this), key); + } + const mapped_type& at(const char* key) const { + return at(absl::string_view(key)); + } +}; + +template +class AtOverloads + : public AtImpl, mapped_type>... { + public: + using AtImpl, mapped_type>::at...; +}; + +// Provides a minimal stl::set like interface over a value using a key for +// lookup. Shared implementation for SetView and MapView. +// `ExtraLookupTypes` parameter is used to provide additional `find`/`contains` +// overloads. It must be one of the `AlsoSupportsLookupWith` types. Consider an +// example: +// +// using WithStringViewLookup = +// zetasql_base::AlsoSupportsLookupWith; +// using StringSetView = zetasql_base::SetView; +// +// The type `StringSetView`: +// * supports lookup with std::string, as well as with absl::string_view; +// * can only be constructed from a set that has `find` overloads for both +// std::string and absl::string_view. +// +// Note that the latter requirement is automatically met for containers with +// heterogeneous lookup. +template +class AssociateView { + // Preventing accidental typos when `ExtraLookupTypes` is not one of + // `AlsoSupportsLookupWith`, for example, `SetView`. + static_assert(!std::is_same_v, + "Invalid template argument ExtraLookupTypes. It must be an " + "instantiation of zetasql_base::AlsoSupportsLookupWith"); +}; + +template +class MapViewBase { + // Preventing accidental typos when `ExtraLookupTypes` is not one of + // `AlsoSupportsLookupWith`, for example, `SetView`. + static_assert(!std::is_same_v, + "Invalid template argument ExtraLookupTypes. It must be an " + "instantiation of zetasql_base::AlsoSupportsLookupWith"); +}; + +template +class MapViewBase> + : public AssociateView, + AlsoSupportsLookupWith>, + public AtOverloads>, V, + K, Keys...> { + using Base = typename MapViewBase::AssociateView; + + public: + using mapped_type = V; + + protected: + using Base::Base; +}; + +struct DefaultIteratorAdapter { + template + static Iterator&& Wrap(Iterator&& it) { + return std::forward(it); + } +}; + +// Note: key types may have type modifiers (i.e. zetasql_base::SetView), and we remove them when interacting with the internal virtual table. +// Consider: +// zetasql_base::SetView my_view = ...; +// bool x = my_view.contains(123); +// The line above calls `ContainsInternal` rather than +// `ContainsInternal`, hence it is easier to keep entries in the +// virtual table without type modifiers. +template +class AssociateView> + : public LookupOverloads< + AssociateView>, + KeyType, Keys...> { + public: + using size_type = std::size_t; + using key_type = KeyType; + using value_type = ValueType; + using reference = const value_type&; + using const_reference = reference; + using pointer = const value_type*; + using const_pointer = pointer; + class iterator; + using const_iterator = iterator; + + iterator begin() const { return dispatch_table_->begin_fn(c_); } + iterator end() const { return {}; } + size_type size() const { return dispatch_table_->size_fn(c_); } + bool empty() const { return size() == 0; } + + protected: + template + using ViewEnabler = + std::enable_if_t && + std::is_same_v>; + + // A default constructed AssociateView behaves as if it is wrapping an empty + // container. + constexpr AssociateView() {} + + // Constructs an AssociateView that wraps the given container. The resulting + // view must not outlive the container. + // + // We distinguish two types of containers: with and without heterogeneous + // support. + // First type: + // These containers are required to support lookup with `key_type` + // and all types from `AlsoSupportsLookupWith`. Failure to meet this + // criteria will result in a compilation error. + // Lookup will never make a copy of the key. + // Second type: + // These containers are required to support lookup with only `key_type` + // and `key_type` must be constructible from every typ in + // `AlsoSupportsLookupWith`. + // Lookup with any type except for `key_type` makes a copy of the key. + template + constexpr explicit AssociateView(const C& c ABSL_ATTRIBUTE_LIFETIME_BOUND, + IteratorAdapter = {}) + : c_(&c), + dispatch_table_( + &kTable, + HeterogeneousContainerDispatcher, + NonHeterogeneousContainerDispatcher>>) { + static_assert(std::is_empty_v, + "Only stateless iterator adapters are supported."); + } + // Constructs an AssociateView that wraps an initializer_list. `Compare` must + // be a stateless function that compares `Compare{}(key_type, value_type)`. + template + constexpr AssociateView(const std::initializer_list& init, + const Compare&) + : c_(&init), + dispatch_table_( + &kTable, + Compare>>) { + static_assert(std::is_empty_v, + "Only stateless comparators are supported."); + } + + private: + // A function that implements 'find'. First argument is a pointer to a + // container, second argument is the key being searched for. + template + using FindFn = iterator (*)(const void*, const K&); + + // A function that implements 'contains'. First argument is a pointer to a + // container, second argument is the key being searched for. + template + using ContainsFn = bool (*)(const void*, const K&); + + // A function that implements 'begin'. The argument is a pointer to a + // container. + using BeginFn = iterator (*)(const void*); + + // A function that implements 'size'. The argument is a pointer to a + // container. + using SizeFn = size_type (*)(const void*); + + template + struct VirtualTableForKey { + FindFn find_fn; + ContainsFn contains_fn; + }; + + struct BaseVirtualTable { + BeginFn begin_fn; + SizeFn size_fn; + }; + + struct DispatchTable : BaseVirtualTable, + VirtualTableForKey>, + VirtualTableForKey>... {}; + + template + inline constexpr static DispatchTable kTable = { + BaseVirtualTable{ + .begin_fn = Impl::begin, + .size_fn = Impl::size, + }, + VirtualTableForKey>{ + .find_fn = Impl::template find>, + .contains_fn = Impl::template contains>, + }, + VirtualTableForKey>{ + .find_fn = Impl::template find>, + .contains_fn = Impl::template contains>, + }..., + }; + + struct EmptyDispatcher { + template + static iterator find(const void*, const K&) { + return {}; + } + template + static bool contains(const void*, const K&) { + return false; + } + static iterator begin(const void*) { return {}; } + static size_type size(const void*) { return 0; } + }; + + template + struct BaseDispatcher { + static iterator begin(const void* c_ptr) { + const auto* c = static_cast(c_ptr); + auto it = c->begin(); + if (it == c->end()) return {}; + return iterator(*c, std::move(it), IteratorAdapter{}); + } + + static size_type size(const void* c_ptr) { + return static_cast(c_ptr)->size(); + } + }; + + template + struct HeterogeneousContainerDispatcher : BaseDispatcher { + template + static iterator find(const void* c_ptr, const K& k) { + const auto* c = static_cast(c_ptr); + auto it = c->find(k); + if (it == c->end()) return {}; + return iterator(*c, std::move(it), IteratorAdapter{}); + } + + template + static bool contains(const void* c_ptr, const K& k) { + const auto* c = static_cast(c_ptr); + if constexpr (kHasContainsMethod) { + return c->contains(k); + } else { + return c->find(k) != c->end(); + } + } + }; + + template + struct NonHeterogeneousContainerDispatcher + : BaseDispatcher { + static iterator find(const void* c_ptr, const key_type& k) { + const auto* c = static_cast(c_ptr); + auto it = c->find(k); + if (it == c->end()) return {}; + return iterator(*c, std::move(it), IteratorAdapter{}); + } + + static bool contains(const void* c_ptr, const key_type& k) { + const auto* c = static_cast(c_ptr); + if constexpr (kHasContainsMethod) { + return c->contains(k); + } else { + return c->find(k) != c->end(); + } + } + + template + static iterator find(const void* c_ptr, const K& k) { + return find(c_ptr, static_cast(k)); + } + + template + static bool contains(const void* c_ptr, const K& k) { + return contains(c_ptr, static_cast(k)); + } + }; + + template + struct InitListDispatcher : BaseDispatcher { + template + static iterator find(const void* c_ptr, const K& k) { + const auto* init = static_cast(c_ptr); + auto it = + std::find_if(init->begin(), init->end(), + [&](const value_type& v) { return Compare{}(v, k); }); + if (it == init->end()) return {}; + return iterator(*init, std::move(it), DefaultIteratorAdapter{}); + } + + template + static bool contains(const void* c_ptr, const K& k) { + const auto* init = static_cast(c_ptr); + return std::find_if(init->begin(), init->end(), [&](const value_type& v) { + return Compare{}(v, k); + }) != init->end(); + } + }; + + template + friend class FindImpl; + + template + iterator FindInternal(const K& key) const { + return dispatch_table_->VirtualTableForKey::find_fn(c_, key); + } + + template + bool ContainsInternal(const K& key) const { + // Not using `FindInternal` to avoid constructing type-erased iterators. + return dispatch_table_->VirtualTableForKey::contains_fn(c_, key); + } + + // The underlying container. + const void* c_ = nullptr; + + // Implements find(), begin() and size() for the correct type of c_. This is + // always non-null and callable, even if c_ is null. + const DispatchTable* dispatch_table_ = &kTable; +}; + +template +class AssociateView>::iterator { + public: + using iterator_category = std::forward_iterator_tag; + using value_type = AssociateView::value_type; + using difference_type = std::ptrdiff_t; + using reference = const value_type&; + using pointer = const value_type*; + + // Default constructed instance is considered equal to end(). + iterator() {} + + iterator(const iterator& other) { CopyFrom(other); } + + ~iterator() { Destroy(); } + + iterator& operator=(const iterator& other) { + Destroy(); + CopyFrom(other); + return *this; + } + + reference operator*() const { + assert(value_ != nullptr); + return *value_; + } + + pointer operator->() const { + assert(value_ != nullptr); + return value_; + } + + iterator& operator++() { + assert(value_ != nullptr); + value_ = inc_(c_, &iter_); + return *this; + } + + iterator operator++(int) { + iterator tmp = *this; + operator++(); + return tmp; + } + + friend bool operator==(const iterator& a, const iterator& b) { + return a.value_ == b.value_; + } + + friend bool operator!=(const iterator& a, const iterator& b) { + return a.value_ != b.value_; + } + + private: + friend class AssociateView; + + enum { kInlineIterSize = sizeof(void*) * 4 }; + + // Used to store the type-erased iterator. + union IterStorage { + // Used for iterators that are smaller than kInlineIterSize. + typename std::aligned_storage::type small_value; + + // Used for allocated storage for iterators that are larger than + // kInlineIterSize. + void* ptr; + }; + + // True if `Iterator` can be stored in `IterStorage::small_value`. + template + static constexpr bool IsInline() { + return (sizeof(Iterator) <= kInlineIterSize && + alignof(Iterator) <= alignof(IterStorage)); + } + + // Casts the given storage to the actual type. + template + static const Iterator* IterCast(const IterStorage* storage) { + if constexpr (IsInline()) { + return reinterpret_cast(&storage->small_value); + } else { + return static_cast(storage->ptr); + } + } + + // Non-const overload of the above function. + template + static Iterator* IterCast(IterStorage* storage) { + const IterStorage* cstorage = storage; + return const_cast(IterCast(cstorage)); + } + + // Copies an iterator from the first argument to the uninitialized second + // argument. + using CopyFn = void (*)(const IterStorage*, IterStorage*); + + // Destroys the iterator. + using DestroyFn = void (*)(IterStorage*); + + // An implementation of CopyFn. + template + static void CopyImpl(const IterStorage* src_storage, + IterStorage* dst_storage) { + auto* src = IterCast(src_storage); + if constexpr (IsInline()) { + ::new (static_cast(&dst_storage->small_value)) Iterator(*src); + } else { + dst_storage->ptr = new Iterator(*src); + } + } + + // An implementation of DestroyFn. + template + static void DestroyImpl(IterStorage* storage) { + auto* it = IterCast(storage); + if constexpr (IsInline()) { + it->~Iterator(); + } else { + delete it; + } + } + + // Contains the functions for copying and destroying the underlying iterator. + // These are grouped together because many iterators do not require them, in + // which case copy_and_destroy_ will be null. + struct CopyAndDestroy { + CopyFn copy; + DestroyFn destroy; + }; + + // Returns a statically allocated CopyAndDestroy if `Iterator` requires + // it. Returns null if the underlying iterator is small and trivial. + template + static const CopyAndDestroy* MakeCopyAndDestroy() { + if constexpr (IsInline() && + std::is_trivially_destructible() && + std::is_trivially_copy_constructible()) { + return nullptr; + } else { + static constexpr CopyAndDestroy kImplementation = { + .copy = CopyImpl, .destroy = DestroyImpl}; + return &kImplementation; + } + } + + // Increments the underlying iterator. The first argument should be the + // container that the iterator points to. + using IncFn = pointer (*)(const void*, IterStorage*); + + template + static pointer IncImpl(const void* c_ptr, IterStorage* it_storage) { + auto& it = *IterCast(it_storage); + const auto& c = *static_cast(c_ptr); + assert(it != c.end()); + if (++it == c.end()) { + DestroyImpl(it_storage); + return nullptr; + } + return std::addressof(*IteratorAdapter::Wrap(it)); + } + + // Constructs an iterator that points to 'it' in the given container. + // 'it' must be != c.end(). + template > + iterator(const C& c, It&& it, IteratorAdapter) + : value_(std::addressof(*IteratorAdapter::Wrap(it))), + c_(&c), + inc_(&IncImpl), + copy_and_destroy_(MakeCopyAndDestroy()) { + assert(it != c.end()); + if constexpr (IsInline()) { + ::new (static_cast(&iter_.small_value)) + Iterator(std::forward(it)); + } else { + iter_.ptr = new Iterator(std::forward(it)); + } + } + + // Destroys the underlying iterator if it is large or non-trivial. Does not + // guard against being called twice. + void Destroy() { + if (ABSL_PREDICT_FALSE(copy_and_destroy_ != nullptr && value_ != nullptr)) { + copy_and_destroy_->destroy(&iter_); + } + } + + // Initializes this iterator as a copy of 'other'. Does not perform + // destruction, so Destroy() must be called first if this iterator has already + // been initialized. + void CopyFrom(const iterator& other) { + value_ = other.value_; + c_ = other.c_; + inc_ = other.inc_; + copy_and_destroy_ = other.copy_and_destroy_; + if (value_ != nullptr) { + if (ABSL_PREDICT_FALSE(copy_and_destroy_ != nullptr)) { + copy_and_destroy_->copy(&other.iter_, &iter_); + } else { + iter_ = other.iter_; + } + } + } + + // The value that the iterator points to, or nullptr if the iterator == + // c_->end(). + pointer value_ = nullptr; + + // The underlying container. Used to detect when the iterator has reached the + // end. + const void* c_ = nullptr; + + // Increments the iterator. Always non-null for valid iterator. + IncFn inc_ = nullptr; + + // Functions for copying and destroying the iterator. Null unless the iterator + // is large or non-trivial. + const CopyAndDestroy* copy_and_destroy_ = nullptr; + + // Storage for the underlying iterator. For end() this stores nothing. + IterStorage iter_; +}; + +} // namespace internal_associative_view +} // namespace zetasql_base + +#endif // THIRD_PARTY_ZETASQL_ZETASQL_BASE_ASSOCIATIVE_VIEW_INTERNAL_H_ diff --git a/zetasql/base/associative_view_internal_test.cc b/zetasql/base/associative_view_internal_test.cc new file mode 100644 index 000000000..b951d9d1f --- /dev/null +++ b/zetasql/base/associative_view_internal_test.cc @@ -0,0 +1,37 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/base/associative_view_internal.h" + +#include +#include +#include + +#include "gtest/gtest.h" + +namespace zetasql_base { +namespace internal_associative_view { +namespace { + +TEST(AssociateViewTest, MapTypeHasUniqueKeys) { + EXPECT_TRUE((MapTypeHasUniqueKeys>())); + EXPECT_FALSE((MapTypeHasUniqueKeys>())); + EXPECT_FALSE((MapTypeHasUniqueKeys>())); +} + +} // namespace +} // namespace internal_associative_view +} // namespace zetasql_base diff --git a/zetasql/base/castops.h b/zetasql/base/castops.h new file mode 100644 index 000000000..48c09a3bf --- /dev/null +++ b/zetasql/base/castops.h @@ -0,0 +1,467 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// X86 compatible cast library is provided to help programmers to clean up +// float-cast-overflow failures. Before you use one of these calls, please +// make sure the overflowed float value is expected, and not due to a bug +// somewhere else. + +#ifndef THIRD_PARTY_ZETASQL_ZETASQL_BASE_CASTOPS_H_ +#define THIRD_PARTY_ZETASQL_ZETASQL_BASE_CASTOPS_H_ + +#include +#include +#include + +namespace zetasql_base { +namespace castops { + +// Generic saturating cast function, for casting floating point type to integral +// type. If the truncated form of value is larger than the max value of the +// result type (including positive infinity), return the max value. If the +// truncated form of value is smaller than the min value of the result type +// (including negative infinity), return the min value. If the value is NaN, +// return 0. If the truncated form of value is in the +// representable range of the result type, return the rounded result. +// The purpose of this function is to provide a defined and dependable cast +// method no matter the input value is in or out of the representable +// range of the result type. +template +ResultType SaturatingFloatToInt(FloatType value); + +// Return true if the truncated form of value is in the representable range of +// the result type: MIN-1 < value < MAX+1. +template +bool InRange(FloatType value); + +// Return true if the value is in the representable range of the result type: +// MIN <= value <= MAX. +template +bool InRangeNoTruncate(FloatType value); + +// Casts a double to a float, clipping values that are outside the legal range +// of float to +/- infinity. NaN is passed through. +float DoubleToFloat(double value); + +// Same as above, but clips to a finite value, e.g. FLT_MAX rather than +// +infinity. Note that even infinite input values are clipped to finite. +float DoubleToFiniteFloat(double value); + +// Casts a long double to a double, clipping values that are outside the legal +// range of double to +/- infinity. +double LongDoubleToDouble(long double value); + +// Casts a long double to a double, but clips to a finite value for both +/- +// infinity.eg DBL_MAX rather than +infinity +double LongDoubleToFiniteDouble(long double value); + +} // namespace castops + +namespace x86compatible { + +// Emulating X86-64's behavior of casting long double, double or float to +// int32_t. Comparing to SaturatingFloatToInt, when the +// truncated form of value is out of the representable range of int32_t or NaN, +// X86-64 always returns INT32_MIN. +template +int32_t ToInt32(FloatType value); + +// Emulating X86-64's behavior of casting long double, double or float to +// int64_t. Comparing to SaturatingFloatToInt, when the +// truncated form of value is out of the representable range of int64_t or NaN, +// X86-64 always returns INT64_MIN. +template +int64_t ToInt64(FloatType value); + +// Emulating X86-64's behavior of casting long double, double or float to +// uint32_t. Comparing to SaturatingFloatToInt, X86-64 does +// things differently when the truncated form of value is out of the +// representable range of uint32_t. Basically, X86-64 outputs +// (uint32_t)(int64_t)value. +template +uint32_t ToUint32(FloatType value); + +// Emulating X86-64's behavior of casting long double, double or float to +// uint64_t. Comparing to SaturatingFloatToInt, X86-64 does +// things differently when the truncated form of value is out of the +// representable range of uint64_t. +// Basically, X86-64 does: +// if (value > UINT64_MAX) { +// return 0; +// } else if (value >=0 && value <= UINT64_MAX) { +// return (uint64_t)value; +// } else if (value >= INT64_MIN) { +// return (uint64_t)(int64_t)value; +// } else { // value is NaN or value < INT64_MIN +// return (uint64_t)INT64_MIN; +// } +// Interestingly, when value is NaN, LLVM and GCC outputs differently. We +// emulate what GCC does. +template +uint64_t ToUint64(FloatType value); + +// Emulating X86-64's behavior of casting long double, double or float into +// int16_t. Comparing to SaturatingFloatToInt, X86-64 does +// things differently when the truncated form of value is out of the +// representable range of int16_t. +// Basically, X86-64 does: +// (int16_t)(int32_t)value. +template +int16_t ToInt16(FloatType value); + +// Emulating X86-64's behavior of casting long double, double or float into +// uint16_t. Comparing to SaturatingFloatToInt, X86-64 does +// things differently when the truncated form of value is out of the +// representable range of uint16_t. +// Basically, X86-64 does: +// (uint16_t)(int32_t)value. +template +uint16_t ToUint16(FloatType value); + +// Emulating X86-64's behavior of casting long double, double or float into +// signed char. Comparing to SaturatingFloatToInt, +// X86-64 does things differently when the truncated form of value is out of +// the representable range of signed char. Basically, X86-64 does: +// (signed char)(int32_t)value. +template +signed char ToSchar(FloatType value); + +// Emulating X86-64's behavior of casting long double, double or float into +// unsigned char. Comparing to SaturatingFloatToInt, +// X86-64 does things differently when the truncated form of value is out of +// the representable range of unsigned char. Basically, X86-64 does: +// (unsigned char)(int32_t)value. +template +unsigned char ToUchar(FloatType value); + +} // namespace x86compatible + +////////////////////////////////////////////////////////////////// +// Implementation details follow; clients should ignore. + +namespace castops { +namespace internal { + +template +constexpr bool kTypeIsIntegral = std::numeric_limits::is_specialized && + std::numeric_limits::is_integer; + +template +constexpr bool kTypeIsFloating = std::numeric_limits::is_specialized && + !std::numeric_limits::is_integer; + +// Return true if `numeric_limits::max() + 1` can be represented +// precisely in FloatType. +template +constexpr bool CanRepresentMaxPlusOnePrecisely() { + static_assert(kTypeIsFloating); + static_assert(kTypeIsIntegral); + static_assert(std::numeric_limits::radix == 2); + // An N-bit integer has a max of 2^N - 1 so max() + 1 is 2^N. + constexpr bool sufficient_range_for_max = + (std::numeric_limits::max_exponent - 1) >= + std::numeric_limits::digits; + return sufficient_range_for_max; +} + +// Return true if the truncated form of value is smaller than or equal to the +// MAX value of IntType. When the MAX value of IntType can not be represented +// precisely in FloatType, the comparison is tricky, because the MAX value of +// IntType is promoted to a FloatType value that is actually greater than what +// IntType can handle. Also note that when value is nan, this function will +// return false. +template +bool SmallerThanOrEqualToIntMax(FloatType value) { + static_assert(kTypeIsFloating); + static_assert(kTypeIsIntegral); + static_assert(std::numeric_limits::radix == 2); + + if constexpr (!Truncate) { + // We are checking if the untruncated result is <= 2^N-1. This is equivalent + // to asking if the rounded up value is < 2^N. + value = std::ceil(value); + } + if constexpr (CanRepresentMaxPlusOnePrecisely()) { + // We construct our exclusive upper bound carefully as we cannot construct + // the integer `1 << N` in the obvious way as it would result in undefined + // behavior (shifting past the width of the type). + // N.B. We don't use std::ldexp because it does not constant fold. + auto int_max_plus_one = + static_cast(static_cast(1) + << (std::numeric_limits::digits - 1)) * + static_cast(2); + return value < int_max_plus_one; + } else { + if (value <= 0) { + return true; + } + if (!std::isfinite(value)) { + return false; + } + // Set exp such that value == f * 2^exp for some f in [0.5, 1.0). + // Note that this implies that the magnitude of value is strictly less than + // 2^exp. + int exp = 0; + std::frexp(value, &exp); + + // Let N be the number of non-sign bits in the representation of IntType. + // If the magnitude of value is strictly less than 2^N, the truncated + // version of value is representable as IntType. + return exp <= std::numeric_limits::digits; + } +} + +// Return true if the truncated form of value is greater than or equal to the +// MIN value of IntType. When the MIN value of IntType can not be represented +// precisely in FloatType, the comparison is tricky, because the MIN value of +// IntType is promoted to a FloatType value that is actually greater than what +// IntType can handle. Also note that when value is nan, this function will +// return false. +template +bool GreaterThanOrEqualToIntMin(FloatType value) { + static_assert(kTypeIsFloating); + static_assert(kTypeIsIntegral); + static_assert(std::numeric_limits::radix == 2); + if constexpr (!std::numeric_limits::is_signed) { + if constexpr (Truncate) { + // We are checking if the truncated result is >= 0. This is equivalent to + // asking if value is larger than -1 as negative numbers in (-1, 0) will + // be truncated to 0. + return value > static_cast(-1.0); + } else { + // We are checking if the untruncated result is >= 0. Our value must be + // non-negative. + return value >= static_cast(0.0); + } + } + if constexpr (Truncate) { + // We are checking if the truncated result is >= -(2^N). Truncate `value` + // before performing the comparison. + value = std::trunc(value); + } + if constexpr (CanRepresentMaxPlusOnePrecisely()) { + auto int_min = static_cast(std::numeric_limits::min()); + return value >= int_min; + } else { + if (!std::isfinite(value)) { + return false; + } + if (value >= static_cast(0.0)) { + return true; + } + // Set exp such that value == f * 2^exp for some f in (-1.0, -0.5]. + // Note that this implies that the magnitude of value is strictly less than + // 2^exp. + int exp = 0; + FloatType f = std::frexp(value, &exp); + + // Let N be the number of non-sign bits in the representation of IntType. + // If the magnitude of value is less than or equal to 2^N, the value is + // representable as IntType. + return exp < std::numeric_limits::digits + 1 || + (exp == std::numeric_limits::digits + 1 && f == -0.5); + } +} + +template +bool InRange(FloatType value) { + static_assert(kTypeIsFloating); + static_assert(kTypeIsIntegral); + static_assert(sizeof(ResultType) <= 16); + static_assert(sizeof(FloatType) >= 4); + return SmallerThanOrEqualToIntMax(value) && + GreaterThanOrEqualToIntMin(value); +} + +} // namespace internal + +// Return true if the truncated form of value is in the representable range of +// the result type, eg. MIN-1 < value < MAX+1. +template +bool InRange(FloatType value) { + return castops::internal::InRange(value); +} + +// Return true if the value is in the representable range of the result type: +// MIN <= value <= MAX. +template +bool InRangeNoTruncate(FloatType value) { + return castops::internal::InRange(value); +} + +template +ResultType SaturatingFloatToInt(FloatType value) { + static_assert(internal::kTypeIsFloating); + static_assert(internal::kTypeIsIntegral); + static_assert(sizeof(ResultType) <= 16); + static_assert(sizeof(FloatType) >= 4); + if (std::isnan(value)) { + // If value is NaN. + return 0; + } else if (!castops::internal::SmallerThanOrEqualToIntMax< + FloatType, ResultType, /*Truncate=*/true>(value)) { + // If value > MAX + return std::numeric_limits::max(); + } else if (!castops::internal::GreaterThanOrEqualToIntMin< + FloatType, ResultType, /*Truncate=*/true>(value)) { + // If value < MIN + return std::numeric_limits::min(); + } else { + // Value is in the representable range of the result type. + return static_cast(value); + } +} + +inline float DoubleToFloat(double value) { + // If value is NaN, both clauses will evaluate to false. + if (value < std::numeric_limits::lowest()) + return -std::numeric_limits::infinity(); + if (value > std::numeric_limits::max()) + return std::numeric_limits::infinity(); + + return static_cast(value); +} + +inline float DoubleToFiniteFloat(double value) { + // If value is NaN, both clauses will evaluate to false. + if (value < std::numeric_limits::lowest()) + return std::numeric_limits::lowest(); + if (value > std::numeric_limits::max()) + return std::numeric_limits::max(); + + return static_cast(value); +} + +inline double LongDoubleToDouble(long double value) { + // If value is NaN, both clauses will evaluate to false. + if (value < std::numeric_limits::lowest()) + return -std::numeric_limits::infinity(); + if (value > std::numeric_limits::max()) + return std::numeric_limits::infinity(); + + return static_cast(value); +} + +inline double LongDoubleToFiniteDouble(long double value) { + // If value is NaN, both clauses will evaluate to false. + if (value < std::numeric_limits::lowest()) + return std::numeric_limits::lowest(); + if (value > std::numeric_limits::max()) + return std::numeric_limits::max(); + + return static_cast(value); +} +} // namespace castops + +namespace x86compatible { + +template +int32_t ToInt32(FloatType value) { + static_assert(castops::internal::kTypeIsFloating); + if (castops::internal::GreaterThanOrEqualToIntMin(value) && + castops::internal::SmallerThanOrEqualToIntMax(value)) { + return static_cast(value); + } else { + // For out-of-bound value, including NaN, x86_64 returns INT32_MIN. + return std::numeric_limits::min(); + } +} + +template +int64_t ToInt64(FloatType value) { + static_assert(castops::internal::kTypeIsFloating); + if (castops::internal::GreaterThanOrEqualToIntMin(value) && + castops::internal::SmallerThanOrEqualToIntMax(value)) { + return static_cast(value); + } else { + return std::numeric_limits::min(); + } +} + +template +uint32_t ToUint32(FloatType value) { + static_assert(castops::internal::kTypeIsFloating); + return static_cast(ToInt64(value)); +} + +template +uint64_t ToUint64(FloatType value) { + static_assert(castops::internal::kTypeIsFloating); + if (value >= static_cast(0.0) && + castops::internal::SmallerThanOrEqualToIntMax(value)) { + return static_cast(value); + } else if (value < static_cast(0.0) && + castops::internal::GreaterThanOrEqualToIntMin( + value)) { + return static_cast(static_cast(value)); + } else if (value > static_cast(0.0)) { + return 0; + } else { + // If value < INT64_MIN or value is NaN. This is tricky when value is NaN, + // because LLVM and GCC output differently. We mimic what GCC does. + return static_cast(std::numeric_limits::min()); + } +} + +namespace internal { + +// Emulating X86-64's behavior of casting long double, double or float into +// small integral types whose size is smaller than 4 bytes. Comparing to +// SaturatingFloatToInt, X86-64 does things differently +// when value is out of the representable range of SmallInt. +// Basically, X86-64 does: +// (SmallInt)(int32_t)value. +template +SmallInt ToSmallIntegral(FloatType value) { + static_assert(castops::internal::kTypeIsFloating); + static_assert(castops::internal::kTypeIsIntegral); + static_assert(sizeof(SmallInt) < 4); + return static_cast(ToInt32(value)); +} + +} // namespace internal + +template +int16_t ToInt16(FloatType value) { + return internal::ToSmallIntegral(value); +} + +template +uint16_t ToUint16(FloatType value) { + return internal::ToSmallIntegral(value); +} + +template +signed char ToSchar(FloatType value) { + return internal::ToSmallIntegral + (value); +} + +template +unsigned char ToUchar(FloatType value) { + return internal::ToSmallIntegral + (value); +} + +} // namespace x86compatible +} // namespace zetasql_base + +#endif // THIRD_PARTY_ZETASQL_ZETASQL_BASE_CASTOPS_H_ diff --git a/zetasql/base/castops_test.cc b/zetasql/base/castops_test.cc new file mode 100644 index 000000000..b98c14475 --- /dev/null +++ b/zetasql/base/castops_test.cc @@ -0,0 +1,764 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +#include "zetasql/base/castops.h" + +#include +#include + +#include "gtest/gtest.h" +#include "absl/numeric/int128.h" + +static const float FLOAT_INT32_MIN = static_cast(INT32_MIN); +static const float FLOAT_INT32_MAX = static_cast(INT32_MAX); +static const float FLOAT_INT64_MIN = static_cast(INT64_MIN); +static const float FLOAT_INT64_MAX = static_cast(INT64_MAX); +static const float FLOAT_UINT32_MAX = static_cast(UINT32_MAX); +static const float FLOAT_UINT64_MAX = static_cast(UINT64_MAX); +static const double DOUBLE_INT32_MIN = static_cast(INT32_MIN); +static const double DOUBLE_INT32_MAX = static_cast(INT32_MAX); +static const double DOUBLE_INT64_MIN = static_cast(INT64_MIN); +static const double DOUBLE_INT64_MAX = static_cast(INT64_MAX); +static const double DOUBLE_UINT32_MAX = static_cast(UINT32_MAX); +static const double DOUBLE_UINT64_MAX = static_cast(UINT64_MAX); +static const float FLOAT_INFI = std::numeric_limits::infinity(); +static const float FLOAT_NINFI = -std::numeric_limits::infinity(); +static const float FLOAT_NAN = std::numeric_limits::quiet_NaN(); +static const double DOUBLE_INFI = std::numeric_limits::infinity(); +static const double DOUBLE_NINFI = -std::numeric_limits::infinity(); +static const double DOUBLE_NAN = std::numeric_limits::quiet_NaN(); + +typedef unsigned char Uchar; +typedef signed char Schar; + +namespace zetasql_base { +namespace castops { + +TEST(CastOpsTest, SaturatingFloatToInt) { + // int32_t in range cases. + EXPECT_EQ(5744950, (SaturatingFloatToInt(5744950.5334))); + EXPECT_EQ(-41834793, + (SaturatingFloatToInt(-41834793.402368))); + EXPECT_EQ(1470707200, (SaturatingFloatToInt(1470707200))); + EXPECT_EQ(-14707, (SaturatingFloatToInt(-14707.997))); + + // int32_t border or out of range cases. + EXPECT_EQ(INT32_MAX, (SaturatingFloatToInt(FLOAT_INT32_MAX))); + EXPECT_EQ(INT32_MAX, + (SaturatingFloatToInt(FLOAT_INT32_MAX + 1))); + EXPECT_EQ(INT32_MIN, (SaturatingFloatToInt(FLOAT_INT32_MIN))); + EXPECT_EQ(INT32_MAX, + (SaturatingFloatToInt(DOUBLE_INT32_MAX))); + EXPECT_EQ(INT32_MAX, + (SaturatingFloatToInt(DOUBLE_INT32_MAX + 1))); + EXPECT_EQ(INT32_MAX, + (SaturatingFloatToInt(DOUBLE_INT32_MAX + 0.5))); + EXPECT_EQ(INT32_MIN, + (SaturatingFloatToInt(DOUBLE_INT32_MIN))); + EXPECT_EQ(INT32_MIN, + (SaturatingFloatToInt(DOUBLE_INT32_MIN - 0.5))); + + // int32_t infinite and nan cases. + EXPECT_EQ(INT32_MAX, (SaturatingFloatToInt(FLOAT_INFI))); + EXPECT_EQ(INT32_MIN, (SaturatingFloatToInt(FLOAT_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(FLOAT_NAN))); + EXPECT_EQ(INT32_MAX, (SaturatingFloatToInt(DOUBLE_INFI))); + EXPECT_EQ(INT32_MIN, (SaturatingFloatToInt(DOUBLE_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(DOUBLE_NAN))); + + // int64_t in range cases. + EXPECT_EQ(37483134, (SaturatingFloatToInt(37483134.653))); + EXPECT_EQ(-37483134, (SaturatingFloatToInt(-37483134.653))); + EXPECT_EQ(374831, (SaturatingFloatToInt(374831.653))); + EXPECT_EQ(-374831, (SaturatingFloatToInt(-374831.653))); + + // int64_t border or out of range cases. + EXPECT_EQ(INT64_MAX, + (SaturatingFloatToInt(DOUBLE_INT64_MAX))); + EXPECT_EQ(INT64_MAX, + (SaturatingFloatToInt(DOUBLE_INT64_MAX + 1))); + EXPECT_EQ(INT64_MAX, + (SaturatingFloatToInt(DOUBLE_INT64_MAX + 0.5))); + EXPECT_EQ(INT64_MIN, + (SaturatingFloatToInt(DOUBLE_INT64_MIN))); + EXPECT_EQ(INT64_MIN, + (SaturatingFloatToInt(DOUBLE_INT64_MIN - 1))); + EXPECT_EQ(INT64_MIN, + (SaturatingFloatToInt(DOUBLE_INT64_MIN - 0.5))); + EXPECT_EQ(INT64_MAX, (SaturatingFloatToInt(FLOAT_INT64_MAX))); + EXPECT_EQ(INT64_MIN, (SaturatingFloatToInt(FLOAT_INT64_MIN))); + + // int64_t infinite and nan cases. + EXPECT_EQ(INT64_MAX, (SaturatingFloatToInt(FLOAT_INFI))); + EXPECT_EQ(INT64_MIN, (SaturatingFloatToInt(FLOAT_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(FLOAT_NAN))); + EXPECT_EQ(INT64_MAX, (SaturatingFloatToInt(DOUBLE_INFI))); + EXPECT_EQ(INT64_MIN, (SaturatingFloatToInt(DOUBLE_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(DOUBLE_NAN))); + + // uint32_t in range cases. + EXPECT_EQ(5744950, (SaturatingFloatToInt(5744950.5334))); + EXPECT_EQ(2634022912, (SaturatingFloatToInt(2634022912.00))); + + // uint32_t corner or out of range cases. + EXPECT_EQ(UINT32_MAX, + (SaturatingFloatToInt(DOUBLE_UINT32_MAX))); + EXPECT_EQ(UINT32_MAX, + (SaturatingFloatToInt(DOUBLE_UINT32_MAX + 0.5))); + EXPECT_EQ(0, (SaturatingFloatToInt(-1.23))); + EXPECT_EQ(UINT32_MAX, + (SaturatingFloatToInt(FLOAT_UINT32_MAX))); + EXPECT_EQ(UINT32_MAX, + (SaturatingFloatToInt(FLOAT_UINT32_MAX + 0.5))); + EXPECT_EQ(0, (SaturatingFloatToInt(-1.023))); + + // uint32_t infinite and nan cases. + EXPECT_EQ(UINT32_MAX, (SaturatingFloatToInt(FLOAT_INFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(FLOAT_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(FLOAT_NAN))); + EXPECT_EQ(UINT32_MAX, (SaturatingFloatToInt(DOUBLE_INFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(DOUBLE_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(DOUBLE_NAN))); + + // uint64_t in range cases. + EXPECT_EQ(5744950, (SaturatingFloatToInt(5744950.5334))); + EXPECT_EQ(2634022912, (SaturatingFloatToInt(2634022912.00))); + + // uint64_t corner or out of range cases. + EXPECT_EQ(UINT64_MAX, + (SaturatingFloatToInt(DOUBLE_UINT64_MAX))); + EXPECT_EQ(UINT64_MAX, + (SaturatingFloatToInt(DOUBLE_UINT64_MAX + 0.5))); + EXPECT_EQ(0, (SaturatingFloatToInt(-1.23))); + EXPECT_EQ(UINT64_MAX, + (SaturatingFloatToInt(FLOAT_UINT64_MAX))); + EXPECT_EQ(UINT64_MAX, + (SaturatingFloatToInt(FLOAT_UINT64_MAX + 0.5))); + EXPECT_EQ(0, (SaturatingFloatToInt(-1.023))); + + // uint64_t infinite and nan cases. + EXPECT_EQ(UINT64_MAX, (SaturatingFloatToInt(FLOAT_INFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(FLOAT_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(FLOAT_NAN))); + EXPECT_EQ(UINT64_MAX, (SaturatingFloatToInt(DOUBLE_INFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(DOUBLE_NINFI))); + EXPECT_EQ(0, (SaturatingFloatToInt(DOUBLE_NAN))); + + // Schar in range cases. + EXPECT_EQ(101, (SaturatingFloatToInt(101.234))); + EXPECT_EQ(-100, (SaturatingFloatToInt(-100.234))); + EXPECT_EQ(101, (SaturatingFloatToInt(101.234))); + EXPECT_EQ(-100, (SaturatingFloatToInt(-100.234))); + + // Schar corner or out of range cases. + EXPECT_EQ(127, (SaturatingFloatToInt(127.23))); + EXPECT_EQ(127, (SaturatingFloatToInt(128.13))); + EXPECT_EQ(-128, (SaturatingFloatToInt(-128))); + EXPECT_EQ(-128, (SaturatingFloatToInt(-129))); + EXPECT_EQ(127, (SaturatingFloatToInt(127.23))); + EXPECT_EQ(127, (SaturatingFloatToInt(128.13))); + EXPECT_EQ(-128, (SaturatingFloatToInt(-128))); + EXPECT_EQ(-128, (SaturatingFloatToInt(-129))); + + // Schar infinite and nan cases. + EXPECT_EQ(127, + (SaturatingFloatToInt(FLOAT_INFI))); + EXPECT_EQ(-128, + (SaturatingFloatToInt(FLOAT_NINFI))); + EXPECT_EQ(0, + (SaturatingFloatToInt(FLOAT_NAN))); + EXPECT_EQ(127, + (SaturatingFloatToInt(DOUBLE_INFI))); + EXPECT_EQ(-128, + (SaturatingFloatToInt(DOUBLE_NINFI))); + EXPECT_EQ(0, + (SaturatingFloatToInt(DOUBLE_NAN))); + + // Uchar in range cases. + EXPECT_EQ(201, (SaturatingFloatToInt(201.234))); + EXPECT_EQ(200, (SaturatingFloatToInt(200.234))); + + // Uchar corner or out of range cases. + EXPECT_EQ(255, (SaturatingFloatToInt(255.23))); + EXPECT_EQ(255, (SaturatingFloatToInt(256.83))); + EXPECT_EQ(0, (SaturatingFloatToInt(-1.13))); + EXPECT_EQ(255, (SaturatingFloatToInt(255.23))); + EXPECT_EQ(0, (SaturatingFloatToInt(-12))); + + // Uchar infinite and nan cases. + EXPECT_EQ(255, + (SaturatingFloatToInt(FLOAT_INFI))); + EXPECT_EQ(0, + (SaturatingFloatToInt(FLOAT_NINFI))); + EXPECT_EQ(0, + (SaturatingFloatToInt(FLOAT_NAN))); + EXPECT_EQ(255, + (SaturatingFloatToInt(DOUBLE_INFI))); + EXPECT_EQ(0, + (SaturatingFloatToInt(DOUBLE_NINFI))); + EXPECT_EQ(0, + (SaturatingFloatToInt(DOUBLE_NAN))); + + // absl::int128 cases. + EXPECT_EQ(absl::MakeInt128(2000, 0), + (SaturatingFloatToInt( + 36893488147419103232000.0L))); + EXPECT_EQ(std::numeric_limits::max(), + (SaturatingFloatToInt( + std::numeric_limits::max()))); + EXPECT_EQ(std::numeric_limits::min(), + (SaturatingFloatToInt( + std::numeric_limits::lowest()))); + + // absl::uint128 cases. + EXPECT_EQ(absl::MakeInt128(2000, 0), + (SaturatingFloatToInt( + 36893488147419103232000.0L))); + EXPECT_EQ(std::numeric_limits::max(), + (SaturatingFloatToInt( + std::numeric_limits::max()))); + EXPECT_EQ(std::numeric_limits::min(), + (SaturatingFloatToInt(0.0))); +} + +TEST(CastOpsTest, InRange) { + // int32_t in range cases. + EXPECT_EQ(true, (InRange(5744950.5334))); + EXPECT_EQ(true, (InRange(-41834793.402368))); + EXPECT_EQ(true, (InRange(1470707200))); + EXPECT_EQ(true, (InRange(-14707.997))); + + // int32_t border or out of range cases. + EXPECT_EQ(false, (InRange(FLOAT_INT32_MAX))); + EXPECT_EQ(true, (InRange(FLOAT_INT32_MIN))); + EXPECT_EQ(true, (InRange(DOUBLE_INT32_MAX))); + EXPECT_EQ(false, (InRange(DOUBLE_INT32_MAX + 1))); + EXPECT_EQ(true, (InRange(DOUBLE_INT32_MAX + 0.5))); + EXPECT_EQ(true, (InRange(DOUBLE_INT32_MIN))); + EXPECT_EQ(true, (InRange(DOUBLE_INT32_MIN - 0.5))); + + // int64_t in range cases. + EXPECT_EQ(true, (InRange(37483134.653))); + EXPECT_EQ(true, (InRange(-37483134.653))); + EXPECT_EQ(true, (InRange(374831.653))); + EXPECT_EQ(true, (InRange(-374831.653))); + + // int64_t border or out of range cases. + EXPECT_EQ(false, (InRange(DOUBLE_INT64_MAX))); + EXPECT_EQ(false, (InRange(DOUBLE_INT64_MAX + 1))); + EXPECT_EQ(true, (InRange(DOUBLE_INT64_MIN))); + EXPECT_EQ(false, (InRange(FLOAT_INT64_MAX))); + EXPECT_EQ(true, (InRange(FLOAT_INT64_MIN))); + + // uint32_t in range cases. + EXPECT_EQ(true, (InRange(5744950.5334))); + EXPECT_EQ(true, (InRange(2634022912.00))); + + // uint32_t corner or out of range cases. + EXPECT_EQ(true, (InRange(DOUBLE_UINT32_MAX))); + EXPECT_EQ(true, (InRange(DOUBLE_UINT32_MAX + 0.5))); + EXPECT_EQ(false, (InRange(-1.23))); + EXPECT_EQ(true, (InRange(-0.23))); + EXPECT_EQ(false, (InRange(FLOAT_UINT32_MAX))); + EXPECT_EQ(false, (InRange(-1.023))); + EXPECT_EQ(true, (InRange(-0.023))); + + // uint64_t in range cases. + EXPECT_EQ(true, (InRange(5744950.5334))); + EXPECT_EQ(true, (InRange(2634022912.00))); + + // uint64_t corner or out of range cases. + EXPECT_EQ(false, (InRange(DOUBLE_UINT64_MAX))); + EXPECT_EQ(false, (InRange(-1.23))); + EXPECT_EQ(false, (InRange(FLOAT_UINT64_MAX))); + EXPECT_EQ(false, (InRange(-1.023))); + + // Schar in range cases. + EXPECT_EQ(true, (InRange(101.234))); + EXPECT_EQ(true, (InRange(-100.234))); + EXPECT_EQ(true, (InRange(101.234))); + EXPECT_EQ(true, (InRange(-100.234))); + + // Schar corner or out of range cases. + EXPECT_EQ(true, (InRange(127.23))); + EXPECT_EQ(false, (InRange(128.13))); + EXPECT_EQ(true, (InRange(-128))); + EXPECT_EQ(false, (InRange(-129))); + EXPECT_EQ(true, (InRange(-128.5))); + EXPECT_EQ(true, (InRange(127.23))); + EXPECT_EQ(false, (InRange(128.13))); + EXPECT_EQ(true, (InRange(-128))); + EXPECT_EQ(true, (InRange(-128.233))); + EXPECT_EQ(false, (InRange(-129))); + + // Uchar in range cases. + EXPECT_EQ(true, (InRange(201.234))); + EXPECT_EQ(true, (InRange(200.234))); + + // Uchar corner or out of range cases. + EXPECT_EQ(true, (InRange(255.23))); + EXPECT_EQ(false, (InRange(256.83))); + EXPECT_EQ(false, (InRange(-1.13))); + EXPECT_EQ(true, (InRange(255.23))); + EXPECT_EQ(false, (InRange(-12))); +} + +TEST(CastOpsTest, InRangeNoTruncate) { + // int32_t in range cases. + EXPECT_EQ(true, (InRangeNoTruncate(5744950.5334))); + EXPECT_EQ(true, (InRangeNoTruncate(-41834793.402368))); + EXPECT_EQ(true, (InRangeNoTruncate(1470707200))); + EXPECT_EQ(true, (InRangeNoTruncate(-14707.997))); + + // int32_t border or out of range cases. + EXPECT_EQ(false, (InRangeNoTruncate(FLOAT_INT32_MAX))); + EXPECT_EQ(true, (InRangeNoTruncate(FLOAT_INT32_MIN))); + EXPECT_EQ(true, (InRangeNoTruncate(DOUBLE_INT32_MAX))); + EXPECT_EQ(false, + (InRangeNoTruncate(DOUBLE_INT32_MAX + 0.5))); + EXPECT_EQ(true, (InRangeNoTruncate(DOUBLE_INT32_MIN))); + EXPECT_EQ(false, + (InRangeNoTruncate(DOUBLE_INT32_MIN - 0.5))); + + // int64_t in range cases. + EXPECT_EQ(true, (InRangeNoTruncate(37483134.653))); + EXPECT_EQ(true, (InRangeNoTruncate(-37483134.653))); + EXPECT_EQ(true, (InRangeNoTruncate(374831.653))); + EXPECT_EQ(true, (InRangeNoTruncate(-374831.653))); + + // int64_t border or out of range cases. + EXPECT_EQ(false, (InRangeNoTruncate(DOUBLE_INT64_MAX))); + EXPECT_EQ(false, (InRangeNoTruncate(DOUBLE_INT64_MAX + 1))); + EXPECT_EQ(true, (InRangeNoTruncate(DOUBLE_INT64_MIN))); + EXPECT_EQ(false, (InRangeNoTruncate(FLOAT_INT64_MAX))); + EXPECT_EQ(true, (InRangeNoTruncate(FLOAT_INT64_MIN))); + + // uint32_t in range cases. + EXPECT_EQ(true, (InRangeNoTruncate(5744950.5334))); + EXPECT_EQ(true, (InRangeNoTruncate(2634022912.00))); + + // uint32_t corner or out of range cases. + EXPECT_EQ(true, (InRangeNoTruncate(DOUBLE_UINT32_MAX))); + EXPECT_EQ(false, + (InRangeNoTruncate(DOUBLE_UINT32_MAX + 0.5))); + EXPECT_EQ(false, (InRangeNoTruncate(-1.23))); + EXPECT_EQ(false, (InRangeNoTruncate(-0.23))); + EXPECT_EQ(false, (InRangeNoTruncate(FLOAT_UINT32_MAX))); + EXPECT_EQ(false, (InRangeNoTruncate(-1.023))); + EXPECT_EQ(false, (InRangeNoTruncate(-0.023))); + + // uint64_t in range cases. + EXPECT_EQ(true, (InRangeNoTruncate(5744950.5334))); + EXPECT_EQ(true, (InRangeNoTruncate(2634022912.00))); + + // uint64_t corner or out of range cases. + EXPECT_EQ(false, (InRangeNoTruncate(DOUBLE_UINT64_MAX))); + EXPECT_EQ(false, (InRangeNoTruncate(-1.23))); + EXPECT_EQ(false, (InRangeNoTruncate(FLOAT_UINT64_MAX))); + EXPECT_EQ(false, (InRangeNoTruncate(-1.023))); + + // Schar in range cases. + EXPECT_EQ(true, (InRangeNoTruncate(101.234))); + EXPECT_EQ(true, (InRangeNoTruncate(-100.234))); + EXPECT_EQ(true, (InRangeNoTruncate(101.234))); + EXPECT_EQ(true, (InRangeNoTruncate(-100.234))); + + // Schar corner or out of range cases. + EXPECT_EQ(true, (InRangeNoTruncate(127))); + EXPECT_EQ(false, (InRangeNoTruncate(127.023))); + EXPECT_EQ(true, (InRangeNoTruncate(-128))); + EXPECT_EQ(false, (InRangeNoTruncate(-128.023))); + EXPECT_EQ(true, (InRangeNoTruncate(127))); + EXPECT_EQ(false, (InRangeNoTruncate(127.023))); + EXPECT_EQ(true, (InRangeNoTruncate(-128))); + EXPECT_EQ(false, (InRangeNoTruncate(-128.0233))); + + // Uchar in range cases. + EXPECT_EQ(true, (InRangeNoTruncate(201.234))); + EXPECT_EQ(true, (InRangeNoTruncate(200.234))); + + // Uchar corner or out of range cases. + EXPECT_EQ(true, (InRangeNoTruncate(255))); + EXPECT_EQ(false, (InRangeNoTruncate(255.023))); + EXPECT_EQ(false, (InRangeNoTruncate(-1.13))); + EXPECT_EQ(true, (InRangeNoTruncate(255))); + EXPECT_EQ(false, (InRangeNoTruncate(255.023))); + EXPECT_EQ(false, (InRangeNoTruncate(-12))); +} + +TEST(CastOpsTest, DoubleToFloat) { + // NaN + EXPECT_TRUE(std::isnan( + DoubleToFloat(std::numeric_limits::quiet_NaN()))); + + // Within float range + EXPECT_EQ(static_cast(1.23), DoubleToFloat(1.23)); + EXPECT_EQ(static_cast(-0.034), DoubleToFloat(-0.034)); + EXPECT_EQ(0.0f, DoubleToFloat(0.0)); + + // Limits of float range + // Relies on the fact that every float is exactly representable as a double. + EXPECT_EQ(std::numeric_limits::max(), + DoubleToFloat(std::numeric_limits::max())); + EXPECT_EQ(std::numeric_limits::lowest(), + DoubleToFloat(std::numeric_limits::lowest())); + + // Clips to +infinity + EXPECT_EQ(std::numeric_limits::infinity(), + DoubleToFloat(std::numeric_limits::infinity())); + EXPECT_EQ(std::numeric_limits::infinity(), + DoubleToFloat( + 2.0 * static_cast(std::numeric_limits::max()))); + + // Clips to -infinity + EXPECT_EQ(-std::numeric_limits::infinity(), + DoubleToFloat(-std::numeric_limits::infinity())); + EXPECT_EQ(-std::numeric_limits::infinity(), + DoubleToFloat( + 2.0 * static_cast( + std::numeric_limits::lowest()))); +} + +template +class CastOpsInfNanTest : public ::testing::Test { + public: + T value_; +}; + +typedef ::testing::Types + MyTypes; +TYPED_TEST_SUITE(CastOpsInfNanTest, MyTypes); + +TYPED_TEST(CastOpsInfNanTest, InRangeInfNanTest) { + EXPECT_EQ(false, (InRange(FLOAT_INFI))); + EXPECT_EQ(false, (InRange(FLOAT_NINFI))); + EXPECT_EQ(false, (InRange(FLOAT_NAN))); + EXPECT_EQ(false, (InRange(DOUBLE_INFI))); + EXPECT_EQ(false, (InRange(DOUBLE_NINFI))); + EXPECT_EQ(false, (InRange(DOUBLE_NAN))); +} + +TYPED_TEST(CastOpsInfNanTest, InRangeNoTruncateInfNanTest) { + EXPECT_EQ(false, (InRangeNoTruncate(FLOAT_INFI))); + EXPECT_EQ(false, (InRangeNoTruncate(FLOAT_NINFI))); + EXPECT_EQ(false, (InRangeNoTruncate(FLOAT_NAN))); + EXPECT_EQ(false, (InRangeNoTruncate(DOUBLE_INFI))); + EXPECT_EQ(false, (InRangeNoTruncate(DOUBLE_NINFI))); + EXPECT_EQ(false, (InRangeNoTruncate(DOUBLE_NAN))); +} + + +TEST(CastOpsTest, DoubleToFiniteFloat) { + // NaN + EXPECT_TRUE(std::isnan( + DoubleToFiniteFloat(std::numeric_limits::quiet_NaN()))); + + // Within float range + EXPECT_EQ(static_cast(1.23), DoubleToFiniteFloat(1.23)); + EXPECT_EQ(static_cast(-0.034), DoubleToFiniteFloat(-0.034)); + EXPECT_EQ(0.0f, DoubleToFiniteFloat(0.0)); + + // Limits of float range + // Relies on the fact that every float is exactly representable as a double. + EXPECT_EQ(std::numeric_limits::max(), + DoubleToFiniteFloat(std::numeric_limits::max())); + EXPECT_EQ(std::numeric_limits::lowest(), + DoubleToFiniteFloat(std::numeric_limits::lowest())); + + // Clips to FLT_MAX + EXPECT_EQ(std::numeric_limits::max(), + DoubleToFiniteFloat(std::numeric_limits::infinity())); + EXPECT_EQ(std::numeric_limits::max(), + DoubleToFiniteFloat( + 2.0 * static_cast(std::numeric_limits::max()))); + + // Clips to FLT_LOWEST + EXPECT_EQ(std::numeric_limits::lowest(), + DoubleToFiniteFloat(-std::numeric_limits::infinity())); + EXPECT_EQ(std::numeric_limits::lowest(), + DoubleToFiniteFloat( + 2.0 * static_cast( + std::numeric_limits::lowest()))); +} + +TEST(CastOpsTest, LongDoubleToDouble) { + // NaN + EXPECT_TRUE( + std::isnan(LongDoubleToDouble(std::numeric_limits::quiet_NaN()))); + + // Within double range + EXPECT_EQ(1.23, LongDoubleToDouble(1.23L)); + EXPECT_EQ(-0.034, LongDoubleToDouble(-0.034L)); + EXPECT_EQ(0.0, LongDoubleToDouble(0.0L)); + + // Limits of double range + // Relies on the fact that every double is exactly representable as a long + // double. + EXPECT_EQ(std::numeric_limits::max(), + LongDoubleToDouble(std::numeric_limits::max())); + EXPECT_EQ(std::numeric_limits::lowest(), + LongDoubleToDouble(std::numeric_limits::lowest())); + + // Clips to +infinity + EXPECT_EQ(std::numeric_limits::infinity(), + LongDoubleToDouble(std::numeric_limits::infinity())); + EXPECT_EQ(std::numeric_limits::infinity(), + LongDoubleToDouble(2.0 * static_cast( + std::numeric_limits::max()))); + + // Clips to -infinity + EXPECT_EQ(-std::numeric_limits::infinity(), + LongDoubleToDouble(-std::numeric_limits::infinity())); + EXPECT_EQ( + -std::numeric_limits::infinity(), + LongDoubleToDouble(2.0 * static_cast( + std::numeric_limits::lowest()))); +} + +TEST(CastOpsTest, LongDoubleToFiniteDouble) { + // NaN + EXPECT_TRUE(std::isnan( + LongDoubleToFiniteDouble(std::numeric_limits::quiet_NaN()))); + + // Within double range + EXPECT_EQ(1.23, LongDoubleToFiniteDouble(1.23L)); + EXPECT_EQ(-0.034, LongDoubleToFiniteDouble(-0.034L)); + EXPECT_EQ(0.0, LongDoubleToFiniteDouble(0.0L)); + + // Limits of double range + // Relies on the fact that every double is exactly representable as a long + // double. + EXPECT_EQ(std::numeric_limits::max(), + LongDoubleToFiniteDouble(std::numeric_limits::max())); + EXPECT_EQ(std::numeric_limits::lowest(), + LongDoubleToFiniteDouble(std::numeric_limits::lowest())); + + // Clips to DBL_MAX + EXPECT_EQ( + std::numeric_limits::max(), + LongDoubleToFiniteDouble(std::numeric_limits::infinity())); + EXPECT_EQ( + std::numeric_limits::max(), + LongDoubleToFiniteDouble( + 2.0 * static_cast(std::numeric_limits::max()))); + + // Clips to DBL_LOWEST + EXPECT_EQ( + std::numeric_limits::lowest(), + LongDoubleToFiniteDouble(-std::numeric_limits::infinity())); + EXPECT_EQ(std::numeric_limits::lowest(), + LongDoubleToFiniteDouble( + 2.0 * static_cast( + std::numeric_limits::lowest()))); +} + +} // namespace castops + +namespace x86compatible { + +TEST(CastOpsTest, ToInt32) { + // In range common cases. + EXPECT_EQ(5744950, (ToInt32(5744950.5334))); + EXPECT_EQ(-41834793, (ToInt32(-41834793.402368))); + EXPECT_EQ(14707, (ToInt32(14707.00))); + + EXPECT_EQ(5744950, (ToInt32(5744950.2334))); + EXPECT_EQ(1470707200, (ToInt32(1470707200.128))); + EXPECT_EQ(-14707, (ToInt32(-14707.997))); + + // Double border cases. + EXPECT_EQ(INT32_MAX, (ToInt32(DOUBLE_INT32_MAX))); + EXPECT_EQ(INT32_MAX, (ToInt32(DOUBLE_INT32_MAX+0.5))); + EXPECT_EQ(INT32_MIN, (ToInt32(DOUBLE_INT32_MIN))); + EXPECT_EQ(INT32_MIN, (ToInt32(DOUBLE_INT32_MIN-0.5))); + EXPECT_EQ(INT32_MIN, (ToInt32(DOUBLE_INT32_MIN-1))); + // Double value DOUBLE_INT32_MIN+0.1 is close to INT_MIN and in-range. + EXPECT_EQ(INT32_MIN+1, (ToInt32(DOUBLE_INT32_MIN+0.1))); + + // Float border cases. + // Because INT_MAX cannot be presented precisely in float, when it is casted + // to float type, it is actually treated out of range of int32_t representable + // range. + // Note that 2147483520.000f (2^31-128 = FLOAT_INT32_MAX-128) and + // 2147483648.000f (2^31 = FLOAT_INT32_MAX) are + // adjacent float type values. + EXPECT_EQ(INT32_MIN, (ToInt32(FLOAT_INT32_MAX))); + EXPECT_EQ(INT32_MAX-127, (ToInt32(FLOAT_INT32_MAX-128))); + EXPECT_EQ(INT32_MIN, (ToInt32(FLOAT_INT32_MIN))); + + // Infinite and NaN cases. + EXPECT_EQ(INT32_MIN, ToInt32(FLOAT_INFI)); + EXPECT_EQ(INT32_MIN, ToInt32(FLOAT_NINFI)); + EXPECT_EQ(INT32_MIN, ToInt32(FLOAT_NAN)); + EXPECT_EQ(INT32_MIN, ToInt32(DOUBLE_INFI)); + EXPECT_EQ(INT32_MIN, ToInt32(DOUBLE_NINFI)); + EXPECT_EQ(INT32_MIN, ToInt32(DOUBLE_NAN)); +} + +TEST(CastOpsTest, ToInt64) { + // In range common cases. + EXPECT_EQ(37483134, (ToInt64(37483134.653))); + EXPECT_EQ(-37483134, (ToInt64(-37483134.653))); + EXPECT_EQ(374831, (ToInt64(374831.653))); + EXPECT_EQ(-374831, (ToInt64(-374831.653))); + + // Border cases. + // Note that DOUBLE_INT64_MAX is actually 2^63. Its adjacent double value + // is 2^63-1024 in double. + EXPECT_EQ(INT64_MIN, ToInt64(DOUBLE_INT64_MAX)); + EXPECT_EQ(INT64_MAX-1023, ToInt64(DOUBLE_INT64_MAX-1024)); + // DOUBLE_INT64_MIN-2048, DOUBLE_INT64_MIN and DOUBLE_INT64_MIN+1024 are + // adjacent three double values. + EXPECT_EQ(INT64_MIN, ToInt64(DOUBLE_INT64_MIN)); + EXPECT_EQ(INT64_MIN+1024, ToInt64(DOUBLE_INT64_MIN+1024)); + EXPECT_EQ(INT64_MIN, ToInt64(DOUBLE_INT64_MIN-2048)); + + EXPECT_EQ(INT64_MIN, ToInt64(FLOAT_INT64_MAX)); + EXPECT_EQ(INT64_MIN, ToInt64(FLOAT_INT64_MIN)); + + // Infinite and NaN cases. + EXPECT_EQ(INT64_MIN, ToInt64(FLOAT_INFI)); + EXPECT_EQ(INT64_MIN, ToInt64(FLOAT_NINFI)); + EXPECT_EQ(INT64_MIN, ToInt64(FLOAT_NAN)); + EXPECT_EQ(INT64_MIN, ToInt64(DOUBLE_INFI)); + EXPECT_EQ(INT64_MIN, ToInt64(DOUBLE_NINFI)); + EXPECT_EQ(INT64_MIN, ToInt64(DOUBLE_NAN)); +} + +TEST(CastOptsTest, ToUint32) { + // In range common cases. + EXPECT_EQ(5744950, (ToUint32(5744950.5334))); + EXPECT_EQ(2634022912, (ToUint32(2634022912.00))); + + // Double border cases. + EXPECT_EQ(UINT32_MAX, ToUint32(DOUBLE_UINT32_MAX)); + EXPECT_EQ(0, ToUint32(DOUBLE_UINT32_MAX+1.01)); + + // Float border cases. + // Note that FLOAT_UINT32_MAX is actually 2^32. 2^32-256 is its adjacent float + // value that is in range. + EXPECT_EQ(0, ToUint32(FLOAT_UINT32_MAX)); + EXPECT_EQ(UINT32_MAX-255, ToUint32(FLOAT_UINT32_MAX-256)); + + // Negative cases. These are out-of-range of uint32_t. + EXPECT_EQ(static_cast(-678), ToUint32(-678)); + EXPECT_EQ(static_cast(INT32_MIN + 128), + ToUint32(FLOAT_INT32_MIN + 128)); + EXPECT_EQ(static_cast(INT32_MAX) + 1, + ToUint32(FLOAT_INT32_MIN)); + + // Infinite and NaN cases. + EXPECT_EQ(0, ToUint32(FLOAT_INFI)); + EXPECT_EQ(0, ToUint32(FLOAT_NINFI)); + EXPECT_EQ(0, ToUint32(FLOAT_NAN)); + EXPECT_EQ(0, ToUint32(DOUBLE_INFI)); + EXPECT_EQ(0, ToUint32(DOUBLE_NINFI)); + EXPECT_EQ(0, ToUint32(DOUBLE_NAN)); +} + +TEST(CastOptsTest, ToUint64) { + // In range common cases. + EXPECT_EQ(37483134, (ToUint64(37483134.653))); + EXPECT_EQ(374831, (ToUint64(374831.653))); + + // Double border cases. + EXPECT_EQ(0, ToUint64(DOUBLE_UINT64_MAX)); + EXPECT_EQ(UINT64_MAX-2047, + ToUint64(DOUBLE_UINT64_MAX-2048)); + + // Float border cases. + EXPECT_EQ(0, ToUint64(FLOAT_UINT64_MAX)); + + // Negative cases. + EXPECT_EQ(static_cast(-678), ToUint64(-678)); + EXPECT_EQ(static_cast(INT64_MAX) + 1, + ToUint64(FLOAT_INT64_MIN)); + EXPECT_EQ(static_cast(INT64_MAX) + 1, + ToUint64(DOUBLE_INT64_MIN)); + EXPECT_EQ(static_cast(INT64_MIN + 1024), + ToUint64(DOUBLE_INT64_MIN + 1024)); + EXPECT_EQ(static_cast(INT64_MAX) + 1, + ToUint64(DOUBLE_INT64_MIN - 2048)); + + // Infinite and NaN cases. + EXPECT_EQ(0, ToUint64(FLOAT_INFI)); + EXPECT_EQ(static_cast(INT64_MAX) + 1, ToUint64(FLOAT_NINFI)); + EXPECT_EQ(static_cast(INT64_MAX) + 1, ToUint64(FLOAT_NAN)); + EXPECT_EQ(0, ToUint64(DOUBLE_INFI)); + EXPECT_EQ(static_cast(INT64_MAX) + 1, + ToUint64(DOUBLE_NINFI)); + EXPECT_EQ(static_cast(INT64_MAX) + 1, ToUint64(DOUBLE_NAN)); +} + +// We group unit tests for some Toxxx() functions +// because they are very similar. +TEST(CastOptsTest, ToSmallIntegrals) { + // In range common cases. + EXPECT_EQ(8, (ToSchar(8.987096))); + EXPECT_EQ(-71, (ToSchar(-71.793536))); + EXPECT_EQ(8, (ToSchar(8.987096))); + EXPECT_EQ(-71, (ToSchar(-71.793536))); + EXPECT_EQ(8, (ToUchar(8.987096))); + EXPECT_EQ(8, (ToUchar(8.987096))); + EXPECT_EQ(187, (ToInt16(187.58))); + EXPECT_EQ(-71, (ToInt16(-71.793536))); + EXPECT_EQ(187, (ToInt16(187.58))); + EXPECT_EQ(-71, (ToInt16(-71.793536))); + EXPECT_EQ(28, (ToUint16(28.987096))); + EXPECT_EQ(28, (ToUint16(28.987096))); + + // Out of range cases. +#define CHECK_SMALLINT_OUTRANGE(ValueType, value, result_int) \ + EXPECT_EQ(static_cast(result_int), (ToSchar(value))); \ + EXPECT_EQ(static_cast(result_int), (ToUchar(value))); \ + EXPECT_EQ(static_cast(result_int), (ToInt16(value))); \ + EXPECT_EQ(static_cast(result_int), (ToUint16(value))); + + CHECK_SMALLINT_OUTRANGE(float, 128.34, 128) + CHECK_SMALLINT_OUTRANGE(double, 128.34, 128) + CHECK_SMALLINT_OUTRANGE(float, -129.34, -129) + CHECK_SMALLINT_OUTRANGE(double, -129.34, -129) + CHECK_SMALLINT_OUTRANGE(float, 32768.12, 32768) + CHECK_SMALLINT_OUTRANGE(float, -32769.3, -32769) + CHECK_SMALLINT_OUTRANGE(double, 32768.12, 32768) + CHECK_SMALLINT_OUTRANGE(double, -32769.3, -32769) + CHECK_SMALLINT_OUTRANGE(float, FLOAT_INT32_MAX, 0) + CHECK_SMALLINT_OUTRANGE(float, FLOAT_INT32_MIN, 0) + CHECK_SMALLINT_OUTRANGE(double, DOUBLE_INT32_MAX, INT32_MAX) + CHECK_SMALLINT_OUTRANGE(double, DOUBLE_INT32_MIN, 0) + + // Infinite and NaN cases. +#define CHECK_SMALLINT_INFI(IntType) \ + EXPECT_EQ(0, \ + (To##IntType(FLOAT_INFI))); \ + EXPECT_EQ(0, \ + (To##IntType(FLOAT_NINFI))); \ + EXPECT_EQ(0, \ + (To##IntType(FLOAT_NAN))); \ + EXPECT_EQ(0, \ + (To##IntType(DOUBLE_INFI))); \ + EXPECT_EQ(0, \ + (To##IntType(DOUBLE_NINFI))); \ + EXPECT_EQ(0, \ + (To##IntType(DOUBLE_NAN))); + + CHECK_SMALLINT_INFI(Schar) + CHECK_SMALLINT_INFI(Uchar) + CHECK_SMALLINT_INFI(Int16) + CHECK_SMALLINT_INFI(Uint16) +} + +} // namespace x86compatible +} // namespace zetasql_base diff --git a/zetasql/base/exactfloat.cc b/zetasql/base/exactfloat.cc index b5bc45bf9..8daf31aaa 100644 --- a/zetasql/base/exactfloat.cc +++ b/zetasql/base/exactfloat.cc @@ -25,7 +25,6 @@ #include #include "zetasql/base/logging.h" -#include #include "absl/base/macros.h" #include "absl/container/fixed_array.h" #include "openssl/bn.h" diff --git a/zetasql/base/exactfloat.h b/zetasql/base/exactfloat.h index 2209b13db..b3bd8c9d1 100644 --- a/zetasql/base/exactfloat.h +++ b/zetasql/base/exactfloat.h @@ -87,7 +87,6 @@ #include #include "zetasql/base/logging.h" -#include #include "openssl/bn.h" namespace zetasql_base { diff --git a/zetasql/base/exactfloat_test.cc b/zetasql/base/exactfloat_test.cc index b31c63fcc..0b601fa98 100644 --- a/zetasql/base/exactfloat_test.cc +++ b/zetasql/base/exactfloat_test.cc @@ -18,13 +18,13 @@ #include +#include #include #include #include "zetasql/base/logging.h" #include "gtest/gtest.h" #include "absl/base/casts.h" -#include #include "absl/base/macros.h" namespace zetasql_base { diff --git a/zetasql/base/exactfloat_underflow_test.cc b/zetasql/base/exactfloat_underflow_test.cc index 9e53ed700..948208a02 100644 --- a/zetasql/base/exactfloat_underflow_test.cc +++ b/zetasql/base/exactfloat_underflow_test.cc @@ -23,7 +23,6 @@ #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/base/casts.h" -#include #include "zetasql/base/exactfloat.h" namespace zetasql_base { diff --git a/zetasql/base/lossless_convert.h b/zetasql/base/lossless_convert.h new file mode 100644 index 000000000..00503cdab --- /dev/null +++ b/zetasql/base/lossless_convert.h @@ -0,0 +1,172 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#ifndef THIRD_PARTY_ZETASQL_ZETASQL_BASE_LOSSLESS_CONVERT_H_ +#define THIRD_PARTY_ZETASQL_ZETASQL_BASE_LOSSLESS_CONVERT_H_ + +#include +#include + +#include "zetasql/base/castops.h" + +namespace zetasql_base { + +// LosslessConvert casts a value from one numeric type to another, detecting if +// any information is lost due to overflow, rounding, or changes in signedness. +// If successful, returns true and writes converted value to *out. Otherwise, +// returns false, and the contents of *out is undefined. +// +// There may exist edge cases where the permissiveness of this function depends +// on compiler and CPU details. +// +// Note that converting a NaN value to another floating-point type will always +// succeed, even though the resulting values will not compare equal, because +// no information is lost (technically IEEE NaN values can store extra data in +// the mantissa bits, but we believe that data is virtually never used in +// Google code). +// +// Example usage: +// +// double x = 5.0; +// int32_t y; +// +// if (zetasql_base::LosslessConvert(x, &y)) { +// ABSL_LOG(INFO) << "Converted to: " << y; +// } +// +template +bool LosslessConvert(InType in_val, OutType *out); + +////////////////////////////////////////////////////////////////// +//// Implementation details follow; clients should ignore. + +namespace internal { + +template +typename std::enable_if::value && + std::is_integral::value, + bool>::type +LosslessConvert(InType in_val, OutType *out) { + // static_cast between integral types always produces a valid value, so we + // don't need a range check; we can just check after the fact if the + // cast changed the value. + *out = static_cast(in_val); + + if (static_cast(*out) != in_val) { + return false; + } + + // Detect sign flips when converting between signed and unsigned types. + // The numeric_limits check isn't strictly necessary, but could help + // the compiler optimize the branch away. + if (std::numeric_limits::is_signed != + std::numeric_limits::is_signed && + (in_val < 0) != (*out < 0)) { + return false; + } + + // No loss detected. + return true; +} + +template +typename std::enable_if::value && + std::is_integral::value, + bool>::type +LosslessConvert(InType in_val, OutType *out) { + // Check for float-cast-overflow. + if (!zetasql_base::castops::InRange(in_val)) { + return false; + } + + *out = static_cast(in_val); + + return in_val == static_cast(*out); +} + +template +typename std::enable_if::value && + std::is_floating_point::value, + bool>::type +LosslessConvert(InType in_val, OutType *out) { + // Assert that the range of OutType includes all possible values of InType. + // This is true even when InType is 32 bit IEEE and OutType is uint128, so + // it shouldn't fail in practice. + static_assert(std::numeric_limits::digits < + std::numeric_limits::max_exponent, + "Integral->floating conversion with such a large size" + " discrepancy is not yet supported"); + + *out = static_cast(in_val); + + // The initial conversion may have rounded the value out of the range of + // InType, so we need a range check before the reverse conversion. + return zetasql_base::castops::InRange(*out) && + in_val == static_cast(*out); +} + +template +typename std::enable_if::value && + std::is_floating_point::value, + bool>::type +LosslessConvert(InType in_val, OutType *out) { + // C++ and IEEE both guarantee that each floating-point type is a subset of + // the next larger one. + if (sizeof(OutType) >= sizeof(InType)) { + *out = static_cast(in_val); + return true; + } + + static_assert((std::numeric_limits::has_infinity || + !std::numeric_limits::has_infinity) && + (std::numeric_limits::has_quiet_NaN || + !std::numeric_limits::has_quiet_NaN), + "Conversions that can lose infinity or NaN are not supported"); + if (std::isnan(in_val) || std::isinf(in_val)) { + *out = static_cast(in_val); + return true; + } + + // At this point we know InType is larger than OutType, so we can safely + // cast to InType. + if (in_val > static_cast(std::numeric_limits::max()) || + in_val < static_cast(std::numeric_limits::lowest())) { + return false; + } + *out = static_cast(in_val); + return in_val == static_cast(*out); +} + +} // namespace internal + +template +[[nodiscard]] bool LosslessConvert(InType in_val, OutType *out) { + static_assert(std::is_arithmetic::value, ""); + static_assert(std::is_arithmetic::value, ""); + + // 128-bit integer support is possible, but annoying due to varying + // implementation support (is_integral<__int128> doesn't even produce a + // consistent result). Punt for now. + static_assert(!std::is_integral::value || + sizeof(InType) <= 8, "128-bit integer types are unsupported"); + static_assert(!std::is_integral::value || + sizeof(OutType) <= 8, "128-bit integer types are unsupported"); + + return internal::LosslessConvert(in_val, out); +} + +} // namespace zetasql_base + +#endif // THIRD_PARTY_ZETASQL_ZETASQL_BASE_LOSSLESS_CONVERT_H_ diff --git a/zetasql/base/lossless_convert_test.cc b/zetasql/base/lossless_convert_test.cc new file mode 100644 index 000000000..766fdd018 --- /dev/null +++ b/zetasql/base/lossless_convert_test.cc @@ -0,0 +1,250 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include "zetasql/base/lossless_convert.h" + +#include +#include +#include + +#include "gtest/gtest.h" + +namespace { + +template +bool IsNaN(T val) { + return false; +} +bool IsNaN(float val) { return std::isnan(val); } +bool IsNaN(double val) { return std::isnan(val); } +bool IsNaN(long double val) { return std::isnan(val); } + +template +bool Case(InType in_val, bool expect_success) { + OutType out_val; + return (zetasql_base::LosslessConvert(in_val, &out_val) == expect_success && + (!expect_success || in_val == out_val || + (IsNaN(in_val) && IsNaN(out_val)))); +} + +TEST(LosslessConvertTest, Identity) { + EXPECT_TRUE((Case(1234, true))); + EXPECT_TRUE((Case(true, true))); + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(1.0, true))); +} + +TEST(LosslessConvertTest, IntInt) { + // Watch out for sign flips. + EXPECT_TRUE((Case(-0x80000000, false))); + EXPECT_TRUE((Case(-0x80000000, false))); + EXPECT_TRUE((Case(-0x80000000, false))); + EXPECT_TRUE((Case(-1, false))); + EXPECT_TRUE((Case(-1, false))); + EXPECT_TRUE((Case(-1, false))); + EXPECT_TRUE((Case(0, true))); + EXPECT_TRUE((Case(0, true))); + EXPECT_TRUE((Case(0, true))); + EXPECT_TRUE((Case(0, true))); + EXPECT_TRUE((Case(0, true))); + EXPECT_TRUE((Case(0, true))); + EXPECT_TRUE((Case(0x80000000U, false))); + EXPECT_TRUE((Case(0x80000000U, false))); + EXPECT_TRUE((Case(0x80000000U, true))); + EXPECT_TRUE((Case(0xFFFFFFFFU, false))); + EXPECT_TRUE((Case(0xFFFFFFFFU, false))); + EXPECT_TRUE((Case(0xFFFFFFFFU, true))); +} + +TEST(LosslessConvertTest, BoolEtc) { + EXPECT_TRUE((Case(-1, false))); + EXPECT_TRUE((Case(0, true))); + EXPECT_TRUE((Case(1, true))); + EXPECT_TRUE((Case(2, false))); + + EXPECT_TRUE((Case(false, true))); + EXPECT_TRUE((Case(true, true))); + + EXPECT_TRUE((Case(0.0, true))); + EXPECT_TRUE((Case(0.5, false))); + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(2.0, false))); + + EXPECT_TRUE((Case(false, true))); + EXPECT_TRUE((Case(true, true))); +} + +TEST(LosslessConvertTest, IntDouble) { + // Large integers lose precision when cast to double. + EXPECT_TRUE((Case(int64_t{-10000000000000001}, false))); + EXPECT_TRUE((Case(int64_t{-1000000000000001}, true))); + EXPECT_TRUE((Case(int64_t{1000000000000001}, true))); + EXPECT_TRUE((Case(int64_t{10000000000000001}, false))); + EXPECT_TRUE( + (Case(std::numeric_limits::min(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::max(), false))); + + // Small integers are fine, aside from sign issues. + EXPECT_TRUE((Case(-1.0, true))); + EXPECT_TRUE((Case(0.0, true))); + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(-1.0, false))); + + // Non-integers lose precision when cast to an integer. + EXPECT_TRUE((Case(-0.5, false))); + EXPECT_TRUE((Case(0.5, false))); + EXPECT_TRUE( + (Case(-std::numeric_limits::infinity(), false))); + EXPECT_TRUE( + (Case(std::numeric_limits::infinity(), false))); + EXPECT_TRUE( + (Case(std::numeric_limits::quiet_NaN(), false))); + + // Make sure nothing funny happens near the int64_t minimum. + EXPECT_TRUE((Case(-9223372036854775808.0, true))); + EXPECT_TRUE((Case(-9223372036854775808.0 - 2048.0, false))); +} + +TEST(LosslessConvertTest, IntLongDouble) { + // We don't have any integer types large enough to lose precision when + // converting to long double. + EXPECT_TRUE( + (Case(std::numeric_limits::min(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::max(), true))); + + // Small integers are fine, aside from sign issues. + EXPECT_TRUE((Case(-1.0, true))); + EXPECT_TRUE((Case(0.0, true))); + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(-1.0, false))); + + // Non-integers lose precision when cast to an integer. + EXPECT_TRUE((Case(-0.5, false))); + EXPECT_TRUE((Case(0.5, false))); + EXPECT_TRUE((Case( + -std::numeric_limits::infinity(), false))); + EXPECT_TRUE((Case( + std::numeric_limits::infinity(), false))); + EXPECT_TRUE((Case( + std::numeric_limits::quiet_NaN(), false))); + + // Make sure nothing funny happens near the int64_t minimum. + EXPECT_TRUE((Case(-9223372036854775808.0, true))); + EXPECT_TRUE( + (Case(-9223372036854775808.0 - 2048.0, false))); +} + +TEST(LosslessConvertTest, DoubleFloat) { + // Detect precision loss. + EXPECT_TRUE((Case(10000001, true))); + EXPECT_TRUE((Case(100000001, false))); + + // Detect overflow + EXPECT_TRUE((Case(std::numeric_limits::max(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::lowest(), true))); + EXPECT_TRUE((Case(3.4e+39, false))); + EXPECT_TRUE((Case(-3.4e+39, false))); + + // Binary fractions convert cleanly. + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(1.25, true))); + EXPECT_TRUE((Case(1.5, true))); + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(1.25, true))); + EXPECT_TRUE((Case(1.5, true))); + + // Non-binary fractions don't narrow well, but widening is okay. + EXPECT_TRUE((Case(1.1, false))); + EXPECT_TRUE((Case(1.3, false))); + EXPECT_TRUE((Case(1.7, false))); + EXPECT_TRUE((Case(1.1, true))); + EXPECT_TRUE((Case(1.3, true))); + EXPECT_TRUE((Case(1.7, true))); + + // NaN and infinity convert cleanly + EXPECT_TRUE( + (Case(std::numeric_limits::quiet_NaN(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::infinity(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::quiet_NaN(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::infinity(), true))); +} + +TEST(LosslessConvertTest, LongDouble) { + // Detect precision loss. + EXPECT_TRUE((Case(10000001, true))); + EXPECT_TRUE((Case(100000001, false))); + EXPECT_TRUE((Case(1000000000000001, true))); + EXPECT_TRUE((Case(10000000000000001, false))); + + // Detect overflow + EXPECT_TRUE( + (Case(std::numeric_limits::max(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::lowest(), true))); + EXPECT_TRUE((Case(3.4e+39, false))); + EXPECT_TRUE((Case(-3.4e+39, false))); + EXPECT_TRUE( + (Case(std::numeric_limits::max(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::lowest(), true))); + + // Apparently `double` and `long double` are both 64 bits on k8 + if (std::numeric_limits::max() == + std::numeric_limits::max()) { + EXPECT_TRUE((Case( + std::numeric_limits::max(), true))); + EXPECT_TRUE((Case( + std::numeric_limits::lowest(), true))); + } else { + EXPECT_TRUE((Case( + std::numeric_limits::max(), false))); + EXPECT_TRUE((Case( + std::numeric_limits::lowest(), false))); + } + + // Binary fractions convert cleanly. + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(1.25, true))); + EXPECT_TRUE((Case(1.5, true))); + EXPECT_TRUE((Case(1.0, true))); + EXPECT_TRUE((Case(1.25, true))); + EXPECT_TRUE((Case(1.5, true))); + + // Non-binary fractions don't narrow well, but widening is okay. + EXPECT_TRUE((Case(1.1, false))); + EXPECT_TRUE((Case(1.3, false))); + EXPECT_TRUE((Case(1.7, false))); + EXPECT_TRUE((Case(1.1, true))); + EXPECT_TRUE((Case(1.3, true))); + EXPECT_TRUE((Case(1.7, true))); + + // NaN and infinity convert cleanly + EXPECT_TRUE(( + Case(std::numeric_limits::quiet_NaN(), true))); + EXPECT_TRUE( + (Case(std::numeric_limits::infinity(), true))); + EXPECT_TRUE((Case( + std::numeric_limits::quiet_NaN(), true))); + EXPECT_TRUE((Case( + std::numeric_limits::infinity(), true))); +} + +} // namespace diff --git a/zetasql/base/map_view.h b/zetasql/base/map_view.h new file mode 100644 index 000000000..d2940814d --- /dev/null +++ b/zetasql/base/map_view.h @@ -0,0 +1,162 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef THIRD_PARTY_ZETASQL_ZETASQL_BASE_MAP_VIEW_H_ +#define THIRD_PARTY_ZETASQL_ZETASQL_BASE_MAP_VIEW_H_ + +#include +#include +#include + +#include "absl/base/attributes.h" +#include "zetasql/base/associative_view_internal.h" + +namespace zetasql_base { + +// A type that is provided to `zetasql_base::MapView` as `ExtraLookupTypes` +// parameter in order to add lookup overloads. +// +// Example: +// +// using WithThreadIdLookup = +// zetasql_base::AlsoSupportsLookupWith; template using ThreadMapView = zetasql_base::MapView; +// +// std::map threads = ...; +// ThreadMapView thread_map_view = threads; +// +// std::thread::id id = ...; +// auto it = thread_map_view.find(id); // Didn't have to create a thread! +// +// By default, `zetasql_base::MapView` sets ExtraLookupTypes to +// `zetasql_base::AlsoSupportsLookupWith<>` except for `std::string` and +// `absl::Cord` for which it is +// `zetasql_base::AlsoSupportsLookupWith` allowing you to +// take advantage of sets that support heterogeneous lookup. +using ::zetasql_base:: + AlsoSupportsLookupWith; // NOLINT(misc-unused-using-decls) + +// MapView is a type-erased, read-only view for associative containers. This +// class supports a useful subset of operations that are found in most ordered +// and unordered maps, namely find(), begin(), and end(), and it expects +// underlying maps to support the same operations. +// +// MapView requires the underlying map to have unique keys. This is enforced by +// requiring the map to provide operator[] or at(). +// +// MapView does not take ownership of the underlying container. Callers must +// ensure that containers outlive any MapViews that point to them. +// +// The overhead of MapView should be substantially lower than a deep copy, which +// makes it useful for handling different map types when the alternatives (such +// as using a template or a specific type of map) are cumbersome or impossible. +// +// To write a function that can accept std::map, std::unordered_map, or +// absl::flat_hash_map as inputs, you can use MapView as a function argument: +// +// void MyFunction(MapView things_by_name); +// +// You can invoke MyFunction with any compatible associative container: +// +// absl::flat_hash_map things_by_name = ...; +// MyFunction(things_by_name); +// +// Or an initializer list: +// +// MyFunction({{"a", thing_a}, {"b", thing_b}}); +// +// Note that MapView does not work with maps that define value_type +// differently from `std::pair`. +template > +class MapView + : public internal_associative_view::MapViewBase { + using Base = typename MapView::MapViewBase; + struct Compare; + template + using ViewEnabler = typename Base::template ViewEnabler; + + public: + using size_type = typename Base::size_type; + using iterator = typename Base::iterator; + using key_type = typename Base::key_type; + using value_type = typename Base::value_type; + using reference = typename Base::reference; + using mapped_type = typename Base::mapped_type; + + // A default constructed MapView behaves as if it is wrapping an empty + // container. + constexpr MapView() : Base() {} + + // Constructs a MapView that wraps the given container. Behavior is + // undefined if the container is resized after the view is constructed. The + // resulting view must not outlive the container. + template > + constexpr MapView( // NOLINT(google-explicit-constructor) + const C& c ABSL_ATTRIBUTE_LIFETIME_BOUND) + : Base(c) { + static_assert(internal_associative_view::MapTypeHasUniqueKeys(), + "MapView requires maps with unique keys."); + } + + // Constructs a MapView that wraps an initializer_list. This constructor is + // O(1) in release builds and performs no allocations or copies, but find() + // may have poor runtime characteristics for large lists. + // + // Beware of dangling references: `init` binds to temporaries (and + // initializer_list is a view itself). + // + // void foo(zetasql_base::MapView view); + // + // // Okay, the temporary `init` outlives the view. + // foo({{1, 1}, {2, 2}}); + // + // // Not okay, the temporary `init` is destroyed on this very line. + // auto view = zetasql_base::MapView({{1, 1}, {2, 2}}); + constexpr MapView( // NOLINT(google-explicit-constructor) + const std::initializer_list& init + ABSL_ATTRIBUTE_LIFETIME_BOUND) + : Base(init, Compare{}) { + assert(internal_associative_view::HasUniqueKeys(init)); + } + + // Lookup methods are overloaded: they support `key_type` and all extra types + // provided via `ExtraLookupTypes`. + using Base::contains; + using Base::find; + + using Base::at; + using Base::begin; + using Base::empty; + using Base::end; + using Base::size; + + private: + struct Compare { + using is_transparent = void; + + template + bool operator()(const value_type& v, const T& k) const { + return v.first == k; + } + }; +}; + +} // namespace zetasql_base + +#endif // THIRD_PARTY_ZETASQL_ZETASQL_BASE_MAP_VIEW_H_ diff --git a/zetasql/base/map_view_test.cc b/zetasql/base/map_view_test.cc new file mode 100644 index 000000000..9ff6e21ed --- /dev/null +++ b/zetasql/base/map_view_test.cc @@ -0,0 +1,474 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/base/map_view.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "gmock/gmock.h" +#include "gtest/gtest.h" +#include "absl/algorithm/container.h" +#include "absl/base/attributes.h" +#include "absl/container/flat_hash_map.h" +#include "absl/random/random.h" +#include "absl/strings/cord.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" + +namespace zetasql_base { +namespace { + +using ::testing::Eq; +using ::testing::Pair; +using ::testing::UnorderedElementsAre; + +// A pointer based iterator that allows you to add a member. This is used to +// change the size of the iterator, or make it non-trivial. +template +class IteratorWithMember { + public: + IteratorWithMember(const V* it) // NOLINT(google-explicit-constructor) + : it_(it) {} + + const V& operator*() const { return *it_; } + + const V* operator->() const { return it_; } + + IteratorWithMember& operator++() { + ++it_; + return *this; + } + + friend bool operator==(const IteratorWithMember& a, + const IteratorWithMember& b) { + return a.it_ == b.it_; + } + + friend bool operator!=(const IteratorWithMember& a, + const IteratorWithMember& b) { + return a.it_ != b.it_; + } + + private: + const V* it_; + Member m_; +}; + +// sizeof(BigIterator) should be large enough to prevent MapView::iterator from +// using inline storage +template +using BigIterator = IteratorWithMember>; + +struct NonTrivialObject { + std::vector s = {"just", "some", "random", "strings"}; +}; + +// NonTrivialIterator requires MapView to correctly copy and destroy a +// non-trivial object. +template +using NonTrivialIterator = IteratorWithMember; + +// A simple, partial implementation of a map that works with the above +// iterators. +template *> +class SlowFlatMap { + public: + using key_type = K; + using mapped_type = V; + using value_type = std::pair; + using const_iterator = It; + using size_type = std::size_t; + + const_iterator begin() const { return container_.data(); } + const_iterator end() const { return container_.data() + container_.size(); } + + const_iterator find(const key_type& k) const { + auto it = begin(); + while (it != end() && it->first != k) ++it; + return it; + } + + template + const_iterator emplace(Args&&... args) { + return &container_.emplace_back(std::forward(args)...); + } + + size_type size() const { return container_.size(); } + + const mapped_type& at(const key_type& k) const { return find(k)->second; } + + private: + using Container = std::vector; + Container container_; +}; + +// Returns a deterministic key of type K. +template +constexpr K MakeKey(int i) { + return i; +} + +template <> +std::string MakeKey(int i) { + return absl::StrCat("Key: ", i); +} + +// Returns a deterministic value of type V. +template +constexpr V MakeValue(int i) { + return i + 20; +} + +template <> +std::string MakeValue(int i) { + return absl::StrCat("Value: ", i); +} + +// Returns a deterministic map of type M and the given size. +template +Map MakeMap(int size) { + Map result; + for (int i = 0; i < size; ++i) { + result.emplace(MakeKey(i), + MakeValue(i)); + } + return result; +} + +template +struct FilledContainerFactory { + C make() { return MakeMap(10); } +}; + +template +struct FilledContainerFactory>> { + std::initializer_list> make() { + static constexpr std::initializer_list> kInit = { + {MakeKey(0), MakeValue(0)}, {MakeKey(1), MakeValue(1)}, + {MakeKey(2), MakeValue(2)}, {MakeKey(3), MakeValue(3)}, + {MakeKey(4), MakeValue(4)}, {MakeKey(5), MakeValue(5)}, + {MakeKey(6), MakeValue(6)}, {MakeKey(7), MakeValue(7)}, + {MakeKey(8), MakeValue(8)}, {MakeKey(9), MakeValue(9)}, + }; + return kInit; + } +}; + +template +MapView MakeMapView( + const Map& m) { + return m; +} + +template +MapView MakeMapView( + const std::initializer_list>& m) { + return m; +} + +template +using ViewType = decltype(MakeMapView(std::declval())); + +template +class MapViewTest : public ::testing::Test {}; + +using MapTypes = ::testing::Types< + std::initializer_list>, // + std::map, // + absl::flat_hash_map, // + std::unordered_map, // + std::map, // + absl::flat_hash_map, // + std::unordered_map, // + SlowFlatMap, // + SlowFlatMap, // + SlowFlatMap>>, // + SlowFlatMap>> // + >; + +TEST(MapViewTest, InitializerList) { + [](const MapView& mv) { + EXPECT_EQ(2, mv.size()); + EXPECT_EQ(2, mv.find(1)->second); + EXPECT_EQ(3, mv.find(2)->second); + EXPECT_TRUE(mv.contains(2)); + EXPECT_THAT(mv, UnorderedElementsAre(Pair(1, 2), Pair(2, 3))); + }({{1, 2}, {2, 3}}); + + // Repeated keys should fail. + EXPECT_DEBUG_DEATH((MapView({{1, 1}, {1, 1}})), "HasUniqueKeys"); +} + +void Overloaded(const MapView&) {} +void Overloaded(const MapView&) {} +void Overloaded(const MapView&) {} + +TYPED_TEST_SUITE(MapViewTest, MapTypes); + +TYPED_TEST(MapViewTest, HandlesOverloadSets) { + TypeParam m = {}; + using MV = ViewType; + EXPECT_TRUE((std::is_convertible::value)); + EXPECT_FALSE( + (std::is_convertible>::value)); + Overloaded(m); // compiles :) +} + +TYPED_TEST(MapViewTest, DefaultConstructed) { + using MV = ViewType; + MV mv; + EXPECT_TRUE(mv.empty()); + EXPECT_EQ(0, mv.size()); + EXPECT_TRUE(mv.begin() == mv.end()); +} + +TYPED_TEST(MapViewTest, Empty) { + TypeParam m = {}; + auto mv = MakeMapView(m); + EXPECT_TRUE(mv.empty()); + EXPECT_EQ(0, mv.size()); + EXPECT_TRUE(mv.begin() == mv.end()); +} + +TYPED_TEST(MapViewTest, FindSomething) { + auto m = FilledContainerFactory().make(); + auto mv = MakeMapView(m); + EXPECT_EQ(m.begin()->second, mv.find(m.begin()->first)->second); + EXPECT_TRUE(mv.contains(m.begin()->first)); +} + +TYPED_TEST(MapViewTest, FindNothing) { + auto m = FilledContainerFactory().make(); + auto mv = MakeMapView(m); + auto key = MakeKey::key_type>(-1); + EXPECT_TRUE(mv.find(key) == mv.end()); + EXPECT_FALSE(mv.contains(key)); +} + +TYPED_TEST(MapViewTest, IteratorPreIncrement) { + auto m = FilledContainerFactory().make(); + auto mv = MakeMapView(m); + for (auto it = mv.begin(); it != mv.end();) { + auto x = ++it; + EXPECT_EQ(x, it); + } +} + +TYPED_TEST(MapViewTest, IteratorPostIncrement) { + auto m = FilledContainerFactory().make(); + auto mv = MakeMapView(m); + for (auto it = mv.begin(); it != mv.end();) { + auto x = it++; + EXPECT_NE(x, it); + ++x; + EXPECT_EQ(x, it); + } +} + +TYPED_TEST(MapViewTest, IteratorCopy) { + auto m = FilledContainerFactory().make(); + auto mv = MakeMapView(m); + for (auto it = mv.begin(); it != mv.end();) { + auto it_copy = it; + EXPECT_TRUE(it_copy == it); + EXPECT_TRUE(&*it_copy == &*it); + it = it_copy; + EXPECT_TRUE(it_copy == it); + EXPECT_TRUE(&*it_copy == &*it); + ++it; + ++it_copy; + EXPECT_TRUE(it_copy == it); + if (it != mv.end()) { + EXPECT_TRUE(&*it_copy == &*it); + } + } +} + +TYPED_TEST(MapViewTest, At) { + auto m = FilledContainerFactory().make(); + auto mv = MakeMapView(m); + for (const auto& p : m) { + EXPECT_EQ(&p.second, &mv.at(p.first)); + } +} + +// Extra "BaseTest" layer is needed to support std::initializer_list (it cannot +// be stored as a member variable). +template +class MapViewStringLookupBaseTest : public ::testing::Test { + using View = MapView; + + protected: + template + void ExpectFind(View view, const LookupKey& k) { + EXPECT_NE(view.find(k), view.end()); + } + template + void ExpectContains(View view, const LookupKey& k) { + EXPECT_TRUE(view.contains(k)); + } + template + void ExpectAt(View view, const LookupKey& k, ValueMatcher&& matcher) { + ASSERT_TRUE(view.contains(k)); + EXPECT_THAT(view.at(k), std::forward(matcher)); + } +}; + +template +class MapViewStringLookupTest : public MapViewStringLookupBaseTest { + public: + MapViewStringLookupTest() { + map_.emplace("a", 1); + map_.emplace("b", 2); + map_.emplace("c", 3); + } + + template + void TestFind(const LookupKey& k) { + this->ExpectFind(map_, k); + } + template + void TestContains(const LookupKey& k) { + this->ExpectContains(map_, k); + } + template + void TestAt(const LookupKey& k, ValueMatcher&& matcher) { + this->ExpectAt(map_, k, std::forward(matcher)); + } + + protected: + Map map_; +}; + +template +class MapViewStringLookupTest>> + : public MapViewStringLookupBaseTest< + std::initializer_list>, K, V> { + public: + template + void TestFind(const LookupKey& k) { + this->ExpectFind({{K("a"), 1}, {K("b"), 2}, {K("c"), 3}}, k); + } + template + void TestContains(const LookupKey& k) { + this->ExpectContains({{K("a"), 1}, {K("b"), 2}, {K("c"), 3}}, k); + } + template + void TestAt(const LookupKey& k, ValueMatcher&& matcher) { + this->ExpectAt({{K("a"), 1}, {K("b"), 2}, {K("c"), 3}}, k, + std::forward(matcher)); + } +}; + +using StringMaps = testing::Types< + // Supports heterogeneous lookup. + absl::flat_hash_map, // + absl::flat_hash_map, // + absl::flat_hash_map, // + // Does not support heterogeneous lookup. + std::map, // + std::map, // + std::map, // + // initializer lists are special + std::initializer_list>, // + std::initializer_list>, // + std::initializer_list> // + >; + +TYPED_TEST_SUITE(MapViewStringLookupTest, StringMaps); + +TYPED_TEST(MapViewStringLookupTest, LiteralString) { + this->TestFind("b"); + this->TestContains("b"); + this->TestAt("b", Eq(2)); +} + +TYPED_TEST(MapViewStringLookupTest, CString) { + char buff[256] = "b"; + const char* raw = buff; + this->TestFind(raw); + this->TestContains(raw); + this->TestAt(raw, Eq(2)); + buff[0] = 'w'; +} + +TYPED_TEST(MapViewStringLookupTest, String) { + this->TestFind(std::string("b")); + this->TestContains(std::string("b")); + this->TestAt(std::string("b"), Eq(2)); +} + +TYPED_TEST(MapViewStringLookupTest, StringView) { + this->TestFind(absl::string_view("b")); + this->TestContains(absl::string_view("b")); + this->TestAt(absl::string_view("b"), Eq(2)); +} + +TEST(MapView, InitListWithWrongType) { + EXPECT_TRUE( + (std::is_constructible_v< + zetasql_base::MapView, + std::initializer_list>>)); + + // Missing const on key type. + EXPECT_FALSE((std::is_constructible_v< + zetasql_base::MapView, + std::initializer_list>>)); + + // Wrong key type. + EXPECT_FALSE( + (std::is_constructible_v< + zetasql_base::MapView, + std::initializer_list>>)); + + // Wrong mapped type. + EXPECT_FALSE((std::is_constructible_v< + zetasql_base::MapView, + std::initializer_list>>)); +} + +template +ABSL_ATTRIBUTE_NOINLINE bool Contains(const M& m, const K& k) { + return m.find(k) != m.end(); +} + +// Makes keys to use for lookup benchmarks of the given size. The result will be +// larger than the specified size in order to trigger a few misses. +template +std::vector MakeKeysForBenchmark(int64_t size) { + int64_t num_misses = std::max(1, size * 0.1); + std::vector keys; + keys.reserve(size + num_misses); + for (int64_t i = 0; i < size + num_misses; ++i) { + keys.push_back(MakeKey(i)); + } + absl::BitGen rng; + std::shuffle(keys.begin(), keys.end(), rng); + return keys; +} + +} // namespace +} // namespace zetasql_base diff --git a/zetasql/base/mathlimits_test.cc b/zetasql/base/mathlimits_test.cc index 8582c790e..a5bd0709c 100644 --- a/zetasql/base/mathlimits_test.cc +++ b/zetasql/base/mathlimits_test.cc @@ -22,7 +22,6 @@ #include #include "gtest/gtest.h" -#include #include "zetasql/base/logging.h" #include "zetasql/base/mathutil.h" diff --git a/zetasql/base/requires.h b/zetasql/base/requires.h new file mode 100644 index 000000000..d40ba3b20 --- /dev/null +++ b/zetasql/base/requires.h @@ -0,0 +1,63 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef THIRD_PARTY_ZETASQL_ZETASQL_BASE_REQUIRES_H_ +#define THIRD_PARTY_ZETASQL_ZETASQL_BASE_REQUIRES_H_ + +#include + +namespace zetasql_base { + +// C++17 port of the C++20 `requires` expressions. +// It allows easy inline test of properties of types in template code. +// https://en.cppreference.com/w/cpp/language/constraints#Requires_expressions +// +// Example usage: +// +// if constexpr (Requires([](auto&& x) -> decltype(x.foo()) {})) { +// // T has foo() +// return t.foo(); +// } else if constexpr (Requires([](auto&& x) -> decltype(Bar(x)) {})) { +// // Can call Bar with T +// return Bar(t); +// } else if constexpr (Requires( +// // Can test expression with multiple inputs +// [](auto&& x, auto&& y) -> decltype(x + y) {})) { +// return t + t2; +// } +// +// The `Requires` function takes a list of types and a generic lambda where all +// arguments are of type `auto&&`. The lambda is never actually invoked and the +// body must be empty. +// When used this way, `Requires` returns whether the expression inside +// `decltype` is well-formed, when the lambda parameters have the types that +// are specified by the corresponding template arguments. +// +// NOTE: C++17 does not allow lambdas in template parameters, which means that +// code like the following is _not_ valid in C++17: +// +// template ( +// [] (auto&& v) -> decltype() {})>> +// +template +constexpr bool Requires(F) { + return std::is_invocable_v; +} + +} // namespace zetasql_base + +#endif // THIRD_PARTY_ZETASQL_ZETASQL_BASE_REQUIRES_H_ diff --git a/zetasql/base/requires_test.cc b/zetasql/base/requires_test.cc new file mode 100644 index 000000000..1fceec2a8 --- /dev/null +++ b/zetasql/base/requires_test.cc @@ -0,0 +1,67 @@ +// +// Copyright 2024 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/base/requires.h" + +#include +#include + +#include "gtest/gtest.h" + +namespace { + +TEST(RequiresTest, SimpleLambdasWork) { + static_assert(zetasql_base::Requires([] {})); + static_assert(zetasql_base::Requires([](auto&&) {})); + static_assert(zetasql_base::Requires([](auto&&, auto&&) {})); +} + +template +inline constexpr bool has_cstr = + zetasql_base::Requires([](auto&& x) -> decltype(x.c_str()) {}); + +template +inline constexpr bool have_plus = + zetasql_base::Requires([](auto&& x, auto&& y) -> decltype(x + y) {}); + +TEST(RequiresTest, CanTestProperties) { + static_assert(has_cstr); + static_assert(!has_cstr>); + + static_assert(have_plus); + static_assert(have_plus); + static_assert(!have_plus); +} + +TEST(RequiresTest, WorksWithUnmovableTypes) { + struct S { + S(const S&) = delete; + int foo() { return 0; } + }; + static_assert( + zetasql_base::Requires([](auto&& x) -> decltype(x.foo()) {})); + static_assert( + !zetasql_base::Requires([](auto&& x) -> decltype(x.bar()) {})); +} + +TEST(RequiresTest, WorksWithArrays) { + static_assert( + zetasql_base::Requires([](auto&& x) -> decltype(x[1]) {})); + static_assert( + !zetasql_base::Requires([](auto&& x) -> decltype(-x) {})); +} + +} // namespace diff --git a/zetasql/base/string_numbers.cc b/zetasql/base/string_numbers.cc index 79804228a..fee1fc2a2 100644 --- a/zetasql/base/string_numbers.cc +++ b/zetasql/base/string_numbers.cc @@ -24,9 +24,12 @@ #include // for DBL_DIG and FLT_DIG #include #include +#include +#include #include +#include -#include +#include "zetasql/base/check.h" #include "absl/strings/ascii.h" #include "absl/strings/numbers.h" #include "absl/strings/string_view.h" diff --git a/zetasql/base/string_numbers.h b/zetasql/base/string_numbers.h index c61b477af..4fd71d396 100644 --- a/zetasql/base/string_numbers.h +++ b/zetasql/base/string_numbers.h @@ -21,8 +21,9 @@ // numeric values. #include - #include +#include + #include "absl/strings/ascii.h" #include "absl/strings/string_view.h" diff --git a/zetasql/base/string_numbers_test.cc b/zetasql/base/string_numbers_test.cc index a4c958231..63a0464d7 100644 --- a/zetasql/base/string_numbers_test.cc +++ b/zetasql/base/string_numbers_test.cc @@ -29,6 +29,7 @@ #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" namespace { diff --git a/zetasql/base/time_proto_util.cc b/zetasql/base/time_proto_util.cc index e3d4dc64c..550aba7e3 100644 --- a/zetasql/base/time_proto_util.cc +++ b/zetasql/base/time_proto_util.cc @@ -16,7 +16,6 @@ #include "zetasql/base/time_proto_util.h" #include "google/protobuf/timestamp.pb.h" -#include #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" diff --git a/zetasql/base/time_proto_util_test.cc b/zetasql/base/time_proto_util_test.cc index 747bbf457..8d12a67e5 100644 --- a/zetasql/base/time_proto_util_test.cc +++ b/zetasql/base/time_proto_util_test.cc @@ -21,7 +21,6 @@ #include "google/protobuf/util/message_differencer.h" #include "gmock/gmock.h" #include "gtest/gtest.h" -#include #include "absl/time/time.h" #include "zetasql/base/testing/status_matchers.h" diff --git a/zetasql/common/BUILD b/zetasql/common/BUILD index a21b7b91f..e605ec434 100644 --- a/zetasql/common/BUILD +++ b/zetasql/common/BUILD @@ -117,23 +117,6 @@ cc_test( ], ) -cc_library( - name = "int_ops_util", - hdrs = ["int_ops_util.h"], - deps = [ - ], -) - -cc_test( - name = "int_ops_util_test", - size = "small", - srcs = ["int_ops_util_test.cc"], - deps = [ - ":int_ops_util", - "//zetasql/base/testing:zetasql_gtest_main", - ], -) - cc_library( name = "string_util", hdrs = ["string_util.h"], @@ -400,10 +383,14 @@ cc_test( cc_library( name = "builtin_function_internal", srcs = [ + "builtin_enum_type.cc", "builtin_function_array.cc", + "builtin_function_differential_privacy.cc", + "builtin_function_distance.cc", "builtin_function_internal_1.cc", "builtin_function_internal_2.cc", "builtin_function_internal_3.cc", + "builtin_function_map.cc", "builtin_function_range.cc", ], hdrs = ["builtin_function_internal.h"], @@ -416,6 +403,7 @@ cc_library( "//zetasql/base:status", "//zetasql/proto:anon_output_with_report_cc_proto", "//zetasql/proto:options_cc_proto", + "//zetasql/public:analyzer_options", "//zetasql/public:anon_function", "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:builtin_function_options", diff --git a/zetasql/common/aggregate_null_handling_test.cc b/zetasql/common/aggregate_null_handling_test.cc index bca6abb5b..7c4b00e35 100644 --- a/zetasql/common/aggregate_null_handling_test.cc +++ b/zetasql/common/aggregate_null_handling_test.cc @@ -60,6 +60,7 @@ class AggregateNullHandlingTest : public ::testing::Test { ->GetAs(); return agg_scan->aggregate_list() .front() + ->GetAs() ->expr() ->GetAs(); } diff --git a/zetasql/common/builtin_enum_type.cc b/zetasql/common/builtin_enum_type.cc new file mode 100644 index 000000000..3b8c3bbc5 --- /dev/null +++ b/zetasql/common/builtin_enum_type.cc @@ -0,0 +1,37 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/common/builtin_function_internal.h" +#include "zetasql/public/builtin_function_options.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "absl/status/status.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +absl::Status GetStandaloneBuiltinEnumTypes( + TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, + NameToTypeMap* types) { + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_MULTIWAY_UNNEST)) { + const Type* array_zip_mode_type = types::ArrayZipModeEnumType(); + ZETASQL_RETURN_IF_ERROR(InsertType(types, options, array_zip_mode_type)); + } + return absl::OkStatus(); +} + +} // namespace zetasql diff --git a/zetasql/common/builtin_function_array.cc b/zetasql/common/builtin_function_array.cc index db3232e84..e8a3f1950 100644 --- a/zetasql/common/builtin_function_array.cc +++ b/zetasql/common/builtin_function_array.cc @@ -1131,7 +1131,7 @@ static absl::StatusOr ArrayZipSignatureHasLambda( static absl::StatusOr ComputeArrayZipOutputType( Catalog* catalog, TypeFactory* type_factory, CycleDetector* cycle_detector, const FunctionSignature& signature, - const std::vector& arguments, + absl::Span arguments, const AnalyzerOptions& analyzer_options) { ZETASQL_ASSIGN_OR_RETURN(bool has_lambda, ArrayZipSignatureHasLambda(signature)); if (has_lambda) { @@ -1526,7 +1526,9 @@ static void AddArrayZipModeLambdaSignatures( END) )sql"; signatures.push_back(FunctionSignatureOnHeap( - ARG_ARRAY_TYPE_ANY_3, + FunctionArgumentType( + ARG_ARRAY_TYPE_ANY_3, + FunctionArgumentTypeOptions().set_uses_array_element_for_collation()), {input_array_1, input_array_2, two_array_transformation, array_zip_mode_arg}, FN_ARRAY_ZIP_TWO_ARRAY_LAMBDA, @@ -1595,7 +1597,9 @@ static void AddArrayZipModeLambdaSignatures( END) )sql"; signatures.push_back(FunctionSignatureOnHeap( - ARG_ARRAY_TYPE_ANY_4, + FunctionArgumentType( + ARG_ARRAY_TYPE_ANY_4, + FunctionArgumentTypeOptions().set_uses_array_element_for_collation()), {input_array_1, input_array_2, input_array_3, three_array_transformation, array_zip_mode_arg}, FN_ARRAY_ZIP_THREE_ARRAY_LAMBDA, @@ -1670,7 +1674,9 @@ static void AddArrayZipModeLambdaSignatures( END) )sql"; signatures.push_back(FunctionSignatureOnHeap( - ARG_ARRAY_TYPE_ANY_5, + FunctionArgumentType( + ARG_ARRAY_TYPE_ANY_5, + FunctionArgumentTypeOptions().set_uses_array_element_for_collation()), {input_array_1, input_array_2, input_array_3, input_array_4, four_array_transformation, array_zip_mode_arg}, FN_ARRAY_ZIP_FOUR_ARRAY_LAMBDA, diff --git a/zetasql/common/builtin_function_differential_privacy.cc b/zetasql/common/builtin_function_differential_privacy.cc new file mode 100644 index 000000000..715d649eb --- /dev/null +++ b/zetasql/common/builtin_function_differential_privacy.cc @@ -0,0 +1,1070 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include +#include +#include + +#include "zetasql/common/builtin_function_internal.h" +#include "zetasql/proto/anon_output_with_report.pb.h" +#include "zetasql/public/anon_function.h" +#include "zetasql/public/builtin_function_options.h" +#include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" +#include "zetasql/public/functions/differential_privacy.pb.h" +#include "zetasql/public/types/array_type.h" +#include "zetasql/public/types/struct_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "absl/functional/bind_front.h" +#include "zetasql/base/check.h" +#include "absl/status/status.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/str_join.h" +#include "absl/strings/string_view.h" +#include "absl/types/span.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +static std::string DPCountStarSQL(const std::vector& inputs) { + if (inputs.empty()) { + return "COUNT(*)"; + } + return absl::StrCat("COUNT(*, ", absl::StrJoin(inputs, ", "), ")"); +} + +static std::string SupportedSignaturesForDPCountStar( + const LanguageOptions& language_options, const Function& function) { + if (!language_options.LanguageFeatureEnabled(FEATURE_DIFFERENTIAL_PRIVACY)) { + return ""; + } + if (!language_options.LanguageFeatureEnabled( + FEATURE_DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS)) { + return "COUNT(* [, contribution_bounds_per_group => STRUCT])"; + } + return "COUNT(* [, contribution_bounds_per_group => STRUCT] " + "[, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT])"; +} + +void GetAnonFunctions(TypeFactory* type_factory, + const ZetaSQLBuiltinFunctionOptions& options, + NameToFunctionMap* functions) { + const Type* int64_type = type_factory->get_int64(); + const Type* uint64_type = type_factory->get_uint64(); + const Type* double_type = type_factory->get_double(); + const Type* numeric_type = type_factory->get_numeric(); + const Type* double_array_type = types::DoubleArrayType(); + const Type* json_type = types::JsonType(); + const Type* anon_output_with_report_proto_type = nullptr; + ZETASQL_CHECK_OK( + type_factory->MakeProtoType(zetasql::AnonOutputWithReport::descriptor(), + &anon_output_with_report_proto_type)); + const FunctionArgumentType::ArgumentCardinality OPTIONAL = + FunctionArgumentType::OPTIONAL; + + FunctionSignatureOptions has_numeric_type_argument; + has_numeric_type_argument.set_constraints(&CheckHasNumericTypeArgument); + + FunctionOptions anon_options = + FunctionOptions() + .set_supports_over_clause(false) + .set_supports_distinct_modifier(false) + .set_supports_having_modifier(false) + .set_supports_clamped_between_modifier(true) + .set_volatility(FunctionEnums::VOLATILE); + + const FunctionArgumentTypeOptions optional_const_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_must_be_non_null() + .set_cardinality(OPTIONAL); + // TODO: Replace these required CLAMPED BETWEEN arguments with + // optional_const_arg_options once Quantiles can support automatic/implicit + // bounds. + const FunctionArgumentTypeOptions required_const_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_must_be_non_null(); + const FunctionArgumentTypeOptions percentile_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_must_be_non_null() + .set_min_value(0) + .set_max_value(1); + const FunctionArgumentTypeOptions quantiles_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_must_be_non_null() + .set_min_value(1); + + // TODO: Fix this HACK - the CLAMPED BETWEEN lower and upper bounds + // are optional, as are the privacy_budget weight and uid. However, + // the syntax and spec allows privacy_budget_weight (and uid) to be specified + // but upper/lower bound to be unspecified, but that is not possible to + // represent in a ZetaSQL FunctionSignature. In the short term, the + // resolver will guarantee that the privacy_budget_weight and uid are not + // specified if the CLAMP are not, but longer term we must remove the + // privacy_budget_weight and uid arguments as per the updated ZetaSQL + // privacy language spec. + InsertCreatedFunction( + functions, options, + new AnonFunction( + "anon_count", Function::kZetaSQLFunctionGroupName, + {{int64_type, + {/*expr=*/ARG_TYPE_ANY_2, + /*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_COUNT}}, + anon_options, "count")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "anon_sum", Function::kZetaSQLFunctionGroupName, + {{int64_type, + {/*expr=*/int64_type, + /*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_SUM_INT64}, + {uint64_type, + {/*expr=*/uint64_type, + /*lower_bound=*/{uint64_type, optional_const_arg_options}, + /*upper_bound=*/{uint64_type, optional_const_arg_options}}, + FN_ANON_SUM_UINT64}, + {double_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_SUM_DOUBLE}, + {numeric_type, + {/*expr=*/numeric_type, + /*lower_bound=*/{numeric_type, optional_const_arg_options}, + /*upper_bound=*/{numeric_type, optional_const_arg_options}}, + FN_ANON_SUM_NUMERIC, + has_numeric_type_argument}}, + anon_options, "sum")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "anon_avg", Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_AVG_DOUBLE}, + {numeric_type, + {/*expr=*/numeric_type, + /*lower_bound=*/{numeric_type, optional_const_arg_options}, + /*upper_bound=*/{numeric_type, optional_const_arg_options}}, + FN_ANON_AVG_NUMERIC, + has_numeric_type_argument}}, + anon_options, "avg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_count_star", Function::kZetaSQLFunctionGroupName, + {{int64_type, + {/*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_COUNT_STAR}}, + anon_options.Copy() + .set_sql_name("anon_count(*)") + .set_get_sql_callback(&AnonCountStarFunctionSQL) + .set_signature_text_callback( + &SignatureTextForAnonCountStarFunction) + .set_supported_signatures_callback( + &SupportedSignaturesForAnonCountStarFunction) + .set_bad_argument_error_prefix_callback( + &AnonCountStarBadArgumentErrorPrefix), + // TODO: internal function names shouldn't be resolvable, + // an alternative way to look up COUNT(*) will be needed to fix the + // linked bug. + "$count_star")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "anon_var_pop", Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_VAR_POP_DOUBLE}, + {double_type, + {/*expr=*/double_array_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_VAR_POP_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}}, + anon_options, "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "anon_stddev_pop", Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_STDDEV_POP_DOUBLE}, + {double_type, + {/*expr=*/double_array_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_STDDEV_POP_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}}, + anon_options, "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "anon_percentile_cont", Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*percentile=*/{double_type, percentile_arg_options}, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_PERCENTILE_CONT_DOUBLE}, + // This is an internal signature that is only used post-anon-rewrite, + // and is not available in the external SQL language. + {double_type, + {/*expr=*/double_array_type, + /*percentile=*/{double_type, percentile_arg_options}, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_PERCENTILE_CONT_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}}, + anon_options, "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "anon_quantiles", Function::kZetaSQLFunctionGroupName, + {{double_array_type, + {/*expr=*/double_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*lower_bound=*/{double_type, required_const_arg_options}, + /*upper_bound=*/{double_type, required_const_arg_options}}, + FN_ANON_QUANTILES_DOUBLE}, + // This is an internal signature that is only used post-anon-rewrite, + // and is not available in the external SQL language. + {double_array_type, + {/*expr=*/double_array_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*lower_bound=*/{double_type, required_const_arg_options}, + /*upper_bound=*/{double_type, required_const_arg_options}}, + FN_ANON_QUANTILES_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}}, + anon_options, "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_quantiles_with_report_json", + Function::kZetaSQLFunctionGroupName, + {{json_type, + {/*expr=*/double_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*lower_bound=*/{double_type, required_const_arg_options}, + /*upper_bound=*/{double_type, required_const_arg_options}}, + FN_ANON_QUANTILES_DOUBLE_WITH_REPORT_JSON}, + // This is an internal signature that is only used post-anon-rewrite, + // and is not available in the external SQL language. + {json_type, + {/*expr=*/double_array_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*lower_bound=*/{double_type, required_const_arg_options}, + /*upper_bound=*/{double_type, required_const_arg_options}}, + FN_ANON_QUANTILES_DOUBLE_ARRAY_WITH_REPORT_JSON, + FunctionSignatureOptions().set_is_internal(true)}}, + anon_options.Copy() + .set_sql_name("anon_quantiles") + .set_get_sql_callback(&AnonQuantilesWithReportJsonFunctionSQL) + .set_signature_text_callback(absl::bind_front( + &SignatureTextForAnonQuantilesWithReportFunction, + /*report_format=*/"JSON")) + .set_supported_signatures_callback(absl::bind_front( + &SupportedSignaturesForAnonQuantilesWithReportFunction, + /*report_format=*/"JSON")), + "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_quantiles_with_report_proto", + Function::kZetaSQLFunctionGroupName, + {{anon_output_with_report_proto_type, + {/*expr=*/double_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*lower_bound=*/{double_type, required_const_arg_options}, + /*upper_bound=*/{double_type, required_const_arg_options}}, + FN_ANON_QUANTILES_DOUBLE_WITH_REPORT_PROTO}, + // This is an internal signature that is only used post-anon-rewrite, + // and is not available in the external SQL language. + {anon_output_with_report_proto_type, + {/*expr=*/double_array_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*lower_bound=*/{double_type, required_const_arg_options}, + /*upper_bound=*/{double_type, required_const_arg_options}}, + FN_ANON_QUANTILES_DOUBLE_ARRAY_WITH_REPORT_PROTO, + FunctionSignatureOptions().set_is_internal(true)}}, + anon_options.Copy() + .set_sql_name("anon_quantiles") + .set_get_sql_callback(&AnonQuantilesWithReportProtoFunctionSQL) + .set_signature_text_callback(absl::bind_front( + &SignatureTextForAnonQuantilesWithReportFunction, + /*report_format=*/"PROTO")) + .set_supported_signatures_callback(absl::bind_front( + &SupportedSignaturesForAnonQuantilesWithReportFunction, + /*report_format=*/"PROTO")), + "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_count_with_report_json", Function::kZetaSQLFunctionGroupName, + {{json_type, + {/*expr=*/ARG_TYPE_ANY_2, + /*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_COUNT_WITH_REPORT_JSON}}, + anon_options.Copy() + .set_sql_name("anon_count") + .set_get_sql_callback(&AnonCountWithReportJsonFunctionSQL), + "count")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_count_with_report_proto", + Function::kZetaSQLFunctionGroupName, + {{anon_output_with_report_proto_type, + {/*expr=*/ARG_TYPE_ANY_2, + /*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_COUNT_WITH_REPORT_PROTO}}, + anon_options.Copy() + .set_sql_name("anon_count") + .set_get_sql_callback(&AnonCountWithReportProtoFunctionSQL), + "count")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_count_star_with_report_json", + Function::kZetaSQLFunctionGroupName, + {{json_type, + {/*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_COUNT_STAR_WITH_REPORT_JSON}}, + anon_options.Copy() + .set_sql_name("anon_count(*)") + .set_get_sql_callback(&AnonCountStarWithReportJsonFunctionSQL) + .set_signature_text_callback(absl::bind_front( + &SignatureTextForAnonCountStarWithReportFunction, + /*report_format=*/"JSON")) + .set_supported_signatures_callback(absl::bind_front( + &SupportedSignaturesForAnonCountStarWithReportFunction, + /*report_format=*/"JSON")) + .set_bad_argument_error_prefix_callback( + &AnonCountStarBadArgumentErrorPrefix), + "$count_star")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_count_star_with_report_proto", + Function::kZetaSQLFunctionGroupName, + {{anon_output_with_report_proto_type, + {/*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_COUNT_STAR_WITH_REPORT_PROTO}}, + anon_options.Copy() + .set_sql_name("anon_count(*)") + .set_get_sql_callback(&AnonCountStarWithReportProtoFunctionSQL) + .set_signature_text_callback(absl::bind_front( + &SignatureTextForAnonCountStarWithReportFunction, + /*report_format=*/"PROTO")) + .set_supported_signatures_callback(absl::bind_front( + &SupportedSignaturesForAnonCountStarWithReportFunction, + /*report_format=*/"PROTO")) + .set_bad_argument_error_prefix_callback( + &AnonCountStarBadArgumentErrorPrefix), + "$count_star")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_sum_with_report_json", Function::kZetaSQLFunctionGroupName, + {{json_type, + {/*expr=*/int64_type, + /*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_SUM_WITH_REPORT_JSON_INT64}, + {json_type, + {/*expr=*/uint64_type, + /*lower_bound=*/{uint64_type, optional_const_arg_options}, + /*upper_bound=*/{uint64_type, optional_const_arg_options}}, + FN_ANON_SUM_WITH_REPORT_JSON_UINT64}, + {json_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_SUM_WITH_REPORT_JSON_DOUBLE}}, + anon_options.Copy() + .set_sql_name("anon_sum") + .set_get_sql_callback(&AnonSumWithReportJsonFunctionSQL), + "sum")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_sum_with_report_proto", Function::kZetaSQLFunctionGroupName, + {{anon_output_with_report_proto_type, + {/*expr=*/int64_type, + /*lower_bound=*/{int64_type, optional_const_arg_options}, + /*upper_bound=*/{int64_type, optional_const_arg_options}}, + FN_ANON_SUM_WITH_REPORT_PROTO_INT64}, + {anon_output_with_report_proto_type, + {/*expr=*/uint64_type, + /*lower_bound=*/{uint64_type, optional_const_arg_options}, + /*upper_bound=*/{uint64_type, optional_const_arg_options}}, + FN_ANON_SUM_WITH_REPORT_PROTO_UINT64}, + {anon_output_with_report_proto_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_SUM_WITH_REPORT_PROTO_DOUBLE}}, + anon_options.Copy() + .set_sql_name("anon_sum") + .set_get_sql_callback(&AnonSumWithReportProtoFunctionSQL), + "sum")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_avg_with_report_json", Function::kZetaSQLFunctionGroupName, + {{json_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_AVG_DOUBLE_WITH_REPORT_JSON}}, + anon_options.Copy() + .set_sql_name("anon_avg") + .set_get_sql_callback(&AnonAvgWithReportJsonFunctionSQL), + "avg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$anon_avg_with_report_proto", Function::kZetaSQLFunctionGroupName, + {{anon_output_with_report_proto_type, + {/*expr=*/double_type, + /*lower_bound=*/{double_type, optional_const_arg_options}, + /*upper_bound=*/{double_type, optional_const_arg_options}}, + FN_ANON_AVG_DOUBLE_WITH_REPORT_PROTO}}, + anon_options.Copy() + .set_sql_name("anon_avg") + .set_get_sql_callback(&AnonAvgWithReportProtoFunctionSQL), + "avg")); +} + +absl::Status GetDifferentialPrivacyFunctions( + TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, + NameToFunctionMap* functions, NameToTypeMap* types) { + const Type* int64_type = type_factory->get_int64(); + const Type* uint64_type = type_factory->get_uint64(); + const Type* double_type = type_factory->get_double(); + const Type* numeric_type = type_factory->get_numeric(); + const Type* double_array_type = types::DoubleArrayType(); + const Type* json_type = types::JsonType(); + const Type* report_proto_type = nullptr; + ZETASQL_RETURN_IF_ERROR(type_factory->MakeProtoType( + functions::DifferentialPrivacyOutputWithReport::descriptor(), + &report_proto_type)); + const Type* report_format_type = + types::DifferentialPrivacyReportFormatEnumType(); + + ZETASQL_RETURN_IF_ERROR(InsertType(types, options, report_format_type)); + // Creates a pair of same types for contribution bounds. First field is lower + // bound and second is upper bound. Struct field name is omitted intentionally + // because the user syntax for struct constructor does not allow "AS + // field_name" and the struct is not usable in actual query. + auto make_pair_type = + [&type_factory](const Type* t) -> absl::StatusOr { + const std::vector pair_fields{{"", t}, {"", t}}; + const Type* pair_type = nullptr; + ZETASQL_RETURN_IF_ERROR(type_factory->MakeStructType(pair_fields, &pair_type)); + return pair_type; + }; + + ZETASQL_ASSIGN_OR_RETURN(const Type* int64_pair_type, make_pair_type(int64_type)); + ZETASQL_ASSIGN_OR_RETURN(const Type* uint64_pair_type, make_pair_type(uint64_type)); + ZETASQL_ASSIGN_OR_RETURN(const Type* double_pair_type, make_pair_type(double_type)); + ZETASQL_ASSIGN_OR_RETURN(const Type* numeric_pair_type, make_pair_type(numeric_type)); + + FunctionSignatureOptions has_numeric_type_argument; + has_numeric_type_argument.set_constraints(&CheckHasNumericTypeArgument); + + auto no_matching_signature_callback = + [](absl::string_view qualified_function_name, + absl::Span arguments, + ProductMode product_mode) { + return absl::StrCat( + "No matching signature for ", qualified_function_name, + " in SELECT WITH DIFFERENTIAL_PRIVACY context", + (arguments.empty() + ? " with no arguments" + : absl::StrCat(" for argument types: ", + InputArgumentType::ArgumentsToString( + arguments, product_mode)))); + }; + + const FunctionOptions dp_options = + FunctionOptions() + .set_supports_over_clause(false) + .set_supports_distinct_modifier(false) + .set_supports_having_modifier(false) + .set_volatility(FunctionEnums::VOLATILE) + .set_no_matching_signature_callback(no_matching_signature_callback) + .add_required_language_feature(FEATURE_DIFFERENTIAL_PRIVACY); + + const FunctionArgumentTypeOptions percentile_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_must_be_non_null() + .set_min_value(0) + .set_max_value(1); + + const FunctionArgumentTypeOptions quantiles_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_must_be_non_null() + .set_min_value(1) + .set_cardinality(FunctionEnums::REQUIRED); + + const FunctionArgumentTypeOptions + optional_contribution_bounds_per_group_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_argument_name("contribution_bounds_per_group", + FunctionEnums::NAMED_ONLY) + .set_cardinality(FunctionEnums::OPTIONAL); + + const FunctionArgumentTypeOptions + optional_contribution_bounds_per_row_arg_options = + FunctionArgumentTypeOptions() + .set_must_be_constant() + .set_argument_name("contribution_bounds_per_row", + FunctionEnums::NAMED_ONLY) + .set_cardinality(FunctionEnums::OPTIONAL); + + const FunctionArgumentTypeOptions + required_contribution_bounds_per_row_arg_options = + FunctionArgumentTypeOptions( + optional_contribution_bounds_per_row_arg_options) + .set_cardinality(FunctionEnums::REQUIRED); + + const FunctionArgumentTypeOptions report_arg_options = + FunctionArgumentTypeOptions().set_must_be_constant().set_argument_name( + "report_format", FunctionEnums::NAMED_ONLY); + // Creates a signature for DP function returning a report. This signature + // will only be matched if the argument at the 0-indexed + // `report_arg_position` has constant value that is equal to + // `report_format`. + auto get_dp_report_signature = + [](functions::DifferentialPrivacyEnums::ReportFormat report_format, + int report_arg_position) { + auto dp_report_constraint = + [report_arg_position, report_format]( + const FunctionSignature& concrete_signature, + absl::Span arguments) -> std::string { + if (arguments.size() <= report_arg_position) { + return absl::StrCat("at most ", report_arg_position, + " argument(s) can be provided"); + } + const Value* value = + arguments.at(report_arg_position).literal_value(); + if (value == nullptr || !value->is_valid()) { + return absl::StrCat("literal value is required at ", + report_arg_position + 1); + } + const Value expected_value = Value::Enum( + types::DifferentialPrivacyReportFormatEnumType(), report_format); + // If we encounter string we have to create enum type out of it to + // be able to compare against expected enum value. + if (value->type()->IsString()) { + auto enum_value = + Value::Enum(types::DifferentialPrivacyReportFormatEnumType(), + value->string_value()); + if (!enum_value.is_valid()) { + return absl::StrCat("Invalid enum value: ", + value->string_value()); + } + if (enum_value.Equals(expected_value)) { + return ""; + } + return absl::StrCat( + "Found: ", enum_value.EnumDisplayName(), + " expecting: ", expected_value.EnumDisplayName()); + } + if (value->Equals(expected_value)) { + return std::string(""); + } + return absl::StrCat("Found: ", value->EnumDisplayName(), + " expecting: ", expected_value.EnumDisplayName()); + }; + return FunctionSignatureOptions() + .set_constraints(dp_report_constraint) + .add_required_language_feature( + FEATURE_DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS); + }; + + // TODO: internal function names shouldn't be resolvable, + // an alternative way to look up COUNT(*) will be needed to fix the + // linked bug. + + auto get_sql_callback_for_function = [](std::string_view user_facing_name) { + return [user_facing_name](const std::vector& inputs) { + return absl::StrCat(user_facing_name, "(", absl::StrJoin(inputs, ", "), + ")"); + }; + }; + + auto signature_text_function = [](const LanguageOptions& language_options, + const Function& function, + const FunctionSignature& signature) { + std::vector argument_texts; + for (const FunctionArgumentType& argument : signature.arguments()) { + if (!argument.has_argument_name() || + argument.argument_name() != "report_format") { + argument_texts.push_back(argument.UserFacingNameWithCardinality( + language_options.product_mode(), + FunctionArgumentType::NamePrintingStyle::kIfNamedOnly, + /*print_template_details=*/true)); + } else { + // Includes the return result type as the two signatures accept the same + // arguments and result type is distinguished based on value of + // `report_format` argument. + const std::string report_suffix = + signature.result_type().type()->IsJsonType() + ? "/*with value \"JSON\"*/" + : (signature.result_type().type()->IsProto() + ? "/*with value \"PROTO\"*/" + : ""); + argument_texts.push_back(absl::StrCat( + argument.UserFacingNameWithCardinality( + language_options.product_mode(), + FunctionArgumentType::NamePrintingStyle::kIfNamedOnly, + /*print_template_details=*/true), + report_suffix)); + } + } + return absl::StrCat( + function.GetSQL(argument_texts), " -> ", + signature.result_type().UserFacingNameWithCardinality( + language_options.product_mode(), + FunctionArgumentType::NamePrintingStyle::kIfNamedOnly, + /*print_template_details=*/true)); + }; + + auto supported_signatures_function = + [&signature_text_function](const LanguageOptions& language_options, + const Function& function) { + std::string supported_signatures; + for (const FunctionSignature& signature : function.signatures()) { + if (signature.IsDeprecated() || signature.IsInternal() || + signature.HasUnsupportedType(language_options) || + !signature.options().check_all_required_features_are_enabled( + language_options.GetEnabledLanguageFeatures())) { + continue; + } + if (!supported_signatures.empty()) { + absl::StrAppend(&supported_signatures, "; "); + } + absl::StrAppend( + &supported_signatures, + signature_text_function(language_options, function, signature)); + } + return supported_signatures; + }; + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_count", Function::kZetaSQLFunctionGroupName, + {{int64_type, + {/*expr=*/ARG_TYPE_ANY_2, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_COUNT}, + {json_type, + {/*expr=*/ARG_TYPE_ANY_2, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_COUNT_REPORT_JSON, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 1)}, + {report_proto_type, + {/*expr=*/ARG_TYPE_ANY_2, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_COUNT_REPORT_PROTO, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 1)}}, + dp_options.Copy() + .set_get_sql_callback(get_sql_callback_for_function("COUNT")) + .set_signature_text_callback(signature_text_function) + .set_supported_signatures_callback(supported_signatures_function) + .set_sql_name("count"), + "count")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_count_star", + Function::kZetaSQLFunctionGroupName, + {{int64_type, + {/*contribution_bounds_per_group=*/{ + int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_COUNT_STAR}, + {json_type, + {/*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_COUNT_STAR_REPORT_JSON, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 0)}, + {report_proto_type, + {/*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_COUNT_STAR_REPORT_PROTO, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 0)}}, + dp_options.Copy() + .set_get_sql_callback(&DPCountStarSQL) + // TODO: Fix this callback, which returns only one + // signature for a function with 3 signatures. + .set_supported_signatures_callback( + &SupportedSignaturesForDPCountStar) + .set_sql_name("count(*)"), + "$count_star")); + + std::vector args; + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_sum", Function::kZetaSQLFunctionGroupName, + {{int64_type, + {/*expr=*/int64_type, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_INT64}, + {uint64_type, + {/*expr=*/uint64_type, + /*contribution_bounds_per_group=*/ + {uint64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_UINT64}, + {double_type, + {/*expr=*/double_type, + /*contribution_bounds_per_group=*/ + {double_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_DOUBLE}, + {numeric_type, + {/*expr=*/numeric_type, + /*contribution_bounds_per_group=*/ + {numeric_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_NUMERIC, + has_numeric_type_argument}, + {json_type, + {/*expr=*/int64_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_JSON_INT64, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 1)}, + {json_type, + {/*expr=*/double_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {double_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_JSON_DOUBLE, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 1)}, + {json_type, + {/*expr=*/uint64_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {uint64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_JSON_UINT64, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 1)}, + {report_proto_type, + {/*expr=*/int64_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {int64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_PROTO_INT64, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 1)}, + {report_proto_type, + {/*expr=*/double_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {double_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_PROTO_DOUBLE, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 1)}, + {report_proto_type, + {/*expr=*/uint64_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {uint64_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_PROTO_UINT64, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 1)}}, + dp_options.Copy() + .set_get_sql_callback(get_sql_callback_for_function("SUM")) + .set_signature_text_callback(signature_text_function) + .set_supported_signatures_callback(supported_signatures_function) + .set_sql_name("sum"), + "sum")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_avg", Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*contribution_bounds_per_group=*/ + {double_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_AVG_DOUBLE}, + {numeric_type, + {/*expr=*/numeric_type, + /*contribution_bounds_per_group=*/ + {numeric_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_AVG_NUMERIC, + has_numeric_type_argument}, + {json_type, + {/*expr=*/double_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {double_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_AVG_DOUBLE_REPORT_JSON, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 1)}, + {report_proto_type, + {/*expr=*/double_type, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_group=*/ + {double_pair_type, + optional_contribution_bounds_per_group_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_AVG_DOUBLE_REPORT_PROTO, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 1)}}, + dp_options.Copy() + .set_get_sql_callback(get_sql_callback_for_function("AVG")) + .set_signature_text_callback(signature_text_function) + .set_supported_signatures_callback(supported_signatures_function) + .set_sql_name("avg"), + "avg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_var_pop", + Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*contribution_bounds_per_row=*/ + {double_pair_type, + optional_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_VAR_POP_DOUBLE}, + {double_type, + {/*expr=*/double_array_type, + /*contribution_bounds_per_row=*/ + {double_pair_type, + optional_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_VAR_POP_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}}, + dp_options.Copy() + .set_get_sql_callback(get_sql_callback_for_function("VAR_POP")) + .set_signature_text_callback(signature_text_function) + .set_supported_signatures_callback(supported_signatures_function) + .set_sql_name("var_pop"), + "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_stddev_pop", + Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*contribution_bounds_per_row=*/ + {double_pair_type, + optional_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_STDDEV_POP_DOUBLE}, + {double_type, + {/*expr=*/double_array_type, + /*contribution_bounds_per_row=*/ + {double_pair_type, + optional_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_STDDEV_POP_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}}, + dp_options.Copy() + .set_get_sql_callback(get_sql_callback_for_function("STDDEV_POP")) + .set_signature_text_callback(signature_text_function) + .set_supported_signatures_callback(supported_signatures_function) + .set_sql_name("stddev_pop"), + "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_percentile_cont", + Function::kZetaSQLFunctionGroupName, + {{double_type, + {/*expr=*/double_type, + /*percentile=*/{double_type, percentile_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + optional_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_PERCENTILE_CONT_DOUBLE}, + // This is an internal signature that is only used post-dp-rewrite, + // and is not available in the external SQL language. + {double_type, + {/*expr=*/double_array_type, + /*percentile=*/{double_type, percentile_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + optional_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_PERCENTILE_CONT_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}}, + dp_options.Copy() + .set_get_sql_callback( + get_sql_callback_for_function("PERCENTILE_CONT")) + .set_signature_text_callback(signature_text_function) + .set_supported_signatures_callback(supported_signatures_function) + .set_sql_name("percentile_cont"), + "array_agg")); + + InsertCreatedFunction( + functions, options, + new AnonFunction( + "$differential_privacy_approx_quantiles", + Function::kZetaSQLFunctionGroupName, + {{double_array_type, + {/*expr=*/double_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + required_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE}, + // This is an internal signature that is only used post-dp-rewrite, + // and is not available in the external SQL language. + {double_array_type, + {/*expr=*/double_array_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + required_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_ARRAY, + FunctionSignatureOptions().set_is_internal(true)}, + {json_type, + {/*expr=*/double_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + required_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_REPORT_JSON, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 2)}, + // This is an internal signature that is only used post-dp-rewrite, + // and is not available in the external SQL language. + {json_type, + {/*expr=*/double_array_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + required_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_ARRAY_REPORT_JSON, + get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, + 2) + .set_is_internal(true)}, + {report_proto_type, + {/*expr=*/double_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + required_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_REPORT_PROTO, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 2)}, + // This is an internal signature that is only used post-dp-rewrite, + // and is not available in the external SQL language. + {report_proto_type, + {/*expr=*/double_array_type, + /*quantiles=*/{int64_type, quantiles_arg_options}, + /*report_format=*/{report_format_type, report_arg_options}, + /*contribution_bounds_per_row=*/ + {double_pair_type, + required_contribution_bounds_per_row_arg_options}}, + FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_ARRAY_REPORT_PROTO, + get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, + 2) + .set_is_internal(true)}}, + dp_options.Copy() + .set_get_sql_callback( + get_sql_callback_for_function("APPROX_QUANTILES")) + .set_signature_text_callback(signature_text_function) + .set_supported_signatures_callback(supported_signatures_function) + .set_sql_name("approx_quantiles"), + "array_agg")); + return absl::OkStatus(); +} + +} // namespace zetasql diff --git a/zetasql/common/builtin_function_distance.cc b/zetasql/common/builtin_function_distance.cc new file mode 100644 index 000000000..a4522871f --- /dev/null +++ b/zetasql/common/builtin_function_distance.cc @@ -0,0 +1,365 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include +#include + +#include "zetasql/common/builtin_function_internal.h" +#include "zetasql/public/builtin_function_options.h" +#include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" +#include "zetasql/public/types/array_type.h" +#include "zetasql/public/types/struct_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "absl/status/status.h" +#include "absl/strings/string_view.h" +#include "absl/strings/substitute.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +// Create a `FunctionSignatureOptions` that configures a SQL definition that +// will be inlined by `REWRITE_BUILTIN_FUNCTION_INLINER`. +static FunctionSignatureOptions SetDefinitionForInlining(absl::string_view sql, + bool enabled = true) { + return FunctionSignatureOptions().set_rewrite_options( + FunctionSignatureRewriteOptions() + .set_enabled(enabled) + .set_rewriter(REWRITE_BUILTIN_FUNCTION_INLINER) + .set_sql(sql)); +} + +absl::Status GetDistanceFunctions(TypeFactory* type_factory, + const BuiltinFunctionOptions& options, + NameToFunctionMap* functions) { + std::vector input_struct_fields_int64 = { + {"key", types::Int64Type()}, {"value", types::DoubleType()}}; + const StructType* struct_int64 = nullptr; + ZETASQL_RETURN_IF_ERROR( + type_factory->MakeStructType({input_struct_fields_int64}, &struct_int64)); + const ArrayType* array_struct_int64_key_type; + ZETASQL_RETURN_IF_ERROR( + type_factory->MakeArrayType(struct_int64, &array_struct_int64_key_type)); + + std::vector input_struct_fields_string = { + {"key", types::StringType()}, {"value", types::DoubleType()}}; + const StructType* struct_string = nullptr; + ZETASQL_RETURN_IF_ERROR(type_factory->MakeStructType({input_struct_fields_string}, + &struct_string)); + const ArrayType* array_struct_string_key_type; + ZETASQL_RETURN_IF_ERROR(type_factory->MakeArrayType(struct_string, + &array_struct_string_key_type)); + + FunctionOptions function_options; + std::vector cosine_signatures = { + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType()}, + FN_COSINE_DISTANCE_DENSE_DOUBLE}, + {types::DoubleType(), + {array_struct_int64_key_type, array_struct_int64_key_type}, + FN_COSINE_DISTANCE_SPARSE_INT64}, + {types::DoubleType(), + {array_struct_string_key_type, array_struct_string_key_type}, + FN_COSINE_DISTANCE_SPARSE_STRING}}; + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_ENABLE_FLOAT_DISTANCE_FUNCTIONS)) { + cosine_signatures.push_back( + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType()}, + FN_COSINE_DISTANCE_DENSE_FLOAT}); + } + + InsertFunction(functions, options, "cosine_distance", Function::SCALAR, + cosine_signatures, function_options); + + FunctionArgumentType options_arg = FunctionArgumentType( + types::JsonType(), + FunctionArgumentTypeOptions(FunctionArgumentType::REQUIRED) + .set_argument_name("options", kNamedOnly)); + + std::vector approx_cosine_signatures = { + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType()}, + FN_APPROX_COSINE_DISTANCE_DOUBLE}, + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType(), options_arg}, + FN_APPROX_COSINE_DISTANCE_DOUBLE_WITH_OPTIONS}, + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType()}, + FN_APPROX_COSINE_DISTANCE_FLOAT}, + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType(), options_arg}, + FN_APPROX_COSINE_DISTANCE_FLOAT_WITH_OPTIONS}}; + + InsertFunction(functions, options, "approx_cosine_distance", Function::SCALAR, + approx_cosine_signatures, /*function_options=*/{}); + + std::vector euclidean_signatures = { + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType()}, + FN_EUCLIDEAN_DISTANCE_DENSE_DOUBLE}, + {types::DoubleType(), + {array_struct_int64_key_type, array_struct_int64_key_type}, + FN_EUCLIDEAN_DISTANCE_SPARSE_INT64}, + {types::DoubleType(), + {array_struct_string_key_type, array_struct_string_key_type}, + FN_EUCLIDEAN_DISTANCE_SPARSE_STRING}}; + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_ENABLE_FLOAT_DISTANCE_FUNCTIONS)) { + euclidean_signatures.push_back( + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType()}, + FN_EUCLIDEAN_DISTANCE_DENSE_FLOAT}); + } + + InsertFunction(functions, options, "euclidean_distance", Function::SCALAR, + euclidean_signatures, function_options); + + std::vector approx_euclidean_signatures = { + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType()}, + FN_APPROX_EUCLIDEAN_DISTANCE_DOUBLE}, + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType(), options_arg}, + FN_APPROX_EUCLIDEAN_DISTANCE_DOUBLE_WITH_OPTIONS}, + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType()}, + FN_APPROX_EUCLIDEAN_DISTANCE_FLOAT}, + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType(), options_arg}, + FN_APPROX_EUCLIDEAN_DISTANCE_FLOAT_WITH_OPTIONS}}; + + InsertFunction(functions, options, "approx_euclidean_distance", + Function::SCALAR, approx_euclidean_signatures, + /*function_options=*/{}); + + // Lambdas for a common error message amongst function rewriters. + auto null_element_err_msg_base = [](absl::string_view name) { + return absl::Substitute( + "Cannot compute $0 with a NULL element, since it is unclear if NULLs " + "should be ignored, counted as a zero value, or another " + "interpretation.", + name); + }; + + // Lambda with argument-checking SQL common to distance function rewriters. + auto distance_fn_rewrite_sql = [&null_element_err_msg_base]( + absl::string_view name, + absl::string_view sql) { + return absl::Substitute(R"sql( + CASE + WHEN input_array_1 IS NULL OR input_array_2 IS NULL + THEN NULL + WHEN ARRAY_LENGTH(input_array_1) = 0 AND ARRAY_LENGTH(input_array_2) = 0 + THEN CAST(0 AS FLOAT64) + WHEN ARRAY_LENGTH(input_array_1) != ARRAY_LENGTH(input_array_2) + THEN ERROR(FORMAT( + "Array arguments to %s must have equal length. The given arrays have lengths of %d and %d", + "$0", ARRAY_LENGTH(input_array_1), ARRAY_LENGTH(input_array_2))) + ELSE + $1 + WHERE + IF(e1 IS NULL, ERROR(FORMAT( + "%s The NULL element was found in the first array argument at OFFSET %d", + "$2", index)), TRUE) AND + IF(input_array_2[OFFSET(index)] IS NULL, ERROR(FORMAT( + "%s The NULL element was found in the second array argument at OFFSET %d", + "$2", index)), TRUE)) + END + )sql", + name, sql, null_element_err_msg_base(name)); + }; + + // Lambda for defining named arguments for distance function rewriters. + auto distance_fn_named_arg = [](const Type* arg_type, + absl::string_view name) { + return FunctionArgumentType( + arg_type, + FunctionArgumentTypeOptions().set_argument_name(name, kPositionalOnly)); + }; + + // Use a Rewriter for DOT_PRODUCT. + std::string dot_product_sql = distance_fn_rewrite_sql("DOT_PRODUCT", R"sql( + (SELECT + SUM( + CAST(e1 AS FLOAT64) * + CAST(input_array_2[OFFSET(index)] AS FLOAT64)) + FROM UNNEST(input_array_1) AS e1 WITH OFFSET index + )sql"); + + FunctionSignatureOptions dot_product_signature_options = + SetDefinitionForInlining(dot_product_sql, true) + .add_required_language_feature(FEATURE_V_1_4_DOT_PRODUCT); + + std::vector dot_product_signatures = { + {types::DoubleType(), + {distance_fn_named_arg(types::Int64ArrayType(), "input_array_1"), + distance_fn_named_arg(types::Int64ArrayType(), "input_array_2")}, + FN_DOT_PRODUCT_INT64, + dot_product_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::FloatArrayType(), "input_array_1"), + distance_fn_named_arg(types::FloatArrayType(), "input_array_2")}, + FN_DOT_PRODUCT_FLOAT, + dot_product_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::DoubleArrayType(), "input_array_1"), + distance_fn_named_arg(types::DoubleArrayType(), "input_array_2")}, + FN_DOT_PRODUCT_DOUBLE, + dot_product_signature_options}}; + + InsertFunction(functions, options, "dot_product", Function::SCALAR, + dot_product_signatures, function_options); + + std::vector approx_dot_product_signatures = { + {types::DoubleType(), + {types::Int64ArrayType(), types::Int64ArrayType()}, + FN_APPROX_DOT_PRODUCT_INT64}, + {types::DoubleType(), + {types::Int64ArrayType(), types::Int64ArrayType(), options_arg}, + FN_APPROX_DOT_PRODUCT_INT64_WITH_OPTIONS}, + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType()}, + FN_APPROX_DOT_PRODUCT_FLOAT}, + {types::DoubleType(), + {types::FloatArrayType(), types::FloatArrayType(), options_arg}, + FN_APPROX_DOT_PRODUCT_FLOAT_WITH_OPTIONS}, + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType()}, + FN_APPROX_DOT_PRODUCT_DOUBLE}, + {types::DoubleType(), + {types::DoubleArrayType(), types::DoubleArrayType(), options_arg}, + FN_APPROX_DOT_PRODUCT_DOUBLE_WITH_OPTIONS}}; + + InsertFunction(functions, options, "approx_dot_product", Function::SCALAR, + approx_dot_product_signatures, /*function_options=*/{}); + + // Use a Rewriter for MANHATTAN_DISTANCE. + std::string manhattan_distance_sql = + distance_fn_rewrite_sql("MANHATTAN_DISTANCE", R"sql( + (SELECT + SUM(ABS( + CAST(e1 AS FLOAT64) - + CAST(input_array_2[OFFSET(index)] AS FLOAT64))) + FROM UNNEST(input_array_1) AS e1 WITH OFFSET index + )sql"); + + FunctionSignatureOptions manhattan_distance_signature_options = + SetDefinitionForInlining(manhattan_distance_sql, true) + .add_required_language_feature(FEATURE_V_1_4_MANHATTAN_DISTANCE); + + std::vector manhattan_distance_signatures = { + {types::DoubleType(), + {distance_fn_named_arg(types::Int64ArrayType(), "input_array_1"), + distance_fn_named_arg(types::Int64ArrayType(), "input_array_2")}, + FN_MANHATTAN_DISTANCE_INT64, + manhattan_distance_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::FloatArrayType(), "input_array_1"), + distance_fn_named_arg(types::FloatArrayType(), "input_array_2")}, + FN_MANHATTAN_DISTANCE_FLOAT, + manhattan_distance_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::DoubleArrayType(), "input_array_1"), + distance_fn_named_arg(types::DoubleArrayType(), "input_array_2")}, + FN_MANHATTAN_DISTANCE_DOUBLE, + manhattan_distance_signature_options}}; + + InsertFunction(functions, options, "manhattan_distance", Function::SCALAR, + manhattan_distance_signatures, function_options); + + // Lambda with argument-checking SQL common to norm function rewriters. + auto norm_fn_rewrite_sql = [&null_element_err_msg_base]( + absl::string_view name, + absl::string_view sql) { + return absl::Substitute(R"sql( + CASE + WHEN input_array IS NULL + THEN NULL + WHEN ARRAY_LENGTH(input_array) = 0 + THEN CAST(0 AS FLOAT64) + ELSE + $0 + WHERE + IF(e IS NULL, ERROR(FORMAT( + "%s The NULL element was found in the array argument at OFFSET %d", + "$1", index)), TRUE)) + END + )sql", + sql, null_element_err_msg_base(name)); + }; + + // Use a Rewriter for L1_NORM. + std::string l1_norm_sql = norm_fn_rewrite_sql("L1_NORM", R"sql( + (SELECT SUM(ABS(CAST(e AS FLOAT64))) + FROM UNNEST(input_array) AS e WITH OFFSET index + )sql"); + + FunctionSignatureOptions l1_norm_signature_options = + SetDefinitionForInlining(l1_norm_sql, true) + .add_required_language_feature(FEATURE_V_1_4_L1_NORM); + + std::vector l1_norm_signatures = { + {types::DoubleType(), + {distance_fn_named_arg(types::Int64ArrayType(), "input_array")}, + FN_L1_NORM_INT64, + l1_norm_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::FloatArrayType(), "input_array")}, + FN_L1_NORM_FLOAT, + l1_norm_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::DoubleArrayType(), "input_array")}, + FN_L1_NORM_DOUBLE, + l1_norm_signature_options}}; + + InsertFunction(functions, options, "l1_norm", Function::SCALAR, + l1_norm_signatures, function_options); + + // Use a Rewriter for L2_NORM. + std::string l2_norm_sql = norm_fn_rewrite_sql("L2_NORM", R"sql( + (SELECT SQRT(SUM(CAST(e AS FLOAT64) * CAST(e AS FLOAT64))) + FROM UNNEST(input_array) AS e WITH OFFSET index + )sql"); + + FunctionSignatureOptions l2_norm_signature_options = + SetDefinitionForInlining(l2_norm_sql, true) + .add_required_language_feature(FEATURE_V_1_4_L2_NORM); + + std::vector l2_norm_signatures = { + {types::DoubleType(), + {distance_fn_named_arg(types::Int64ArrayType(), "input_array")}, + FN_L2_NORM_INT64, + l2_norm_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::FloatArrayType(), "input_array")}, + FN_L2_NORM_FLOAT, + l2_norm_signature_options}, + {types::DoubleType(), + {distance_fn_named_arg(types::DoubleArrayType(), "input_array")}, + FN_L2_NORM_DOUBLE, + l2_norm_signature_options}}; + + InsertFunction(functions, options, "l2_norm", Function::SCALAR, + l2_norm_signatures, function_options); + + return absl::OkStatus(); +} + +} // namespace zetasql diff --git a/zetasql/common/builtin_function_internal.h b/zetasql/common/builtin_function_internal.h index 1b0c37650..cac1a656c 100644 --- a/zetasql/common/builtin_function_internal.h +++ b/zetasql/common/builtin_function_internal.h @@ -165,14 +165,25 @@ std::string CountStarFunctionSQL(const std::vector& inputs); std::string AnonCountStarFunctionSQL(const std::vector& inputs); +std::string SignatureTextForAnonCountStarFunction( + const LanguageOptions& language_options, const Function& function, + const FunctionSignature& signature); + std::string SupportedSignaturesForAnonCountStarFunction( - const std::string& unused_function_name, const LanguageOptions& language_options, const Function& function); +std::string SignatureTextForAnonCountStarWithReportFunction( + const std::string& report_format, const LanguageOptions& language_options, + const Function& function, const FunctionSignature& signature); + std::string SupportedSignaturesForAnonCountStarWithReportFunction( const std::string& report_format, const LanguageOptions& language_options, const Function& function); +std::string SignatureTextForAnonQuantilesWithReportFunction( + const std::string& report_format, const LanguageOptions& language_options, + const Function& function, const FunctionSignature& signature); + std::string SupportedSignaturesForAnonQuantilesWithReportFunction( const std::string& report_format, const LanguageOptions& language_options, const Function& function); @@ -213,8 +224,12 @@ std::string InListFunctionSQL(const std::vector& inputs); std::string LikeAnyFunctionSQL(const std::vector& inputs); +std::string NotLikeAnyFunctionSQL(const std::vector& inputs); + std::string LikeAllFunctionSQL(const std::vector& inputs); +std::string NotLikeAllFunctionSQL(const std::vector& inputs); + std::string CaseWithValueFunctionSQL(const std::vector& inputs); std::string CaseNoValueFunctionSQL(const std::vector& inputs); @@ -223,8 +238,12 @@ std::string InArrayFunctionSQL(const std::vector& inputs); std::string LikeAnyArrayFunctionSQL(const std::vector& inputs); +std::string NotLikeAnyArrayFunctionSQL(const std::vector& inputs); + std::string LikeAllArrayFunctionSQL(const std::vector& inputs); +std::string NotLikeAllArrayFunctionSQL(const std::vector& inputs); + std::string ArrayAtOffsetFunctionSQL(const std::vector& inputs); std::string ArrayAtOrdinalFunctionSQL(const std::vector& inputs); @@ -384,9 +403,6 @@ std::string NoMatchingSignatureForSubscript( absl::string_view offset_or_ordinal, absl::string_view operator_name, const std::vector& arguments, ProductMode product_mode); -std::string EmptySupportedSignatures(const LanguageOptions& language_options, - const Function& function); - absl::Status CheckArgumentsSupportEquality( const std::string& comparison_name, const FunctionSignature& /*signature*/, @@ -722,6 +738,10 @@ absl::Status GetArrayZipFunctions( TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, NameToFunctionMap* functions, NameToTypeMap* types); +absl::Status GetStandaloneBuiltinEnumTypes( + TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, + NameToTypeMap* types); + void GetSubscriptFunctions(TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, NameToFunctionMap* functions); @@ -792,6 +812,14 @@ void GetRangeFunctions(TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, NameToFunctionMap* functions); +void GetElementWiseAggregationFunctions( + TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, + NameToFunctionMap* functions); + +void GetMapCoreFunctions(TypeFactory* type_factory, + const ZetaSQLBuiltinFunctionOptions& options, + NameToFunctionMap* functions); + } // namespace zetasql #endif // ZETASQL_COMMON_BUILTIN_FUNCTION_INTERNAL_H_ diff --git a/zetasql/common/builtin_function_internal_1.cc b/zetasql/common/builtin_function_internal_1.cc index cdd6803fa..648a7a660 100644 --- a/zetasql/common/builtin_function_internal_1.cc +++ b/zetasql/common/builtin_function_internal_1.cc @@ -160,31 +160,61 @@ std::string AnonCountStarFunctionSQL(const std::vector& inputs) { : "", ")"); } +std::string SignatureTextForAnonCountStarFunction() { + return "ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64])"; +} + +std::string SignatureTextForAnonCountStarFunction( + const LanguageOptions& language_options, const Function& function, + const FunctionSignature& signature) { + return SignatureTextForAnonCountStarFunction(); +} std::string SupportedSignaturesForAnonCountStarFunction( - const std::string& unused_function_name, const LanguageOptions& language_options, const Function& function) { - return "ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64])"; + return SignatureTextForAnonCountStarFunction(); } -std::string SupportedSignaturesForAnonCountStarWithReportFunction( - const std::string& report_format, const LanguageOptions& language_options, - const Function& function) { +std::string SignatureTextForAnonCountStarWithReportFunction( + const std::string& report_format) { return absl::StrCat( "ANON_COUNT(* [CLAMPED BETWEEN INT64 AND INT64] WITH " "REPORT(FORMAT=", report_format, "))"); } -std::string SupportedSignaturesForAnonQuantilesWithReportFunction( +std::string SignatureTextForAnonCountStarWithReportFunction( + const std::string& report_format, const LanguageOptions& language_options, + const Function& function, const FunctionSignature& signature) { + return SignatureTextForAnonCountStarWithReportFunction(report_format); +} + +std::string SupportedSignaturesForAnonCountStarWithReportFunction( const std::string& report_format, const LanguageOptions& language_options, const Function& function) { + return SignatureTextForAnonCountStarWithReportFunction(report_format); +} + +std::string SignatureTextForAnonQuantilesWithReportFunction( + const std::string& report_format) { return absl::StrCat( "ANON_QUANTILES(DOUBLE, INT64 CLAMPED BETWEEN DOUBLE AND DOUBLE WITH " "REPORT(FORMAT=", report_format, "))"); } +std::string SupportedSignaturesForAnonQuantilesWithReportFunction( + const std::string& report_format, const LanguageOptions& language_options, + const Function& function) { + return SignatureTextForAnonQuantilesWithReportFunction(report_format); +} + +std::string SignatureTextForAnonQuantilesWithReportFunction( + const std::string& report_format, const LanguageOptions& language_options, + const Function& function, const FunctionSignature& signature) { + return SignatureTextForAnonQuantilesWithReportFunction(report_format); +} + std::string AnonSumWithReportJsonFunctionSQL( const std::vector& inputs) { ABSL_DCHECK(inputs.size() == 1 || inputs.size() == 3); @@ -310,12 +340,24 @@ std::string LikeAnyFunctionSQL(const std::vector& inputs) { return absl::StrCat(inputs[0], " LIKE ANY (", absl::StrJoin(like_list, ", "), ")"); } +std::string NotLikeAnyFunctionSQL(const std::vector& inputs) { + ABSL_DCHECK_GT(inputs.size(), 1); + std::vector like_list(inputs.begin() + 1, inputs.end()); + return absl::StrCat(inputs[0], "NOT LIKE ALL (", + absl::StrJoin(like_list, ", "), ")"); +} std::string LikeAllFunctionSQL(const std::vector& inputs) { ABSL_DCHECK_GT(inputs.size(), 1); std::vector like_list(inputs.begin() + 1, inputs.end()); return absl::StrCat(inputs[0], " LIKE ALL (", absl::StrJoin(like_list, ", "), ")"); } +std::string NotLikeAllFunctionSQL(const std::vector& inputs) { + ABSL_DCHECK_GT(inputs.size(), 1); + std::vector like_list(inputs.begin() + 1, inputs.end()); + return absl::StrCat(inputs[0], "NOT LIKE ALL (", + absl::StrJoin(like_list, ", "), ")"); +} std::string CaseWithValueFunctionSQL(const std::vector& inputs) { ABSL_DCHECK_GE(inputs.size(), 2); ABSL_DCHECK_EQ((inputs.size() - 2) % 2, 0); @@ -354,10 +396,18 @@ std::string LikeAnyArrayFunctionSQL(const std::vector& inputs) { ABSL_DCHECK_EQ(inputs.size(), 2); return absl::StrCat(inputs[0], " LIKE ANY UNNEST(", inputs[1], ")"); } +std::string NotLikeAnyArrayFunctionSQL(const std::vector& inputs) { + ABSL_DCHECK_EQ(inputs.size(), 2); + return absl::StrCat(inputs[0], " NOT LIKE ANY UNNEST(", inputs[1], ")"); +} std::string LikeAllArrayFunctionSQL(const std::vector& inputs) { ABSL_DCHECK_EQ(inputs.size(), 2); return absl::StrCat(inputs[0], " LIKE ALL UNNEST(", inputs[1], ")"); } +std::string NotLikeAllArrayFunctionSQL(const std::vector& inputs) { + ABSL_DCHECK_EQ(inputs.size(), 2); + return absl::StrCat(inputs[0], " NOT LIKE ALL UNNEST(", inputs[1], ")"); +} std::string ParenthesizedArrayFunctionSQL(const std::string& input) { if (std::find_if(input.begin(), input.end(), [](char c) { return c == '|'; }) == input.end()) { @@ -1301,11 +1351,6 @@ std::string NoMatchingSignatureForSubscript( return msg; } -std::string EmptySupportedSignatures(const LanguageOptions& language_options, - const Function& function) { - return std::string(); -} - absl::Status CheckArgumentsSupportEquality( const std::string& comparison_name, const FunctionSignature& signature, diff --git a/zetasql/common/builtin_function_internal_2.cc b/zetasql/common/builtin_function_internal_2.cc index 8c0957e27..cf1d76394 100644 --- a/zetasql/common/builtin_function_internal_2.cc +++ b/zetasql/common/builtin_function_internal_2.cc @@ -40,6 +40,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_format.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/status.h" #include "zetasql/base/status_macros.h" @@ -83,7 +84,7 @@ static bool AllArgumentsHaveType(const std::vector& arguments) { // is non-empty. template static std::string GetExtractFunctionSignatureString( - const std::string& explicit_datepart_name, + absl::string_view explicit_datepart_name, const std::vector& arguments, ProductMode product_mode, bool include_bracket) { if (arguments.empty()) { @@ -103,10 +104,6 @@ static std::string GetExtractFunctionSignatureString( // // ABSL_DCHECK validated - given the non-standard function call syntax for // EXTRACT, the parser enforces 2 or 3 arguments in the language. - if (arguments.size() != 2 && arguments.size() != 3) { - return absl::StrCat("Expected 2 or 3 arguments to EXTRACT, but found ", - arguments.size()); - } // Expected invariant - the 1th argument is the date part argument. ABSL_DCHECK(arguments[1].type()->Equivalent(types::DatePartEnumType())); @@ -142,12 +139,17 @@ static std::string GetExtractFunctionSignatureString( } static std::string NoMatchingSignatureForExtractFunction( - const std::string& explicit_datepart_name, + absl::string_view explicit_datepart_name, absl::string_view qualified_function_name, const std::vector& arguments, ProductMode product_mode) { if (arguments.empty()) { - return "No matching signature for function EXTRACT," - " at least 1 argument must be provided"; + return "No matching signature for function EXTRACT with no arguments"; + } + if (explicit_datepart_name.empty() && arguments.size() != 2 && + arguments.size() != 3) { + return absl::StrCat("No matching signature for function EXTRACT with ", + arguments.size(), " argument", + arguments.size() == 1 ? "" : "s"); } std::string msg = "No matching signature for function EXTRACT for argument types: "; @@ -157,6 +159,18 @@ static std::string NoMatchingSignatureForExtractFunction( return msg; } +static std::string ExtractSignatureText( + const std::string& explicit_datepart_name, + const LanguageOptions& language_options, const Function& function, + const FunctionSignature& signature) { + return absl::StrCat( + "EXTRACT(", + GetExtractFunctionSignatureString( + explicit_datepart_name, signature.arguments(), + language_options.product_mode(), true /* include_bracket */), + ")"); +} + static std::string ExtractSupportedSignatures( const std::string& explicit_datepart_name, const LanguageOptions& language_options, const Function& function) { @@ -173,11 +187,9 @@ static std::string ExtractSupportedSignatures( absl::StrAppend(&supported_signatures, "; "); } absl::StrAppend( - &supported_signatures, "EXTRACT(", - GetExtractFunctionSignatureString( - explicit_datepart_name, signature.arguments(), - language_options.product_mode(), true /* include_bracket */), - ")"); + &supported_signatures, + ExtractSignatureText(explicit_datepart_name, language_options, function, + signature)); } return supported_signatures; } @@ -219,6 +231,8 @@ void GetDatetimeExtractFunctions(TypeFactory* type_factory, .set_no_matching_signature_callback( absl::bind_front(&NoMatchingSignatureForExtractFunction, /*explicit_datepart_name=*/"")) + .set_signature_text_callback(absl::bind_front( + &ExtractSignatureText, /*explicit_datepart_name=*/"")) .set_supported_signatures_callback( absl::bind_front(&ExtractSupportedSignatures, /*explicit_datepart_name=*/"")) @@ -237,8 +251,10 @@ void GetDatetimeExtractFunctions(TypeFactory* type_factory, .set_sql_name("extract") .set_no_matching_signature_callback( absl::bind_front(&NoMatchingSignatureForExtractFunction, "DATE")) - .set_supported_signatures_callback( - absl::bind_front(&ExtractSupportedSignatures, "DATE")) + .set_signature_text_callback(absl::bind_front( + &ExtractSignatureText, /*explicit_datepart_name=*/"DATE")) + .set_supported_signatures_callback(absl::bind_front( + &ExtractSupportedSignatures, /*explicit_datepart_name=*/"DATE")) .set_get_sql_callback( absl::bind_front(ExtractDateOrTimeFunctionSQL, "DATE"))); @@ -255,6 +271,8 @@ void GetDatetimeExtractFunctions(TypeFactory* type_factory, .set_sql_name("extract") .set_no_matching_signature_callback( absl::bind_front(&NoMatchingSignatureForExtractFunction, "TIME")) + .set_signature_text_callback(absl::bind_front( + &ExtractSignatureText, /*explicit_datepart_name=*/"TIME")) .set_supported_signatures_callback( absl::bind_front(&ExtractSupportedSignatures, "TIME")) .set_get_sql_callback( @@ -272,6 +290,8 @@ void GetDatetimeExtractFunctions(TypeFactory* type_factory, .set_sql_name("extract") .set_no_matching_signature_callback(absl::bind_front( &NoMatchingSignatureForExtractFunction, "DATETIME")) + .set_signature_text_callback(absl::bind_front( + &ExtractSignatureText, /*explicit_datepart_name=*/"DATETIME")) .set_supported_signatures_callback( absl::bind_front(&ExtractSupportedSignatures, "DATETIME")) .set_get_sql_callback( @@ -530,7 +550,7 @@ void GetDatetimeCurrentFunctions(TypeFactory* type_factory, template std::string NoLiteralOrParameterString( const FunctionSignature& matched_signature, - const std::vector& arguments) { + absl::Span arguments) { for (int i = 0; i < arguments.size(); i++) { if (i != arg_index1 && i != arg_index2) { continue; @@ -1964,12 +1984,14 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, InsertFunction( functions, options, "$equal", SCALAR, - {{bool_type, - {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, - FN_EQUAL, - FunctionSignatureOptions().set_uses_operation_collation()}, - {bool_type, {int64_type, uint64_type}, FN_EQUAL_INT64_UINT64}, - {bool_type, {uint64_type, int64_type}, FN_EQUAL_UINT64_INT64}}, + { + {bool_type, + {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, + FN_EQUAL, + FunctionSignatureOptions().set_uses_operation_collation()}, + {bool_type, {int64_type, uint64_type}, FN_EQUAL_INT64_UINT64}, + {bool_type, {uint64_type, int64_type}, FN_EQUAL_UINT64_INT64} + }, FunctionOptions() .set_supports_safe_error_mode(false) .set_sql_name("=") @@ -1981,12 +2003,14 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, InsertFunction( functions, options, "$not_equal", SCALAR, - {{bool_type, - {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, - FN_NOT_EQUAL, - FunctionSignatureOptions().set_uses_operation_collation()}, - {bool_type, {int64_type, uint64_type}, FN_NOT_EQUAL_INT64_UINT64}, - {bool_type, {uint64_type, int64_type}, FN_NOT_EQUAL_UINT64_INT64}}, + { + {bool_type, + {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, + FN_NOT_EQUAL, + FunctionSignatureOptions().set_uses_operation_collation()}, + {bool_type, {int64_type, uint64_type}, FN_NOT_EQUAL_INT64_UINT64}, + {bool_type, {uint64_type, int64_type}, FN_NOT_EQUAL_UINT64_INT64} + }, FunctionOptions() .set_supports_safe_error_mode(false) .set_sql_name("!=") @@ -2188,7 +2212,7 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, .set_no_matching_signature_callback( &NoMatchingSignatureForLikeExprFunction) .set_sql_name("like any") - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&LikeAnyFunctionSQL)); InsertFunction( @@ -2205,8 +2229,49 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, .set_no_matching_signature_callback( &NoMatchingSignatureForLikeExprFunction) .set_sql_name("like all") - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&LikeAllFunctionSQL)); + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL)) { + InsertFunction( + functions, options, "$not_like_any", SCALAR, + {{bool_type, + {string_type, {string_type, REPEATED}}, + FN_STRING_NOT_LIKE_ANY, + FunctionSignatureOptions().set_uses_operation_collation()}, + {bool_type, + {byte_type, {byte_type, REPEATED}}, + FN_BYTE_NOT_LIKE_ANY}}, + FunctionOptions() + .set_supports_safe_error_mode(false) + .set_post_resolution_argument_constraint(absl::bind_front( + &CheckArgumentsSupportEquality, "NOT LIKE ANY")) + .set_no_matching_signature_callback( + &NoMatchingSignatureForLikeExprFunction) + .set_sql_name("not like any") + .set_hide_supported_signatures(true) + .set_get_sql_callback(&NotLikeAnyFunctionSQL)); + + InsertFunction( + functions, options, "$not_like_all", SCALAR, + {{bool_type, + {string_type, {string_type, REPEATED}}, + FN_STRING_NOT_LIKE_ALL, + FunctionSignatureOptions().set_uses_operation_collation()}, + {bool_type, + {byte_type, {byte_type, REPEATED}}, + FN_BYTE_NOT_LIKE_ALL}}, + FunctionOptions() + .set_supports_safe_error_mode(false) + .set_post_resolution_argument_constraint(absl::bind_front( + &CheckArgumentsSupportEquality, "NOT LIKE ALL")) + .set_no_matching_signature_callback( + &NoMatchingSignatureForLikeExprFunction) + .set_sql_name("not like all") + .set_hide_supported_signatures(true) + .set_get_sql_callback(&NotLikeAllFunctionSQL)); + } } if (options.language_options.LanguageFeatureEnabled( @@ -2215,9 +2280,11 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, InsertFunction( functions, options, "$like_any_array", SCALAR, {{bool_type, - {string_type, array_string_type}, + {string_type, + {array_string_type, FunctionArgumentTypeOptions() + .set_uses_array_element_for_collation()}}, FN_STRING_ARRAY_LIKE_ANY, - FunctionSignatureOptions().set_rejects_collation()}, + FunctionSignatureOptions().set_uses_operation_collation()}, {bool_type, {byte_type, array_byte_type}, FN_BYTE_ARRAY_LIKE_ANY}}, FunctionOptions() .set_supports_safe_error_mode(false) @@ -2229,15 +2296,17 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, .set_no_matching_signature_callback( &NoMatchingSignatureForLikeExprArrayFunction) .set_sql_name("like any unnest") - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&LikeAnyArrayFunctionSQL)); InsertFunction( functions, options, "$like_all_array", SCALAR, {{bool_type, - {string_type, array_string_type}, + {string_type, + {array_string_type, FunctionArgumentTypeOptions() + .set_uses_array_element_for_collation()}}, FN_STRING_ARRAY_LIKE_ALL, - FunctionSignatureOptions().set_rejects_collation()}, + FunctionSignatureOptions().set_uses_operation_collation()}, {bool_type, {byte_type, array_byte_type}, FN_BYTE_ARRAY_LIKE_ALL}}, FunctionOptions() .set_supports_safe_error_mode(false) @@ -2249,8 +2318,59 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, .set_no_matching_signature_callback( &NoMatchingSignatureForLikeExprArrayFunction) .set_sql_name("like all unnest") - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&LikeAllArrayFunctionSQL)); + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL)) { + InsertFunction( + functions, options, "$not_like_any_array", SCALAR, + {{bool_type, + {string_type, + {array_string_type, FunctionArgumentTypeOptions() + .set_uses_array_element_for_collation()}}, + FN_STRING_ARRAY_NOT_LIKE_ANY, + FunctionSignatureOptions().set_uses_operation_collation()}, + {bool_type, + {byte_type, array_byte_type}, + FN_BYTE_ARRAY_NOT_LIKE_ANY}}, + FunctionOptions() + .set_supports_safe_error_mode(false) + .set_pre_resolution_argument_constraint( + // Verifies for NOT LIKE ANY|SOME UNNEST() + // * Argument to UNNEST is an array. + // * and elements of are comparable. + &CheckLikeExprArrayArguments) + .set_no_matching_signature_callback( + &NoMatchingSignatureForLikeExprArrayFunction) + .set_sql_name("not like any unnest") + .set_hide_supported_signatures(true) + .set_get_sql_callback(&NotLikeAnyArrayFunctionSQL)); + + InsertFunction( + functions, options, "$not_like_all_array", SCALAR, + {{bool_type, + {string_type, + {array_string_type, FunctionArgumentTypeOptions() + .set_uses_array_element_for_collation()}}, + FN_STRING_ARRAY_NOT_LIKE_ALL, + FunctionSignatureOptions().set_uses_operation_collation()}, + {bool_type, + {byte_type, array_byte_type}, + FN_BYTE_ARRAY_NOT_LIKE_ALL}}, + FunctionOptions() + .set_supports_safe_error_mode(false) + .set_pre_resolution_argument_constraint( + // Verifies for NOT LIKE ALL UNNEST() + // * Argument to UNNEST is an array. + // * and elements of are comparable. + &CheckLikeExprArrayArguments) + .set_no_matching_signature_callback( + &NoMatchingSignatureForLikeExprArrayFunction) + .set_sql_name("not like all unnest") + .set_hide_supported_signatures(true) + .set_get_sql_callback(&NotLikeAllArrayFunctionSQL)); + } } // TODO: Do we want to support IN for non-compatible integers, i.e., @@ -2266,7 +2386,7 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, .set_post_resolution_argument_constraint( absl::bind_front(&CheckArgumentsSupportEquality, "IN")) .set_no_matching_signature_callback(&NoMatchingSignatureForInFunction) - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&InListFunctionSQL)); // TODO: Do we want to support: @@ -2289,7 +2409,7 @@ absl::Status GetBooleanFunctions(TypeFactory* type_factory, .set_no_matching_signature_callback( &NoMatchingSignatureForInArrayFunction) .set_sql_name("in unnest") - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&InArrayFunctionSQL)); return absl::OkStatus(); } diff --git a/zetasql/common/builtin_function_internal_3.cc b/zetasql/common/builtin_function_internal_3.cc index cb1befa3d..51cc4ff0f 100644 --- a/zetasql/common/builtin_function_internal_3.cc +++ b/zetasql/common/builtin_function_internal_3.cc @@ -19,7 +19,6 @@ #include #include #include -#include #include #include "google/protobuf/timestamp.pb.h" @@ -28,15 +27,12 @@ #include "google/type/timeofday.pb.h" #include "zetasql/common/builtin_function_internal.h" #include "zetasql/common/errors.h" -#include "zetasql/proto/anon_output_with_report.pb.h" -#include "zetasql/public/anon_function.h" #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/builtin_function_options.h" #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" #include "zetasql/public/function.pb.h" #include "zetasql/public/function_signature.h" -#include "zetasql/public/functions/differential_privacy.pb.h" #include "zetasql/public/input_argument_type.h" #include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" @@ -51,34 +47,12 @@ #include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/strings/str_cat.h" -#include "absl/strings/str_join.h" #include "absl/strings/string_view.h" #include "zetasql/base/status_macros.h" namespace zetasql { class AnalyzerOptions; -static std::string DPCountStarSQL(const std::vector& inputs) { - if (inputs.empty()) { - return "COUNT(*)"; - } - return absl::StrCat("COUNT(*, ", absl::StrJoin(inputs, ", "), ")"); -} - -static std::string SupportedSignaturesForDPCountStar( - const LanguageOptions& language_options, const Function& function) { - if (!language_options.LanguageFeatureEnabled(FEATURE_DIFFERENTIAL_PRIVACY)) { - return ""; - } - if (!language_options.LanguageFeatureEnabled( - FEATURE_DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS)) { - return "COUNT(* [, contribution_bounds_per_group => STRUCT])"; - } - return "COUNT(* [, contribution_bounds_per_group => STRUCT] " - "[, report_format => DIFFERENTIAL_PRIVACY_REPORT_FORMAT])"; -} - static FunctionSignatureOptions SetRewriter(ResolvedASTRewrite rewriter) { return FunctionSignatureOptions().set_rewrite_options( FunctionSignatureRewriteOptions().set_rewriter(rewriter)); @@ -681,16 +655,19 @@ void GetErrorHandlingFunctions(TypeFactory* type_factory, InsertSimpleFunction( functions, options, "iferror", SCALAR, - {{ARG_TYPE_ANY_1, {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, FN_IFERROR}}); + {{ARG_TYPE_ANY_1, {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, FN_IFERROR}}, + FunctionOptions().set_may_suppress_side_effects(true)); - InsertSimpleFunction(functions, options, "iserror", SCALAR, - {{bool_type, {ARG_TYPE_ANY_1}, FN_ISERROR}}); + InsertFunction(functions, options, "iserror", SCALAR, + {{bool_type, {ARG_TYPE_ANY_1}, FN_ISERROR}}, + FunctionOptions().set_may_suppress_side_effects(true)); InsertFunction(functions, options, "nulliferror", SCALAR, {{ARG_TYPE_ANY_1, {ARG_TYPE_ANY_1}, FN_NULLIFERROR, - SetRewriter(REWRITE_NULLIFERROR_FUNCTION)}}); + SetRewriter(REWRITE_NULLIFERROR_FUNCTION)}}, + FunctionOptions().set_may_suppress_side_effects(true)); } static FunctionSignatureOnHeap NullIfZeroSig(const Type* type, @@ -745,18 +722,21 @@ void GetConditionalFunctions(TypeFactory* type_factory, InsertSimpleFunction( functions, options, "if", SCALAR, - {{ARG_TYPE_ANY_1, {bool_type, ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, FN_IF}}); + {{ARG_TYPE_ANY_1, {bool_type, ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, FN_IF}}, + FunctionOptions().set_may_suppress_side_effects(true)); // COALESCE(expr1, ..., exprN): returns the first non-null expression. // In particular, COALESCE is used to express the output of FULL JOIN. InsertSimpleFunction( functions, options, "coalesce", SCALAR, - {{ARG_TYPE_ANY_1, {{ARG_TYPE_ANY_1, REPEATED}}, FN_COALESCE}}); + {{ARG_TYPE_ANY_1, {{ARG_TYPE_ANY_1, REPEATED}}, FN_COALESCE}}, + FunctionOptions().set_may_suppress_side_effects(true)); // IFNULL(expr1, expr2): if expr1 is not null, returns expr1, else expr2 InsertSimpleFunction( functions, options, "ifnull", SCALAR, - {{ARG_TYPE_ANY_1, {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, FN_IFNULL}}); + {{ARG_TYPE_ANY_1, {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, FN_IFNULL}}, + FunctionOptions().set_may_suppress_side_effects(true)); bool uses_operation_collation_for_nullif = options.language_options.LanguageFeatureEnabled( @@ -832,8 +812,9 @@ void GetConditionalFunctions(TypeFactory* type_factory, FunctionSignatureOptions().set_uses_operation_collation()}}, FunctionOptions() .set_supports_safe_error_mode(false) + .set_may_suppress_side_effects(true) .set_sql_name("case") - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&CaseWithValueFunctionSQL) .set_pre_resolution_argument_constraint( absl::bind_front(&CheckFirstArgumentSupportsEquality, @@ -846,83 +827,43 @@ void GetConditionalFunctions(TypeFactory* type_factory, FN_CASE_NO_VALUE}}, FunctionOptions() .set_supports_safe_error_mode(false) + .set_may_suppress_side_effects(true) .set_sql_name("case") - .set_supported_signatures_callback(&EmptySupportedSignatures) + .set_hide_supported_signatures(true) .set_get_sql_callback(&CaseNoValueFunctionSQL) .set_no_matching_signature_callback( &NoMatchingSignatureForCaseNoValueFunction)); -} - -absl::Status GetDistanceFunctions( - TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, - NameToFunctionMap* functions) { - const Type* double_type = type_factory->get_double(); - - const Function::Mode SCALAR = Function::SCALAR; - - std::vector input_struct_fields_int64 = { - {"key", types::Int64Type()}, {"value", types::DoubleType()}}; - - const zetasql::StructType* struct_int64 = nullptr; - ZETASQL_RETURN_IF_ERROR( - type_factory->MakeStructType({input_struct_fields_int64}, &struct_int64)); - const ArrayType* array_struct_int64_key_type; - ZETASQL_RETURN_IF_ERROR( - type_factory->MakeArrayType(struct_int64, &array_struct_int64_key_type)); - - std::vector input_struct_fields_string = { - {"key", types::StringType()}, {"value", types::DoubleType()}}; - const zetasql::StructType* struct_string = nullptr; - ZETASQL_RETURN_IF_ERROR(type_factory->MakeStructType({input_struct_fields_string}, - &struct_string)); - const ArrayType* array_struct_string_key_type; - ZETASQL_RETURN_IF_ERROR(type_factory->MakeArrayType(struct_string, - &array_struct_string_key_type)); - zetasql::FunctionOptions function_options; - std::vector cosine_signatures = { - {double_type, - {types::DoubleArrayType(), types::DoubleArrayType()}, - FN_COSINE_DISTANCE_DENSE_DOUBLE}, - {double_type, - {array_struct_int64_key_type, array_struct_int64_key_type}, - FN_COSINE_DISTANCE_SPARSE_INT64}, - {double_type, - {array_struct_string_key_type, array_struct_string_key_type}, - FN_COSINE_DISTANCE_SPARSE_STRING}}; - - if (options.language_options.LanguageFeatureEnabled( - FEATURE_V_1_4_ENABLE_FLOAT_DISTANCE_FUNCTIONS)) { - cosine_signatures.push_back( - {double_type, - {types::FloatArrayType(), types::FloatArrayType()}, - FN_COSINE_DISTANCE_DENSE_FLOAT}); - } - InsertFunction(functions, options, "cosine_distance", SCALAR, - cosine_signatures, function_options); - - std::vector euclidean_signatures = { - {double_type, - {types::DoubleArrayType(), types::DoubleArrayType()}, - FN_EUCLIDEAN_DISTANCE_DENSE_DOUBLE}, - {double_type, - {array_struct_int64_key_type, array_struct_int64_key_type}, - FN_EUCLIDEAN_DISTANCE_SPARSE_INT64}, - {double_type, - {array_struct_string_key_type, array_struct_string_key_type}, - FN_EUCLIDEAN_DISTANCE_SPARSE_STRING}}; - - if (options.language_options.LanguageFeatureEnabled( - FEATURE_V_1_4_ENABLE_FLOAT_DISTANCE_FUNCTIONS)) { - euclidean_signatures.push_back( - {double_type, - {types::FloatArrayType(), types::FloatArrayType()}, - FN_EUCLIDEAN_DISTANCE_DENSE_FLOAT}); - } - - InsertFunction(functions, options, "euclidean_distance", SCALAR, - euclidean_signatures, function_options); - return absl::OkStatus(); + // Internal function $with_side_effects(expression ANY_1, payload BYTES). + // Enabled only when FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION is on. + // + // If the payload is not NULL, applies the side effect (e.g. raise the error + // described by the payload). Otherwise, returns the first argument. This is + // important for conditional evaluation and correct handling of side effects + // when an expression get split across scans. For example, in the query + // SELECT IF(a, b, SUM(c/d)) FROM t + // the division `c/d` should not cause the query to fail when `a` is + // true, even if d is zero, because it's in the false branch. This holds even + // as SUM(c/d) is separated from the larger IF() expression to be placed on an + // AggregateScan. + // + // The aforementioned LanguageFeature changes the resulting resolved AST to + // propagate deferred side-effect values, and to specify when and where + // exactly the deferred side effect is handled, using + // ResolvedDeferredComputedColumn and the internal function + // $with_side_effects(). + // + // See (broken link) + InsertFunction( + functions, options, "$with_side_effects", SCALAR, + {{ARG_TYPE_ANY_1, + {{ARG_TYPE_ANY_1}, {types::BytesType()}}, + FN_WITH_SIDE_EFFECTS, + FunctionSignatureOptions().set_is_internal(true)}}, + FunctionOptions() + .set_supports_safe_error_mode(false) + .add_required_language_feature( + LanguageFeature::FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION)); } void GetMiscellaneousFunctions(TypeFactory* type_factory, @@ -1209,63 +1150,69 @@ void GetSubscriptFunctions(TypeFactory* type_factory, function_signatures.push_back( {json_type, {json_type, string_type}, FN_JSON_SUBSCRIPT_STRING}); } - InsertFunction( - functions, options, "$subscript", Function::SCALAR, function_signatures, - FunctionOptions() - .set_supports_safe_error_mode(false) - .set_get_sql_callback(&SubscriptFunctionSQL) - .set_supported_signatures_callback(&EmptySupportedSignatures) - .set_no_matching_signature_callback( - absl::bind_front(&NoMatchingSignatureForSubscript, - /*offset_or_ordinal=*/""))); + InsertFunction(functions, options, "$subscript", Function::SCALAR, + function_signatures, + FunctionOptions() + .set_supports_safe_error_mode(false) + .set_get_sql_callback(&SubscriptFunctionSQL) + .set_hide_supported_signatures(true) + .set_no_matching_signature_callback( + absl::bind_front(&NoMatchingSignatureForSubscript, + /*offset_or_ordinal=*/""))); // Create functions with no signatures for other subscript functions // that have special handling in the analyzer. const std::vector empty_signatures; - InsertFunction( - functions, options, "$subscript_with_key", Function::SCALAR, - empty_signatures, - FunctionOptions() - .set_supports_safe_error_mode(true) - .set_get_sql_callback(&SubscriptWithKeyFunctionSQL) - .set_supported_signatures_callback(&EmptySupportedSignatures) - .set_no_matching_signature_callback( - absl::bind_front(&NoMatchingSignatureForSubscript, - /*offset_or_ordinal=*/"KEY"))); - InsertFunction( - functions, options, "$subscript_with_offset", Function::SCALAR, - empty_signatures, - FunctionOptions() - .set_supports_safe_error_mode(true) - .set_get_sql_callback(&SubscriptWithOffsetFunctionSQL) - .set_supported_signatures_callback(&EmptySupportedSignatures) - .set_no_matching_signature_callback( - absl::bind_front(&NoMatchingSignatureForSubscript, - /*offset_or_ordinal=*/"OFFSET"))); - InsertFunction( - functions, options, "$subscript_with_ordinal", Function::SCALAR, - empty_signatures, - FunctionOptions() - .set_supports_safe_error_mode(true) - .set_get_sql_callback(&SubscriptWithOrdinalFunctionSQL) - .set_supported_signatures_callback(&EmptySupportedSignatures) - .set_no_matching_signature_callback( - absl::bind_front(&NoMatchingSignatureForSubscript, - /*offset_or_ordinal=*/"ORDINAL"))); + InsertFunction(functions, options, "$subscript_with_key", Function::SCALAR, + empty_signatures, + FunctionOptions() + .set_supports_safe_error_mode(true) + .set_get_sql_callback(&SubscriptWithKeyFunctionSQL) + .set_hide_supported_signatures(true) + .set_no_matching_signature_callback( + absl::bind_front(&NoMatchingSignatureForSubscript, + /*offset_or_ordinal=*/"KEY"))); + InsertFunction(functions, options, "$subscript_with_offset", Function::SCALAR, + empty_signatures, + FunctionOptions() + .set_supports_safe_error_mode(true) + .set_get_sql_callback(&SubscriptWithOffsetFunctionSQL) + .set_hide_supported_signatures(true) + .set_no_matching_signature_callback( + absl::bind_front(&NoMatchingSignatureForSubscript, + /*offset_or_ordinal=*/"OFFSET"))); + InsertFunction(functions, options, "$subscript_with_ordinal", + Function::SCALAR, empty_signatures, + FunctionOptions() + .set_supports_safe_error_mode(true) + .set_get_sql_callback(&SubscriptWithOrdinalFunctionSQL) + .set_hide_supported_signatures(true) + .set_no_matching_signature_callback( + absl::bind_front(&NoMatchingSignatureForSubscript, + /*offset_or_ordinal=*/"ORDINAL"))); } void GetJSONFunctions(TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, NameToFunctionMap* functions) { + const Type* int32_type = types::Int32Type(); const Type* int64_type = types::Int64Type(); + const Type* uint32_type = types::Uint32Type(); + const Type* uint64_type = types::Uint64Type(); const Type* double_type = types::DoubleType(); + const Type* float_type = types::FloatType(); const Type* bool_type = type_factory->get_bool(); const Type* string_type = type_factory->get_string(); const Type* json_type = types::JsonType(); - const ArrayType* array_string_type; - ZETASQL_CHECK_OK(type_factory->MakeArrayType(string_type, &array_string_type)); - const ArrayType* array_json_type; - ZETASQL_CHECK_OK(type_factory->MakeArrayType(json_type, &array_json_type)); + const ArrayType* array_int32_type = types::Int32ArrayType(); + const ArrayType* array_int64_type = types::Int64ArrayType(); + const ArrayType* array_uint32_type = types::Uint32ArrayType(); + const ArrayType* array_uint64_type = types::Uint64ArrayType(); + const ArrayType* array_double_type = types::DoubleArrayType(); + const ArrayType* array_float_type = types::FloatArrayType(); + const ArrayType* array_bool_type = types::BoolArrayType(); + const ArrayType* array_string_type = types::StringArrayType(); + const ArrayType* array_json_type = types::JsonArrayType(); const Function::Mode SCALAR = Function::SCALAR; const FunctionArgumentType::ArgumentCardinality REPEATED = @@ -1425,6 +1372,158 @@ void GetJSONFunctions(TypeFactory* type_factory, InsertFunction(functions, options, "lax_string", SCALAR, {{string_type, {json_type}, FN_JSON_LAX_TO_STRING}}); } + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_JSON_ARRAY_VALUE_EXTRACTION_FUNCTIONS)) { + InsertFunction(functions, options, "bool_array", SCALAR, + {{array_bool_type, {json_type}, FN_JSON_TO_BOOL_ARRAY}}); + + InsertFunction( + functions, options, "float64_array", SCALAR, + {{array_double_type, + {json_type, + {string_type, + FunctionArgumentTypeOptions() + .set_cardinality(FunctionEnums::OPTIONAL) + .set_argument_name("wide_number_mode", kNamedOnly) + .set_default(Value::String("round"))}}, + FN_JSON_TO_FLOAT64_ARRAY}}, + zetasql::FunctionOptions().set_alias_name( + options.language_options.product_mode() == PRODUCT_INTERNAL + ? "double_array" + : "")); + + InsertFunction(functions, options, "int64_array", SCALAR, + {{array_int64_type, {json_type}, FN_JSON_TO_INT64_ARRAY}}); + + InsertFunction( + functions, options, "string_array", SCALAR, + {{array_string_type, {json_type}, FN_JSON_TO_STRING_ARRAY}}); + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_JSON_LAX_VALUE_EXTRACTION_FUNCTIONS)) { + InsertFunction( + functions, options, "lax_bool_array", SCALAR, + {{array_bool_type, {json_type}, FN_JSON_LAX_TO_BOOL_ARRAY}}); + InsertFunction( + functions, options, "lax_float64_array", SCALAR, + {{array_double_type, {json_type}, FN_JSON_LAX_TO_FLOAT64_ARRAY}}, + zetasql::FunctionOptions().set_alias_name( + options.language_options.product_mode() == PRODUCT_INTERNAL + ? "lax_double_array" + : "")); + InsertFunction( + functions, options, "lax_int64_array", SCALAR, + {{array_int64_type, {json_type}, FN_JSON_LAX_TO_INT64_ARRAY}}); + InsertFunction( + functions, options, "lax_string_array", SCALAR, + {{array_string_type, {json_type}, FN_JSON_LAX_TO_STRING_ARRAY}}); + } + } + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_JSON_MORE_VALUE_EXTRACTION_FUNCTIONS)) { + InsertFunction(functions, options, "int32", SCALAR, + {{int32_type, {json_type}, FN_JSON_TO_INT32}}); + InsertFunction(functions, options, "uint32", SCALAR, + {{uint32_type, {json_type}, FN_JSON_TO_UINT32}}); + InsertFunction(functions, options, "uint64", SCALAR, + {{uint64_type, {json_type}, FN_JSON_TO_UINT64}}); + + if (!options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_DISABLE_FLOAT32)) { + InsertFunction( + functions, options, "float32", SCALAR, + {{float_type, + {json_type, + {string_type, + FunctionArgumentTypeOptions() + .set_cardinality(FunctionEnums::OPTIONAL) + .set_argument_name("wide_number_mode", kNamedOnly) + .set_default(Value::String("round"))}}, + FN_JSON_TO_FLOAT32}}, + zetasql::FunctionOptions().set_alias_name( + options.language_options.product_mode() == PRODUCT_INTERNAL + ? "float" + : "")); + } + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_JSON_LAX_VALUE_EXTRACTION_FUNCTIONS)) { + InsertFunction(functions, options, "lax_int32", SCALAR, + {{int32_type, {json_type}, FN_JSON_LAX_TO_INT32}}); + InsertFunction(functions, options, "lax_uint32", SCALAR, + {{uint32_type, {json_type}, FN_JSON_LAX_TO_UINT32}}); + InsertFunction(functions, options, "lax_uint64", SCALAR, + {{uint64_type, {json_type}, FN_JSON_LAX_TO_UINT64}}); + + if (!options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_DISABLE_FLOAT32)) { + InsertFunction( + functions, options, "lax_float32", SCALAR, + {{float_type, {json_type}, FN_JSON_LAX_TO_FLOAT32}}, + zetasql::FunctionOptions().set_alias_name( + options.language_options.product_mode() == PRODUCT_INTERNAL + ? "lax_float" + : "")); + } + } + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_JSON_ARRAY_VALUE_EXTRACTION_FUNCTIONS)) { + InsertFunction( + functions, options, "int32_array", SCALAR, + {{array_int32_type, {json_type}, FN_JSON_TO_INT32_ARRAY}}); + InsertFunction( + functions, options, "uint32_array", SCALAR, + {{array_uint32_type, {json_type}, FN_JSON_TO_UINT32_ARRAY}}); + InsertFunction( + functions, options, "uint64_array", SCALAR, + {{array_uint64_type, {json_type}, FN_JSON_TO_UINT64_ARRAY}}); + + if (!options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_DISABLE_FLOAT32)) { + InsertFunction( + functions, options, "float32_array", SCALAR, + {{array_float_type, + {json_type, + {string_type, + FunctionArgumentTypeOptions() + .set_cardinality(FunctionEnums::OPTIONAL) + .set_argument_name("wide_number_mode", kNamedOnly) + .set_default(Value::String("round"))}}, + FN_JSON_TO_FLOAT32_ARRAY}}, + zetasql::FunctionOptions().set_alias_name( + options.language_options.product_mode() == PRODUCT_INTERNAL + ? "float_array" + : "")); + } + + if (options.language_options.LanguageFeatureEnabled( + FEATURE_JSON_LAX_VALUE_EXTRACTION_FUNCTIONS)) { + InsertFunction( + functions, options, "lax_int32_array", SCALAR, + {{array_int32_type, {json_type}, FN_JSON_LAX_TO_INT32_ARRAY}}); + InsertFunction( + functions, options, "lax_uint32_array", SCALAR, + {{array_uint32_type, {json_type}, FN_JSON_LAX_TO_UINT32_ARRAY}}); + InsertFunction( + functions, options, "lax_uint64_array", SCALAR, + {{array_uint64_type, {json_type}, FN_JSON_LAX_TO_UINT64_ARRAY}}); + + if (!options.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_DISABLE_FLOAT32)) { + InsertFunction( + functions, options, "lax_float32_array", SCALAR, + {{array_float_type, {json_type}, FN_JSON_LAX_TO_FLOAT32_ARRAY}}, + zetasql::FunctionOptions().set_alias_name( + options.language_options.product_mode() == PRODUCT_INTERNAL + ? "lax_float_array" + : "")); + } + } + } + } } InsertFunction( @@ -2865,6 +2964,11 @@ void GetGeographyFunctions(TypeFactory* type_factory, functions, options, "st_interiorrings", SCALAR, {{geography_array_type, {geography_type}, FN_ST_INTERIORRINGS}}, geography_required); + InsertFunction(functions, options, "st_lineinterpolatepoint", SCALAR, + {{geography_type, + {geography_type, double_type}, + FN_ST_LINE_INTERPOLATE_POINT}}, + geography_required); InsertFunction(functions, options, "st_linesubstring", SCALAR, {{geography_type, {geography_type, double_type, double_type}, @@ -3039,18 +3143,22 @@ void GetGeographyFunctions(TypeFactory* type_factory, FunctionSignatureOptions().add_required_language_feature( FEATURE_V_1_3_EXTENDED_GEOGRAPHY_PARSERS); + FunctionArgumentType oriented_argument_type{ + bool_type, const_with_mandatory_name_and_default_value( + "oriented", Value::Bool(false))}; + FunctionArgumentType planar_argument_type{ + bool_type, const_with_mandatory_name_and_default_value( + "planar", Value::Bool(false))}; + FunctionArgumentType make_valid_argument_type{ + bool_type, const_with_mandatory_name_and_default_value( + "make_valid", Value::Bool(false))}; InsertFunction(functions, options, "st_geogfromtext", SCALAR, {{geography_type, {string_type, {bool_type, optional_const_arg_options}}, FN_ST_GEOG_FROM_TEXT}, {geography_type, - {string_type, - {bool_type, const_with_mandatory_name_and_default_value( - "oriented", Value::Bool(false))}, - {bool_type, const_with_mandatory_name_and_default_value( - "planar", Value::Bool(false))}, - {bool_type, const_with_mandatory_name_and_default_value( - "make_valid", Value::Bool(false))}}, + {string_type, oriented_argument_type, planar_argument_type, + make_valid_argument_type}, FN_ST_GEOG_FROM_TEXT_EXT, extended_parser_signatures}}, geography_required); @@ -3060,17 +3168,29 @@ void GetGeographyFunctions(TypeFactory* type_factory, InsertFunction(functions, options, "st_geogfromgeojson", SCALAR, {{geography_type, {string_type}, FN_ST_GEOG_FROM_GEO_JSON}, {geography_type, - {string_type, - {bool_type, const_with_mandatory_name_and_default_value( - "make_valid", Value::Bool(false))}}, + {string_type, make_valid_argument_type}, FN_ST_GEOG_FROM_GEO_JSON_EXT, extended_parser_signatures}}, geography_required); InsertFunction(functions, options, "st_geogfromwkb", SCALAR, - {{geography_type, {bytes_type}, FN_ST_GEOG_FROM_WKB}, + {// st_geogfromwkb(bytes) + {geography_type, {bytes_type}, FN_ST_GEOG_FROM_WKB}, + // st_geogfromwkb(string) {geography_type, {string_type}, FN_ST_GEOG_FROM_WKB_HEX, + extended_parser_signatures}, + // st_geogfromwkb(bytes, oriented, planar, make_valid) + {geography_type, + {bytes_type, oriented_argument_type, planar_argument_type, + make_valid_argument_type}, + FN_ST_GEOG_FROM_WKB_EXT, + extended_parser_signatures}, + // st_geogfromwkb(string, oriented, planar, make_valid) + {geography_type, + {string_type, oriented_argument_type, planar_argument_type, + make_valid_argument_type}, + FN_ST_GEOG_FROM_WKB_HEX_EXT, extended_parser_signatures}}, geography_required); InsertSimpleFunction( @@ -3164,974 +3284,6 @@ void GetGeographyFunctions(TypeFactory* type_factory, geography_required_analytic); } -void GetAnonFunctions(TypeFactory* type_factory, - const ZetaSQLBuiltinFunctionOptions& options, - NameToFunctionMap* functions) { - const Type* int64_type = type_factory->get_int64(); - const Type* uint64_type = type_factory->get_uint64(); - const Type* double_type = type_factory->get_double(); - const Type* numeric_type = type_factory->get_numeric(); - const Type* double_array_type = types::DoubleArrayType(); - const Type* json_type = types::JsonType(); - const Type* anon_output_with_report_proto_type = nullptr; - ZETASQL_CHECK_OK( - type_factory->MakeProtoType(zetasql::AnonOutputWithReport::descriptor(), - &anon_output_with_report_proto_type)); - const FunctionArgumentType::ArgumentCardinality OPTIONAL = - FunctionArgumentType::OPTIONAL; - - FunctionSignatureOptions has_numeric_type_argument; - has_numeric_type_argument.set_constraints(&CheckHasNumericTypeArgument); - - FunctionOptions anon_options = - FunctionOptions() - .set_supports_over_clause(false) - .set_supports_distinct_modifier(false) - .set_supports_having_modifier(false) - .set_supports_clamped_between_modifier(true) - .set_volatility(FunctionEnums::VOLATILE); - - const FunctionArgumentTypeOptions optional_const_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_must_be_non_null() - .set_cardinality(OPTIONAL); - // TODO: Replace these required CLAMPED BETWEEN arguments with - // optional_const_arg_options once Quantiles can support automatic/implicit - // bounds. - const FunctionArgumentTypeOptions required_const_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_must_be_non_null(); - const FunctionArgumentTypeOptions percentile_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_must_be_non_null() - .set_min_value(0) - .set_max_value(1); - const FunctionArgumentTypeOptions quantiles_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_must_be_non_null() - .set_min_value(1); - - // TODO: Fix this HACK - the CLAMPED BETWEEN lower and upper bounds - // are optional, as are the privacy_budget weight and uid. However, - // the syntax and spec allows privacy_budget_weight (and uid) to be specified - // but upper/lower bound to be unspecified, but that is not possible to - // represent in a ZetaSQL FunctionSignature. In the short term, the - // resolver will guarantee that the privacy_budget_weight and uid are not - // specified if the CLAMP are not, but longer term we must remove the - // privacy_budget_weight and uid arguments as per the updated ZetaSQL - // privacy language spec. - InsertCreatedFunction( - functions, options, - new AnonFunction( - "anon_count", Function::kZetaSQLFunctionGroupName, - {{int64_type, - {/*expr=*/ARG_TYPE_ANY_2, - /*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_COUNT}}, - anon_options, "count")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "anon_sum", Function::kZetaSQLFunctionGroupName, - {{int64_type, - {/*expr=*/int64_type, - /*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_SUM_INT64}, - {uint64_type, - {/*expr=*/uint64_type, - /*lower_bound=*/{uint64_type, optional_const_arg_options}, - /*upper_bound=*/{uint64_type, optional_const_arg_options}}, - FN_ANON_SUM_UINT64}, - {double_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_SUM_DOUBLE}, - {numeric_type, - {/*expr=*/numeric_type, - /*lower_bound=*/{numeric_type, optional_const_arg_options}, - /*upper_bound=*/{numeric_type, optional_const_arg_options}}, - FN_ANON_SUM_NUMERIC, - has_numeric_type_argument}}, - anon_options, "sum")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "anon_avg", Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_AVG_DOUBLE}, - {numeric_type, - {/*expr=*/numeric_type, - /*lower_bound=*/{numeric_type, optional_const_arg_options}, - /*upper_bound=*/{numeric_type, optional_const_arg_options}}, - FN_ANON_AVG_NUMERIC, - has_numeric_type_argument}}, - anon_options, "avg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_count_star", Function::kZetaSQLFunctionGroupName, - {{int64_type, - {/*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_COUNT_STAR}}, - anon_options.Copy() - .set_sql_name("anon_count(*)") - .set_get_sql_callback(&AnonCountStarFunctionSQL) - .set_supported_signatures_callback( - absl::bind_front(&SupportedSignaturesForAnonCountStarFunction, - /*unused_function_name=*/"")) - .set_bad_argument_error_prefix_callback( - &AnonCountStarBadArgumentErrorPrefix), - // TODO: internal function names shouldn't be resolvable, - // an alternative way to look up COUNT(*) will be needed to fix the - // linked bug. - "$count_star")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "anon_var_pop", Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_VAR_POP_DOUBLE}, - {double_type, - {/*expr=*/double_array_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_VAR_POP_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}}, - anon_options, "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "anon_stddev_pop", Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_STDDEV_POP_DOUBLE}, - {double_type, - {/*expr=*/double_array_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_STDDEV_POP_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}}, - anon_options, "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "anon_percentile_cont", Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*percentile=*/{double_type, percentile_arg_options}, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_PERCENTILE_CONT_DOUBLE}, - // This is an internal signature that is only used post-anon-rewrite, - // and is not available in the external SQL language. - {double_type, - {/*expr=*/double_array_type, - /*percentile=*/{double_type, percentile_arg_options}, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_PERCENTILE_CONT_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}}, - anon_options, "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "anon_quantiles", Function::kZetaSQLFunctionGroupName, - {{double_array_type, - {/*expr=*/double_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*lower_bound=*/{double_type, required_const_arg_options}, - /*upper_bound=*/{double_type, required_const_arg_options}}, - FN_ANON_QUANTILES_DOUBLE}, - // This is an internal signature that is only used post-anon-rewrite, - // and is not available in the external SQL language. - {double_array_type, - {/*expr=*/double_array_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*lower_bound=*/{double_type, required_const_arg_options}, - /*upper_bound=*/{double_type, required_const_arg_options}}, - FN_ANON_QUANTILES_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}}, - anon_options, "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_quantiles_with_report_json", - Function::kZetaSQLFunctionGroupName, - {{json_type, - {/*expr=*/double_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*lower_bound=*/{double_type, required_const_arg_options}, - /*upper_bound=*/{double_type, required_const_arg_options}}, - FN_ANON_QUANTILES_DOUBLE_WITH_REPORT_JSON}, - // This is an internal signature that is only used post-anon-rewrite, - // and is not available in the external SQL language. - {json_type, - {/*expr=*/double_array_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*lower_bound=*/{double_type, required_const_arg_options}, - /*upper_bound=*/{double_type, required_const_arg_options}}, - FN_ANON_QUANTILES_DOUBLE_ARRAY_WITH_REPORT_JSON, - FunctionSignatureOptions().set_is_internal(true)}}, - anon_options.Copy() - .set_sql_name("anon_quantiles") - .set_get_sql_callback(&AnonQuantilesWithReportJsonFunctionSQL) - .set_supported_signatures_callback(absl::bind_front( - &SupportedSignaturesForAnonQuantilesWithReportFunction, - /*report_format=*/"JSON")), - "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_quantiles_with_report_proto", - Function::kZetaSQLFunctionGroupName, - {{anon_output_with_report_proto_type, - {/*expr=*/double_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*lower_bound=*/{double_type, required_const_arg_options}, - /*upper_bound=*/{double_type, required_const_arg_options}}, - FN_ANON_QUANTILES_DOUBLE_WITH_REPORT_PROTO}, - // This is an internal signature that is only used post-anon-rewrite, - // and is not available in the external SQL language. - {anon_output_with_report_proto_type, - {/*expr=*/double_array_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*lower_bound=*/{double_type, required_const_arg_options}, - /*upper_bound=*/{double_type, required_const_arg_options}}, - FN_ANON_QUANTILES_DOUBLE_ARRAY_WITH_REPORT_PROTO, - FunctionSignatureOptions().set_is_internal(true)}}, - anon_options.Copy() - .set_sql_name("anon_quantiles") - .set_get_sql_callback(&AnonQuantilesWithReportProtoFunctionSQL) - .set_supported_signatures_callback(absl::bind_front( - &SupportedSignaturesForAnonQuantilesWithReportFunction, - /*report_format=*/"PROTO")), - "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_count_with_report_json", Function::kZetaSQLFunctionGroupName, - {{json_type, - {/*expr=*/ARG_TYPE_ANY_2, - /*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_COUNT_WITH_REPORT_JSON}}, - anon_options.Copy() - .set_sql_name("anon_count") - .set_get_sql_callback(&AnonCountWithReportJsonFunctionSQL), - "count")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_count_with_report_proto", - Function::kZetaSQLFunctionGroupName, - {{anon_output_with_report_proto_type, - {/*expr=*/ARG_TYPE_ANY_2, - /*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_COUNT_WITH_REPORT_PROTO}}, - anon_options.Copy() - .set_sql_name("anon_count") - .set_get_sql_callback(&AnonCountWithReportProtoFunctionSQL), - "count")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_count_star_with_report_json", - Function::kZetaSQLFunctionGroupName, - {{json_type, - {/*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_COUNT_STAR_WITH_REPORT_JSON}}, - anon_options.Copy() - .set_sql_name("anon_count(*)") - .set_get_sql_callback(&AnonCountStarWithReportJsonFunctionSQL) - .set_supported_signatures_callback(absl::bind_front( - &SupportedSignaturesForAnonCountStarWithReportFunction, - /*report_format=*/"JSON")) - .set_bad_argument_error_prefix_callback( - &AnonCountStarBadArgumentErrorPrefix), - "$count_star")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_count_star_with_report_proto", - Function::kZetaSQLFunctionGroupName, - {{anon_output_with_report_proto_type, - {/*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_COUNT_STAR_WITH_REPORT_PROTO}}, - anon_options.Copy() - .set_sql_name("anon_count(*)") - .set_get_sql_callback(&AnonCountStarWithReportProtoFunctionSQL) - .set_supported_signatures_callback(absl::bind_front( - &SupportedSignaturesForAnonCountStarWithReportFunction, - /*report_format=*/"PROTO")) - .set_bad_argument_error_prefix_callback( - &AnonCountStarBadArgumentErrorPrefix), - "$count_star")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_sum_with_report_json", Function::kZetaSQLFunctionGroupName, - {{json_type, - {/*expr=*/int64_type, - /*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_SUM_WITH_REPORT_JSON_INT64}, - {json_type, - {/*expr=*/uint64_type, - /*lower_bound=*/{uint64_type, optional_const_arg_options}, - /*upper_bound=*/{uint64_type, optional_const_arg_options}}, - FN_ANON_SUM_WITH_REPORT_JSON_UINT64}, - {json_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_SUM_WITH_REPORT_JSON_DOUBLE}}, - anon_options.Copy() - .set_sql_name("anon_sum") - .set_get_sql_callback(&AnonSumWithReportJsonFunctionSQL), - "sum")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_sum_with_report_proto", Function::kZetaSQLFunctionGroupName, - {{anon_output_with_report_proto_type, - {/*expr=*/int64_type, - /*lower_bound=*/{int64_type, optional_const_arg_options}, - /*upper_bound=*/{int64_type, optional_const_arg_options}}, - FN_ANON_SUM_WITH_REPORT_PROTO_INT64}, - {anon_output_with_report_proto_type, - {/*expr=*/uint64_type, - /*lower_bound=*/{uint64_type, optional_const_arg_options}, - /*upper_bound=*/{uint64_type, optional_const_arg_options}}, - FN_ANON_SUM_WITH_REPORT_PROTO_UINT64}, - {anon_output_with_report_proto_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_SUM_WITH_REPORT_PROTO_DOUBLE}}, - anon_options.Copy() - .set_sql_name("anon_sum") - .set_get_sql_callback(&AnonSumWithReportProtoFunctionSQL), - "sum")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_avg_with_report_json", Function::kZetaSQLFunctionGroupName, - {{json_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_AVG_DOUBLE_WITH_REPORT_JSON}}, - anon_options.Copy() - .set_sql_name("anon_avg") - .set_get_sql_callback(&AnonAvgWithReportJsonFunctionSQL), - "avg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$anon_avg_with_report_proto", Function::kZetaSQLFunctionGroupName, - {{anon_output_with_report_proto_type, - {/*expr=*/double_type, - /*lower_bound=*/{double_type, optional_const_arg_options}, - /*upper_bound=*/{double_type, optional_const_arg_options}}, - FN_ANON_AVG_DOUBLE_WITH_REPORT_PROTO}}, - anon_options.Copy() - .set_sql_name("anon_avg") - .set_get_sql_callback(&AnonAvgWithReportProtoFunctionSQL), - "avg")); -} - -absl::Status GetDifferentialPrivacyFunctions( - TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, - NameToFunctionMap* functions, NameToTypeMap* types) { - const Type* int64_type = type_factory->get_int64(); - const Type* uint64_type = type_factory->get_uint64(); - const Type* double_type = type_factory->get_double(); - const Type* numeric_type = type_factory->get_numeric(); - const Type* double_array_type = types::DoubleArrayType(); - const Type* json_type = types::JsonType(); - const Type* report_proto_type = nullptr; - ZETASQL_RETURN_IF_ERROR(type_factory->MakeProtoType( - functions::DifferentialPrivacyOutputWithReport::descriptor(), - &report_proto_type)); - const Type* report_format_type = - types::DifferentialPrivacyReportFormatEnumType(); - - ZETASQL_RETURN_IF_ERROR(InsertType(types, options, report_format_type)); - // Creates a pair of same types for contribution bounds. First field is lower - // bound and second is upper bound. Struct field name is omitted intentionally - // because the user syntax for struct constructor does not allow "AS - // field_name" and the struct is not usable in actual query. - auto make_pair_type = - [&type_factory](const Type* t) -> absl::StatusOr { - const std::vector pair_fields{{"", t}, {"", t}}; - const Type* pair_type = nullptr; - ZETASQL_RETURN_IF_ERROR(type_factory->MakeStructType(pair_fields, &pair_type)); - return pair_type; - }; - - ZETASQL_ASSIGN_OR_RETURN(const Type* int64_pair_type, make_pair_type(int64_type)); - ZETASQL_ASSIGN_OR_RETURN(const Type* uint64_pair_type, make_pair_type(uint64_type)); - ZETASQL_ASSIGN_OR_RETURN(const Type* double_pair_type, make_pair_type(double_type)); - ZETASQL_ASSIGN_OR_RETURN(const Type* numeric_pair_type, make_pair_type(numeric_type)); - - FunctionSignatureOptions has_numeric_type_argument; - has_numeric_type_argument.set_constraints(&CheckHasNumericTypeArgument); - - auto no_matching_signature_callback = - [](absl::string_view qualified_function_name, - const std::vector& arguments, - ProductMode product_mode) { - return absl::StrCat( - "No matching signature for ", qualified_function_name, - " in SELECT WITH DIFFERENTIAL_PRIVACY context", - (arguments.empty() - ? " with no arguments" - : absl::StrCat(" for argument types: ", - InputArgumentType::ArgumentsToString( - arguments, product_mode)))); - }; - - const FunctionOptions dp_options = - FunctionOptions() - .set_supports_over_clause(false) - .set_supports_distinct_modifier(false) - .set_supports_having_modifier(false) - .set_volatility(FunctionEnums::VOLATILE) - .set_no_matching_signature_callback(no_matching_signature_callback) - .add_required_language_feature(FEATURE_DIFFERENTIAL_PRIVACY); - - const FunctionArgumentTypeOptions percentile_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_must_be_non_null() - .set_min_value(0) - .set_max_value(1); - - const FunctionArgumentTypeOptions quantiles_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_must_be_non_null() - .set_min_value(1) - .set_cardinality(FunctionEnums::REQUIRED); - - const FunctionArgumentTypeOptions - optional_contribution_bounds_per_group_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_argument_name("contribution_bounds_per_group", - FunctionEnums::NAMED_ONLY) - .set_cardinality(FunctionEnums::OPTIONAL); - - const FunctionArgumentTypeOptions - optional_contribution_bounds_per_row_arg_options = - FunctionArgumentTypeOptions() - .set_must_be_constant() - .set_argument_name("contribution_bounds_per_row", - FunctionEnums::NAMED_ONLY) - .set_cardinality(FunctionEnums::OPTIONAL); - - const FunctionArgumentTypeOptions - required_contribution_bounds_per_row_arg_options = - FunctionArgumentTypeOptions( - optional_contribution_bounds_per_row_arg_options) - .set_cardinality(FunctionEnums::REQUIRED); - - const FunctionArgumentTypeOptions report_arg_options = - FunctionArgumentTypeOptions().set_must_be_constant().set_argument_name( - "report_format", FunctionEnums::NAMED_ONLY); - // Creates a signature for DP function returning a report. This signature - // will only be matched if the argument at the 0-indexed - // `report_arg_position` has constant value that is equal to - // `report_format`. - auto get_dp_report_signature = - [](functions::DifferentialPrivacyEnums::ReportFormat report_format, - int report_arg_position) { - auto dp_report_constraint = - [report_arg_position, report_format]( - const FunctionSignature& concrete_signature, - const std::vector& arguments) - -> std::string { - if (arguments.size() <= report_arg_position) { - return absl::StrCat("at most ", report_arg_position, - " argument(s) can be provided"); - } - const Value* value = - arguments.at(report_arg_position).literal_value(); - if (value == nullptr || !value->is_valid()) { - return absl::StrCat("literal value is required at ", - report_arg_position + 1); - } - const Value expected_value = Value::Enum( - types::DifferentialPrivacyReportFormatEnumType(), report_format); - // If we encounter string we have to create enum type out of it to - // be able to compare against expected enum value. - if (value->type()->IsString()) { - auto enum_value = - Value::Enum(types::DifferentialPrivacyReportFormatEnumType(), - value->string_value()); - if (!enum_value.is_valid()) { - return absl::StrCat("Invalid enum value: ", - value->string_value()); - } - if (enum_value.Equals(expected_value)) { - return ""; - } - return absl::StrCat( - "Found: ", enum_value.EnumDisplayName(), - " expecting: ", expected_value.EnumDisplayName()); - } - if (value->Equals(expected_value)) { - return std::string(""); - } - return absl::StrCat("Found: ", value->EnumDisplayName(), - " expecting: ", expected_value.EnumDisplayName()); - }; - return FunctionSignatureOptions() - .set_constraints(dp_report_constraint) - .add_required_language_feature( - FEATURE_DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS); - }; - - // TODO: internal function names shouldn't be resolvable, - // an alternative way to look up COUNT(*) will be needed to fix the - // linked bug. - - auto get_sql_callback_for_function = [](std::string_view user_facing_name) { - return [user_facing_name](const std::vector& inputs) { - return absl::StrCat(user_facing_name, "(", absl::StrJoin(inputs, ", "), - ")"); - }; - }; - auto supported_signatures_function = - [](const LanguageOptions& language_options, const Function& function) { - std::string supported_signatures; - for (const FunctionSignature& signature : function.signatures()) { - if (signature.IsDeprecated() || signature.IsInternal() || - signature.HasUnsupportedType(language_options) || - !signature.options().check_all_required_features_are_enabled( - language_options.GetEnabledLanguageFeatures())) { - continue; - } - if (!supported_signatures.empty()) { - absl::StrAppend(&supported_signatures, "; "); - } - std::vector argument_texts; - for (const FunctionArgumentType& argument : signature.arguments()) { - if (!argument.has_argument_name() || - argument.argument_name() != "report_format") { - argument_texts.push_back(argument.UserFacingNameWithCardinality( - language_options.product_mode(), - FunctionArgumentType::NamePrintingStyle::kIfNamedOnly, - /*print_template_details=*/true)); - } else { - const std::string report_suffix = - signature.result_type().type()->IsJsonType() - ? "/*required_value=\"JSON\"*/" - : (signature.result_type().type()->IsProto() - ? "/*required_value=\"PROTO\"*/" - : ""); - argument_texts.push_back(absl::StrCat( - argument.UserFacingNameWithCardinality( - language_options.product_mode(), - FunctionArgumentType::NamePrintingStyle::kIfNamedOnly, - /*print_template_details=*/true), - report_suffix)); - } - } - absl::StrAppend(&supported_signatures, - function.GetSQL(argument_texts)); - } - return supported_signatures; - }; - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_count", Function::kZetaSQLFunctionGroupName, - {{int64_type, - {/*expr=*/ARG_TYPE_ANY_2, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_COUNT}, - {json_type, - {/*expr=*/ARG_TYPE_ANY_2, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_COUNT_REPORT_JSON, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 1)}, - {report_proto_type, - {/*expr=*/ARG_TYPE_ANY_2, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_COUNT_REPORT_PROTO, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 1)}}, - dp_options.Copy() - .set_get_sql_callback(get_sql_callback_for_function("COUNT")) - .set_supported_signatures_callback(supported_signatures_function) - .set_sql_name("count"), - "count")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_count_star", - Function::kZetaSQLFunctionGroupName, - {{int64_type, - {/*contribution_bounds_per_group=*/{ - int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_COUNT_STAR}, - {json_type, - {/*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_COUNT_STAR_REPORT_JSON, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 0)}, - {report_proto_type, - {/*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_COUNT_STAR_REPORT_PROTO, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 0)}}, - dp_options.Copy() - .set_get_sql_callback(&DPCountStarSQL) - .set_supported_signatures_callback( - &SupportedSignaturesForDPCountStar) - .set_sql_name("count(*)"), - "$count_star")); - - std::vector args; - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_sum", Function::kZetaSQLFunctionGroupName, - {{int64_type, - {/*expr=*/int64_type, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_INT64}, - {uint64_type, - {/*expr=*/uint64_type, - /*contribution_bounds_per_group=*/ - {uint64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_UINT64}, - {double_type, - {/*expr=*/double_type, - /*contribution_bounds_per_group=*/ - {double_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_DOUBLE}, - {numeric_type, - {/*expr=*/numeric_type, - /*contribution_bounds_per_group=*/ - {numeric_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_NUMERIC, - has_numeric_type_argument}, - {json_type, - {/*expr=*/int64_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_JSON_INT64, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 1)}, - {json_type, - {/*expr=*/double_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {double_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_JSON_DOUBLE, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 1)}, - {json_type, - {/*expr=*/uint64_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {uint64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_JSON_UINT64, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 1)}, - {report_proto_type, - {/*expr=*/int64_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {int64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_PROTO_INT64, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 1)}, - {report_proto_type, - {/*expr=*/double_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {double_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_PROTO_DOUBLE, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 1)}, - {report_proto_type, - {/*expr=*/uint64_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {uint64_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_SUM_REPORT_PROTO_UINT64, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 1)}}, - dp_options.Copy() - .set_get_sql_callback(get_sql_callback_for_function("SUM")) - .set_supported_signatures_callback(supported_signatures_function) - .set_sql_name("sum"), - "sum")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_avg", Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*contribution_bounds_per_group=*/ - {double_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_AVG_DOUBLE}, - {numeric_type, - {/*expr=*/numeric_type, - /*contribution_bounds_per_group=*/ - {numeric_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_AVG_NUMERIC, - has_numeric_type_argument}, - {json_type, - {/*expr=*/double_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {double_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_AVG_DOUBLE_REPORT_JSON, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 1)}, - {report_proto_type, - {/*expr=*/double_type, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_group=*/ - {double_pair_type, - optional_contribution_bounds_per_group_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_AVG_DOUBLE_REPORT_PROTO, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 1)}}, - dp_options.Copy() - .set_get_sql_callback(get_sql_callback_for_function("AVG")) - .set_supported_signatures_callback(supported_signatures_function) - .set_sql_name("avg"), - "avg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_var_pop", - Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*contribution_bounds_per_row=*/ - {double_pair_type, - optional_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_VAR_POP_DOUBLE}, - {double_type, - {/*expr=*/double_array_type, - /*contribution_bounds_per_row=*/ - {double_pair_type, - optional_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_VAR_POP_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}}, - dp_options.Copy() - .set_get_sql_callback(get_sql_callback_for_function("VAR_POP")) - .set_supported_signatures_callback(supported_signatures_function) - .set_sql_name("var_pop"), - "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_stddev_pop", - Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*contribution_bounds_per_row=*/ - {double_pair_type, - optional_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_STDDEV_POP_DOUBLE}, - {double_type, - {/*expr=*/double_array_type, - /*contribution_bounds_per_row=*/ - {double_pair_type, - optional_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_STDDEV_POP_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}}, - dp_options.Copy() - .set_get_sql_callback(get_sql_callback_for_function("STDDEV_POP")) - .set_supported_signatures_callback(supported_signatures_function) - .set_sql_name("stddev_pop"), - "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_percentile_cont", - Function::kZetaSQLFunctionGroupName, - {{double_type, - {/*expr=*/double_type, - /*percentile=*/{double_type, percentile_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - optional_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_PERCENTILE_CONT_DOUBLE}, - // This is an internal signature that is only used post-dp-rewrite, - // and is not available in the external SQL language. - {double_type, - {/*expr=*/double_array_type, - /*percentile=*/{double_type, percentile_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - optional_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_PERCENTILE_CONT_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}}, - dp_options.Copy() - .set_get_sql_callback( - get_sql_callback_for_function("PERCENTILE_CONT")) - .set_supported_signatures_callback(supported_signatures_function) - .set_sql_name("percentile_cont"), - "array_agg")); - - InsertCreatedFunction( - functions, options, - new AnonFunction( - "$differential_privacy_approx_quantiles", - Function::kZetaSQLFunctionGroupName, - {{double_array_type, - {/*expr=*/double_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - required_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE}, - // This is an internal signature that is only used post-dp-rewrite, - // and is not available in the external SQL language. - {double_array_type, - {/*expr=*/double_array_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - required_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_ARRAY, - FunctionSignatureOptions().set_is_internal(true)}, - {json_type, - {/*expr=*/double_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - required_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_REPORT_JSON, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 2)}, - // This is an internal signature that is only used post-dp-rewrite, - // and is not available in the external SQL language. - {json_type, - {/*expr=*/double_array_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - required_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_ARRAY_REPORT_JSON, - get_dp_report_signature(functions::DifferentialPrivacyEnums::JSON, - 2) - .set_is_internal(true)}, - {report_proto_type, - {/*expr=*/double_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - required_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_REPORT_PROTO, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 2)}, - // This is an internal signature that is only used post-dp-rewrite, - // and is not available in the external SQL language. - {report_proto_type, - {/*expr=*/double_array_type, - /*quantiles=*/{int64_type, quantiles_arg_options}, - /*report_format=*/{report_format_type, report_arg_options}, - /*contribution_bounds_per_row=*/ - {double_pair_type, - required_contribution_bounds_per_row_arg_options}}, - FN_DIFFERENTIAL_PRIVACY_QUANTILES_DOUBLE_ARRAY_REPORT_PROTO, - get_dp_report_signature(functions::DifferentialPrivacyEnums::PROTO, - 2) - .set_is_internal(true)}}, - dp_options.Copy() - .set_get_sql_callback( - get_sql_callback_for_function("APPROX_QUANTILES")) - .set_supported_signatures_callback(supported_signatures_function) - .set_sql_name("approx_quantiles"), - "array_agg")); - return absl::OkStatus(); -} - void GetTypeOfFunction(TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, NameToFunctionMap* functions) { @@ -4160,4 +3312,73 @@ void GetFilterFieldsFunction(TypeFactory* type_factory, } } +void GetElementWiseAggregationFunctions( + TypeFactory* type_factory, const ZetaSQLBuiltinFunctionOptions& options, + NameToFunctionMap* functions) { + // The signatures here match the coercion rules for the SUM aggregate + // function. In particular, + // INT32 -> INT64, UINT32 -> UINT64, and FLOAT -> DOUBLE. + std::vector elementwise_sum_signatures = { + {types::Int64ArrayType(), + {types::Int32ArrayType()}, + FN_ELEMENTWISE_SUM_INT32}, + {types::Int64ArrayType(), + {types::Int64ArrayType()}, + FN_ELEMENTWISE_SUM_INT64}, + {types::Uint64ArrayType(), + {types::Uint32ArrayType()}, + FN_ELEMENTWISE_SUM_UINT32}, + {types::Uint64ArrayType(), + {types::Uint64ArrayType()}, + FN_ELEMENTWISE_SUM_UINT64}, + {types::DoubleArrayType(), + {types::FloatArrayType()}, + FN_ELEMENTWISE_SUM_FLOAT}, + {types::DoubleArrayType(), + {types::DoubleArrayType()}, + FN_ELEMENTWISE_SUM_DOUBLE}, + {types::NumericArrayType(), + {types::NumericArrayType()}, + FN_ELEMENTWISE_SUM_NUMERIC}, + {types::BigNumericArrayType(), + {types::BigNumericArrayType()}, + FN_ELEMENTWISE_SUM_BIGNUMERIC}, + {types::IntervalArrayType(), + {types::IntervalArrayType()}, + FN_ELEMENTWISE_SUM_INTERVAL}, + }; + InsertFunction(functions, options, "elementwise_sum", Function::AGGREGATE, + elementwise_sum_signatures, DefaultAggregateFunctionOptions()); + std::vector elementwise_avg_signatures = { + {types::DoubleArrayType(), + {types::Int32ArrayType()}, + FN_ELEMENTWISE_AVG_INT32}, + {types::DoubleArrayType(), + {types::Int64ArrayType()}, + FN_ELEMENTWISE_AVG_INT64}, + {types::DoubleArrayType(), + {types::Uint32ArrayType()}, + FN_ELEMENTWISE_AVG_UINT32}, + {types::DoubleArrayType(), + {types::Uint64ArrayType()}, + FN_ELEMENTWISE_AVG_UINT64}, + {types::DoubleArrayType(), + {types::FloatArrayType()}, + FN_ELEMENTWISE_AVG_FLOAT}, + {types::DoubleArrayType(), + {types::DoubleArrayType()}, + FN_ELEMENTWISE_AVG_DOUBLE}, + {types::NumericArrayType(), + {types::NumericArrayType()}, + FN_ELEMENTWISE_AVG_NUMERIC}, + {types::BigNumericArrayType(), + {types::BigNumericArrayType()}, + FN_ELEMENTWISE_AVG_BIGNUMERIC}, + {types::IntervalArrayType(), + {types::IntervalArrayType()}, + FN_ELEMENTWISE_AVG_INTERVAL}, + }; + InsertFunction(functions, options, "elementwise_avg", Function::AGGREGATE, + elementwise_avg_signatures, DefaultAggregateFunctionOptions()); +}; } // namespace zetasql diff --git a/zetasql/common/builtin_function_map.cc b/zetasql/common/builtin_function_map.cc new file mode 100644 index 000000000..9aab08a90 --- /dev/null +++ b/zetasql/common/builtin_function_map.cc @@ -0,0 +1,132 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include + +#include "zetasql/common/builtin_function_internal.h" +#include "zetasql/common/errors.h" +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/builtin_function.pb.h" +#include "zetasql/public/builtin_function_options.h" +#include "zetasql/public/function.h" +#include "zetasql/public/function.pb.h" +#include "zetasql/public/function_signature.h" +#include "zetasql/public/input_argument_type.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/options.pb.h" +#include "zetasql/public/types/struct_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/base/check.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/ret_check.h" + +namespace zetasql { + +namespace { +constexpr absl::string_view kMapFromArray = "MAP_FROM_ARRAY"; +} + +static absl::Status CheckMapFromArrayPreResolutionArguments( + const std::vector& arguments, + const LanguageOptions& language_options) { + if (arguments.size() != 1) { + return MakeSqlError() << "No matching signature for function " + << kMapFromArray + << ". Supported signature: " << kMapFromArray + << "(ARRAY>)"; + } + + if (arguments[0].is_untyped()) { + return MakeSqlError() + << kMapFromArray + << " result type cannot be determined from " + "argument " + << arguments[0].UserFacingName(language_options.product_mode()) + << ". Consider casting the argument to ARRAY> so " + "that key type T1 and value type T2 can be determined from the " + "argument"; + } + return absl::OkStatus(); +} + +static absl::StatusOr ComputeMapFromArrayResultType( + Catalog* catalog, TypeFactory* type_factory, CycleDetector* cycle_detector, + const FunctionSignature& signature, + const std::vector& arguments, + const AnalyzerOptions& analyzer_options) { + ZETASQL_RET_CHECK_EQ(arguments.size(), 1); + auto& input_argument = arguments[0]; + + auto make_error_struct_arr_expected = + [&]() { + return MakeSqlError() + << kMapFromArray + << " input argument must be an array of structs, but got type " + << input_argument.type()->TypeName( + analyzer_options.language().product_mode()); + }; + if (!input_argument.type()->IsArray()) { + return make_error_struct_arr_expected(); + } + + auto* array_element_type = input_argument.type()->AsArray()->element_type(); + + if (!array_element_type->IsStruct()) { + return make_error_struct_arr_expected(); + } + + auto* struct_type = array_element_type->AsStruct(); + if (struct_type->num_fields() != 2) { + return MakeSqlError() + << kMapFromArray << " input array must be of type " + << "ARRAY>, but found a struct member with " + << struct_type->num_fields() << " fields"; + } + + if (!struct_type->field(0).type->SupportsGrouping( + analyzer_options.language())) { + return MakeSqlError() << kMapFromArray + << " expected a groupable key, but got a key of type " + << struct_type->field(0).type->TypeName( + analyzer_options.language().product_mode()) + << ", which does not support grouping"; + } + return type_factory->MakeMapType(struct_type->field(0).type, + struct_type->field(1).type); +} + +void GetMapCoreFunctions(TypeFactory* type_factory, + const ZetaSQLBuiltinFunctionOptions& options, + NameToFunctionMap* functions) { + // MAP_FROM_ARRAY(ARRAY> entries) -> MAP + InsertFunction( + functions, options, "map_from_array", Function::SCALAR, + {{ARG_TYPE_ARBITRARY, + {ARG_ARRAY_TYPE_ANY_1}, + FN_MAP_FROM_ARRAY, + // TODO: Collation support for MAP<> type. + FunctionSignatureOptions().set_rejects_collation()}}, + FunctionOptions() + .set_compute_result_type_callback(&ComputeMapFromArrayResultType) + .set_pre_resolution_argument_constraint( + &CheckMapFromArrayPreResolutionArguments) + .add_required_language_feature(FEATURE_V_1_4_MAP_TYPE)); +} + +} // namespace zetasql diff --git a/zetasql/common/errors.h b/zetasql/common/errors.h index 9b7a13ba2..c2d6d16ad 100644 --- a/zetasql/common/errors.h +++ b/zetasql/common/errors.h @@ -274,20 +274,6 @@ ConvertInternalErrorLocationsAndAdjustErrorStrings( return new_statuses; } -// DEPRECATED: Please use the overload using ErrorMessageOptions -ABSL_DEPRECATED("Inline me!") -inline std::vector -ConvertInternalErrorLocationsAndAdjustErrorStrings( - ErrorMessageMode mode, bool attach_error_location_payload, - absl::string_view input_string, const std::vector& statuses) { - return ConvertInternalErrorLocationsAndAdjustErrorStrings( - ErrorMessageOptions{ - .mode = mode, - .attach_error_location_payload = attach_error_location_payload, - .stability = ERROR_MESSAGE_STABILITY_UNSPECIFIED}, - input_string, statuses); -} - } // namespace zetasql #endif // ZETASQL_COMMON_ERRORS_H_ diff --git a/zetasql/common/int_ops_util.h b/zetasql/common/int_ops_util.h deleted file mode 100644 index a1e30e13d..000000000 --- a/zetasql/common/int_ops_util.h +++ /dev/null @@ -1,61 +0,0 @@ -// -// Copyright 2019 Google LLC -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// - -#ifndef ZETASQL_COMMON_INT_OPS_UTIL_H_ -#define ZETASQL_COMMON_INT_OPS_UTIL_H_ - -#include -#include -#include - -namespace zetasql { - -// LosslessConvert casts a value from double to int64_t, detecting if any -// information is lost due to overflow, rounding, or changes in signedness. If -// successful, returns true and writes converted value to `*output`. Otherwise, -// returns false, and the contents of `*output` is undefined. -inline bool LossLessConvertDoubleToInt64(double input, int64_t* output) { - if (std::isnan(input) || std::isinf(input)) { - return false; - } - - double lower_bound = std::numeric_limits::min(); - if (input < lower_bound) { - return false; - } - - if (input > 0) { - // Set exp such that value == f * 2^exp for some f with |f| in [0.5, 1.0). - // Note that this implies that the magnitude of value is strictly less than - // 2^exp. - int exp = 0; - std::frexp(input, &exp); - - // Let N be the number of non-sign bits in the representation of int64_t. - // If the magnitude of value is strictly less than 2^N, the truncated - // version of input is representable as int64_t. - if (exp > std::numeric_limits::digits) { - return false; - } - } - - *output = static_cast(input); - return input == static_cast(*output); -} - -} // namespace zetasql - -#endif // ZETASQL_COMMON_INT_OPS_UTIL_H_ diff --git a/zetasql/common/int_ops_util_test.cc b/zetasql/common/int_ops_util_test.cc deleted file mode 100644 index 72e405802..000000000 --- a/zetasql/common/int_ops_util_test.cc +++ /dev/null @@ -1,52 +0,0 @@ -// -// Copyright 2019 Google LLC -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// - -#include "zetasql/common/int_ops_util.h" - -#include - -#include "gtest/gtest.h" - -namespace zetasql { - -TEST(IntOpsUtilTest, LossLessConvertDoubleToInt64) { - int64_t output; - - EXPECT_TRUE(LossLessConvertDoubleToInt64(1.0, &output)); - EXPECT_EQ(output, 1); - - EXPECT_TRUE(LossLessConvertDoubleToInt64(-123456, &output)); - EXPECT_EQ(output, -123456); - - EXPECT_FALSE(LossLessConvertDoubleToInt64(1.1, &output)); - - EXPECT_TRUE(LossLessConvertDoubleToInt64( - static_cast(std::numeric_limits::min()), &output)); - EXPECT_EQ(output, std::numeric_limits::min()); - - EXPECT_FALSE(LossLessConvertDoubleToInt64( - static_cast(std::numeric_limits::max()), &output)); - - EXPECT_FALSE(LossLessConvertDoubleToInt64(1e100, &output)); - EXPECT_FALSE(LossLessConvertDoubleToInt64( - std::numeric_limits::infinity(), &output)); - EXPECT_FALSE(LossLessConvertDoubleToInt64( - -std::numeric_limits::infinity(), &output)); - EXPECT_FALSE(LossLessConvertDoubleToInt64( - std::numeric_limits::quiet_NaN(), &output)); -} - -} // namespace zetasql diff --git a/zetasql/common/testing/BUILD b/zetasql/common/testing/BUILD index 60af6d2c6..feee56500 100644 --- a/zetasql/common/testing/BUILD +++ b/zetasql/common/testing/BUILD @@ -32,7 +32,6 @@ cc_library( deps = [ "//zetasql/base:path", "@com_google_absl//absl/base:core_headers", - "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/strings:cord", "@com_google_googletest//:gtest", "@com_google_protobuf//:protobuf", diff --git a/zetasql/common/testing/status_payload_matchers_oss_test.cc b/zetasql/common/testing/status_payload_matchers_oss_test.cc index 4ff6b31bc..efde84446 100644 --- a/zetasql/common/testing/status_payload_matchers_oss_test.cc +++ b/zetasql/common/testing/status_payload_matchers_oss_test.cc @@ -50,7 +50,7 @@ absl::Status StatusWithPayload(absl::string_view text) { return status; } -absl::StatusOr StatusOrWithPayload(const std::string &text) { +absl::StatusOr StatusOrWithPayload(absl::string_view text) { return StatusWithPayload(text); } diff --git a/zetasql/common/testing/testing_proto_util.h b/zetasql/common/testing/testing_proto_util.h index 808479866..2abc71fcc 100644 --- a/zetasql/common/testing/testing_proto_util.h +++ b/zetasql/common/testing/testing_proto_util.h @@ -24,7 +24,6 @@ #include "google/protobuf/message.h" #include "gtest/gtest.h" #include "absl/base/macros.h" -#include "absl/flags/flag.h" #include "absl/strings/cord.h" namespace zetasql { diff --git a/zetasql/compliance/BUILD b/zetasql/compliance/BUILD index 9b974dfa4..a0138fd13 100644 --- a/zetasql/compliance/BUILD +++ b/zetasql/compliance/BUILD @@ -179,6 +179,7 @@ cc_library( testonly = 1, data = glob(["testdata/*.test"]) + [ "//zetasql/public/functions:array_find_mode_proto", + "//zetasql/public/functions:array_zip_mode_proto", "//zetasql/public/functions:rounding_mode_proto", "//zetasql/testdata:test_proto3_proto", "//zetasql/testdata:test_schema_proto", @@ -370,6 +371,7 @@ cc_library( "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:cord", "@com_google_absl//absl/strings:str_format", + "@com_google_absl//absl/types:span", "//zetasql/base:status", ], ) @@ -414,7 +416,6 @@ cc_library( deps = [ ":matchers", "//zetasql/base:status", - "@com_google_absl//absl/memory", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", ], @@ -503,6 +504,7 @@ cc_library( "//zetasql/public:json_value", "//zetasql/public:numeric_value", "//zetasql/public:options_cc_proto", + "//zetasql/public:token_list_util", "//zetasql/public:type", "//zetasql/public:type_cc_proto", "//zetasql/public:value", diff --git a/zetasql/compliance/builddefs.bzl b/zetasql/compliance/builddefs.bzl index 869c3a4e9..47f082c66 100644 --- a/zetasql/compliance/builddefs.bzl +++ b/zetasql/compliance/builddefs.bzl @@ -50,6 +50,7 @@ def zetasql_compliance_test( args = [], include_gtest_main = True, driver_exec_properties = None, + tags = [], **extra_args): """Invoke the ZetaSQL compliance test suite against a SQL engine.""" @@ -65,6 +66,7 @@ def zetasql_compliance_test( args = args + [ "--zetasql_reference_impl_validate_timestamp_precision", ], + tags = tags, **extra_args ) diff --git a/zetasql/compliance/compliance_test_cases.cc b/zetasql/compliance/compliance_test_cases.cc index 44a1a1caa..d8ee69329 100644 --- a/zetasql/compliance/compliance_test_cases.cc +++ b/zetasql/compliance/compliance_test_cases.cc @@ -69,6 +69,7 @@ #include "absl/strings/str_replace.h" #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" +#include "absl/types/span.h" ABSL_FLAG(std::string, file_pattern, "*.test", "File pattern for test files."); @@ -173,7 +174,7 @@ static std::string MakeLiteral(const Value& value) { } static std::vector WrapFunctionTestWithFeature( - const std::vector& tests, LanguageFeature feature) { + absl::Span tests, LanguageFeature feature) { std::vector wrapped_tests; wrapped_tests.reserve(tests.size()); for (auto call : tests) { @@ -184,7 +185,7 @@ static std::vector WrapFunctionTestWithFeature( } static std::vector WrapFunctionTestWithFeatures( - const std::vector& tests, + absl::Span tests, std::vector& features) { std::vector wrapped_tests; wrapped_tests.reserve(tests.size()); @@ -200,7 +201,7 @@ static std::vector WrapFunctionTestWithFeatures( } std::vector WrapFeatureAdditionalStringFunctions( - const std::vector& tests) { + absl::Span tests) { return WrapFunctionTestWithFeature(tests, FEATURE_V_1_3_ADDITIONAL_STRING_FUNCTIONS); } @@ -234,7 +235,7 @@ std::vector WrapFeatureLastDay( } static std::vector WrapFeatureJSON( - const std::vector& tests) { + absl::Span tests) { std::vector wrapped_tests; wrapped_tests.reserve(tests.size()); for (auto& test_case : tests) { @@ -244,7 +245,7 @@ static std::vector WrapFeatureJSON( } static std::vector WrapFeatureCollation( - const std::vector& tests) { + absl::Span tests) { std::vector wrapped_tests; wrapped_tests.reserve(tests.size()); for (auto& test_case : tests) { @@ -502,7 +503,7 @@ void ComplianceCodebasedTests::RunFunctionTestsPrefix( template void ComplianceCodebasedTests::RunFunctionTestsCustom( - const std::vector& function_tests, FCT get_sql_string) { + absl::Span function_tests, FCT get_sql_string) { for (const auto& params : function_tests) { std::string pattern = get_sql_string(params); std::string sql = absl::StrCat("SELECT ", pattern, " AS ", kColA); @@ -617,7 +618,7 @@ void ComplianceCodebasedTests::RunAggregationFunctionCalls( template void ComplianceCodebasedTests::RunStatementTestsCustom( - const std::vector& statement_tests, + absl::Span statement_tests, FCT get_sql_string) { for (const auto& params : statement_tests) { std::string pattern = get_sql_string(params); @@ -648,7 +649,7 @@ void ComplianceCodebasedTests::RunStatementOnFeatures( std::vector ComplianceCodebasedTests::GetFunctionTestsDateArithmetics( - const std::vector& tests) { + absl::Span tests) { std::vector out; for (const auto& test : tests) { // Only look at tests which use DAY as datepart @@ -674,7 +675,7 @@ TEST_F(ComplianceCodebasedTests, TestQueryParameters) { Returns(Singleton(Int64(5)))); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestDepthLimitDetectorTestCases, 12) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestDepthLimitDetectorTestCases, 40) { for (const DepthLimitDetectorTestCase& depth_case : Shard(AllDepthLimitDetectorTestCases())) { bool driver_enables_right_features = true; @@ -813,12 +814,12 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestStringConcatOperator, 1) { "@p0 || @p1"); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestGreatestFunctions, 3) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestGreatestFunctions, 5) { SetNamePrefix("Greatest"); RunFunctionTestsPrefix(Shard(GetFunctionTestsGreatest()), "Greatest"); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestLeastFunctions, 3) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestLeastFunctions, 4) { SetNamePrefix("Least"); RunFunctionTestsPrefix(Shard(GetFunctionTestsLeast()), "Least"); } @@ -1038,7 +1039,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestComparisonFunctions_GT, 2) { RunStatementTests(Shard(GetFunctionTestsGreater()), "@p0 > @p1"); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestComparisonFunctions_GE, 2) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestComparisonFunctions_GE, 3) { SetNamePrefix("GE"); RunStatementTests(Shard(GetFunctionTestsGreaterOrEqual()), "@p0 >= @p1"); } @@ -1093,12 +1094,12 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestComparisonFunctions_LT, 2) { RunStatementTests(Shard(GetFunctionTestsLess()), "@p0 < @p1"); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestComparisonFunctions_LE, 2) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestComparisonFunctions_LE, 3) { SetNamePrefix("LE"); RunStatementTests(Shard(GetFunctionTestsLessOrEqual()), "@p0 <= @p1"); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestCastFunction, 4) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestCastFunction, 14) { // TODO: This needs to be sensitive to ProductMode, or // maybe just switched to PRODUCT_EXTERNAL. auto format_fct = [](const QueryParamsWithResult& p) { @@ -1403,12 +1404,22 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestNativeJsonExtractStringArray, 1) { GetFunctionTestsNativeJsonExtractStringArray()))); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestToJsonString, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestJsonQueryLax, 1) { + SetNamePrefix("JsonQueryLax"); + std::vector tests = GetFunctionTestsJsonQueryLax(); + for (auto& test_case : tests) { + test_case.params.AddRequiredFeatures( + {FEATURE_JSON_TYPE, FEATURE_JSON_QUERY_LAX}); + } + RunFunctionCalls(Shard(tests)); +} + +SHARDED_TEST_F(ComplianceCodebasedTests, TestToJsonString, 3) { SetNamePrefix("ToJsonString"); RunFunctionCalls(Shard(GetFunctionTestsToJsonString())); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestToJson, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestToJson, 3) { SetNamePrefix("ToJson"); auto to_json_fct = [](const FunctionTestCall& f) { if (f.params.params().size() == 1) { @@ -1500,7 +1511,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestConvertJsonLaxString, 1) { GetFunctionTestsConvertJsonLaxString()))); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestJsonArray, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestJsonArray, 2) { SetNamePrefix("JsonArray"); RunFunctionCalls(Shard( EnableJsonConstructorFunctionsForTest(GetFunctionTestsJsonArray()))); @@ -1631,12 +1642,12 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestRegexp2Functions, 1) { Shard(GetFunctionTestsRegexp2(/*=include_feature_set=*/true))); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestRegexpInstrFunctions, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestRegexpInstrFunctions, 5) { SetNamePrefix("RegexpInstr"); RunFunctionCalls(Shard(GetFunctionTestsRegexpInstr())); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestLike, 4) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestLike, 5) { SetNamePrefix("Like"); auto query_params_with_results = GetFunctionTestsLike(); // The LIKE pattern is not a constant, the regexp is constructed and @@ -1654,7 +1665,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestLike, 4) { }); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestLikeWithCollation, 4) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestLikeWithCollation, 12) { auto query_params_with_results_with_collation = WrapFeatureCollation(GetFunctionTestsLikeWithCollation()); SetNamePrefix("LikeWithCollationTextPatternUndCi"); @@ -1694,7 +1705,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestNotLike, 2) { "@p0 NOT LIKE @p1"); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestNotLikeWithCollation, 4) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestNotLikeWithCollation, 6) { auto query_params_with_results_with_collation = WrapFeatureCollation(InvertResults(GetFunctionTestsLikeWithCollation())); SetNamePrefix("NotLikeWithCollationTextPatternUndCi"); @@ -1996,7 +2007,7 @@ TEST_F(ComplianceCodebasedTests, TestProto) { // TODO: port all tests from evaluation_test.cc } -SHARDED_TEST_F(ComplianceCodebasedTests, TestMathFunctions_Math, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestMathFunctions_Math, 4) { // No need to set PREFIX, RunFunctionCalls() will do it. RunFunctionCalls(Shard(GetFunctionTestsMath())); } @@ -2053,7 +2064,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestDateTimeFunctionsDateDiffFormat, // others. The number of shards for this case is picked so that each shard has // ~300 queries. SHARDED_TEST_F(ComplianceCodebasedTests, TestDateTimeFunctionsExtractFormat, - 100) { + 140) { auto extract_format_fct = [](const FunctionTestCall& f) { if (f.params.num_params() != 2 && f.params.num_params() != 3) { ABSL_LOG(FATAL) << "Unexpected number of parameters: " @@ -2080,7 +2091,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestDateTimeFunctionsExtractFormat, // Even with 500 query shards, this test takes a long time relative to the // others. The number of shards for this case is picked so that each shard has // ~300 queries. -SHARDED_TEST_F(ComplianceCodebasedTests, TestDateTimeFunctions_Standard, 12) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestDateTimeFunctions_Standard, 60) { RunFunctionCalls(Shard(GetFunctionTestsDateTimeStandardFunctionCalls())); } @@ -2148,7 +2159,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestDatetimeAddSubFunctions, 2) { datetime_add_sub); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestDatetimeDiffFunctions, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestDatetimeDiffFunctions, 5) { auto datetime_diff = [](const FunctionTestCall& f) { ABSL_CHECK_EQ(3, f.params.num_params()); return absl::Substitute("$0(@p0, @p1, $1)", f.function_name, @@ -2198,7 +2209,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestDatetimeTruncFunctions, 1) { datetime_trunc); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestTimeAddSubFunctions, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestTimeAddSubFunctions, 2) { auto time_add_sub = [](const FunctionTestCall& f) { ABSL_CHECK_EQ(3, f.params.num_params()); return absl::Substitute("$0(@p0, INTERVAL @p1 $1)", f.function_name, @@ -2246,7 +2257,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestTimestampAddSubFunctions, 1) { timestamp_diff_format_fct); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestTimestampTruncFunctions, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestTimestampTruncFunctions, 6) { auto timestamp_trunc_format_fct = [](const FunctionTestCall& f) { if (f.params.num_params() != 2 && f.params.num_params() != 3) { ABSL_LOG(FATAL) << "Unexpected number of parameters: " @@ -2338,7 +2349,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestChrFunctions, 1) { RunFunctionCalls(Shard(GetFunctionTestsChr())); } -SHARDED_TEST_F(ComplianceCodebasedTests, TestStringFunctions, 2) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestStringFunctions, 6) { // No need to set PREFIX, RunFunctionCalls() will do it. RunFunctionCalls(Shard(GetFunctionTestsString())); } @@ -2394,11 +2405,18 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestArrayFunctions, 1) { } // Six way sharding puts each shard at ~100 queries as of Q1'17. -SHARDED_TEST_F(ComplianceCodebasedTests, TestFormatFunction, 6) { +SHARDED_TEST_F(ComplianceCodebasedTests, TestFormatFunction, 11) { SetNamePrefix("Format"); RunFunctionCalls(Shard(GetFunctionTestsFormat())); } +// TODO: Remove these once all engines use FLOAT32 as the type name +// in FORMAT("%T"). +SHARDED_TEST_F(ComplianceCodebasedTests, TestFormatFunctionWithExternalFloat, + 1) { + RunFunctionCalls(Shard(GetFunctionTestsFormatWithExternalModeFloatType())); +} + SHARDED_TEST_F(ComplianceCodebasedTests, TestNormalizeFunctions, 1) { RunNormalizeFunctionCalls(Shard(GetFunctionTestsNormalize())); } @@ -2484,7 +2502,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, RangeIntersect, 1) { [](const FunctionTestCall& f) { return "range_intersect(@p0, @p1)"; }); } -SHARDED_TEST_F(ComplianceCodebasedTests, GenerateRangeArray, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, GenerateRangeArray, 2) { SetNamePrefix("GenerateRangeArray"); auto sql_string_fn = [](const FunctionTestCall& f) { ABSL_CHECK_GE(f.params.num_params(), 2); @@ -2533,7 +2551,7 @@ SHARDED_TEST_F(ComplianceCodebasedTests, IntervalDateTimestampSubtractions, 1) { "CAST(-(@p1 - @p0) AS STRING)"); } -SHARDED_TEST_F(ComplianceCodebasedTests, DateTimestampAddSubInterval, 1) { +SHARDED_TEST_F(ComplianceCodebasedTests, DateTimestampAddSubInterval, 6) { SetNamePrefix("TimestampAddInterval"); RunStatementTests(Shard(GetTimestampAddSubInterval()), "@p0 + @p1"); SetNamePrefix("TimestampSubInterval"); @@ -2602,11 +2620,49 @@ SHARDED_TEST_F(ComplianceCodebasedTests, TestsCosineDistance, 1) { RunFunctionCalls(Shard(GetFunctionTestsCosineDistance())); } +SHARDED_TEST_F(ComplianceCodebasedTests, TestsApproxCosineDistance, 1) { + SetNamePrefix("ApproxCosineDistance"); + RunFunctionCalls( + Shard(AddSafeFunctionCalls(GetFunctionTestsApproxCosineDistance()))); +} + SHARDED_TEST_F(ComplianceCodebasedTests, TestsEuclideanDistance, 1) { SetNamePrefix("EuclideanDistance"); RunFunctionCalls(Shard(GetFunctionTestsEuclideanDistance())); } +SHARDED_TEST_F(ComplianceCodebasedTests, TestsApproxEuclideanDistance, 1) { + SetNamePrefix("ApproxEuclideanDistance"); + RunFunctionCalls( + Shard(AddSafeFunctionCalls(GetFunctionTestsApproxEuclideanDistance()))); +} + +SHARDED_TEST_F(ComplianceCodebasedTests, TestsDotProduct, 1) { + SetNamePrefix("DotProduct"); + RunFunctionCalls(Shard(GetFunctionTestsDotProduct())); +} + +SHARDED_TEST_F(ComplianceCodebasedTests, TestDotProduct, 1) { + SetNamePrefix("ApproxDotProduct"); + RunFunctionCalls( + Shard(AddSafeFunctionCalls(GetFunctionTestsApproxDotProduct()))); +} + +SHARDED_TEST_F(ComplianceCodebasedTests, TestsManhattanDistance, 1) { + SetNamePrefix("ManhattanDistance"); + RunFunctionCalls(Shard(GetFunctionTestsManhattanDistance())); +} + +SHARDED_TEST_F(ComplianceCodebasedTests, TestsL1Norm, 1) { + SetNamePrefix("L1Norm"); + RunFunctionCalls(Shard(GetFunctionTestsL1Norm())); +} + +SHARDED_TEST_F(ComplianceCodebasedTests, TestsL2Norm, 1) { + SetNamePrefix("L2Norm"); + RunFunctionCalls(Shard(GetFunctionTestsL2Norm())); +} + SHARDED_TEST_F(ComplianceCodebasedTests, TestEditDistance, 1) { SetNamePrefix("EditDistance"); auto edit_distance_fn = [](const FunctionTestCall& f) { diff --git a/zetasql/compliance/compliance_test_cases.h b/zetasql/compliance/compliance_test_cases.h index 0f9ed8b88..bd9e0265e 100644 --- a/zetasql/compliance/compliance_test_cases.h +++ b/zetasql/compliance/compliance_test_cases.h @@ -38,6 +38,7 @@ #include "absl/strings/numbers.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/status.h" namespace zetasql { @@ -127,7 +128,7 @@ class ComplianceCodebasedTests : public SQLTestBase { // For each original FunctionTestCall, build QueryParamsWithResult enabled for // FEATURE_V_1_3_DATE_ARITHMETICS language feature. std::vector GetFunctionTestsDateArithmetics( - const std::vector& tests); + absl::Span tests); // For each function test call, runs the original version of the function in // addition to the SAFE version of the function, if the language feature is @@ -162,13 +163,13 @@ class ComplianceCodebasedTests : public SQLTestBase { // specified as a template parameter. template void RunStatementTestsCustom( - const std::vector& statement_tests, + absl::Span statement_tests, FCT get_sql_string); // Same as above but takes vector instead. template - void RunFunctionTestsCustom( - const std::vector& function_tests, FCT get_sql_string); + void RunFunctionTestsCustom(absl::Span function_tests, + FCT get_sql_string); // Runs a statement with the specified feature set and returns the result. absl::StatusOr @@ -291,7 +292,7 @@ class ShardedTest : public BaseT { void SkipAllShards() { sharded_ = true; } // Accessors for test_name and index. - void set_test_name(const std::string& test_name) { test_name_ = test_name; } + void set_test_name(absl::string_view test_name) { test_name_ = test_name; } const std::string& test_name() const { return test_name_; } void set_index(const size_t index) { index_ = index; } diff --git a/zetasql/compliance/functions_testlib.h b/zetasql/compliance/functions_testlib.h index a1522dad3..a7a860452 100644 --- a/zetasql/compliance/functions_testlib.h +++ b/zetasql/compliance/functions_testlib.h @@ -99,6 +99,9 @@ std::vector GetFunctionTestsSafeCast(); std::vector GetFunctionTestsCastBetweenDifferentArrayTypes(bool arrays_with_nulls); +// Casts involving TOKENLIST values. +std::vector GetFunctionTestsCastTokenList(); + std::vector GetFunctionTestsBitwiseNot(); std::vector GetFunctionTestsBitwiseOr(); std::vector GetFunctionTestsBitwiseXor(); @@ -248,6 +251,7 @@ std::vector GetFunctionTestsRegexp(); std::vector GetFunctionTestsRegexp2(bool include_feature_set); std::vector GetFunctionTestsRegexpInstr(); std::vector GetFunctionTestsFormat(); +std::vector GetFunctionTestsFormatWithExternalModeFloatType(); std::vector GetFunctionTestsArray(); std::vector GetFunctionTestsNormalize(); std::vector GetFunctionTestsBase2(); @@ -291,6 +295,7 @@ std::vector GetFunctionTestsNativeJsonExtractArray(); std::vector GetFunctionTestsNativeJsonExtractStringArray(); std::vector GetFunctionTestsNativeJsonQueryArray(); std::vector GetFunctionTestsNativeJsonValueArray(); +std::vector GetFunctionTestsJsonQueryLax(); std::vector GetFunctionTestsToJsonString(); std::vector GetFunctionTestsToJson(); std::vector GetFunctionTestsJsonIsNull(); @@ -334,7 +339,14 @@ GetFunctionTestsGenerateDatetimeRangeArrayExtras(); std::vector GetFunctionTestsRangeContains(); std::vector GetFunctionTestsCosineDistance(); +std::vector GetFunctionTestsApproxCosineDistance(); std::vector GetFunctionTestsEuclideanDistance(); +std::vector GetFunctionTestsApproxEuclideanDistance(); +std::vector GetFunctionTestsDotProduct(); +std::vector GetFunctionTestsApproxDotProduct(); +std::vector GetFunctionTestsManhattanDistance(); +std::vector GetFunctionTestsL1Norm(); +std::vector GetFunctionTestsL2Norm(); std::vector GetFunctionTestsEditDistance(); std::vector GetFunctionTestsEditDistanceBytes(); diff --git a/zetasql/compliance/functions_testlib_2.cc b/zetasql/compliance/functions_testlib_2.cc index be1ebd08a..fc7284b36 100644 --- a/zetasql/compliance/functions_testlib_2.cc +++ b/zetasql/compliance/functions_testlib_2.cc @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -32,7 +33,9 @@ #include "zetasql/base/logging.h" #include "google/protobuf/wrappers.pb.h" #include "zetasql/compliance/functions_testlib_common.h" +#include "zetasql/public/civil_time.h" #include "zetasql/public/functions/date_time_util.h" +#include "zetasql/public/interval_value.h" #include "zetasql/public/json_value.h" #include "zetasql/public/numeric_value.h" #include "zetasql/public/options.pb.h" @@ -1623,6 +1626,17 @@ struct TypeFeaturePair { example_input_4(example4), required_features({feature}) {} + TypeFeaturePair(const Type* type, const Value& example1, + const Value& example2, const Value& example3, + const Value& example4, + std::initializer_list features) + : type(type), + example_input_1(example1), + example_input_2(example2), + example_input_3(example3), + example_input_4(example4), + required_features(features.begin(), features.end()) {} + TypeFeaturePair(const Type* type, const Value& example1, const Value& example2, const Value& example3, LanguageFeature feature) @@ -2364,6 +2378,38 @@ GetOrderableTypesWithFeaturesAndValues() { Value::Enum(TestEnumType(), 1), Value::Enum(TestEnumType(), 0x000000002), }, + // Ranges + { + types::DateRangeType(), + Range(NullDate(), Date(1)), + Range(Date(1), Date(2)), + Range(Date(1), Date(2)), + Range(Date(5), Date(10)), + FEATURE_RANGE_TYPE, + }, + {types::DatetimeRangeType(), + Range(NullDatetime(), Datetime(DatetimeValue::FromYMDHMSAndMicros( + 2024, 1, 9, 10, 00, 00, 99))), + Range(Datetime(DatetimeValue::FromYMDHMSAndMicros(2024, 1, 9, 12, 40, 55, + 99)), + Datetime(DatetimeValue::FromYMDHMSAndMicros(2024, 1, 9, 17, 40, 56, + 99))), + Range(Datetime(DatetimeValue::FromYMDHMSAndMicros(2024, 1, 9, 12, 40, 55, + 99)), + Datetime(DatetimeValue::FromYMDHMSAndMicros(2024, 1, 9, 17, 40, 56, + 99))), + Range(Datetime( + DatetimeValue::FromYMDHMSAndMicros(2024, 1, 20, 10, 0, 0, 0)), + NullDatetime()), + {FEATURE_RANGE_TYPE, FEATURE_V_1_2_CIVIL_TIME}}, + { + types::TimestampRangeType(), + Range(NullTimestamp(), Timestamp(2)), + Range(Timestamp(5), Timestamp(10)), + Range(Timestamp(5), Timestamp(10)), + Range(Timestamp(20), Timestamp(50)), + FEATURE_RANGE_TYPE, + }, }; } diff --git a/zetasql/compliance/functions_testlib_cast.cc b/zetasql/compliance/functions_testlib_cast.cc index bc7763826..f8f72d171 100644 --- a/zetasql/compliance/functions_testlib_cast.cc +++ b/zetasql/compliance/functions_testlib_cast.cc @@ -34,6 +34,7 @@ #include "zetasql/public/interval_value_test_util.h" #include "zetasql/public/numeric_value.h" #include "zetasql/public/options.pb.h" +#include "zetasql/public/token_list_util.h" #include "zetasql/public/type.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/public/value.h" @@ -1177,41 +1178,37 @@ std::vector GetFunctionTestsCastRange() { {{Range(Timestamp(1233496649123456), Timestamp(1267628310654321))}, {Range(Timestamp(1233496649123456), Timestamp(1267628310654321))}}, - // TODO: b/285939418 - uncomment these test cases once RANGE cast to - // STRING is implemented. - // // RANGE -> STRING - // {{Null(types::DateRangeType())}, NullString()}, - // {{Range(Date(14276), Date(14288))}, String("[2009-02-01, - // 2009-02-13)")}, - // {{Range(NullDate(), Date(14288))}, String("[UNBOUNDED, 2009-02-13)")}, - // {{Range(Date(14276), NullDate())}, String("[2009-02-01, UNBOUNDED)")}, - // {{Range(NullDate(), NullDate())}, String("[UNBOUNDED, UNBOUNDED)")}, - // // RANGE -> STRING - // {{Null(types::DatetimeRangeType())}, NullString()}, - // {{Range(DatetimeMicros(2009, 02, 01, 13, 57, 29, 123456), - // DatetimeMicros(2010, 03, 03, 14, 58, 30, 654321))}, - // String("[2009-02-01 13:57:29.123456, 2010-03-03 14:58:30.654321)")}, - // {{Range(NullDatetime(), - // DatetimeMicros(2010, 03, 03, 14, 58, 30, 654321))}, - // String("[UNBOUNDED, 2010-03-03 14:58:30.654321)")}, - // {{Range(DatetimeMicros(2009, 02, 01, 13, 57, 29, 123456), - // NullDatetime())}, - // String("[2009-02-01 13:57:29.123456, UNBOUNDED)")}, - // {{Range(NullDatetime(), NullDatetime())}, - // String("[UNBOUNDED, UNBOUNDED)")}, - // // RANGE -> STRING - // // The default time zone is America/Los_Angeles. - // {{Null(types::TimestampRangeType())}, NullString()}, - // {{Range(Timestamp(1233496649123456), Timestamp(1267628310654321))}, - // String( - // "[2009-02-01 05:57:29.123456-08, 2010-03-03 - // 06:58:30.654321-08)")}, - // {{Range(NullTimestamp(), Timestamp(1267628310654321))}, - // String("[UNBOUNDED, 2010-03-03 06:58:30.654321-08)")}, - // {{Range(Timestamp(1233496649123456), NullTimestamp())}, - // String("[2009-02-01 05:57:29.123456-08, UNBOUNDED)")}, - // {{Range(NullTimestamp(), NullTimestamp())}, - // String("[UNBOUNDED, UNBOUNDED)")}, + // RANGE -> STRING + {{Null(types::DateRangeType())}, NullString()}, + {{Range(Date(14276), Date(14288))}, String("[2009-02-01, 2009-02-13)")}, + {{Range(NullDate(), Date(14288))}, String("[UNBOUNDED, 2009-02-13)")}, + {{Range(Date(14276), NullDate())}, String("[2009-02-01, UNBOUNDED)")}, + {{Range(NullDate(), NullDate())}, String("[UNBOUNDED, UNBOUNDED)")}, + // RANGE -> STRING + {{Null(types::DatetimeRangeType())}, NullString()}, + {{Range(DatetimeMicros(2009, 02, 01, 13, 57, 29, 123456), + DatetimeMicros(2010, 03, 03, 14, 58, 30, 654321))}, + String("[2009-02-01 13:57:29.123456, 2010-03-03 14:58:30.654321)")}, + {{Range(NullDatetime(), + DatetimeMicros(2010, 03, 03, 14, 58, 30, 654321))}, + String("[UNBOUNDED, 2010-03-03 14:58:30.654321)")}, + {{Range(DatetimeMicros(2009, 02, 01, 13, 57, 29, 123456), + NullDatetime())}, + String("[2009-02-01 13:57:29.123456, UNBOUNDED)")}, + {{Range(NullDatetime(), NullDatetime())}, + String("[UNBOUNDED, UNBOUNDED)")}, + // RANGE -> STRING + // The default time zone is America/Los_Angeles. + {{Null(types::TimestampRangeType())}, NullString()}, + {{Range(Timestamp(1233496649123456), Timestamp(1267628310654321))}, + String( + "[2009-02-01 05:57:29.123456-08, 2010-03-03 06:58:30.654321-08)")}, + {{Range(NullTimestamp(), Timestamp(1267628310654321))}, + String("[UNBOUNDED, 2010-03-03 06:58:30.654321-08)")}, + {{Range(Timestamp(1233496649123456), NullTimestamp())}, + String("[2009-02-01 05:57:29.123456-08, UNBOUNDED)")}, + {{Range(NullTimestamp(), NullTimestamp())}, + String("[UNBOUNDED, UNBOUNDED)")}, // STRING -> RANGE {{NullString()}, Null(types::DateRangeType())}, @@ -2626,6 +2623,7 @@ std::vector GetFunctionTestsCast() { GetFunctionTestsCastBytesStringWithFormat(), GetFunctionTestsCastDateTimestampStringWithFormat(), GetFunctionTestsCastRange(), + GetFunctionTestsCastTokenList(), }); } @@ -2699,4 +2697,24 @@ GetFunctionTestsCastBetweenDifferentArrayTypes(bool arrays_with_nulls) { return tests; } +std::vector GetFunctionTestsCastTokenList() { + const Value tokenlist = TokenListFromStringArray({"test", "tokenlist"}); + const std::string bytes = tokenlist.tokenlist_value().GetBytes(); + + std::vector tests = { + {{Value::NullTokenList()}, Value::NullTokenList()}, + {{NullBytes()}, Value::NullTokenList()}, + {{Value::NullTokenList()}, NullBytes()}, + {{Bytes(bytes)}, tokenlist}, + {{tokenlist}, Bytes(bytes)}, + }; + + std::vector result; + result.reserve(tests.size()); + for (const auto& test : tests) { + result.push_back(test.WrapWithFeature(FEATURE_TOKENIZED_SEARCH)); + } + return result; +} + } // namespace zetasql diff --git a/zetasql/compliance/functions_testlib_distance.cc b/zetasql/compliance/functions_testlib_distance.cc index f5c0eef6f..7f1c1a501 100644 --- a/zetasql/compliance/functions_testlib_distance.cc +++ b/zetasql/compliance/functions_testlib_distance.cc @@ -14,7 +14,9 @@ // limitations under the License. // +#include #include +#include // NOLINT: (broken link) for std::sqrt(). #include #include #include @@ -52,11 +54,14 @@ Value ValueOrDie(absl::StatusOr& s) { return s.value(); } -template >> +template || + std::is_integral_v>> Value MakeArray(std::vector arr, bool is_null = false, bool ends_with_null = false) { if (is_null) { - if constexpr (std::is_same_v) { + if constexpr (std::is_same_v) { + return Value::Null(types::Int64ArrayType()); + } else if constexpr (std::is_same_v) { return Value::Null(types::FloatArrayType()); } return Value::Null(types::DoubleArrayType()); @@ -70,7 +75,9 @@ Value MakeArray(std::vector arr, bool is_null = false, values.push_back(Value::MakeNull()); } absl::StatusOr array; - if constexpr (std::is_same_v) { + if constexpr (std::is_same_v) { + array = Value::MakeArray(types::Int64ArrayType(), values); + } else if constexpr (std::is_same_v) { array = Value::MakeArray(types::FloatArrayType(), values); } else { array = Value::MakeArray(types::DoubleArrayType(), values); @@ -78,7 +85,8 @@ Value MakeArray(std::vector arr, bool is_null = false, return ValueOrDie(array); } -template >> +template || + std::is_integral_v>> Value MakeRepeatedArray(T value, int64_t repeated_count) { std::vector values; values.reserve(repeated_count); @@ -88,13 +96,15 @@ Value MakeRepeatedArray(T value, int64_t repeated_count) { return MakeArray(values); } -template >> +template || + std::is_integral_v>> Value MakeArrayEndingWithNull(std::vector arr) { return MakeArray(arr, /*is_null=*/false, /*ends_with_null=*/true); } Value MakeNullDoubleArray() { return MakeArray({}, /*is_null=*/true); } Value MakeNullFloatArray() { return MakeArray({}, /*is_null=*/true); } +Value MakeNullInt64Array() { return MakeArray({}, /*is_null=*/true); } Value MakeArray(std::vector arr, bool is_null = false, bool ends_with_null = false, @@ -224,7 +234,7 @@ std::vector GetFunctionTestsCosineDistance() { {MakeArray({5.0, 6.0}), MakeArrayEndingWithNull(std::vector{3.0})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, {"cosine_distance", {MakeArray({{1, 1.0}, {2, 2.0}}), MakeNullInt64KeyArray()}, @@ -240,17 +250,17 @@ std::vector GetFunctionTestsCosineDistance() { MakeArrayEndingWithNull( std::vector{{3, 5.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, {"cosine_distance", {MakeArray({{1, 5.0}, {2, 6.0}}), MakeArray({{1, 1.0}, {std::nullopt, 6.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, {"cosine_distance", {MakeArray({{1, 5.0}, {2, 6.0}}), MakeArray({{1, 1.0}, {2, std::nullopt}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, {"cosine_distance", {MakeNullStringKeyArray(), MakeArray({{"a", 5.0}, {"b", 6.0}})}, @@ -265,37 +275,37 @@ std::vector GetFunctionTestsCosineDistance() { {MakeArray({{"a", 5.0}, {"b", 6.0}}), MakeArrayEndingWithNull({{"a", 5.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, {"cosine_distance", {MakeArray({{"a", 5.0}, {"b", 6.0}}), MakeArray({{"b", 1.0}, {std::nullopt, 6.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, {"cosine_distance", {MakeArray({{"a", 5.0}, {"b", 6.0}}), MakeArray({{"b", 1.0}, {"a", std::nullopt}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, // Zero length array {"cosine_distance", {MakeArray(std::vector{}), MakeArray(std::vector{})}, values::NullDouble(), absl::OutOfRangeError( - "Cannot compute cosine distance against zero vector.")}, + "Cannot compute cosine distance against zero vector")}, // Zero length vector {"cosine_distance", {MakeArray({0.0, 0.0}), MakeArray({1.0, 2.0})}, values::NullDouble(), absl::OutOfRangeError( - "Cannot compute cosine distance against zero vector.")}, + "Cannot compute cosine distance against zero vector")}, // Mismatch length vector {"cosine_distance", {MakeArray({1.0, 2.0}), MakeArray({1.0, 2.0, 3.0})}, values::NullDouble(), - absl::OutOfRangeError("Array length mismatch 2 and 3.")}, + absl::OutOfRangeError("Array length mismatch 2 and 3")}, // Long vector // When 2 vectors are parallel, the angle is 0, so cosine distance is @@ -470,7 +480,7 @@ std::vector GetFunctionTestsCosineDistance() { {MakeArray({5.0, 6.0}), MakeArrayEndingWithNull(std::vector{3.0})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, // Long vector // When 2 vectors are parallel, the angle is 0, so cosine distance is @@ -485,13 +495,13 @@ std::vector GetFunctionTestsCosineDistance() { {MakeArray({0.0, 0.0}), MakeArray({1.0, 2.0})}, values::NullDouble(), absl::OutOfRangeError( - "Cannot compute cosine distance against zero vector.")}, + "Cannot compute cosine distance against zero vector")}, // Mismatched vector length. {"cosine_distance", {MakeArray({1.0, 2.0}), MakeArray({1.0, 2.0, 3.0})}, values::NullDouble(), - absl::OutOfRangeError("Array length mismatch 2 and 3.")}, + absl::OutOfRangeError("Array length mismatch 2 and 3")}, // NaN {"cosine_distance", @@ -555,6 +565,203 @@ std::vector GetFunctionTestsCosineDistance() { return tests; } +std::vector GetFunctionTestsApproxCosineDistance() { + std::vector tests = { + // NULL inputs + {"approx_cosine_distance", + {MakeNullDoubleArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"approx_cosine_distance", + {MakeArray({5.0, 6.0}), MakeNullDoubleArray()}, + values::NullDouble()}, + {"approx_cosine_distance", + {MakeNullDoubleArray(), MakeNullDoubleArray()}, + values::NullDouble()}, + {"approx_cosine_distance", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError("NULL array element.")}, + + // Zero length array + {"approx_cosine_distance", + {MakeArray(std::vector{}), MakeArray(std::vector{})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute cosine distance against zero vector")}, + + // Zero length vector + {"approx_cosine_distance", + {MakeArray({0.0, 0.0}), MakeArray({1.0, 2.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute cosine distance against zero vector")}, + + // Mismatch length vector + {"approx_cosine_distance", + {MakeArray({1.0, 2.0}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError("Array length mismatch 2 and 3")}, + + // Long vector + // When 2 vectors are parallel, the angle is 0, so cosine distance is + // 1 - cos(0) = 0. + {"approx_cosine_distance", + {MakeRepeatedArray(1.0, 128), MakeRepeatedArray(2.0, 128)}, + 0.0, + kDistanceFloatMargin}, + + // NaN + {"approx_cosine_distance", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Inf + {"approx_cosine_distance", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Overflow + {"approx_cosine_distance", + {MakeArray({std::numeric_limits::max(), 0.0}), + MakeArray({0.0, std::numeric_limits::max()})}, + values::NullDouble(), + absl::OutOfRangeError("double overflow: 1.79769e+308 * 1.79769e+308")}, + + // Dense array + {"approx_cosine_distance", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 0.01613008990009257, + kDistanceFloatMargin}, + {"approx_cosine_distance", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 8.0})}, + 0.0002901915325176363, + kDistanceFloatMargin}, + {"approx_cosine_distance", + {MakeArray({1.0, 2.0}), MakeArray({-3.0, -4.0})}, + 1.9838699100999073, + kDistanceFloatMargin}, + + // Dense array with significant floating point values difference. + // The 2 vectors are parallel so cosine distance = 0. + {"approx_cosine_distance", + {MakeArray({1e140, 2.0e140}), + MakeArray({1.0e-140, 2.0e-140})}, + 0.0, + kDistanceFloatMargin}, + + // Dense array with significant floating point values difference. + // The 2 vectors are antiparallel so cosine distance = 2. + {"approx_cosine_distance", + {MakeArray({1e140, 2.0e140}), + MakeArray({-1.0e-140, -2.0e-140})}, + 2.0, + kDistanceFloatMargin}, + + // Dense array with significant floating point values difference. + // The 2 vectors are perpendicular so cosine distance = 1. + {"approx_cosine_distance", + {MakeArray({1e140, 2.0e140}), + MakeArray({-2.0e-140, 1.0e-140})}, + 1.0, + kDistanceFloatMargin}}; + + std::vector float_array_tests = { + // NULL inputs. + {"approx_cosine_distance", + {MakeNullFloatArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"approx_cosine_distance", + {MakeArray({5.0, 6.0}), MakeNullFloatArray()}, + values::NullDouble()}, + {"approx_cosine_distance", + {MakeNullFloatArray(), MakeNullFloatArray()}, + values::NullDouble()}, + {"approx_cosine_distance", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError("NULL array element")}, + + // Long vector + // When 2 vectors are parallel, the angle is 0, so cosine distance is + // 1 - cos(0) = 0. + {"approx_cosine_distance", + {MakeRepeatedArray(1.0f, 128), MakeRepeatedArray(2.0f, 128)}, + 0.0, + kDistanceFloatMargin}, + + // Zero length vector. + {"approx_cosine_distance", + {MakeArray({0.0, 0.0}), MakeArray({1.0, 2.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute cosine distance against zero vector")}, + + // Mismatched vector length. + {"approx_cosine_distance", + {MakeArray({1.0, 2.0}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError("Array length mismatch 2 and 3")}, + + // NaN + {"approx_cosine_distance", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Inf + {"approx_cosine_distance", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + {"approx_cosine_distance", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Dense array with float values as input. + {"approx_cosine_distance", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 0.01613008990009257, + kDistanceFloatMargin}, + {"approx_cosine_distance", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 8.0})}, + 0.0002901915325176363, + kDistanceFloatMargin}, + {"approx_cosine_distance", + {MakeArray({1.0, 2.0}), MakeArray({-3.0, -4.0})}, + 1.9838699100999073, + kDistanceFloatMargin}, + + // Dense array with significant floating point values difference. + // The 2 vectors are parallel so cosine distance = 0. + {"approx_cosine_distance", + {MakeArray({1e18, 2.0e18}), MakeArray({1.0e-18, 2.0e-18})}, + 0.0, + kDistanceFloatMargin}, + + // Dense array with significant floating point values difference. + // The 2 vectors are antiparallel so cosine distance = 2. + {"approx_cosine_distance", + {MakeArray({1e18, 2.0e18}), + MakeArray({-1.0e-18, -2.0e-18})}, + 2.0, + kDistanceFloatMargin}, + + // Dense array with significant floating point values difference. + // The 2 vectors are perpendicular so cosine distance = 1. + {"approx_cosine_distance", + {MakeArray({1e18, 2.0e18}), + MakeArray({-2.0e-18, 1.0e-18})}, + 1.0, + kDistanceFloatMargin}}; + + return tests; +} + std::vector GetFunctionTestsEuclideanDistance() { std::vector tests = { // NULL inputs @@ -571,7 +778,7 @@ std::vector GetFunctionTestsEuclideanDistance() { {MakeArray({5.0, 6.0}), MakeArrayEndingWithNull(std::vector{3.0})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, {"euclidean_distance", {MakeArray({{1, 1.0}, {2, 2.0}}), MakeNullInt64KeyArray()}, @@ -587,17 +794,17 @@ std::vector GetFunctionTestsEuclideanDistance() { MakeArrayEndingWithNull( std::vector{{3, 5.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, {"euclidean_distance", {MakeArray({{1, 5.0}, {2, 6.0}}), MakeArray({{1, 1.0}, {std::nullopt, 6.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, {"euclidean_distance", {MakeArray({{1, 5.0}, {2, 6.0}}), MakeArray({{1, 1.0}, {2, std::nullopt}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, {"euclidean_distance", {MakeNullStringKeyArray(), MakeArray({{"a", 5.0}, {"b", 6.0}})}, @@ -612,17 +819,17 @@ std::vector GetFunctionTestsEuclideanDistance() { {MakeArray({{"a", 5.0}, {"b", 6.0}}), MakeArrayEndingWithNull({{"a", 5.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, {"euclidean_distance", {MakeArray({{"a", 5.0}, {"b", 6.0}}), MakeArray({{"b", 1.0}, {std::nullopt, 6.0}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, {"euclidean_distance", {MakeArray({{"a", 5.0}, {"b", 6.0}}), MakeArray({{"b", 1.0}, {"a", std::nullopt}})}, values::NullDouble(), - absl::OutOfRangeError("NULL struct field.")}, + absl::OutOfRangeError("NULL struct field")}, // Zero length array. {"euclidean_distance", @@ -640,7 +847,7 @@ std::vector GetFunctionTestsEuclideanDistance() { {"euclidean_distance", {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, values::NullDouble(), - absl::OutOfRangeError("Array length mismatch 2 and 3.")}, + absl::OutOfRangeError("Array length mismatch 2 and 3")}, // Inf {"euclidean_distance", @@ -760,7 +967,7 @@ std::vector GetFunctionTestsEuclideanDistance() { {MakeArray({5.0, 6.0}), MakeArrayEndingWithNull(std::vector{3.0})}, values::NullDouble(), - absl::OutOfRangeError("NULL array element.")}, + absl::OutOfRangeError("NULL array element")}, // Long vector. // Distance = sqrt(64 * 64 * (2 - 1)^2) = 64 @@ -779,7 +986,7 @@ std::vector GetFunctionTestsEuclideanDistance() { {"euclidean_distance", {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, values::NullDouble(), - absl::OutOfRangeError("Array length mismatch 2 and 3.")}, + absl::OutOfRangeError("Array length mismatch 2 and 3")}, // Inf {"euclidean_distance", @@ -823,6 +1030,1214 @@ std::vector GetFunctionTestsEuclideanDistance() { return tests; } +std::vector GetFunctionTestsApproxEuclideanDistance() { + std::vector tests = { + // NULL inputs + {"approx_euclidean_distance", + {MakeNullDoubleArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"approx_euclidean_distance", + {MakeArray({5.0, 6.0}), MakeNullDoubleArray()}, + values::NullDouble()}, + {"approx_euclidean_distance", + {MakeNullDoubleArray(), MakeNullDoubleArray()}, + values::NullDouble()}, + {"approx_euclidean_distance", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError("NULL array element")}, + + // Zero length array. + {"approx_euclidean_distance", + {MakeArray(std::vector{}), MakeArray(std::vector{})}, + 0.0, + kDistanceFloatMargin}, + + // Zero length vector + {"approx_euclidean_distance", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 5.0, + kDistanceFloatMargin}, + + // Mismatch length vector. + {"approx_euclidean_distance", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError("Array length mismatch 2 and 3")}, + + // Inf + {"approx_euclidean_distance", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + + // NaN + {"approx_euclidean_distance", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + // Distance = sqrt(64 * 64 * (2 - 1)^2) = 64 + {"approx_euclidean_distance", + {MakeRepeatedArray(1.0, 64 * 64), MakeRepeatedArray(2.0, 64 * 64)}, + 64.0, + kDistanceFloatMargin}, + + // Overflow + {"approx_euclidean_distance", + {MakeArray({std::numeric_limits::max(), 2.0}), + MakeArray({3.0, 4.0})}, + values::NullDouble(), + absl::OutOfRangeError("double overflow: 1.79769e+308 * 1.79769e+308")}, + + // Dense array + {"approx_euclidean_distance", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 2.8284271247461903, + kDistanceFloatMargin}, + {"approx_euclidean_distance", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 3.6055512754639891, + kDistanceFloatMargin}, + + // Dense array with significantly different floating point values. + {"approx_euclidean_distance", + {MakeArray({3.0e140, 4.0e140}), + MakeArray({7.0e-140, 9.0e-140})}, + 5e140, + kDistanceFloatMargin}}; + + std::vector float_array_tests = { + // NULL inputs. + {"approx_euclidean_distance", + {MakeNullFloatArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"approx_euclidean_distance", + {MakeArray({5.0, 6.0}), MakeNullFloatArray()}, + values::NullDouble()}, + {"approx_euclidean_distance", + {MakeNullFloatArray(), MakeNullFloatArray()}, + values::NullDouble()}, + {"approx_euclidean_distance", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError("NULL array element")}, + + // Long vector. + // Distance = sqrt(64 * 64 * (2 - 1)^2) = 64 + {"approx_euclidean_distance", + {MakeRepeatedArray(1.0f, 64 * 64), MakeRepeatedArray(2.0f, 64 * 64)}, + 64.0, + kDistanceFloatMargin}, + + // Zero length vector. + {"approx_euclidean_distance", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 5.0, + kDistanceFloatMargin}, + + // Mismatched vector length. + {"approx_euclidean_distance", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError("Array length mismatch 2 and 3")}, + + // Inf + {"approx_euclidean_distance", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + {"approx_euclidean_distance", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + + // NaN + {"approx_euclidean_distance", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Dense array with float values as input. + {"approx_euclidean_distance", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 2.8284271247461903, + kDistanceFloatMargin}, + {"approx_euclidean_distance", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 3.6055512754639891, + kDistanceFloatMargin}, + + // Dense array with significantly different floating point values. + {"approx_euclidean_distance", + {MakeArray({3.0e18, 4.0e18}), + MakeArray({7.0e-19, 9.0e-19})}, + 5e18, + FloatMargin::UlpMargin(30)}}; + + return tests; +} + +std::vector GetFunctionTestsDotProduct() { + std::vector double_array_tests = { + // NULL inputs + {"dot_product", + {MakeNullDoubleArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"dot_product", + {MakeArray({5.0, 6.0}), MakeNullDoubleArray()}, + values::NullDouble()}, + {"dot_product", + {MakeNullDoubleArray(), MakeNullDoubleArray()}, + values::NullDouble()}, + {"dot_product", + {MakeArrayEndingWithNull(std::vector{3.0}), + MakeArray({5.0, 6.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"dot_product", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Zero length array. + {"dot_product", + {MakeArray(std::vector{}), MakeArray(std::vector{})}, + 0.0, + kDistanceFloatMargin}, + + // Zero vector + {"dot_product", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 0.0, + kDistanceFloatMargin}, + + // Mismatch length vector. + {"dot_product", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Array arguments to DOT_PRODUCT must have equal length. The given " + "arrays have lengths of 2 and 3")}, + + // Inf + {"dot_product", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + {"dot_product", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + -std::numeric_limits::infinity()}, + + // NaN + {"dot_product", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + {"dot_product", + {MakeRepeatedArray(1.0, 64 * 64), MakeRepeatedArray(2.0, 64 * 64)}, + 2.0 * 64 * 64, + kDistanceFloatMargin}, + + // Overflow + {"dot_product", + {MakeArray({std::numeric_limits::max(), 2.0}), + MakeArray({3.0, 4.0})}, + values::NullDouble(), + absl::OutOfRangeError("double overflow: 1.79769e+308 * 1.79769e+308")}, + + // Expected usage with floating point values. + {"dot_product", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 11.0, + kDistanceFloatMargin}, + {"dot_product", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 89.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"dot_product", + {MakeArray({3.0e140, 4.0e140}), + MakeArray({7.0e-140, 9.0e-140})}, + 57.0, // = (3.0 * 7.0) + (4.0 * 9.0) + kDistanceFloatMargin}}; + + std::vector float_array_tests = { + // NULL inputs. + {"dot_product", + {MakeNullFloatArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"dot_product", + {MakeArray({5.0, 6.0}), MakeNullFloatArray()}, + values::NullDouble()}, + {"dot_product", + {MakeNullFloatArray(), MakeNullFloatArray()}, + values::NullDouble()}, + {"dot_product", + {MakeArrayEndingWithNull(std::vector{3.0}), + MakeArray({5.0, 6.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"dot_product", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Long vector. + {"dot_product", + {MakeRepeatedArray(1.0f, 64 * 64), MakeRepeatedArray(2.0f, 64 * 64)}, + 2.0 * 64 * 64, + kDistanceFloatMargin}, + + // Zero vector. + {"dot_product", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 0.0, + kDistanceFloatMargin}, + + // Mismatched vector length. + {"dot_product", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Array arguments to DOT_PRODUCT must have equal length. The given " + "arrays have lengths of 2 and 3")}, + + // Inf + {"dot_product", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + {"dot_product", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + -std::numeric_limits::infinity()}, + + // NaN + {"dot_product", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Expected usage with floating point values. + {"dot_product", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 11.0, + kDistanceFloatMargin}, + {"dot_product", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 89.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"dot_product", + {MakeArray({3.0e18, 4.0e18}), + MakeArray({7.0e-18, 9.0e-18})}, + 57.0, // = (3.0 * 7.0) + (4.0 * 9.0) + FloatMargin::UlpMargin(30)}}; + + std::vector int64_array_tests = { + // NULL inputs. + {"dot_product", + {MakeNullInt64Array(), MakeArray({3, 4})}, + values::NullDouble()}, + {"dot_product", + {MakeArray({5, 6}), MakeNullInt64Array()}, + values::NullDouble()}, + {"dot_product", + {MakeNullInt64Array(), MakeNullInt64Array()}, + values::NullDouble()}, + {"dot_product", + {MakeArrayEndingWithNull(std::vector{3}), + MakeArray({5, 6})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"dot_product", + {MakeArray({5, 6}), + MakeArrayEndingWithNull(std::vector{3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Long vector. + {"dot_product", + {MakeRepeatedArray(1l, 64 * 64), MakeRepeatedArray(2l, 64 * 64)}, + 2.0 * 64 * 64}, + + // Zero vector. + {"dot_product", + {MakeArray({0, 0}), MakeArray({3, 4})}, + 0.0}, + + // Mismatched vector length. + {"dot_product", + {MakeArray({0, 1}), MakeArray({1, 2, 3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Array arguments to DOT_PRODUCT must have equal length. The given " + "arrays have lengths of 2 and 3")}}; + + std::vector tests; + tests.reserve(double_array_tests.size() + float_array_tests.size() + + int64_array_tests.size()); + for (auto& test_case : double_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_DOT_PRODUCT); + tests.emplace_back(test_case); + } + for (auto& test_case : float_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_DOT_PRODUCT); + tests.emplace_back(test_case); + } + for (auto& test_case : int64_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_DOT_PRODUCT); + tests.emplace_back(test_case); + } + + return tests; +} + +std::vector GetFunctionTestsApproxDotProduct() { + std::vector double_array_tests = { + // NULL inputs + {"approx_dot_product", + {MakeNullDoubleArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"approx_dot_product", + {MakeArray({5.0, 6.0}), MakeNullDoubleArray()}, + values::NullDouble()}, + {"approx_dot_product", + {MakeNullDoubleArray(), MakeNullDoubleArray()}, + values::NullDouble()}, + {"approx_dot_product", + {MakeArrayEndingWithNull(std::vector{3.0}), + MakeArray({5.0, 6.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"approx_dot_product", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Zero length array. + {"approx_dot_product", + {MakeArray(std::vector{}), MakeArray(std::vector{})}, + 0.0, + kDistanceFloatMargin}, + + // Zero vector + {"approx_dot_product", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 0.0, + kDistanceFloatMargin}, + + // Mismatch length vector. + {"approx_dot_product", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Array arguments to DOT_PRODUCT must have equal length. The given " + "arrays have lengths of 2 and 3")}, + + // Inf + {"approx_dot_product", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + {"approx_dot_product", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + -std::numeric_limits::infinity()}, + + // NaN + {"approx_dot_product", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + {"approx_dot_product", + {MakeRepeatedArray(1.0, 64 * 64), MakeRepeatedArray(2.0, 64 * 64)}, + 2.0 * 64 * 64, + kDistanceFloatMargin}, + + // Overflow + {"approx_dot_product", + {MakeArray({std::numeric_limits::max(), 2.0}), + MakeArray({3.0, 4.0})}, + values::NullDouble(), + absl::OutOfRangeError("double overflow: 1.79769e+308 * 1.79769e+308")}, + + // Expected usage with floating point values. + {"approx_dot_product", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 11.0, + kDistanceFloatMargin}, + {"approx_dot_product", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 89.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"approx_dot_product", + {MakeArray({3.0e140, 4.0e140}), + MakeArray({7.0e-140, 9.0e-140})}, + 57.0, // = (3.0 * 7.0) + (4.0 * 9.0) + kDistanceFloatMargin}}; + + std::vector float_array_tests = { + // NULL inputs. + {"approx_dot_product", + {MakeNullFloatArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"approx_dot_product", + {MakeArray({5.0, 6.0}), MakeNullFloatArray()}, + values::NullDouble()}, + {"approx_dot_product", + {MakeNullFloatArray(), MakeNullFloatArray()}, + values::NullDouble()}, + {"approx_dot_product", + {MakeArrayEndingWithNull(std::vector{3.0}), + MakeArray({5.0, 6.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"approx_dot_product", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Long vector. + {"approx_dot_product", + {MakeRepeatedArray(1.0f, 64 * 64), MakeRepeatedArray(2.0f, 64 * 64)}, + 2.0 * 64 * 64, + kDistanceFloatMargin}, + + // Zero vector. + {"approx_dot_product", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 0.0, + kDistanceFloatMargin}, + + // Mismatched vector length. + {"approx_dot_product", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Array arguments to DOT_PRODUCT must have equal length. The given " + "arrays have lengths of 2 and 3")}, + + // Inf + {"approx_dot_product", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + {"approx_dot_product", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + -std::numeric_limits::infinity()}, + + // NaN + {"approx_dot_product", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Expected usage with floating point values. + {"approx_dot_product", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 11.0, + kDistanceFloatMargin}, + {"approx_dot_product", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 89.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"approx_dot_product", + {MakeArray({3.0e18, 4.0e18}), + MakeArray({7.0e-18, 9.0e-18})}, + 57.0, // = (3.0 * 7.0) + (4.0 * 9.0) + FloatMargin::UlpMargin(30)}}; + + std::vector int64_array_tests = { + // NULL inputs. + {"approx_dot_product", + {MakeNullInt64Array(), MakeArray({3, 4})}, + values::NullDouble()}, + {"approx_dot_product", + {MakeArray({5, 6}), MakeNullInt64Array()}, + values::NullDouble()}, + {"approx_dot_product", + {MakeNullInt64Array(), MakeNullInt64Array()}, + values::NullDouble()}, + {"approx_dot_product", + {MakeArrayEndingWithNull(std::vector{3}), + MakeArray({5, 6})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"approx_dot_product", + {MakeArray({5, 6}), + MakeArrayEndingWithNull(std::vector{3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute DOT_PRODUCT with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Long vector. + {"approx_dot_product", + {MakeRepeatedArray(1l, 64 * 64), MakeRepeatedArray(2l, 64 * 64)}, + 2.0 * 64 * 64}, + + // Zero vector. + {"approx_dot_product", + {MakeArray({0, 0}), MakeArray({3, 4})}, + 0.0}, + + // Mismatched vector length. + {"approx_dot_product", + {MakeArray({0, 1}), MakeArray({1, 2, 3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Array arguments to DOT_PRODUCT must have equal length. The given " + "arrays have lengths of 2 and 3")}}; + + std::vector tests; + tests.reserve(double_array_tests.size() + float_array_tests.size() + + int64_array_tests.size()); + for (auto& test_case : double_array_tests) { + tests.emplace_back(test_case); + } + for (auto& test_case : float_array_tests) { + tests.emplace_back(test_case); + } + for (auto& test_case : int64_array_tests) { + tests.emplace_back(test_case); + } + + return tests; +} + +std::vector GetFunctionTestsManhattanDistance() { + std::vector double_array_tests = { + // NULL inputs + {"manhattan_distance", + {MakeNullDoubleArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"manhattan_distance", + {MakeArray({5.0, 6.0}), MakeNullDoubleArray()}, + values::NullDouble()}, + {"manhattan_distance", + {MakeNullDoubleArray(), MakeNullDoubleArray()}, + values::NullDouble()}, + {"manhattan_distance", + {MakeArrayEndingWithNull(std::vector{3.0}), + MakeArray({5.0, 6.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute MANHATTAN_DISTANCE with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"manhattan_distance", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute MANHATTAN_DISTANCE with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Zero length array. + {"manhattan_distance", + {MakeArray(std::vector{}), MakeArray(std::vector{})}, + 0.0, + kDistanceFloatMargin}, + + // Zero vector + {"manhattan_distance", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 7.0, + kDistanceFloatMargin}, + + // Mismatch length vector. + {"manhattan_distance", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError("Array arguments to MANHATTAN_DISTANCE must have " + "equal length. The given arrays have lengths of " + "2 and 3")}, + + // Inf + {"manhattan_distance", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + {"manhattan_distance", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + + // NaN + {"manhattan_distance", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + {"manhattan_distance", + {MakeRepeatedArray(1.0, 64 * 64), MakeRepeatedArray(2.0, 64 * 64)}, + 1.0 * 64 * 64, + kDistanceFloatMargin}, + + // Overflow + {"manhattan_distance", + {MakeArray({-std::numeric_limits::max(), 2.0}), + MakeArray({3.0, -std::numeric_limits::max()})}, + values::NullDouble(), + absl::OutOfRangeError("double overflow: 1.79769e+308 * 1.79769e+308")}, + + // Expected usage with floating point values. + {"manhattan_distance", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 4.0, + kDistanceFloatMargin}, + {"manhattan_distance", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 5.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"manhattan_distance", + {MakeArray({3.0e140, 4.0e140}), + MakeArray({7.0e-140, 9.0e-140})}, + 7.0e140, + kDistanceFloatMargin}}; + + std::vector float_array_tests = { + // NULL inputs. + {"manhattan_distance", + {MakeNullFloatArray(), MakeArray({3.0, 4.0})}, + values::NullDouble()}, + {"manhattan_distance", + {MakeArray({5.0, 6.0}), MakeNullFloatArray()}, + values::NullDouble()}, + {"manhattan_distance", + {MakeNullFloatArray(), MakeNullFloatArray()}, + values::NullDouble()}, + {"manhattan_distance", + {MakeArrayEndingWithNull(std::vector{3.0}), + MakeArray({5.0, 6.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute MANHATTAN_DISTANCE with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"manhattan_distance", + {MakeArray({5.0, 6.0}), + MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute MANHATTAN_DISTANCE with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Long vector. + {"manhattan_distance", + {MakeRepeatedArray(1.0f, 64 * 64), MakeRepeatedArray(2.0f, 64 * 64)}, + 1.0 * 64 * 64, + kDistanceFloatMargin}, + + // Zero vector. + {"manhattan_distance", + {MakeArray({0.0, 0.0}), MakeArray({3.0, 4.0})}, + 7.0, + kDistanceFloatMargin}, + + // Mismatched vector length. + {"manhattan_distance", + {MakeArray({0.0, 0.1}), MakeArray({1.0, 2.0, 3.0})}, + values::NullDouble(), + absl::OutOfRangeError("Array arguments to MANHATTAN_DISTANCE must have " + "equal length. The given arrays have lengths of " + "2 and 3")}, + + // Inf + {"manhattan_distance", + {MakeArray({1.0, std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + {"manhattan_distance", + {MakeArray({1.0, -std::numeric_limits::infinity()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::infinity()}, + + // NaN + {"manhattan_distance", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()}), + MakeArray({3.0, 4.0})}, + std::numeric_limits::quiet_NaN()}, + + // Expected usage with floating point values. + {"manhattan_distance", + {MakeArray({1.0, 2.0}), MakeArray({3.0, 4.0})}, + 4.0, + kDistanceFloatMargin}, + {"manhattan_distance", + {MakeArray({5.0, 6.0}), MakeArray({7.0, 9.0})}, + 5.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"manhattan_distance", + {MakeArray({3.0e18, 4.0e18}), + MakeArray({7.0e-18, 9.0e-18})}, + 7.0e18, + FloatMargin::UlpMargin(30)}}; + + std::vector int64_array_tests = { + // NULL inputs. + {"manhattan_distance", + {MakeNullInt64Array(), MakeArray({3, 4})}, + values::NullDouble()}, + {"manhattan_distance", + {MakeArray({5, 6}), MakeNullInt64Array()}, + values::NullDouble()}, + {"manhattan_distance", + {MakeNullInt64Array(), MakeNullInt64Array()}, + values::NullDouble()}, + {"manhattan_distance", + {MakeArrayEndingWithNull(std::vector{3}), + MakeArray({5, 6})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute MANHATTAN_DISTANCE with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the first " + "array argument at OFFSET 2")}, + {"manhattan_distance", + {MakeArray({5, 6}), + MakeArrayEndingWithNull(std::vector{3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute MANHATTAN_DISTANCE with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the second " + "array argument at OFFSET 2")}, + + // Long vector. + {"manhattan_distance", + {MakeRepeatedArray(1l, 64 * 64), MakeRepeatedArray(2l, 64 * 64)}, + 1.0 * 64 * 64}, + + // Zero vector. + {"manhattan_distance", + {MakeArray({0, 0}), MakeArray({3, 4})}, + 7.0}, + + // Mismatched vector length. + {"manhattan_distance", + {MakeArray({0, 1}), MakeArray({1, 2, 3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Array arguments to MANHATTAN_DISTANCE must have equal length. The " + "given arrays have lengths of 2 and 3")}}; + + std::vector tests; + tests.reserve(double_array_tests.size() + float_array_tests.size() + + int64_array_tests.size()); + for (auto& test_case : double_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_MANHATTAN_DISTANCE); + tests.emplace_back(test_case); + } + for (auto& test_case : float_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_MANHATTAN_DISTANCE); + tests.emplace_back(test_case); + } + for (auto& test_case : int64_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_MANHATTAN_DISTANCE); + tests.emplace_back(test_case); + } + + return tests; +} + +std::vector GetFunctionTestsL1Norm() { + std::vector double_array_tests = { + // NULL inputs + {"l1_norm", {MakeNullDoubleArray()}, values::NullDouble()}, + {"l1_norm", + {MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute L1_NORM with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the array " + "argument at OFFSET 2")}, + + // Zero length array. + {"l1_norm", + {MakeArray(std::vector{})}, + 0.0, + kDistanceFloatMargin}, + + // Zero vector + {"l1_norm", {MakeArray({0.0, 0.0})}, 0.0, kDistanceFloatMargin}, + + // Inf + {"l1_norm", + {MakeArray({1.0, std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + {"l1_norm", + {MakeArray({1.0, -std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + + // NaN + {"l1_norm", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + {"l1_norm", + {MakeRepeatedArray(1.0, 64 * 64)}, + 1.0 * 64 * 64, + kDistanceFloatMargin}, + + // Overflow + {"l1_norm", + {MakeArray({std::numeric_limits::max(), + -std::numeric_limits::max()})}, + values::NullDouble(), + absl::OutOfRangeError("double overflow: 1.79769e+308 * 1.79769e+308")}, + + // Expected usage with floating point values. + {"l1_norm", {MakeArray({1.0, 2.0})}, 3.0, kDistanceFloatMargin}, + {"l1_norm", + {MakeArray({-5.0, 6.0, -7.0, 8.0})}, + 26.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"l1_norm", + {MakeArray({3.0e140, 4.0e140, 7.0e-140, 9.0e-140})}, + 7.0e140, + kDistanceFloatMargin}}; + + std::vector float_array_tests = { + // NULL inputs + {"l1_norm", {MakeNullFloatArray()}, values::NullDouble()}, + {"l1_norm", + {MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute L1_NORM with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the array " + "argument at OFFSET 2")}, + + // Zero length array. + {"l1_norm", {MakeArray(std::vector{})}, 0.0, kDistanceFloatMargin}, + + // Zero vector + {"l1_norm", {MakeArray({0.0, 0.0})}, 0.0, kDistanceFloatMargin}, + + // Inf + {"l1_norm", + {MakeArray({1.0, std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + {"l1_norm", + {MakeArray({1.0, -std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + + // NaN + {"l1_norm", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + {"l1_norm", + {MakeRepeatedArray(1.0f, 64 * 64)}, + 1.0 * 64 * 64, + kDistanceFloatMargin}, + + // Expected usage with floating point values. + {"l1_norm", {MakeArray({1.0, 2.0})}, 3.0, kDistanceFloatMargin}, + {"l1_norm", + {MakeArray({-5.0, 6.0, -7.0, 8.0})}, + 26.0, + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"l1_norm", + {MakeArray({3.0e18, 4.0e18, 7.0e-18, 9.0e-18})}, + 7.0e18, + FloatMargin::UlpMargin(30)}}; + + std::vector int64_array_tests = { + // NULL inputs. + {"l1_norm", {MakeNullInt64Array()}, values::NullDouble()}, + {"l1_norm", + {MakeArrayEndingWithNull(std::vector{3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute L1_NORM with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the array " + "argument at OFFSET 2")}, + + // Long vector. + {"l1_norm", {MakeRepeatedArray(1l, 64 * 64)}, 1.0 * 64 * 64}, + + // Zero vector. + {"l1_norm", {MakeArray({0, 0})}, 0.0}, + + // Expected usage + {"l1_norm", {MakeArray({1, 2})}, 3.0}, + {"l1_norm", {MakeArray({-5, 6, -7, 8})}, 26.0}}; + + std::vector tests; + tests.reserve(double_array_tests.size() + float_array_tests.size() + + int64_array_tests.size()); + for (auto& test_case : double_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_L1_NORM); + tests.emplace_back(test_case); + } + for (auto& test_case : float_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_L1_NORM); + tests.emplace_back(test_case); + } + for (auto& test_case : int64_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_L1_NORM); + tests.emplace_back(test_case); + } + + return tests; +} + +std::vector GetFunctionTestsL2Norm() { + std::vector double_array_tests = { + // NULL inputs + {"l2_norm", {MakeNullDoubleArray()}, values::NullDouble()}, + {"l2_norm", + {MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute L2_NORM with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the array " + "argument at OFFSET 2")}, + + // Zero length array. + {"l2_norm", + {MakeArray(std::vector{})}, + 0.0, + kDistanceFloatMargin}, + + // Zero vector + {"l2_norm", {MakeArray({0.0, 0.0})}, 0.0, kDistanceFloatMargin}, + + // Inf + {"l2_norm", + {MakeArray({1.0, std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + {"l2_norm", + {MakeArray({1.0, -std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + + // NaN + {"l2_norm", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + {"l2_norm", + {MakeRepeatedArray(1.0, 64 * 64)}, + std::sqrt(1.0 * 64 * 64), + kDistanceFloatMargin}, + + // Overflow + {"l2_norm", + {MakeArray({std::numeric_limits::max(), + -std::numeric_limits::max()})}, + values::NullDouble(), + absl::OutOfRangeError("double overflow: 1.79769e+308 * 1.79769e+308")}, + + // Expected usage with floating point values. + {"l2_norm", + {MakeArray({1.0, 2.0})}, + std::sqrt(5.0), + kDistanceFloatMargin}, + {"l2_norm", + {MakeArray({-5.0, 6.0, -7.0, 8.0})}, + std::sqrt(25.0 + 36.0 + 49.0 + 64.0), + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"l2_norm", + {MakeArray({3.0e140, 4.0e140, 7.0e-140, 9.0e-140})}, + std::sqrt(25.0e280), + kDistanceFloatMargin}}; + + std::vector float_array_tests = { + // NULL inputs + {"l2_norm", {MakeNullFloatArray()}, values::NullDouble()}, + {"l2_norm", + {MakeArrayEndingWithNull(std::vector{3.0})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute L2_NORM with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the array " + "argument at OFFSET 2")}, + + // Zero length array. + {"l2_norm", {MakeArray(std::vector{})}, 0.0, kDistanceFloatMargin}, + + // Zero vector + {"l2_norm", {MakeArray({0.0, 0.0})}, 0.0, kDistanceFloatMargin}, + + // Inf + {"l2_norm", + {MakeArray({1.0, std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + {"l2_norm", + {MakeArray({1.0, -std::numeric_limits::infinity()})}, + std::numeric_limits::infinity()}, + + // NaN + {"l2_norm", + {MakeArray({1.0, std::numeric_limits::quiet_NaN()})}, + std::numeric_limits::quiet_NaN()}, + + // Long vector + {"l2_norm", + {MakeRepeatedArray(1.0f, 64 * 64)}, + std::sqrt(1.0 * 64 * 64), + kDistanceFloatMargin}, + + // Expected usage with floating point values. + {"l2_norm", + {MakeArray({1.0, 2.0})}, + std::sqrt(5.0), + kDistanceFloatMargin}, + {"l2_norm", + {MakeArray({-5.0, 6.0, -7.0, 8.0})}, + std::sqrt(25.0 + 36.0 + 49.0 + 64.0), + kDistanceFloatMargin}, + + // Expected usage with significantly different floating point values. + {"l2_norm", + {MakeArray({3.0e18, 4.0e18, 7.0e-18, 9.0e-18})}, + std::sqrt(25.0e36), + FloatMargin::UlpMargin(30)}}; + + std::vector int64_array_tests = { + // NULL inputs. + {"l2_norm", {MakeNullInt64Array()}, values::NullDouble()}, + {"l2_norm", + {MakeArrayEndingWithNull(std::vector{3})}, + values::NullDouble(), + absl::OutOfRangeError( + "Cannot compute L2_NORM with a NULL element, since it is " + "unclear if NULLs should be ignored, counted as a zero value, or " + "another interpretation. The NULL element was found in the array " + "argument at OFFSET 2")}, + + // Long vector. + {"l2_norm", + {MakeRepeatedArray(1l, 64 * 64)}, + std::sqrt(1.0 * 64 * 64), + kDistanceFloatMargin}, + + // Zero vector. + {"l2_norm", {MakeArray({0, 0})}, 0.0}, + + // Expected usage + {"l2_norm", + {MakeArray({1, 2})}, + std::sqrt(5.0), + kDistanceFloatMargin}, + {"l2_norm", + {MakeArray({-5, 6, -7, 8})}, + std::sqrt(25.0 + 36.0 + 49.0 + 64.0), + kDistanceFloatMargin}}; + + std::vector tests; + tests.reserve(double_array_tests.size() + float_array_tests.size() + + int64_array_tests.size()); + for (auto& test_case : double_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_L2_NORM); + tests.emplace_back(test_case); + } + for (auto& test_case : float_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_L2_NORM); + tests.emplace_back(test_case); + } + for (auto& test_case : int64_array_tests) { + test_case.params.AddRequiredFeature(FEATURE_V_1_4_L2_NORM); + tests.emplace_back(test_case); + } + + return tests; +} + std::vector GetFunctionTestsEditDistance() { std::vector tests = { // NULL values @@ -837,7 +2252,7 @@ std::vector GetFunctionTestsEditDistance() { {"edit_distance", {"abc", "", -3}, 3ll, - absl::OutOfRangeError("max_distance must be non-negative")}, + absl::OutOfRangeError("Max distance must be non-negative")}, // Unicode strings {"edit_distance", {"𨉟€", "€€"}, 1ll}, @@ -901,7 +2316,7 @@ std::vector GetFunctionTestsEditDistanceBytes() { {"edit_distance", {values::Bytes("abc"), values::Bytes(""), -3}, 3ll, - absl::OutOfRangeError("max_distance must be non-negative")}, + absl::OutOfRangeError("Max distance must be non-negative")}, // Unicode strings {"edit_distance", {values::Bytes("𨉟€"), values::Bytes("€€")}, 4ll}, diff --git a/zetasql/compliance/functions_testlib_format_floating_point.cc b/zetasql/compliance/functions_testlib_format_floating_point.cc index 903f4beef..4380082ec 100644 --- a/zetasql/compliance/functions_testlib_format_floating_point.cc +++ b/zetasql/compliance/functions_testlib_format_floating_point.cc @@ -260,4 +260,34 @@ std::vector GetFunctionTestsFormatFloatingPoint() { }); } +// Keeping these separate as GetFunctionTestsFormat() is being used by format +// unit tests too. We do not want these cases to run in those unit tests (which +// run in the context of PRODUCT_INTERNAL), as these cases are only applicable +// for PRODUCT_EXTERNAL. +std::vector +GetFunctionTestsFormatWithExternalModeFloatType() { + const float kFloatNan = std::numeric_limits::quiet_NaN(); + const float kFloatNegNan = absl::bit_cast(0xffc00000u); + const float kFloatPosInf = std::numeric_limits::infinity(); + const float kFloatNegInf = -std::numeric_limits::infinity(); + + return std::vector({ + {"format", + {"external: %T", kFloatNan}, + "external: CAST(\"nan\" AS FLOAT32)"}, + {"format", + {"external: %T", kFloatNegNan}, + "external: CAST(\"nan\" AS FLOAT32)"}, + {"format", + {"external: %T", kFloatPosInf}, + "external: CAST(\"inf\" AS FLOAT32)"}, + {"format", + {"external: %T", kFloatNegInf}, + "external: CAST(\"-inf\" AS FLOAT32)"}, + {"format", + {"external: %T", values::FloatArray({4, -2.5, kFloatNan})}, + "external: [4.0, -2.5, CAST(\"nan\" AS FLOAT32)]"}, + }); +} + } // namespace zetasql diff --git a/zetasql/compliance/functions_testlib_interval.cc b/zetasql/compliance/functions_testlib_interval.cc index 1c1ee0c65..67a4d7d1c 100644 --- a/zetasql/compliance/functions_testlib_interval.cc +++ b/zetasql/compliance/functions_testlib_interval.cc @@ -121,6 +121,24 @@ std::vector GetFunctionTestsIntervalConstructor() { {"", {Int64(5270400000), "MINUTE"}, Minutes(5270400000)}, {"", {Int64(8), "SECOND"}, Seconds(8)}, {"", {Int64(-316224000000), "SECOND"}, Seconds(-316224000000)}, + {"", {Int64(123), "MILLISECOND"}, Micros(123000)}, + {"", {Int64(-123), "MILLISECOND"}, Micros(-123000)}, + {"", {Int64(316224000000000), "MILLISECOND"}, Micros(316224000000000000)}, + {"", + {Int64(-316224000000000), "MILLISECOND"}, + Micros(-316224000000000000)}, + {"", {Int64(123), "MICROSECOND"}, Micros(123)}, + {"", {Int64(-123), "MICROSECOND"}, Micros(-123)}, + {"", + {Int64(316224000000000000), "MICROSECOND"}, + Micros(316224000000000000)}, + {"", + {Int64(-316224000000000000), "MICROSECOND"}, + Micros(-316224000000000000)}, + {"", QueryParamsWithResult({Int64(123), "NANOSECOND"}, Nanos(123)) + .AddRequiredFeature(FEATURE_TIMESTAMP_NANOS)}, + {"", QueryParamsWithResult({Int64(-123), "NANOSECOND"}, Nanos(-123)) + .AddRequiredFeature(FEATURE_TIMESTAMP_NANOS)}, // Exceeds maximum allowed value {"", {Int64(10001), "YEAR"}, NullInterval(), OUT_OF_RANGE}, @@ -131,6 +149,22 @@ std::vector GetFunctionTestsIntervalConstructor() { {"", {Int64(87840001), "HOUR"}, NullInterval(), OUT_OF_RANGE}, {"", {Int64(-5270400001), "MINUTE"}, NullInterval(), OUT_OF_RANGE}, {"", {Int64(316224000001), "SECOND"}, NullInterval(), OUT_OF_RANGE}, + {"", + {Int64(316224000000001), "MILLISECOND"}, + NullInterval(), + OUT_OF_RANGE}, + {"", + {Int64(-316224000000001), "MILLISECOND"}, + NullInterval(), + OUT_OF_RANGE}, + {"", + {Int64(316224000000000001), "MICROSECOND"}, + NullInterval(), + OUT_OF_RANGE}, + {"", + {Int64(-316224000000000001), "MICROSECOND"}, + NullInterval(), + OUT_OF_RANGE}, // Overflow in multiplication {"", @@ -142,9 +176,9 @@ std::vector GetFunctionTestsIntervalConstructor() { // Invalid datetime part fields {"", {Int64(0), "DAYOFWEEK"}, NullInterval(), OUT_OF_RANGE}, {"", {Int64(0), "DAYOFYEAR"}, NullInterval(), OUT_OF_RANGE}, - {"", {Int64(0), "MILLISECOND"}, NullInterval(), OUT_OF_RANGE}, - {"", {Int64(0), "MICROSECOND"}, NullInterval(), OUT_OF_RANGE}, - {"", {Int64(0), "NANOSECOND"}, NullInterval(), OUT_OF_RANGE}, + {"", QueryParamsWithResult({Int64(0), "NANOSECOND"}, NullInterval(), + OUT_OF_RANGE) + .AddProhibitedFeature(FEATURE_TIMESTAMP_NANOS)}, {"", {Int64(0), "DATE"}, NullInterval(), OUT_OF_RANGE}, {"", {Int64(0), "DATETIME"}, NullInterval(), OUT_OF_RANGE}, {"", {Int64(0), "TIME"}, NullInterval(), OUT_OF_RANGE}, diff --git a/zetasql/compliance/functions_testlib_json.cc b/zetasql/compliance/functions_testlib_json.cc index 6e24af8c7..954b14cfc 100644 --- a/zetasql/compliance/functions_testlib_json.cc +++ b/zetasql/compliance/functions_testlib_json.cc @@ -44,6 +44,10 @@ namespace zetasql { namespace { constexpr absl::StatusCode OUT_OF_RANGE = absl::StatusCode::kOutOfRange; +Value ParseJson(absl::string_view json) { + return Json(JSONValue::ParseJSONString(json).value()); +} + // Note: not enclosed in {}. constexpr absl::string_view kDeepJsonString = R"( "a" : { @@ -413,6 +417,14 @@ const std::vector GetJsonTestsCommon( {json9, String("$..1")}, json_constructor(std::nullopt), OUT_OF_RANGE}, + {query_fn_name, + {json9, String("lax $.a")}, + json_constructor(std::nullopt), + OUT_OF_RANGE}, + {query_fn_name, + {json9, String("lax recursive $.a")}, + json_constructor(std::nullopt), + OUT_OF_RANGE}, }; if (sql_standard_mode) { all_tests.push_back({query_fn_name, @@ -645,6 +657,134 @@ std::vector GetFunctionTestsJsonIsNull() { return v; } +// Add test cases when "lax" and "lax recursive" have the same result. +void JsonQueryLaxRecursiveSameResultTest(absl::string_view json_input, + absl::string_view json_path, + absl::string_view result, + std::vector& tests) { + const Value input_json = ParseJson(json_input); + const Value result_json = ParseJson(result); + tests.push_back({"json_query", + {input_json, String(absl::StrCat("lax ", json_path))}, + result_json}); + tests.push_back( + {"json_query", + {input_json, String(absl::StrCat("lax recursive ", json_path))}, + result_json}); + tests.push_back( + {"json_query", + {input_json, String(absl::StrCat("Recursive LAX ", json_path))}, + result_json}); +} + +// Add test cases when "lax" and "lax recursive" have different result. +void JsonQueryLaxRecursiveDiffResultTest(absl::string_view json_input, + absl::string_view json_path, + absl::string_view lax_result, + absl::string_view lax_recursive_result, + std::vector& tests) { + const Value input_json = ParseJson(json_input); + tests.push_back({"json_query", + {input_json, String(absl::StrCat("lax ", json_path))}, + ParseJson(lax_result)}); + const Value lax_recursive_result_json = ParseJson(lax_recursive_result); + tests.push_back( + {"json_query", + {input_json, String(absl::StrCat("lax recursive ", json_path))}, + lax_recursive_result_json}); + tests.push_back( + {"json_query", + {input_json, String(absl::StrCat("recursive lax ", json_path))}, + lax_recursive_result_json}); + + tests.push_back( + {"json_query", + {input_json, String(absl::StrCat("Recursive LAX ", json_path))}, + lax_recursive_result_json}); +} + +std::vector GetFunctionTestsJsonQueryLax() { + std::vector tests; + // Basic cases. + // + // Key doesn't exist. + JsonQueryLaxRecursiveSameResultTest(R"({"a":{"b":1}})", "$.A", "[]", tests); + JsonQueryLaxRecursiveSameResultTest(R"({"a":{"b":1}})", "$.b", "[]", tests); + JsonQueryLaxRecursiveSameResultTest(R"({"a":{"b":1}})", "$.a.c", "[]", tests); + JsonQueryLaxRecursiveSameResultTest(R"({"a":{"b":1}})", "$[1]", "[]", tests); + JsonQueryLaxRecursiveSameResultTest(R"({"a":{"b":1}})", "$[0].b", "[]", + tests); + // NULL JSON value input. + JsonQueryLaxRecursiveSameResultTest("null", "$", "[null]", tests); + JsonQueryLaxRecursiveSameResultTest("null", "$.a", "[]", tests); + JsonQueryLaxRecursiveSameResultTest("null", "$[0]", "[null]", tests); + JsonQueryLaxRecursiveSameResultTest("null", "$[1]", "[]", tests); + // Simple object match with matching types. + JsonQueryLaxRecursiveSameResultTest(R"({"a":null})", "$", R"([{"a":null}])", + tests); + JsonQueryLaxRecursiveSameResultTest(R"({"a":null})", "$.a", "[null]", tests); + JsonQueryLaxRecursiveSameResultTest(R"({"a":1, "b":2})", "$.b", "[2]", tests); + // Simple array match. + JsonQueryLaxRecursiveSameResultTest("[[null]]", "$", "[[[null]]]", tests); + JsonQueryLaxRecursiveSameResultTest("[[null]]", "$[0]", "[[null]]", tests); + // Index larger than array size. + JsonQueryLaxRecursiveSameResultTest("[[null], 1]", "$[2]", "[]", tests); + // Single level of array. + JsonQueryLaxRecursiveSameResultTest(R"([{"a":1}])", "$.a", "[1]", tests); + // 2-levels of arrays. + JsonQueryLaxRecursiveDiffResultTest(R"([[{"a":1}]])", "$.a", "[]", "[1]", + tests); + // Mix of 1,2,3 levels of arrays. + JsonQueryLaxRecursiveDiffResultTest(R"([{"a":1},[[{"a":2}]],[[[{"a":3}]]]])", + "$.a", "[1]", "[1,2,3]", tests); + JsonQueryLaxRecursiveDiffResultTest( + R"([{"a":1}, {"b":2}, [[{"a":3}]],[[{"b":4}]],[[[{"a":5}]]]])", "$.b", + "[2]", "[2,4]", tests); + // Wrap non-array before match. + JsonQueryLaxRecursiveSameResultTest(R"({"a":1})", "$[0].a", "[1]", tests); + JsonQueryLaxRecursiveSameResultTest(R"({"a":1})", "$[0][0].a", "[1]", tests); + // Key 'b' doesn't exist in matched JSON subtree. + JsonQueryLaxRecursiveSameResultTest(R"({"a":1})", "$[0][0].b", "[]", tests); + // Second index is larger than size of wrapped array. + JsonQueryLaxRecursiveSameResultTest(R"({"a":1})", "$[0][1].a", "[]", tests); + // Complex Cases + // + // 'b' is not included in every nested object. + JsonQueryLaxRecursiveDiffResultTest( + R"({"a":[{"b":1}, {"c":2}, {"b":3}, [{"b":4}]]})", "$.a.b", "[1,3]", + "[1,3,4]", tests); + // 'c' is not included in every nested object. + JsonQueryLaxRecursiveDiffResultTest( + R"({"a":[{"b":1}, {"c":2}, {"b":3}, [{"b":4}], null, + [[{"c":5}]]]})", + "$.a.c", "[2]", "[2,5]", tests); + // 'a.b' has different levels of nestedness + JsonQueryLaxRecursiveDiffResultTest( + R"({"a":[1, {"b":2}, [{"b":3}, null, 4, {"b":[5]}]]})", "$.a.b", "[2]", + "[2, 3, [5]]", tests); + // Both 'a' and 'a.b' have different levels of nestedness. + JsonQueryLaxRecursiveDiffResultTest( + R"([{"a":[1, {"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "$.a.b", "[2]", "[2, 3, 4, 5]", tests); + JsonQueryLaxRecursiveDiffResultTest( + R"([{"a":[{"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "$.a[0].b", "[2]", "[2, 5]", tests); + // Specific array indices and different levels of nestedness with + // autowrap. + JsonQueryLaxRecursiveSameResultTest( + R"([{"a":[{"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "$[1].a[0].b", "[5]", tests); + JsonQueryLaxRecursiveSameResultTest( + R"([{"a":[{"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "$[1].a[0][0][0].b", "[5]", tests); + + return tests; +} + std::vector GetFunctionTestsParseJson() { // TODO: Currently these tests only verify if PARSE_JSON produces // the same output as JSONValue::ParseJsonString. A better test would need @@ -1470,14 +1610,6 @@ std::vector GetFunctionTestsJsonObjectArrays( return tests; } -namespace { - -Value ParseJson(absl::string_view json) { - return Json(JSONValue::ParseJSONString(json).value()); -} - -} // namespace - std::vector GetFunctionTestsJsonRemove() { absl::string_view json_string = R"({"a": 10, "b": [true, ["foo", null, "bar"], {"c": [20]}]})"; diff --git a/zetasql/compliance/functions_testlib_range.cc b/zetasql/compliance/functions_testlib_range.cc index 5bb355e5c..a2d39af8e 100644 --- a/zetasql/compliance/functions_testlib_range.cc +++ b/zetasql/compliance/functions_testlib_range.cc @@ -88,7 +88,7 @@ std::vector WrapFeatures( return wrapped_tests; } -std::vector EqualityTests(const std::vector& values, +std::vector EqualityTests(absl::Span values, const Value& unbounded, const Value& null_range) { // Verify t1 < t2 < t3 and provided values are of the same type @@ -129,7 +129,7 @@ std::vector EqualityTests(const std::vector& values, }; } -std::vector ComparisonTests(const std::vector& values, +std::vector ComparisonTests(absl::Span values, const Value& unbounded, const Value& null_range) { // Verify t1 < t2 < t3 < t4 and provided values are of the same type @@ -232,9 +232,9 @@ std::vector ComparisonTests(const std::vector& values, }; } -std::vector RangeOverlapsTests( - const std::vector& values, const Value& unbounded, - const Value& null_range) { +std::vector RangeOverlapsTests(absl::Span values, + const Value& unbounded, + const Value& null_range) { // Verify t1 < t2 < t3 < t4 and provided values are of the same type CommonInitialCheck(values, unbounded, null_range); const Value &t1 = values[0], &t2 = values[1], &t3 = values[2], @@ -471,7 +471,7 @@ std::vector RangeOverlapsTests( } std::vector RangeIntersectTests( - const std::vector& values, const Value& unbounded, + absl::Span values, const Value& unbounded, const Value& null_range) { // Verify t1 < t2 < t3 < t4 and provided values are of the same type CommonInitialCheck(values, unbounded, null_range); @@ -656,9 +656,9 @@ std::vector RangeIntersectTests( }; } -std::vector RangeContainsTests( - const std::vector& values, const Value& unbounded, - const Value& null_range) { +std::vector RangeContainsTests(absl::Span values, + const Value& unbounded, + const Value& null_range) { // Verify t1 < t2 < t3 < t4 < t5 and provided values are of the same type CommonInitialCheck(values, unbounded, null_range); const Value &t1 = values[0], &t2 = values[1], &t3 = values[2], diff --git a/zetasql/compliance/known_errors/BUILD b/zetasql/compliance/known_errors/BUILD index af132b547..445c6b232 100644 --- a/zetasql/compliance/known_errors/BUILD +++ b/zetasql/compliance/known_errors/BUILD @@ -14,8 +14,6 @@ # limitations under the License. # -# Placeholder: load py_binary - package( default_visibility = ["//zetasql/base:zetasql_implementation"], ) diff --git a/zetasql/compliance/runtime_expected_errors.cc b/zetasql/compliance/runtime_expected_errors.cc index 5503644ee..25639c096 100644 --- a/zetasql/compliance/runtime_expected_errors.cc +++ b/zetasql/compliance/runtime_expected_errors.cc @@ -22,7 +22,6 @@ #include #include "zetasql/compliance/matchers.h" -#include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/strings/substitute.h" @@ -41,7 +40,7 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( // which b=0. // // Preventing such errors in the randomized test framework is possible in - // principle but a complex challenge. Ingoring these runtime errors means we + // principle but a complex challenge. Ignoring these runtime errors means we // could miss genuine bugs whereby runtime errors incorrectly occur. // However, that is not a common bug pattern compared to e.g. ZETASQL_RET_CHECKS and // incorrect results, and this is the balance we have currently struck. @@ -80,6 +79,10 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( "Elements in input array to RANGE_BUCKET")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, "Invalid return_position_after_match")); + error_matchers.emplace_back(std::make_unique( + absl::StatusCode::kInvalidArgument, + "Elementwise aggregate requires all non-NULL arrays have the same " + "length")); // Out of range errors // @@ -258,9 +261,14 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, "LIKE pattern ends with a backslash")); - // Expected errors for COSINE_DISTANCE, EUCLIDEAN_DISTANCE, EDIT_DISTANCE. + // Expected errors for distance functions: COSINE_DISTANCE, + // EUCLIDEAN_DISTANCE, DOT_PRODUCT, MANHATTAN_DISTANCE, L1_NORM, L2_NORM, + // EDIT_DISTANCE. error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, "Array length mismatch:")); + error_matchers.emplace_back(std::make_unique( + absl::StatusCode::kOutOfRange, + "Array arguments to ([A-Z0-9_]+) must have equal length")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, "Cannot compute .* distance against zero vector")); @@ -269,10 +277,13 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( "(?m)Duplicate index (.|\\n)* found in the input array")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, "NULL array element")); + error_matchers.emplace_back(std::make_unique( + absl::StatusCode::kOutOfRange, + "Cannot compute ([A-Z0-9_]+) with a NULL element")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, "NULL struct field")); error_matchers.emplace_back(std::make_unique( - absl::StatusCode::kOutOfRange, "max_distance must be non-negative")); + absl::StatusCode::kOutOfRange, "Max distance must be non-negative")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, "EDIT_DISTANCE .* invalid UTF8 string")); @@ -454,6 +465,9 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( absl::StatusCode::kInvalidArgument, "ARRAY_IS_DISTINCT cannot be used on argument of type .* because the " "array's element type does not support grouping")); + error_matchers.emplace_back(std::make_unique( + absl::StatusCode::kInvalidArgument, + "LIMIT ... OFFSET ... expects INT64, got (.+)")); // TODO: Remove after the bug is fixed. // Due to the above expected errors, rqg could generate invalid expressions. @@ -462,7 +476,7 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( absl::StatusCode::kInvalidArgument, "No matching signature for function " "(ARRAY_FILTER|ARRAY_TRANSFORM|ARRAY_INCLUDES|ARRAY_FIND|ARRAY_FIND_ALL|" - "ARRAY_OFFSET|ARRAY_OFFSETS) .*")); + "ARRAY_OFFSET|ARRAY_OFFSETS|ARRAY_ZIP) .*")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kInvalidArgument, @@ -470,6 +484,14 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( "contains a volatile expression which must be explicitly listed as a " "group by key")); + // TODO: Remove after the bug is fixed. + // We shouldn't be generating GROUP BY ordinal syntax related to WITH + // expression containing CAST between the same type, which triggers analysis + // time errors. + error_matchers.emplace_back(std::make_unique( + absl::StatusCode::kInvalidArgument, + "expression references (.+) which is neither grouped nor aggregated")); + // HLL sketch format errors // error_matchers.emplace_back(std::make_unique( @@ -626,11 +648,12 @@ std::unique_ptr> RuntimeExpectedErrorMatcher( "number)")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, - "JSON number: (-?\\d+) cannot be converted to DOUBLE without loss of " - "precision")); + "JSON number: (-?\\d+) cannot be converted to " + "(DOUBLE|FLOAT64|FLOAT|FLOAT32) without loss of precision")); error_matchers.emplace_back(std::make_unique( absl::StatusCode::kOutOfRange, - "The provided JSON number: .+ cannot be converted to an integer")); + "The provided JSON number: .+ cannot be converted to an " + "(integer|int64_t|int32|uint64_t|uint32)")); // TODO PARSE_JSON sometimes is generated with invalid string // inputs. error_matchers.emplace_back(std::make_unique( diff --git a/zetasql/compliance/sql_test_base.cc b/zetasql/compliance/sql_test_base.cc index 175125978..53aec23b7 100644 --- a/zetasql/compliance/sql_test_base.cc +++ b/zetasql/compliance/sql_test_base.cc @@ -197,6 +197,12 @@ constexpr absl::string_view kTimeResolutionNanosLabel = constexpr absl::string_view kTimeResolutionMicrosLabel = "TimeResolution:MicrosOnly"; +// We include broken [prepare_database] statements with this name in some +// test files to make sure the framework is handling things correctly when +// an engine does not support something inside a function or view definition. +constexpr absl::string_view kSkipFailedReferenceSetup = + "skip_failed_reference_setup"; + // For a long string, a signature is generated, which is the left 8 // characters of the string, followed by the fingerprint of the string, // followed by the right 8 characters of the string. @@ -1478,7 +1484,6 @@ static std::unique_ptr CreateTestSetupDriver() { options.set_product_mode(zetasql::ProductMode::PRODUCT_INTERNAL); // Enable all possible language features. options.EnableMaximumLanguageFeaturesForDevelopment(); - options.EnableLanguageFeature(FEATURE_TEXTMAPPER_PARSER); options.EnableLanguageFeature(FEATURE_SHADOW_PARSING); // Allow CREATE TABLE AS SELECT in [prepare_database] statements. options.AddSupportedStatementKind(RESOLVED_CREATE_TABLE_AS_SELECT_STMT); @@ -1758,6 +1763,52 @@ void SQLTestBase::StepPrepareTimeZoneProtosEnums() { } } +absl::Status SQLTestBase::AddViews( + absl::Span create_view_stmts, bool cache_stmts) { + bool is_testing_test_framework = + test_case_options_->name() == kSkipFailedReferenceSetup; + absl::Status reference_status = + reference_driver()->AddViews(create_view_stmts); + ZETASQL_RET_CHECK_NE(reference_status.ok(), is_testing_test_framework) + << reference_status; + absl::Status driver_status = driver()->AddViews(create_view_stmts); + if (!driver_status.ok()) { + // We don't want to fail the test because of a database setup failure. + // Any test statements that depend on this schema object should cause + // the test to fail in a more useful way. + ABSL_LOG(ERROR) << "Prepare database failed with error: " << driver_status; + } + if (cache_stmts && reference_status.ok() && driver_status.ok()) { + for (const auto& stmt : create_view_stmts) { + view_stmt_cache_.push_back(stmt); + } + } + return absl::OkStatus(); +} + +absl::Status SQLTestBase::AddFunctions( + absl::Span create_function_stmts, bool cache_stmts) { + bool is_testing_test_framework = + test_case_options_->name() == kSkipFailedReferenceSetup; + absl::Status reference_status = + reference_driver()->AddSqlUdfs(create_function_stmts); + ZETASQL_RET_CHECK_NE(reference_status.ok(), is_testing_test_framework) + << reference_status; + absl::Status driver_status = driver()->AddSqlUdfs(create_function_stmts); + if (!driver_status.ok()) { + // We don't want to fail the test because of a database setup failure. + // Any test statements that depend on this schema object should cause + // the test to fail in a more useful way. + ABSL_LOG(ERROR) << "Prepare database failed with error: " << driver_status; + } + if (cache_stmts && reference_status.ok() && driver_status.ok()) { + for (const auto& stmt : create_function_stmts) { + udf_stmt_cache_.push_back(stmt); + } + } + return absl::OkStatus(); +} + void SQLTestBase::StepPrepareDatabase() { if (test_case_options_ != nullptr && !test_case_options_->prepare_database()) { @@ -1798,48 +1849,20 @@ void SQLTestBase::StepPrepareDatabase() { CheckCancellation(status, "Wrong placement of prepare_database"); } - // We include broken [prepare_database] statements with this name in some - // test files to make sure the framework is handling things correctly when - // an engine does not support something inside a function or view definition. - constexpr absl::string_view kSkipFailedReferenceSetup = - "skip_failed_reference_setup"; - if (GetStatementKind(sql_) == RESOLVED_CREATE_FUNCTION_STMT) { - bool is_testing_test_framework = - test_case_options_->name() == kSkipFailedReferenceSetup; - absl::Status reference_status = reference_driver()->AddSqlUdfs({sql_}); - EXPECT_NE(reference_status.ok(), is_testing_test_framework) - << reference_status; - absl::Status driver_status = driver()->AddSqlUdfs({sql_}); - if (!driver_status.ok()) { - // We don't want to fail the test because of a database setup failure. - // Any test statements that depend on this schema object should cause - // the test to fail in a more useful way. - ABSL_LOG(ERROR) << "Prepare database failed with error: " << driver_status; - } + ZETASQL_EXPECT_OK(AddFunctions({sql_}, /*cache_stmts=*/true)); return; } if (GetStatementKind(sql_) == RESOLVED_CREATE_VIEW_STMT) { - bool is_testing_test_framework = - test_case_options_->name() == kSkipFailedReferenceSetup; - absl::Status reference_status = reference_driver()->AddViews({sql_}); - EXPECT_NE(reference_status.ok(), is_testing_test_framework) - << reference_status; - absl::Status driver_status = driver()->AddViews({sql_}); - if (!driver_status.ok()) { - // We don't want to fail the test because of a database setup failure. - // Any test statements that depend on this schema object should cause - // the test to fail in a more useful way. - ABSL_LOG(ERROR) << "Prepare database failed with error: " << driver_status; - } + ZETASQL_EXPECT_OK(AddViews({sql_}, /*cache_stmts=*/true)); return; } if (GetStatementKind(sql_) == RESOLVED_CREATE_TABLE_AS_SELECT_STMT) { ReferenceDriver::ExecuteStatementAuxOutput aux_output; - ABSL_CHECK(test_setup_driver_->language_options().LanguageFeatureEnabled( - FEATURE_TEXTMAPPER_PARSER)); + ABSL_CHECK(!test_setup_driver_->language_options().LanguageFeatureEnabled( + FEATURE_DISABLE_TEXTMAPPER_PARSER)); ABSL_CHECK(test_setup_driver_->language_options().LanguageFeatureEnabled( FEATURE_SHADOW_PARSING)); CheckCancellation( @@ -2164,12 +2187,11 @@ bool SQLTestBase::IsFeatureFalselyRequired( language_options.SetEnabledLanguageFeatures(features_minus_one); reference_driver()->SetLanguageOptions(language_options); auto modified_run_result = RunSQL(sql, param_map); - if (!modified_run_result.ok() && - modified_run_result.status() == initial_run_status) { - // The test case is expecting an error with error message. We see the same - // expected error and message with and without 'feature', we can conclude - // that 'feature' is not actually required. - return true; + if (!modified_run_result.ok()) { + // The test case is expecting an error with error message. If we see the + // same expected error and message with and without 'feature', we can + // conclude that 'feature' is not actually required. + return modified_run_result.status() == initial_run_status; } if (absl::IsOutOfRange(initial_run_status) && @@ -2183,8 +2205,8 @@ bool SQLTestBase::IsFeatureFalselyRequired( return false; } - // The test case is expecting a result. We see the same result with - // and without 'feature'. We can conclude that 'feature' is not + // The test case is expecting a result. If we see the same result with + // and without 'feature', we can conclude that 'feature' is not // actually required. return ::testing::Value( modified_run_result, @@ -2345,6 +2367,8 @@ std::string SQLTestBase::ValueToSafeString(const Value& value) const { return absl::StrCat("_", SignatureOfString(value.DebugString())); case TYPE_ENUM: return absl::StrCat("_", value.DebugString()); + case TYPE_TOKENLIST: + return absl::StrCat("_", SignatureOfString(value.DebugString())); case TYPE_DATE: case TYPE_ARRAY: case TYPE_STRUCT: @@ -2403,7 +2427,8 @@ std::string SQLTestBase::GenerateCodeBasedStatementName( param_strs.reserve(parameters.size()); for (const std::pair& entry : parameters) { param_strs.emplace_back(absl::StrCat( - absl::StrReplaceAll(entry.second.type()->TypeName(product_mode()), + absl::StrReplaceAll(entry.second.type()->TypeName( + product_mode(), use_external_float32()), {{".", "_"}}), ValueToSafeString(entry.second))); } @@ -2600,6 +2625,13 @@ ProductMode SQLTestBase::product_mode() const { return driver()->GetSupportedLanguageOptions().product_mode(); } +bool SQLTestBase::use_external_float32() const { + return driver()->GetSupportedLanguageOptions().product_mode() == + ProductMode::PRODUCT_EXTERNAL && + !driver()->GetSupportedLanguageOptions().LanguageFeatureEnabled( + LanguageFeature::FEATURE_V_1_4_DISABLE_FLOAT32); +} + void SQLTestBase::SetUp() { ZETASQL_EXPECT_OK(CreateDatabase(TestDatabase{})); } absl::Status SQLTestBase::CreateDatabase() { @@ -2612,6 +2644,13 @@ absl::Status SQLTestBase::CreateDatabase() { ZETASQL_RETURN_IF_ERROR(driver()->CreateDatabase(test_db_)); ZETASQL_RETURN_IF_ERROR(reference_driver()->CreateDatabase(test_db_)); + if (!udf_stmt_cache_.empty()) { + ZETASQL_RETURN_IF_ERROR(AddFunctions(udf_stmt_cache_, /*cache_stmts=*/false)); + } + if (!view_stmt_cache_.empty()) { + ZETASQL_RETURN_IF_ERROR(AddViews(view_stmt_cache_, /*cache_stmts=*/false)); + } + // Only create test database once. test_db_.clear(); diff --git a/zetasql/compliance/sql_test_base.h b/zetasql/compliance/sql_test_base.h index b3b04af52..14ce8513d 100644 --- a/zetasql/compliance/sql_test_base.h +++ b/zetasql/compliance/sql_test_base.h @@ -288,6 +288,12 @@ class SQLTestBase : public ::testing::TestWithParam { // Get the current product mode of the test driver. ProductMode product_mode() const; + // Returns true if the test driver wants FLOAT32 as the name for TYPE_FLOAT + // in the external mode. + // TODO: Remove once all engines are updated to use FLOAT32 in + // the external mode. + bool use_external_float32() const; + // // MakeScopedLabel() // @@ -756,6 +762,13 @@ class SQLTestBase : public ::testing::TestWithParam { // are used to create a test database. TestDatabase test_db_; + // The container for CREATE FUNCTION statements that were executed during + // a [prepare_database] step. + std::vector udf_stmt_cache_; + // The container for CREATE VIEW statements that were executed during + // a [prepare_database] step. + std::vector view_stmt_cache_; + // Code-based label set. Use a vector since labels might be added multiple // times. std::vector code_based_labels_; @@ -823,6 +836,24 @@ class SQLTestBase : public ::testing::TestWithParam { // as all the tables. virtual absl::Status CreateDatabase(); + // Helper to add views from CREATE VIEW statements to the reference and + // test drivers. + // + // If `cache_stmts` is true, cache the statements in `view_stmt_cache_` + // if both the reference and test drivers are able to successfully execute + // them. + absl::Status AddViews(absl::Span create_view_stmts, + bool cache_stmts = false); + + // Helper to add functions from CREATE FUNCTION statements to the reference + // and test drivers. + // + // If `cache_stmts` is true, cache the statements in `udf_stmt_cache_` + // if both the reference and test drivers are able to successfully execute + // them. + absl::Status AddFunctions(absl::Span create_function_stmts, + bool cache_stmts = false); + // Add and remove labels to the code-based label set. Duplicated labels are // allowed. Labels will be added in the specified order, and be removed in // the reverse order. When removing labels, validates the to-be removed diff --git a/zetasql/compliance/test_driver.h b/zetasql/compliance/test_driver.h index aa9184e0d..a929fab5b 100644 --- a/zetasql/compliance/test_driver.h +++ b/zetasql/compliance/test_driver.h @@ -144,7 +144,7 @@ class TestTableOptions { } const std::string& userid_column() const { return userid_column_; } - void set_userid_column(const std::string& userid_column) { + void set_userid_column(absl::string_view userid_column) { userid_column_ = userid_column; } diff --git a/zetasql/compliance/testdata/anonymization.test b/zetasql/compliance/testdata/anonymization.test index 8f0d75a24..b65929359 100644 --- a/zetasql/compliance/testdata/anonymization.test +++ b/zetasql/compliance/testdata/anonymization.test @@ -2545,7 +2545,7 @@ ARRAY>[{11}] # Public groups without group by [name=anonymization_public_groups_without_group_by] -[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS] +[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY] [labels=anonymization] SELECT WITH ANONYMIZATION OPTIONS(epsilon=1e20, delta=1.0, group_selection_strategy=PUBLIC_GROUPS) ANON_COUNT(int64_val) @@ -2556,7 +2556,7 @@ ARRAY>[{8}] # Public groups with group by [name=anonymization_public_groups_with_group_by] -[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS] +[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY] [labels=anonymization] SELECT WITH ANONYMIZATION OPTIONS(epsilon=1e20, delta=1.0, group_selection_strategy=PUBLIC_GROUPS) int64_val, ANON_COUNT(* CLAMPED BETWEEN 0 AND 1) @@ -2577,7 +2577,7 @@ ARRAY>[unknown order: # Public groups with group by [name=anonymization_public_groups_with_group_by_and_max_groups_contributed] -[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS] +[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY] [labels=anonymization] SELECT WITH ANONYMIZATION OPTIONS(epsilon=1e20, delta=1.0, group_selection_strategy=PUBLIC_GROUPS, max_groups_contributed=3) int64_val, ANON_COUNT(* CLAMPED BETWEEN 0 AND 1) @@ -2695,7 +2695,7 @@ SELECT WITH ANONYMIZATION OPTIONS(epsilon=1e20, k_threshold=5, min_privacy_un... # An error should be reported if min_privacy_units_per_group is set # simultaneously with the PUBLIC_GROUPS groups selection strategy. [name=anonymization_public_groups_and_min_privacy_units_per_group_set] -[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,DIFFERENTIAL_PRIVACY_MIN_PRIVACY_UNITS_PER_GROUP] +[required_features=ANONYMIZATION,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,V_1_1_WITH_ON_SUBQUERY,DIFFERENTIAL_PRIVACY_MIN_PRIVACY_UNITS_PER_GROUP] [labels=anonymization] SELECT WITH ANONYMIZATION OPTIONS(epsilon=1e20, delta = 1.0, min_privacy_units_per_group=2, group_selection_strategy = PUBLIC_GROUPS) ANON_SUM(double_val CLAMPED BETWEEN 0 AND 2) diff --git a/zetasql/compliance/testdata/array_aggregation.test b/zetasql/compliance/testdata/array_aggregation.test index 8d4cc5fa2..938cce981 100644 --- a/zetasql/compliance/testdata/array_aggregation.test +++ b/zetasql/compliance/testdata/array_aggregation.test @@ -896,6 +896,15 @@ FROM (SELECT [1] x) foo; -- ARRAY>>[{ARRAY[1]}] +NOTE: Reference implementation reports non-determinism. +== +[required_features=V_1_1_LIMIT_IN_AGGREGATE,V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[name=array_concat_agg_simple_with_expression_limit] +SELECT ARRAY_CONCAT_AGG(x LIMIT MOD(1, 2)) +FROM (SELECT [1] x) foo; +-- +ARRAY>>[{ARRAY[1]}] + NOTE: Reference implementation reports non-determinism. == [name=array_concat_agg_empty_non_empty] diff --git a/zetasql/compliance/testdata/array_zip.test b/zetasql/compliance/testdata/array_zip.test new file mode 100644 index 000000000..e528fa487 --- /dev/null +++ b/zetasql/compliance/testdata/array_zip.test @@ -0,0 +1,2278 @@ +# The function call has null arrays so the return value is NULL +[name=null_array] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP(NULL AS a, [1, 2]) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# The function call has null arrays and explicit array_zip_mode +[name=null_array_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(NULL AS a, [1, 2], mode => 'PAD') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# The function call has null arrays and the return value is NULL even with explicit STRICT mode +[name=null_array_strict_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(NULL AS a, [1, 2], mode => 'STRICT') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# The function call has array_zip_mode = NULL and the result should be NULL +[name=null_zip_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['s', 't'], mode => NULL) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# All arrays are empty array literals +[name=all_empty_array_literals] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([] AS a, []) +-- +ARRAY>>[{ARRAY>[]}] +== + +# All arrays are empty arrays of different types +[name=all_empty_array_different_types] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([] AS a, ARRAY[]) +-- +ARRAY>>[{ARRAY>[]}] +== + +# Unspecified array_zip_mode input arrays have different lengths +[name=unspecified_array_zip_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([1] AS a, ['a', 'b']) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Unspecified array_zip_mode SAFE mode arrays have different lengths +[name=unspecified_array_zip_mode_safe_mode_arrays_have_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS] +SELECT SAFE.ARRAY_ZIP([1] AS a, ['a', 'b']) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Unspecified array_zip_mode input arrays have the same lengths +[name=unspecified_array_zip_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b']) +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {2, "b"}]} +] +== + +# Unspecified array_zip_mode with empty input arrays +[name=unspecified_array_zip_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP(['a'] AS a, []) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Unspecified array_zip_mode same-length arrays some arrays have collations +[name=unspecified_array_zip_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# Unspecified array_zip_mode same-length arrays same collations +[name=unspecified_array_zip_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# Unspecified array_zip_mode same-length arrays different collations +[name=unspecified_array_zip_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, false}, {true, false}] +== + +# Unspecified array_zip_mode different-length arrays some arrays have collations +[name=unspecified_array_zip_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b)) + WITH OFFSET +ORDER BY OFFSET +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# PAD mode input arrays have different lengths +[name=pad_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1] AS a, ['a', 'b'], mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {NULL, "b"}]} +] +== + +# PAD mode input arrays have the same lengths +[name=pad_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b'], mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {2, "b"}]} +] +== + +# PAD mode with empty input arrays +[name=pad_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(['a'] AS a, [], mode => 'PAD') +-- +ARRAY>>[{ARRAY>[{"a", NULL}]}] +== + +# PAD mode same-length arrays some arrays have collations +[name=pad_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# PAD mode same-length arrays same collations +[name=pad_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# PAD mode same-length arrays different collations +[name=pad_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, false}, {true, false}] +== + +# PAD mode different-length arrays some arrays have collations +[name=pad_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {NULL, true}] +== + +# PAD mode different-length arrays same collations +[name=pad_mode_different_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {NULL, true}] +== + +# PAD mode different-length arrays different collations +[name=pad_mode_different_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, false}, {NULL, false}] +== + +# TRUNCATE mode input arrays have different lengths +[name=truncate_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1] AS a, ['a', 'b'], mode => 'TRUNCATE') +-- +ARRAY>>[{ARRAY>[{1, "a"}]}] +== + +# TRUNCATE mode input arrays have the same lengths +[name=truncate_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b'], mode => 'TRUNCATE') +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {2, "b"}]} +] +== + +# TRUNCATE mode with empty input arrays +[name=truncate_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(['a'] AS a, [], mode => 'TRUNCATE') +-- +ARRAY>>[{ARRAY>[]}] +== + +# TRUNCATE mode same-length arrays some arrays have collations +[name=truncate_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# TRUNCATE mode same-length arrays same collations +[name=truncate_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# TRUNCATE mode same-length arrays different collations +[name=truncate_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, false}, {true, false}] +== + +# TRUNCATE mode different-length arrays some arrays have collations +[name=truncate_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, true}] +== + +# TRUNCATE mode different-length arrays same collations +[name=truncate_mode_different_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, true}] +== + +# TRUNCATE mode different-length arrays different collations +[name=truncate_mode_different_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, false}] +== + +# STRICT mode input arrays have different lengths +[name=strict_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1] AS a, ['a', 'b'], mode => 'STRICT') +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# STRICT mode SAFE mode arrays have different lengths +[name=strict_mode_safe_mode_arrays_have_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS] +SELECT SAFE.ARRAY_ZIP([1] AS a, ['a', 'b'], mode => 'STRICT') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# STRICT mode input arrays have the same lengths +[name=strict_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b'], mode => 'STRICT') +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {2, "b"}]} +] +== + +# STRICT mode with empty input arrays +[name=strict_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(['a'] AS a, [], mode => 'STRICT') +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# STRICT mode same-length arrays some arrays have collations +[name=strict_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# STRICT mode same-length arrays same collations +[name=strict_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, true}, {true, true}] +== + +# STRICT mode same-length arrays different collations +[name=strict_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order:{true, false}, {true, false}] +== + +# STRICT mode different-length arrays some arrays have collations +[name=strict_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +## ==== Start section: three input arrays ==== + +# Three arrays: The function call has null arrays so the return value is NULL +[name=three_arrays_null_array] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP(NULL AS a, NULL, [1, 2] AS b) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Three arrays: The function call has null arrays and explicit array_zip_mode +[name=three_arrays_null_array_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(NULL AS a, NULL, [1, 2] AS b, mode => 'PAD') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Three arrays: The function call has null arrays and the return value is NULL even with explicit STRICT mode +[name=three_arrays_null_array_strict_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(NULL AS a, NULL, [1, 2] AS b, mode => 'STRICT') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Three arrays: The function call has array_zip_mode = NULL and the result should be NULL +[name=three_arrays_null_zip_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['s', 't'], [TRUE] AS b, mode => NULL) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Three arrays: All arrays are empty array literals +[name=three_arrays_all_empty_array_literals] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([] AS a, [], [] AS b) +-- +ARRAY>>[{ARRAY>[]}] +== + +# Three arrays: All arrays are empty arrays of different types +[name=three_arrays_all_empty_array_different_types] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([] AS a, ARRAY[], ARRAY[] AS b) +-- +ARRAY>>[{ARRAY>[]}] +== + +# Three arrays: Unspecified array_zip_mode input arrays have different lengths +[name=three_arrays_unspecified_array_zip_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Three arrays: Unspecified array_zip_mode SAFE mode arrays have different lengths +[name=three_arrays_unspecified_array_zip_mode_safe_mode_arrays_have_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS] +SELECT SAFE.ARRAY_ZIP([1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Three arrays: Unspecified array_zip_mode input arrays have the same lengths +[name=three_arrays_unspecified_array_zip_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b) +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {2, "b", false} + ]} +] +== + +# Three arrays: Unspecified array_zip_mode with empty input arrays +[name=three_arrays_unspecified_array_zip_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([TRUE, FALSE] AS a, ['a'], [] AS b) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Three arrays: Unspecified array_zip_mode same-length arrays some arrays have collations +[name=three_arrays_unspecified_array_zip_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: Unspecified array_zip_mode same-length arrays same collations +[name=three_arrays_unspecified_array_zip_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: Unspecified array_zip_mode same-length arrays different collations +[name=three_arrays_unspecified_array_zip_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true}, + {true, false, true} +] +== + +# Three arrays: Unspecified array_zip_mode different-length arrays some arrays have collations +[name=three_arrays_unspecified_array_zip_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c)) + WITH OFFSET +ORDER BY OFFSET +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Three arrays: PAD mode input arrays have different lengths +[name=three_arrays_pad_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b, mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {NULL, "b", false}, + {NULL, NULL, true} + ]} +] +== + +# Three arrays: PAD mode input arrays have the same lengths +[name=three_arrays_pad_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b, mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {2, "b", false} + ]} +] +== + +# Three arrays: PAD mode with empty input arrays +[name=three_arrays_pad_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([TRUE, FALSE] AS a, ['a'], [] AS b, mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order: + {true, "a", NULL}, + {false, NULL, NULL} + ]} +] +== + +# Three arrays: PAD mode same-length arrays some arrays have collations +[name=three_arrays_pad_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: PAD mode same-length arrays same collations +[name=three_arrays_pad_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: PAD mode same-length arrays different collations +[name=three_arrays_pad_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true}, + {true, false, true} +] +== + +# Three arrays: PAD mode different-length arrays some arrays have collations +[name=three_arrays_pad_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {NULL, true, true}, + {NULL, NULL, true} +] +== + +# Three arrays: PAD mode different-length arrays same collations +[name=three_arrays_pad_mode_different_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {NULL, true, true}, + {NULL, NULL, true} +] +== + +# Three arrays: PAD mode different-length arrays different collations +[name=three_arrays_pad_mode_different_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true}, + {NULL, false, true}, + {NULL, NULL, true} +] +== + +# Three arrays: TRUNCATE mode input arrays have different lengths +[name=three_arrays_truncate_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP([1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b, mode => 'TRUNCATE') +-- +ARRAY>>[ + {ARRAY>[{1, "a", true}]} +] +== + +# Three arrays: TRUNCATE mode input arrays have the same lengths +[name=three_arrays_truncate_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP([1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b, mode => 'TRUNCATE') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {2, "b", false} + ]} +] +== + +# Three arrays: TRUNCATE mode with empty input arrays +[name=three_arrays_truncate_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([TRUE, FALSE] AS a, ['a'], [] AS b, mode => 'TRUNCATE') +-- +ARRAY>>[{ARRAY>[]}] +== + +# Three arrays: TRUNCATE mode same-length arrays some arrays have collations +[name=three_arrays_truncate_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: TRUNCATE mode same-length arrays same collations +[name=three_arrays_truncate_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: TRUNCATE mode same-length arrays different collations +[name=three_arrays_truncate_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true}, + {true, false, true} +] +== + +# Three arrays: TRUNCATE mode different-length arrays some arrays have collations +[name=three_arrays_truncate_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, true, true}] +== + +# Three arrays: TRUNCATE mode different-length arrays same collations +[name=three_arrays_truncate_mode_different_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, true, true}] +== + +# Three arrays: TRUNCATE mode different-length arrays different collations +[name=three_arrays_truncate_mode_different_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, false, true}] +== + +# Three arrays: STRICT mode input arrays have different lengths +[name=three_arrays_strict_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP([1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b, mode => 'STRICT') +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Three arrays: STRICT mode SAFE mode arrays have different lengths +[name=three_arrays_strict_mode_safe_mode_arrays_have_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS] +SELECT + SAFE.ARRAY_ZIP( + [1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b, mode => 'STRICT') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Three arrays: STRICT mode input arrays have the same lengths +[name=three_arrays_strict_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b, mode => 'STRICT') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {2, "b", false} + ]} +] +== + +# Three arrays: STRICT mode with empty input arrays +[name=three_arrays_strict_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([TRUE, FALSE] AS a, ['a'], [] AS b, mode => 'STRICT') +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Three arrays: STRICT mode same-length arrays some arrays have collations +[name=three_arrays_strict_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: STRICT mode same-length arrays same collations +[name=three_arrays_strict_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true}, + {true, true, true} +] +== + +# Three arrays: STRICT mode same-length arrays different collations +[name=three_arrays_strict_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true}, + {true, false, true} +] +== + +# Three arrays: STRICT mode different-length arrays some arrays have collations +[name=three_arrays_strict_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +## ==== Start section: Four input arrays ==== + +# Four arrays: The function call has null arrays so the return value is NULL +[name=four_arrays_null_array] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP(NULL AS a, NULL, NULL AS b, [1, 2]) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Four arrays: The function call has null arrays and explicit array_zip_mode +[name=four_arrays_null_array_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(NULL AS a, NULL, NULL AS b, [1, 2], mode => 'PAD') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Four arrays: The function call has null arrays and the return value is NULL even with explicit STRICT mode +[name=four_arrays_null_array_strict_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(NULL AS a, NULL, NULL AS b, [1, 2], mode => 'STRICT') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Four arrays: The function call has array_zip_mode = NULL and the result should be NULL +[name=four_arrays_null_zip_mode] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP([1, 2] AS a, ['s', 't'], [TRUE] AS b, [b''], mode => NULL) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Four arrays: All arrays are empty array literals +[name=four_arrays_all_empty_array_literals] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([] AS a, [], [] AS b, []) +-- +ARRAY>>[{ARRAY>[]}] +== + +# Four arrays: All arrays are empty arrays of different types +[name=four_arrays_all_empty_array_different_types] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([] AS a, ARRAY[], ARRAY[] AS b, ARRAY[]) +-- +ARRAY>>[{ARRAY>[]}] +== + +# Four arrays: Unspecified array_zip_mode input arrays have different lengths +[name=four_arrays_unspecified_array_zip_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP] +SELECT + ARRAY_ZIP( + [1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b, [1.0, 2.0, 3.0, 4.0]) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Four arrays: Unspecified array_zip_mode SAFE mode arrays have different lengths +[name=four_arrays_unspecified_array_zip_mode_safe_mode_arrays_have_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS] +SELECT + SAFE.ARRAY_ZIP( + [1] AS a, ['a', 'b'], [TRUE, FALSE, TRUE] AS b, [1.0, 2.0, 3.0, 4.0]) +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Four arrays: Unspecified array_zip_mode input arrays have the same lengths +[name=four_arrays_unspecified_array_zip_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b, [1.0, 2.0]) +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true, 1}, + {2, "b", false, 2} + ]} +] +== + +# Four arrays: Unspecified array_zip_mode with empty input arrays +[name=four_arrays_unspecified_array_zip_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP([1.0, 2.0, 3.0] AS a, [TRUE, FALSE], ['a'] AS b, []) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Four arrays: Unspecified array_zip_mode same-length arrays some arrays have collations +[name=four_arrays_unspecified_array_zip_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: Unspecified array_zip_mode same-length arrays same collations +[name=four_arrays_unspecified_array_zip_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: Unspecified array_zip_mode same-length arrays different collations +[name=four_arrays_unspecified_array_zip_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B', d = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS d)) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true, false}, + {true, false, true, false} +] +== + +# Four arrays: Unspecified array_zip_mode different-length arrays some arrays have collations +[name=four_arrays_unspecified_array_zip_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c, + [ + COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), + COLLATE('b', 'und:ci')] + AS d)) + WITH OFFSET +ORDER BY OFFSET +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Four arrays: PAD mode input arrays have different lengths +[name=four_arrays_pad_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1] AS a, + ['a', 'b'], + [TRUE, FALSE, TRUE] AS b, + [1.0, 2.0, 3.0, 4.0], + mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true, 1}, + {NULL, "b", false, 2}, + {NULL, NULL, true, 3}, + {NULL, NULL, NULL, 4} + ]} +] +== + +# Four arrays: PAD mode input arrays have the same lengths +[name=four_arrays_pad_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b, [1.0, 2.0], mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true, 1}, + {2, "b", false, 2} + ]} +] +== + +# Four arrays: PAD mode with empty input arrays +[name=four_arrays_pad_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP([1.0, 2.0, 3.0] AS a, [TRUE, FALSE], ['a'] AS b, [], mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, true, "a", NULL}, + {2, false, NULL, NULL}, + {3, NULL, NULL, NULL} + ]} +] +== + +# Four arrays: PAD mode same-length arrays some arrays have collations +[name=four_arrays_pad_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: PAD mode same-length arrays same collations +[name=four_arrays_pad_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: PAD mode same-length arrays different collations +[name=four_arrays_pad_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B', d = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS d, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true, false}, + {true, false, true, false} +] +== + +# Four arrays: PAD mode different-length arrays some arrays have collations +[name=four_arrays_pad_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c, + [ + COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), + COLLATE('b', 'und:ci')] + AS d, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {NULL, true, true, true}, + {NULL, NULL, true, true}, + {NULL, NULL, NULL, true} +] +== + +# Four arrays: PAD mode different-length arrays same collations +[name=four_arrays_pad_mode_different_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + [ + COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), + COLLATE('b', 'und:ci')] + AS d, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {NULL, true, true, true}, + {NULL, NULL, true, true}, + {NULL, NULL, NULL, true} +] +== + +# Four arrays: PAD mode different-length arrays different collations +[name=four_arrays_pad_mode_different_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B', d = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + [ + COLLATE('c', 'binary'), COLLATE('c', 'binary'), COLLATE('c', 'binary'), + COLLATE('c', 'binary')] + AS d, + mode => 'PAD')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true, false}, + {NULL, false, true, false}, + {NULL, NULL, true, false}, + {NULL, NULL, NULL, false} +] +== + +# Four arrays: TRUNCATE mode input arrays have different lengths +[name=four_arrays_truncate_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1] AS a, + ['a', 'b'], + [TRUE, FALSE, TRUE] AS b, + [1.0, 2.0, 3.0, 4.0], + mode => 'TRUNCATE') +-- +ARRAY>>[ + {ARRAY>[{1, "a", true, 1}]} +] +== + +# Four arrays: TRUNCATE mode input arrays have the same lengths +[name=four_arrays_truncate_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b, [1.0, 2.0], mode => 'TRUNCATE') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true, 1}, + {2, "b", false, 2} + ]} +] +== + +# Four arrays: TRUNCATE mode with empty input arrays +[name=four_arrays_truncate_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1.0, 2.0, 3.0] AS a, [TRUE, FALSE], ['a'] AS b, [], mode => 'TRUNCATE') +-- +ARRAY>>[{ARRAY>[]}] +== + +# Four arrays: TRUNCATE mode same-length arrays some arrays have collations +[name=four_arrays_truncate_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: TRUNCATE mode same-length arrays same collations +[name=four_arrays_truncate_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: TRUNCATE mode same-length arrays different collations +[name=four_arrays_truncate_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B', d = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS d, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true, false}, + {true, false, true, false} +] +== + +# Four arrays: TRUNCATE mode different-length arrays some arrays have collations +[name=four_arrays_truncate_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c, + [ + COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), + COLLATE('b', 'und:ci')] + AS d, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, true, true, true}] +== + +# Four arrays: TRUNCATE mode different-length arrays same collations +[name=four_arrays_truncate_mode_different_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + [ + COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), + COLLATE('b', 'und:ci')] + AS d, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, true, true, true}] +== + +# Four arrays: TRUNCATE mode different-length arrays different collations +[name=four_arrays_truncate_mode_different_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B', d = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] + AS c, + [ + COLLATE('c', 'binary'), COLLATE('c', 'binary'), COLLATE('c', 'binary'), + COLLATE('c', 'binary')] + AS d, + mode => 'TRUNCATE')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[{true, false, true, false}] +== + +# Four arrays: STRICT mode input arrays have different lengths +[name=four_arrays_strict_mode_input_arrays_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1] AS a, + ['a', 'b'], + [TRUE, FALSE, TRUE] AS b, + [1.0, 2.0, 3.0, 4.0], + mode => 'STRICT') +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Four arrays: STRICT mode SAFE mode arrays have different lengths +[name=four_arrays_strict_mode_safe_mode_arrays_have_different_lengths] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS] +SELECT + SAFE.ARRAY_ZIP( + [1] AS a, + ['a', 'b'], + [TRUE, FALSE, TRUE] AS b, + [1.0, 2.0, 3.0, 4.0], + mode => 'STRICT') +-- +ARRAY>>[{ARRAY>(NULL)}] +== + +# Four arrays: STRICT mode input arrays have the same lengths +[name=four_arrays_strict_mode_input_arrays_same_length] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1, 2] AS a, ['a', 'b'], [TRUE, FALSE] AS b, [1.0, 2.0], mode => 'STRICT') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true, 1}, + {2, "b", false, 2} + ]} +] +== + +# Four arrays: STRICT mode with empty input arrays +[name=four_arrays_strict_mode_has_empty_input_arrays] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT + ARRAY_ZIP( + [1.0, 2.0, 3.0] AS a, [TRUE, FALSE], ['a'] AS b, [], mode => 'STRICT') +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Four arrays: STRICT mode same-length arrays some arrays have collations +[name=four_arrays_strict_mode_same_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a', 'a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a'] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: STRICT mode same-length arrays same collations +[name=four_arrays_strict_mode_same_length_arrays_same_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'B', c = 'B', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS d, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, true, true, true}, + {true, true, true, true} +] +== + +# Four arrays: STRICT mode same-length arrays different collations +[name=four_arrays_strict_mode_same_length_arrays_different_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'B', b = 'C', c = 'B', d = 'C' +FROM + UNNEST( + ARRAY_ZIP( + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS a, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS b, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS c, + [COLLATE('c', 'binary'), COLLATE('c', 'binary')] AS d, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ARRAY>[known order: + {true, false, true, false}, + {true, false, true, false} +] +== + +# Four arrays: STRICT mode different-length arrays some arrays have collations +[name=four_arrays_strict_mode_different_length_arrays_some_input_arrays_have_collations] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT] +SELECT a = 'a', b = 'B', c = 'a', d = 'B' +FROM + UNNEST( + ARRAY_ZIP( + ['a'] AS a, + [COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci')] AS b, + ['a', 'a', 'a'] AS c, + [ + COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), COLLATE('b', 'und:ci'), + COLLATE('b', 'und:ci')] + AS d, + mode => 'STRICT')) + WITH OFFSET +ORDER BY OFFSET +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +## ==== Start Section: Two-Array Lambda Test Cases ==== ## +# The following test cases are not covered due to known issues with lambda: +# - Empty or NULL arrays: the resolver cannot infer the type for the lambda +# arguments corresponding to them. See b/308192630. +# - Lambda with collations: lambda does not propagate collations correctly. See +# b/258733832. + +# Unspecified array_zip_mode input arrays have different lengths lambda STRUCT return type +[name=unspecified_array_zip_mode_input_arrays_different_lengths_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT ARRAY_ZIP([1], ['a', 'b'], transformation => (e1, e2) -> STRUCT(e1, e2)) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Unspecified array_zip_mode SAFE mode arrays have different lengths lambda constant return value +[name=unspecified_array_zip_mode_safe_mode_arrays_have_different_lengths_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT SAFE.ARRAY_ZIP([1], ['a', 'b'], transformation => (e1, e2) -> 1) +-- +ARRAY>>[{ARRAY(NULL)}] +== + +# Unspecified array_zip_mode input arrays have the same lengths lambda complex expression +[name=unspecified_array_zip_mode_input_arrays_same_length_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], ['a', 'b'], transformation => (e1, e2) -> e1 > 2 OR e2 != 'a') +-- +ARRAY>>[{ARRAY[known order:false, true]}] +== + +# PAD mode input arrays have different lengths lambda STRUCT return type +[name=pad_mode_input_arrays_different_lengths_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + mode => 'PAD', + transformation => (e1, e2) -> STRUCT(e1, e2)) +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {NULL, "b"}]} +] +== + +# PAD mode input arrays have the same lengths lambda constant return value +[name=pad_mode_input_arrays_same_length_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP([1, 2], ['a', 'b'], mode => 'PAD', transformation => (e1, e2) -> 1) +-- +ARRAY>>[{ARRAY[known order:1, 1]}] +== + +# TRUNCATE mode input arrays have different lengths lambda complex expression +[name=truncate_mode_input_arrays_different_lengths_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + mode => 'TRUNCATE', + transformation => (e1, e2) -> e1 > 2 OR e2 != 'a') +-- +ARRAY>>[{ARRAY[false]}] +== + +# TRUNCATE mode input arrays have the same lengths lambda STRUCT return type +[name=truncate_mode_input_arrays_same_length_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + mode => 'TRUNCATE', + transformation => (e1, e2) -> STRUCT(e1, e2)) +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {2, "b"}]} +] +== + +# STRICT mode input arrays have different lengths lambda constant return value +[name=strict_mode_input_arrays_different_lengths_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP([1], ['a', 'b'], mode => 'STRICT', transformation => (e1, e2) -> 1) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# STRICT mode SAFE mode arrays have different lengths lambda complex expression +[name=strict_mode_safe_mode_arrays_have_different_lengths_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + SAFE.ARRAY_ZIP( + [1], + ['a', 'b'], + mode => 'STRICT', + transformation => (e1, e2) -> e1 > 2 OR e2 != 'a') +-- +ARRAY>>[{ARRAY(NULL)}] +== + +# STRICT mode input arrays have the same lengths lambda STRUCT return type +[name=strict_mode_input_arrays_same_length_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + mode => 'STRICT', + transformation => (e1, e2) -> STRUCT(e1, e2)) +-- +ARRAY>>[ + {ARRAY>[known order:{1, "a"}, {2, "b"}]} +] +== + +## ==== Start section: three-input-array with lambda ==== + +# Three arrays: Unspecified array_zip_mode input arrays have different lengths lambda STRUCT return type +[name=three_arrays_unspecified_array_zip_mode_input_arrays_different_lengths_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + transformation => (e1, e2, e3) -> STRUCT(e1 AS field1, e2, e3)) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Three arrays: Unspecified array_zip_mode SAFE mode arrays have different lengths lambda constant return value +[name=three_arrays_unspecified_array_zip_mode_safe_mode_arrays_have_different_lengths_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + SAFE.ARRAY_ZIP( + [1], ['a', 'b'], [TRUE, FALSE, TRUE], transformation => (e1, e2, e3) -> 1) +-- +ARRAY>>[{ARRAY(NULL)}] +== + +# Three arrays: Unspecified array_zip_mode input arrays have the same lengths lambda complex expression +[name=three_arrays_unspecified_array_zip_mode_input_arrays_same_length_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + transformation => (e1, e2, e3) -> e1 > 2 OR e2 != 'a' OR e3) +-- +ARRAY>>[{ARRAY[known order:true, true]}] +== + +# Three arrays: PAD mode input arrays have different lengths lambda STRUCT return type +[name=three_arrays_pad_mode_input_arrays_different_lengths_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + mode => 'PAD', + transformation => (e1, e2, e3) -> STRUCT(e1 AS field1, e2, e3)) +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {NULL, "b", false}, + {NULL, NULL, true} + ]} +] +== + +# Three arrays: PAD mode input arrays have the same lengths lambda constant return value +[name=three_arrays_pad_mode_input_arrays_same_length_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + mode => 'PAD', + transformation => (e1, e2, e3) -> 1) +-- +ARRAY>>[{ARRAY[known order:1, 1]}] +== + +# Three arrays: TRUNCATE mode input arrays have different lengths lambda complex expression +[name=three_arrays_truncate_mode_input_arrays_different_lengths_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + mode => 'TRUNCATE', + transformation => (e1, e2, e3) -> e1 > 2 OR e2 != 'a' OR e3) +-- +ARRAY>>[{ARRAY[true]}] +== + +# Three arrays: TRUNCATE mode input arrays have the same lengths lambda STRUCT return type +[name=three_arrays_truncate_mode_input_arrays_same_length_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + mode => 'TRUNCATE', + transformation => (e1, e2, e3) -> STRUCT(e1 AS field1, e2, e3)) +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {2, "b", false} + ]} +] +== + +# Three arrays: STRICT mode input arrays have different lengths lambda constant return value +[name=three_arrays_strict_mode_input_arrays_different_lengths_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + transformation => (e1, e2, e3) -> 1, + mode => 'STRICT') +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Three arrays: STRICT mode SAFE mode arrays have different lengths lambda complex expression +[name=three_arrays_strict_mode_safe_mode_arrays_have_different_lengths_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + SAFE.ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + (e1, e2, e3) -> e1 > 2 OR e2 != 'a' OR e3, + mode => 'STRICT') +-- +ARRAY>>[{ARRAY(NULL)}] +== + +# Three arrays: STRICT mode input arrays have the same lengths lambda STRUCT return type +[name=three_arrays_strict_mode_input_arrays_same_length_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + (e1, e2, e3) -> STRUCT(e1 AS field1, e2, e3), + mode => 'STRICT') +-- +ARRAY>>[ + {ARRAY>[known order: + {1, "a", true}, + {2, "b", false} + ]} +] +== + +## ==== Start section: four-input-array with lambda ==== + +# Four arrays: Unspecified array_zip_mode input arrays have different lengths lambda STRUCT return type +[name=four_arrays_unspecified_array_zip_mode_input_arrays_different_lengths_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + [1.0, 2.0, 3.0, 4.0], + transformation => (e1, e2, e3, e4) -> STRUCT(e1 AS field1, e2, e3, e4 AS field4)) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Four arrays: Unspecified array_zip_mode SAFE mode arrays have different lengths lambda constant return value +[name=four_arrays_unspecified_array_zip_mode_safe_mode_arrays_have_different_lengths_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + SAFE.ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + [1.0, 2.0, 3.0, 4.0], + transformation => (e1, e2, e3, e4) -> 1) +-- +ARRAY>>[{ARRAY(NULL)}] +== + +# Four arrays: Unspecified array_zip_mode input arrays have the same lengths lambda complex expression +[name=four_arrays_unspecified_array_zip_mode_input_arrays_same_length_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + [1.0, 2.0], + transformation => (e1, e2, e3, e4) -> e1 > 2 OR e2 != 'a' OR e3 OR e4 <= 2) +-- +ARRAY>>[{ARRAY[known order:true, true]}] +== + +# Four arrays: PAD mode input arrays have different lengths lambda STRUCT return type +[name=four_arrays_pad_mode_input_arrays_different_lengths_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + [1.0, 2.0, 3.0, 4.0], + transformation => (e1, e2, e3, e4) -> STRUCT(e1, e2 AS field2, e3 AS field3, e4), + mode => 'PAD') +-- +ARRAY>>[ + { + ARRAY>[known order: + {1, "a", true, 1}, + {NULL, "b", false, 2}, + {NULL, NULL, true, 3}, + {NULL, NULL, NULL, 4} + ] + } +] +== + +# Four arrays: PAD mode input arrays have the same lengths lambda constant return value +[name=four_arrays_pad_mode_input_arrays_same_length_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + [1.0, 2.0], + (e1, e2, e3, e4) -> 1, + mode => 'PAD') +-- +ARRAY>>[{ARRAY[known order:1, 1]}] +== + +# Four arrays: TRUNCATE mode input arrays have different lengths lambda complex expression +[name=four_arrays_truncate_mode_input_arrays_different_lengths_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + [1.0, 2.0, 3.0, 4.0], + transformation => (e1, e2, e3, e4) -> e1 > 2 OR e2 != 'a' OR e3 OR e4 <= 2, + mode => 'TRUNCATE') +-- +ARRAY>>[{ARRAY[true]}] +== + +# Four arrays: TRUNCATE mode input arrays have the same lengths lambda STRUCT return type +[name=four_arrays_truncate_mode_input_arrays_same_length_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + [1.0, 2.0], + mode => 'TRUNCATE', + transformation => (e1, e2, e3, e4) -> STRUCT(e1 AS field1, e2 AS field2, e3, e4)) +-- +ARRAY>>[ + { + ARRAY>[known order: + {1, "a", true, 1}, + {2, "b", false, 2} + ] + } +] +== + +# Four arrays: STRICT mode input arrays have different lengths lambda constant return value +[name=four_arrays_strict_mode_input_arrays_different_lengths_lambda_constant_return_value] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + [1.0, 2.0, 3.0, 4.0], + mode => 'STRICT', + transformation => (e1, e2, e3, e4) -> 1) +-- +ERROR: generic::out_of_range: Unequal array length in ARRAY_ZIP using STRICT mode +== + +# Four arrays: STRICT mode SAFE mode arrays have different lengths lambda complex expression +[name=four_arrays_strict_mode_safe_mode_arrays_have_different_lengths_lambda_complex_expression] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_2_SAFE_FUNCTION_CALL,V_1_4_SAFE_FUNCTION_CALL_WITH_LAMBDA_ARGS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + SAFE.ARRAY_ZIP( + [1], + ['a', 'b'], + [TRUE, FALSE, TRUE], + [1.0, 2.0, 3.0, 4.0], + mode => 'STRICT', + transformation => (e1, e2, e3, e4) -> e1 > 2 OR e2 != 'a' OR e3 OR e4 <= 2) +-- +ARRAY>>[{ARRAY(NULL)}] +== + +# Four arrays: STRICT mode input arrays have the same lengths lambda STRUCT return type +[name=four_arrays_strict_mode_input_arrays_same_length_lambda_struct_return_type] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT + ARRAY_ZIP( + [1, 2], + ['a', 'b'], + [TRUE, FALSE], + [1.0, 2.0], + mode => 'STRICT', + transformation => (e1, e2, e3, e4) -> STRUCT(e1, e2 AS field2, e3, e4 AS field4)) +-- +ARRAY>>[ + { + ARRAY>[known order: + {1, "a", true, 1}, + {2, "b", false, 2} + ] + } +] +== + +# One input array does not have a defined order, so the result is +# non-deterministic. +[name=array_zip_non_determinism_is_correctly_marked] +[required_features=V_1_4_ARRAY_ZIP] +SELECT ARRAY_ZIP(ARRAY(SELECT * FROM UNNEST([1, 2])), [2, 3]) +-- +ARRAY>>[ + {ARRAY>[known order:{1, 2}, {2, 3}]} +] + +NOTE: Reference implementation reports non-determinism. +== + +# Lambda signature: one input array does not have a defined order, and the +# result is marked non-deterministic even though the result is actually +# deterministic due to the lambda body. +[name=array_zip_non_determinism_is_marked_for_lambda] +[required_features=V_1_4_ARRAY_ZIP,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT ARRAY_ZIP(ARRAY(SELECT * FROM UNNEST([1, 2])), [2, 3], (e1, e2) -> 1) +-- +ARRAY>>[{ARRAY[known order:1, 1]}] + +NOTE: Reference implementation reports non-determinism. +== + +# The array with kIgnoresOrder only has one element, so the result is +# deterministic. +[name=array_zip_igore_order_array_only_has_one_element] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP(ARRAY(SELECT * FROM UNNEST([1])), [2, 3], [4], mode => 'TRUNCATE') +-- +ARRAY>>[{ARRAY>[{1, 2, 4}]}] +== + +# Lambda signature: The array with kIgnoresOrder only has one element, so the +# result is deterministic. +[name=array_zip_igore_order_array_only_has_one_element_lambda] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT ARRAY_ZIP( + ARRAY(SELECT * FROM UNNEST([1])), + [2, 3], + [4], + (e1, e2, e3) -> e1 + e2 + e3, + mode => 'TRUNCATE') +-- +ARRAY>>[{ARRAY[7]}] +== + +# All elements of the array with kIgnoresOrder are equal, so the result is still +# deterministic. +[name=array_zip_igore_order_array_all_elements_are_the_same] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS] +SELECT ARRAY_ZIP( + [2, 3], + ARRAY(SELECT * FROM UNNEST(['S', 'S', 'S'])), + [4], + [TRUE, FALSE, TRUE, FALSE], + mode => 'PAD') +-- +ARRAY>>[ + {ARRAY>[known order: + {2, "S", 4, true}, + {3, "S", NULL, false}, + {NULL, "S", NULL, true}, + {NULL, NULL, NULL, false} + ]} +] +== + +# Lambda signature: All elements of the array with kIgnoresOrder are equal, so +# the result is still deterministic. +[name=array_zip_igore_order_array_all_elements_are_the_same_lambda] +[required_features=V_1_4_ARRAY_ZIP,NAMED_ARGUMENTS,V_1_3_INLINE_LAMBDA_ARGUMENT] +SELECT ARRAY_ZIP( + [2, 3], + ARRAY(SELECT * FROM UNNEST(['S', 'S', 'S'])), + [4], + [TRUE, FALSE, TRUE, FALSE], + mode => 'PAD', + transformation => (e1, e2, e3, e4) -> STRUCT(e4, e3, e2, e1)) +-- +ARRAY>>[ + {ARRAY>[known order: + {true, 4, "S", 2}, + {false, NULL, "S", 3}, + {true, NULL, "S", NULL}, + {false, NULL, NULL, NULL} + ]} +] diff --git a/zetasql/compliance/testdata/differential_privacy.test b/zetasql/compliance/testdata/differential_privacy.test index d6515af6f..0f03427c0 100644 --- a/zetasql/compliance/testdata/differential_privacy.test +++ b/zetasql/compliance/testdata/differential_privacy.test @@ -1690,7 +1690,7 @@ ARRAY>[{11}] # Public groups without group by [name=differential_privacy_public_groups_without_group_by] -[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,NAMED_ARGUMENTS] +[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,NAMED_ARGUMENTS,V_1_1_WITH_ON_SUBQUERY] [labels=differential_privacy] SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=1e20, delta=1.0, group_selection_strategy=PUBLIC_GROUPS) COUNT(int64_val) @@ -1701,7 +1701,7 @@ ARRAY>[{8}] # Public groups with group by [name=differential_privacy_public_groups_with_group_by] -[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,NAMED_ARGUMENTS] +[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,NAMED_ARGUMENTS,V_1_1_WITH_ON_SUBQUERY] [labels=differential_privacy] SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=1e20, delta=1.0, group_selection_strategy=PUBLIC_GROUPS) int64_val, COUNT(*, contribution_bounds_per_group => (0, 1)) @@ -1722,7 +1722,7 @@ ARRAY>[unknown order: # Public groups with group by [name=differential_privacy_public_groups_with_group_by_and_max_groups_contributed] -[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,NAMED_ARGUMENTS] +[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,NAMED_ARGUMENTS,V_1_1_WITH_ON_SUBQUERY] [labels=anonymization] SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=1e20, delta=1.0, group_selection_strategy=PUBLIC_GROUPS, max_groups_contributed=3) int64_val, COUNT(*, contribution_bounds_per_group => (0, 1)) @@ -1826,7 +1826,7 @@ ARRAY>[] # Engines should report an error if min_privacy_units_per_group is set # simultaneously with the PUBLIC_GROUPS groups selection strategy. [name=differential_privacy_public_groups_and_min_privacy_units_per_group_set] -[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,DIFFERENTIAL_PRIVACY_MIN_PRIVACY_UNITS_PER_GROUP,NAMED_ARGUMENTS] +[required_features=DIFFERENTIAL_PRIVACY,DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS,DIFFERENTIAL_PRIVACY_MIN_PRIVACY_UNITS_PER_GROUP,NAMED_ARGUMENTS,V_1_1_WITH_ON_SUBQUERY] [labels=differential_privacy] SELECT WITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=1e20, delta = 1.0, min_privacy_units_per_group=2, group_selection_strategy = PUBLIC_GROUPS, max_groups_contributed=NULL) SUM(double_val, contribution_bounds_per_group => (0.0, 2.0)) diff --git a/zetasql/compliance/testdata/group_by_all.test b/zetasql/compliance/testdata/group_by_all.test index 3b8df5067..4b46db57b 100644 --- a/zetasql/compliance/testdata/group_by_all.test +++ b/zetasql/compliance/testdata/group_by_all.test @@ -561,3 +561,50 @@ ARRAY, ] == +# Regression test for b/310705631 +[required_features=V_1_2_GROUP_BY_ARRAY,V_1_4_GROUP_BY_ALL] +[name=group_by_all_implicit_unnest_table_array_path_regression_test] +WITH Table5 AS (SELECT [1, 2, 3] AS arr_col) +SELECT + ( SELECT e FROM t5.arr_col AS e ORDER BY e LIMIT 1 ) AS scalar_subq_col, + t5.arr_col +FROM Table5 AS t5 +GROUP BY ALL +-- +ARRAY + >> +[{ + 1, + ARRAY[known order:1, 2, 3] + }] +== + +# Regression test for b/323439034 +[name=group_by_all_no_agg_no_grouping_keys_regression_test] +SELECT 'a' AS x +FROM ( + SELECT 1 AS id + UNION ALL + SELECT 2 AS id +) +GROUP BY ALL +-- +ARRAY>[{"a"}] +== + +# Regression test for b/323567303 +[name=group_by_all_no_agg_no_grouping_keys_having_regression_test] +SELECT 'a' AS x +FROM ( + SELECT 1 AS id + UNION ALL + SELECT 2 AS id +) +GROUP BY ALL +HAVING TRUE +-- +ARRAY>[{"a"}] +== + diff --git a/zetasql/compliance/testdata/groupby_queries_2.test b/zetasql/compliance/testdata/groupby_queries_2.test index 1c2f4ce92..a044d672e 100644 --- a/zetasql/compliance/testdata/groupby_queries_2.test +++ b/zetasql/compliance/testdata/groupby_queries_2.test @@ -2475,3 +2475,38 @@ ARRAY>[{0, 0}] SELECT COUNT(primary_key), SUM(distinct_1) FROM TableDistincts GROUP BY (); -- ARRAY>[{16, 16}] +== + +[name=group_by_empty_columns_select_literals] +[required_features=V_1_4_GROUPING_SETS] +SELECT 1, "a" FROM TableDistincts GROUP BY (); +-- +ARRAY>[{1, "a"}] +== +[name=group_by_empty_columns_as_scalar_subquery] +[required_features=V_1_4_GROUPING_SETS] +SELECT + primary_key, + (SELECT "a" FROM TableDistincts GROUP BY ()) AS scalar_subquery +FROM TableDistincts +-- +ARRAY> +[unknown order:{2, "a"}, + {4, "a"}, + {6, "a"}, + {8, "a"}, + {10, "a"}, + {12, "a"}, + {14, "a"}, + {16, "a"}, + {1, "a"}, + {3, "a"}, + {5, "a"}, + {7, "a"}, + {9, "a"}, + {11, "a"}, + {13, "a"}, + {15, "a"}] diff --git a/zetasql/compliance/testdata/grouping_sets_queries.test b/zetasql/compliance/testdata/grouping_sets_queries.test new file mode 100644 index 000000000..65180847f --- /dev/null +++ b/zetasql/compliance/testdata/grouping_sets_queries.test @@ -0,0 +1,2713 @@ +[prepare_database] +CREATE TABLE simple_table +( + key INT64, + a INT64, + b STRING, + c BOOL, + d FLOAT64 +) +AS SELECT 1, 10, 'foo', true, 12.0 +UNION ALL SELECT 2, 123, 'foo', false, CAST("NAN" as FLOAT64) +UNION ALL SELECT 3, 123, 'bar', true, 123.456e-67 +UNION ALL SELECT 4, CAST(NULL AS int64), CAST(NULL AS STRING), CAST(NULL AS BOOL), CAST(NULL AS FLOAT64) +UNION ALL SELECT 5, 10, 'bar', CAST(NULL AS BOOL), CAST(NULL AS FLOAT64); +-- + +ARRAY>[ + {1, 10, "foo", true, 12}, + {2, 123, "foo", false, nan}, + {3, 123, "bar", true, 1.23456e-65}, + {4, NULL, NULL, NULL, NULL}, + {5, 10, "bar", NULL, NULL} +] +== + +[prepare_database] +CREATE TABLE simple_collation_table +( + col_ci STRING COLLATE 'und:ci', + col_binary STRING COLLATE 'binary', + col_no_collation STRING, + col INT64 +) +AS SELECT 'a', 'a', 'a', 1 +UNION ALL SELECT 'b', 'b', 'b', 2 +UNION ALL SELECT 'B', 'B', 'B', 3 +UNION ALL SELECT 'ana', 'ana', 'ana', 4 +UNION ALL SELECT 'banana', 'banana', 'banana', 5; +-- +ARRAY> +[ + {"a", "a", "a", 1}, + {"b", "b", "b", 2}, + {"B", "B", "B", 3}, + {"ana", "ana", "ana", 4}, + {"banana", "banana", "banana", 5} +] +== + +[default required_features=V_1_4_GROUPING_SETS] +[name=grouping_sets_single_column_grouping_set] +SELECT a, b, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(a, b) +ORDER BY a, b +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, "bar", 2}, + {NULL, "foo", 2}, + {10, NULL, 2}, + {123, NULL, 2} +] +== + +[name=grouping_sets_multi_column_grouping_set] +SELECT a, b, c, d, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS((a, b), (c, d)) +ORDER BY a, b, c, d, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 2}, + {NULL, NULL, false, nan, 1}, + {NULL, NULL, true, 1.23456e-65, 1}, + {NULL, NULL, true, 12, 1}, + {10, "bar", NULL, NULL, 1}, + {10, "foo", NULL, NULL, 1}, + {123, "bar", NULL, NULL, 1}, + {123, "foo", NULL, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=grouping_sets_empty_grouping_set] +SELECT a, b, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS((a, b), (), ()) +ORDER BY a, b, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 5}, + {NULL, NULL, 5}, + {10, "bar", 1}, + {10, "foo", 1}, + {123, "bar", 1}, + {123, "foo", 1} +] +== + +[name=empty_grouping_sets_select_aggregate_function] +SELECT COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(()) +-- + +ARRAY>[{5}] +== + +[name=empty_grouping_sets_select_literal] +SELECT 1, "a", 1.23 +FROM simple_table +GROUP BY GROUPING SETS(()) +-- +ARRAY>[{1, "a", 1.23}] +== + +[name=grouping_sets_expression_columns] +SELECT a+1, UPPER(b), a-1, LOWER(b), COUNT(*), MAX(a), MAX(b) +FROM simple_table +GROUP BY GROUPING SETS((a+1, UPPER(b)), a-1, LOWER(b), ()) +ORDER BY MAX(a), MAX(b), COUNT(*) +-- +ARRAY>[unknown order: + {NULL, NULL, NULL, NULL, 1, NULL, NULL}, + {NULL, NULL, NULL, NULL, 1, NULL, NULL}, + {NULL, NULL, NULL, NULL, 1, NULL, NULL}, + {11, "BAR", NULL, NULL, 1, 10, "bar"}, + {11, "FOO", NULL, NULL, 1, 10, "foo"}, + {NULL, NULL, 9, NULL, 2, 10, "foo"}, + {124, "BAR", NULL, NULL, 1, 123, "bar"}, + {NULL, NULL, NULL, "bar", 2, 123, "bar"}, + {124, "FOO", NULL, NULL, 1, 123, "foo"}, + {NULL, NULL, NULL, "foo", 2, 123, "foo"}, + {NULL, NULL, 122, NULL, 2, 123, "foo"}, + {NULL, NULL, NULL, NULL, 5, 123, "foo"} +] +== + +[name=grouping_sets_select_expression_match_to_column_reference] +SELECT a, a+1, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(a, a+1) +ORDER BY a, a+1, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 2}, + {NULL, NULL, 2}, + {10, 11, 2}, + {123, 124, 2} +] +== + +[name=grouping_sets_duplicate_keys_in_grouping_set] +SELECT a, b, a+1, UPPER(b), COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS((a, a, b), (a, b, b), (a+1, a+1, a+1), (UPPER(b), UPPER(b))) +ORDER BY a, b, a+1, UPPER(b), COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, NULL, 2}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1} +] +== + +[name=grouping_sets_duplicate_columns_alias_ordinal_index_in_grouping_set] +SELECT a, b, a+1, UPPER(b) AS ub, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS((a, a, b), (1, b, b), (3, 3, 3), (ub, ub)) +ORDER BY 1, 2, 3, 4, 5 +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, "BAR", 2}, + {NULL, NULL, NULL, "FOO", 2}, + {NULL, NULL, 11, NULL, 2}, + {NULL, NULL, 124, NULL, 2}, + {10, "bar", NULL, NULL, 1}, + {10, "bar", NULL, NULL, 1}, + {10, "foo", NULL, NULL, 1}, + {10, "foo", NULL, NULL, 1}, + {123, "bar", NULL, NULL, 1}, + {123, "bar", NULL, NULL, 1}, + {123, "foo", NULL, NULL, 1}, + {123, "foo", NULL, NULL, 1} +] +== + +[name=grouping_sets_duplicate_grouping_sets] +SELECT a, b, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS((a, b), (a, b), a, a) +ORDER BY a, b, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {10, NULL, 2}, + {10, NULL, 2}, + {10, "bar", 1}, + {10, "bar", 1}, + {10, "foo", 1}, + {10, "foo", 1}, + {123, NULL, 2}, + {123, NULL, 2}, + {123, "bar", 1}, + {123, "bar", 1}, + {123, "foo", 1}, + {123, "foo", 1} +] +== + +[name=grouping_sets_with_alias] +SELECT a+d AS x, b AS y, c AS z, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(x, (y, z), z) +ORDER BY 1, 2, 3, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, false, 1}, + {NULL, NULL, true, 2}, + {NULL, "bar", NULL, 1}, + {NULL, "bar", true, 1}, + {NULL, "foo", false, 1}, + {NULL, "foo", true, 1}, + {nan, NULL, NULL, 1}, + {22, NULL, NULL, 1}, + {123, NULL, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=grouping_sets_with_ordinal_columns] +SELECT a, b, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(1, 2, (1, 2)) +ORDER BY 1, 2, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, "bar", 2}, + {NULL, "foo", 2}, + {10, NULL, 2}, + {10, "bar", 1}, + {10, "foo", 1}, + {123, NULL, 2}, + {123, "bar", 1}, + {123, "foo", 1} +] +== + +[name=grouping_sets_with_mix_of_column_alias_ordinal_index] +SELECT a+d AS x, UPPER(b), c, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(x, (2, c), 3) +ORDER BY 1, 2, 3, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, false, 1}, + {NULL, NULL, true, 2}, + {NULL, "BAR", NULL, 1}, + {NULL, "BAR", true, 1}, + {NULL, "FOO", false, 1}, + {NULL, "FOO", true, 1}, + {nan, NULL, NULL, 1}, + {22, NULL, NULL, 1}, + {123, NULL, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=grouping_sets_with_duplicate_simple_expressions] +SELECT a+1, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(a+1, a+1, a+1) +ORDER BY ANY_VALUE(a), COUNT(*) +-- +ARRAY>[known order: + {NULL, 1}, + {NULL, 1}, + {NULL, 1}, + {11, 2}, + {11, 2}, + {11, 2}, + {124, 2}, + {124, 2}, + {124, 2} +] +== + +[name=grouping_sets_with_duplicate_subquery_expressions] +SELECT COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS((SELECT a+1), (SELECT a+1)) +-- +ARRAY>[unknown order:{1}, {1}, {2}, {2}, {2}, {2}] +== + +[name=grouping_sets_with_collation_columns] +SELECT + UPPER(col_ci) AS col1, col_binary AS col_2, col AS col_3, + COUNT(*) AS count +FROM simple_collation_table +GROUP BY GROUPING SETS(col1, (col_binary, col)) +ORDER BY col1, col_2, col_3, count +-- +ARRAY>[known order: + {NULL, "B", 3, 1}, + {NULL, "a", 1, 1}, + {NULL, "ana", 4, 1}, + {NULL, "b", 2, 1}, + {NULL, "banana", 5, 1}, + {"A", NULL, NULL, 1}, + {"ANA", NULL, NULL, 1}, + {"B", NULL, NULL, 2}, + {"BANANA", NULL, NULL, 1} +] + +== + +[name=grouping_sets_with_aggregate_functions] +SELECT a, b, ANY_VALUE(a), ARRAY_AGG(c), +FROM simple_table +GROUP BY GROUPING SETS((a, b), a, ()) +HAVING b IS NOT NULL +ORDER BY a, b +-- +ARRAY>>[known order: + {10, "bar", 10, ARRAY[NULL]}, + {10, "foo", 10, ARRAY[true]}, + {123, "bar", 123, ARRAY[true]}, + {123, "foo", 123, ARRAY[false]} +] + +NOTE: Reference implementation reports non-determinism. +== + +[required_features=V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS] +[name=grouping_sets_with_analytic_functions] +SELECT a, b, + COUNT(*) OVER ( + PARTITION BY a + ORDER BY a + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS count, + SUM(a) OVER ( + PARTITION BY b + ORDER BY a + ROWS BETWEEN CURRENT ROW AND CURRENT ROW) AS sum, + MAX(b) OVER ( + PARTITION BY b + ORDER BY a + RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS max +FROM simple_table +GROUP BY GROUPING SETS((a, b), a, ()) +ORDER BY 1, 2, 3, 4, 5 +-- +ARRAY>[known order: + {NULL, NULL, 3, NULL, NULL}, + {NULL, NULL, 3, NULL, NULL}, + {NULL, NULL, 3, NULL, NULL}, + {10, NULL, 3, 10, NULL}, + {10, "bar", 3, 10, "bar"}, + {10, "foo", 3, 10, "foo"}, + {123, NULL, 3, 123, NULL}, + {123, "bar", 3, 123, "bar"}, + {123, "foo", 3, 123, "foo"} +] +== + +[name=grouping_sets_with_subquery_constant] +SELECT SUM(key) +FROM simple_table +GROUP BY GROUPING SETS( + (SELECT 'abc'), + (SELECT 1)) +-- +ARRAY>[unknown order:{15}, {15}] +== + +[name=grouping_sets_with_subquery_from_the_table] +SELECT SUM(key) +FROM simple_table +GROUP BY GROUPING SETS( + (SELECT COUNT(*) FROM simple_table), + ((SELECT SUM(key) FROM simple_table), (SELECT MAX(key) FROM simple_table)) +) +-- +ARRAY>[unknown order:{15}, {15}] +== + +[name=grouping_sets_with_correlated_subquery] +SELECT SUM(key) +FROM simple_table +GROUP BY GROUPING SETS( + (SELECT a+1), + (SELECT b) +) +ORDER BY SUM(key) +-- +ARRAY>[known order:{3}, {4}, {4}, {5}, {6}, {8}] +== + +[name=grouping_sets_with_keyword_column_name] +WITH T AS ( + SELECT 10 AS `GROUPING SETS`, "bar" AS `ROLLUP`, true AS `CUBE` UNION ALL + SELECT 11 AS `GROUPING SETS`, "bar" AS `ROLLUP`, false AS `CUBE` +) +SELECT `GROUPING SETS`, `ROLLUP`, `CUBE`, COUNT(*) +FROM T +GROUP BY GROUPING SETS(`GROUPING SETS`, `ROLLUP`, `CUBE`) +ORDER BY `GROUPING SETS`, `ROLLUP`, `CUBE`, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, false, 1}, + {NULL, NULL, true, 1}, + {NULL, "bar", NULL, 2}, + {10, NULL, NULL, 1}, + {11, NULL, NULL, 1} +] +== + +[name=grouping_sets_with_rollup_cube] +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +SELECT a+1, b, c, d, COUNT(*) AS cnt +FROM simple_table +GROUP BY GROUPING SETS(a+1, ROLLUP(b, c), CUBE(d), ()) +ORDER BY 1, b, c, d, cnt +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, nan, 1}, + {NULL, NULL, NULL, 1.23456e-65, 1}, + {NULL, NULL, NULL, 12, 1}, + {NULL, "bar", NULL, NULL, 1}, + {NULL, "bar", NULL, NULL, 2}, + {NULL, "bar", true, NULL, 1}, + {NULL, "foo", NULL, NULL, 2}, + {NULL, "foo", false, NULL, 1}, + {NULL, "foo", true, NULL, 1}, + {11, NULL, NULL, NULL, 2}, + {124, NULL, NULL, NULL, 2} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=grouping_sets_with_rollup_cube_on_multi_columns] +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +SELECT a+1, b, c, d, COUNT(*) AS cnt +FROM simple_table +GROUP BY GROUPING SETS(a+1, ROLLUP(b, (c, d)), CUBE(a+1, (c, b)), ()) +ORDER BY 1, b, c, d, cnt +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, "bar", NULL, NULL, 1}, + {NULL, "bar", NULL, NULL, 1}, + {NULL, "bar", NULL, NULL, 2}, + {NULL, "bar", true, NULL, 1}, + {NULL, "bar", true, 1.23456e-65, 1}, + {NULL, "foo", NULL, NULL, 2}, + {NULL, "foo", false, NULL, 1}, + {NULL, "foo", false, nan, 1}, + {NULL, "foo", true, NULL, 1}, + {NULL, "foo", true, 12, 1}, + {11, NULL, NULL, NULL, 2}, + {11, NULL, NULL, NULL, 2}, + {11, "bar", NULL, NULL, 1}, + {11, "foo", true, NULL, 1}, + {124, NULL, NULL, NULL, 2}, + {124, NULL, NULL, NULL, 2}, + {124, "bar", true, NULL, 1}, + {124, "foo", false, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=grouping_sets_with_rollup_cube_mix_column_alias_ordinal_index] +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +SELECT a+1, b AS x, c AS y, d, COUNT(*) AS cnt +FROM simple_table +GROUP BY GROUPING SETS(1, ROLLUP(x, (y, d)), CUBE(1, (3, 2)), ()) +ORDER BY 1, x, y, d, cnt +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, "bar", NULL, NULL, 1}, + {NULL, "bar", NULL, NULL, 1}, + {NULL, "bar", NULL, NULL, 2}, + {NULL, "bar", true, NULL, 1}, + {NULL, "bar", true, 1.23456e-65, 1}, + {NULL, "foo", NULL, NULL, 2}, + {NULL, "foo", false, NULL, 1}, + {NULL, "foo", false, nan, 1}, + {NULL, "foo", true, NULL, 1}, + {NULL, "foo", true, 12, 1}, + {11, NULL, NULL, NULL, 2}, + {11, NULL, NULL, NULL, 2}, + {11, "bar", NULL, NULL, 1}, + {11, "foo", true, NULL, 1}, + {124, NULL, NULL, NULL, 2}, + {124, NULL, NULL, NULL, 2}, + {124, "bar", true, NULL, 1}, + {124, "foo", false, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=grouping_sets_with_constant_select_list] +SELECT "a", true, 1.23 +FROM simple_table +GROUP BY GROUPING SETS(a, b) +-- +ARRAY>[unknown order: + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23} +] +== + +[name=grouping_sets_with_all_post_aggregate_clauses] +[required_features=V_1_3_QUALIFY,V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS] +SELECT a, b, c, COUNT(*) OVER() AS cnt +FROM simple_table +GROUP BY GROUPING SETS(a, b, c) +HAVING a IS NULL +QUALIFY c +ORDER BY a, b, c, cnt +-- + +ARRAY>[{NULL, NULL, true, 7}] +== + +[name=grouping_sets_with_having_clauses] +[required_features=V_1_4_GROUPING_SETS,V_1_4_GROUPING_BUILTIN] +SELECT a, b, c, COUNT(*) +FROM simple_table +GROUP BY GROUPING SETS(a, b, c) +HAVING GROUPING(a) = 0 +ORDER BY 1, 2, 3, 4 +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {10, NULL, NULL, 2}, + {123, NULL, NULL, 2} +] +== + +[name=grouping_sets_with_qualify_clauses] +[required_features=V_1_3_QUALIFY,V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS,V_1_4_GROUPING_BUILTIN] +SELECT a, b, c, COUNT(*) OVER() +FROM simple_table +GROUP BY GROUPING SETS(a, b, c) +QUALIFY GROUPING(a) = 0 +ORDER BY 1, 2, 3, 4 +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 9}, + {10, NULL, NULL, 9}, + {123, NULL, NULL, 9} +] +== + +[name=grouping_sets_with_pivot] +[required_features=V_1_3_PIVOT,V_1_4_GROUPING_SETS] +SELECT a+1, foo, bar, COUNT(*) AS cnt +FROM simple_table PIVOT(SUM(d) FOR b IN ('foo', 'bar')) +GROUP BY GROUPING SETS(a+1, foo, bar) +ORDER BY 1, foo, bar, cnt +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 3}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, 1.23456e-65, 1}, + {NULL, nan, NULL, 1}, + {NULL, 12, NULL, 1}, + {11, NULL, NULL, 2}, + {124, NULL, NULL, 2} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=grouping_sets_with_unpivot] +[required_features=V_1_3_UNPIVOT,V_1_4_GROUPING_SETS] +SELECT a+1, e, f, COUNT(*) AS cnt +FROM simple_table UNPIVOT(e FOR f in (b)) +GROUP BY GROUPING SETS(a+1, e, f) +ORDER BY 1, e, f, cnt +-- + +ARRAY>[known order: + {NULL, NULL, "b", 4}, + {NULL, "bar", NULL, 2}, + {NULL, "foo", NULL, 2}, + {11, NULL, NULL, 2}, + {124, NULL, NULL, 2} +] +== + +[name=grouping_sets_and_with_recursive] +[required_features=V_1_3_WITH_RECURSIVE,V_1_4_GROUPING_SETS] +WITH RECURSIVE + CTE_1 AS ( + ( + SELECT iteration as it, iteration+1 AS it1 + FROM UNNEST([1, 2, 3, 4]) AS iteration + GROUP BY GROUPING SETS(iteration, iteration+1) + ) + UNION ALL ( + SELECT it+1 AS it, it1+1 AS it1 FROM CTE_1 WHERE it < 3) + ) +SELECT it, it1 FROM CTE_1 +ORDER BY 1, 2 ASC +-- + +ARRAY>[known order: + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {1, 2}, + {2, 3}, + {2, 3}, + {3, 4}, + {3, 4}, + {3, 4}, + {4, 5} +] +== + +[required_features=GROUP_BY_ROLLUP] +[name=rollup_with_single_columns] +SELECT a, b, SUM(key) +FROM simple_table +GROUP BY ROLLUP(a, b) +ORDER BY a, b, SUM(key) +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {10, NULL, 6}, + {10, "bar", 5}, + {10, "foo", 1}, + {123, NULL, 5}, + {123, "bar", 3}, + {123, "foo", 2} +] +== + +[required_features=GROUP_BY_ROLLUP] +[name=rollup_with_expression_columns] +SELECT a+1, UPPER(b), SUM(key) +FROM simple_table +GROUP BY ROLLUP(a+1, UPPER(b)) +ORDER BY 1, 2, SUM(key) +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {11, NULL, 6}, + {11, "BAR", 5}, + {11, "FOO", 1}, + {124, NULL, 5}, + {124, "BAR", 3}, + {124, "FOO", 2} +] +== + +[required_features=GROUP_BY_ROLLUP] +[name=rollup_with_duplicate_columns] +SELECT a+1, b, SUM(key) +FROM simple_table +GROUP BY ROLLUP(a+1, a+1, a+1, b, b) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {11, NULL, 6}, + {11, NULL, 6}, + {11, NULL, 6}, + {11, "bar", 5}, + {11, "bar", 5}, + {11, "foo", 1}, + {11, "foo", 1}, + {124, NULL, 5}, + {124, NULL, 5}, + {124, NULL, 5}, + {124, "bar", 3}, + {124, "bar", 3}, + {124, "foo", 2}, + {124, "foo", 2} +] +== + +[required_features=GROUP_BY_ROLLUP] +[name=rollup_with_duplicate_mix_of_column_alias_ordinal_index] +SELECT a+1 AS a1, b, SUM(key) +FROM simple_table +GROUP BY ROLLUP(a1, 1, 1, b, 2, 2) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {11, NULL, 6}, + {11, NULL, 6}, + {11, NULL, 6}, + {11, "bar", 5}, + {11, "bar", 5}, + {11, "bar", 5}, + {11, "foo", 1}, + {11, "foo", 1}, + {11, "foo", 1}, + {124, NULL, 5}, + {124, NULL, 5}, + {124, NULL, 5}, + {124, "bar", 3}, + {124, "bar", 3}, + {124, "bar", 3}, + {124, "foo", 2}, + {124, "foo", 2}, + {124, "foo", 2} +] +== + +[required_features=GROUP_BY_ROLLUP] +[name=rollup_with_single_columns_in_parentheses] +SELECT a, b, SUM(key) +FROM simple_table +GROUP BY ROLLUP((a), (b)) +ORDER BY a, b, SUM(key) +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {10, NULL, 6}, + {10, "bar", 5}, + {10, "foo", 1}, + {123, NULL, 5}, + {123, "bar", 3}, + {123, "foo", 2} +] +== + +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_multi_columns] +SELECT a, b, c, d, SUM(key) +FROM simple_table +GROUP BY ROLLUP((a, b), (c), d) +ORDER BY a, b, c, d, SUM(key) +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, NULL, 15}, + {10, "bar", NULL, NULL, 5}, + {10, "bar", NULL, NULL, 5}, + {10, "bar", NULL, NULL, 5}, + {10, "foo", NULL, NULL, 1}, + {10, "foo", true, NULL, 1}, + {10, "foo", true, 12, 1}, + {123, "bar", NULL, NULL, 3}, + {123, "bar", true, NULL, 3}, + {123, "bar", true, 1.23456e-65, 3}, + {123, "foo", NULL, NULL, 2}, + {123, "foo", false, NULL, 2}, + {123, "foo", false, nan, 2} +] + +NOTE: Reference implementation reports non-determinism. +== + +[required_features=GROUP_BY_ROLLUP] +[name=rollup_with_parenthesised_single_column] +SELECT a, b, SUM(key) +FROM simple_table +GROUP BY ROLLUP((a), (b)) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {10, NULL, 6}, + {10, "bar", 5}, + {10, "foo", 1}, + {123, NULL, 5}, + {123, "bar", 3}, + {123, "foo", 2} +] +== + +# The WHERE EXISTS clause is to make sure the V_1_4_GROUPING_SETS feature option +# required. Without the where clause, the query can be run by missing the +# V_1_4_GROUPING_SETS feature option, which will cause +# required_feature_integrity_test fail. It's intentional to test the query +# behavior won't change when V_1_4_GROUPING_SETS is enabled. V_1_4_GROUPING_SETS +# uses a new code path to resolve the ROLLUP. +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_parenthesised_single_column_when_grouping_sets_enabled] +SELECT a, b, SUM(key) +FROM simple_table +WHERE EXISTS(SELECT "a" FROM simple_table GROUP BY GROUPING SETS(())) +GROUP BY ROLLUP((a), (b)) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {10, NULL, 6}, + {10, "bar", 5}, + {10, "foo", 1}, + {123, NULL, 5}, + {123, "bar", 3}, + {123, "foo", 2} +] +== + +# (a, b) is resolved as a struct when only GROUP_BY_ROLLUP is enabled, and is +# resolved to a multi-column when V_1_4_GROUPING_SETS is enabled. +[required_features=GROUP_BY_ROLLUP,V_1_2_GROUP_BY_STRUCT] +[name=rollup_with_multi_columns_as_struct] +SELECT SUM(key) +FROM simple_table +GROUP BY ROLLUP((a, b), (c, d)) +ORDER BY SUM(key) +-- +ARRAY>[known order: + {1}, + {1}, + {2}, + {2}, + {3}, + {3}, + {4}, + {4}, + {5}, + {5}, + {15} +] +== + +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_multi_expression_columns] +SELECT a+1, UPPER(b), c, d, SUM(key) +FROM simple_table +GROUP BY ROLLUP(a+1, (UPPER(b), c), d) +ORDER BY 1, 2, c, d, SUM(key) +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, NULL, 15}, + {11, NULL, NULL, NULL, 6}, + {11, "BAR", NULL, NULL, 5}, + {11, "BAR", NULL, NULL, 5}, + {11, "FOO", true, NULL, 1}, + {11, "FOO", true, 12, 1}, + {124, NULL, NULL, NULL, 5}, + {124, "BAR", true, NULL, 3}, + {124, "BAR", true, 1.23456e-65, 3}, + {124, "FOO", false, NULL, 2}, + {124, "FOO", false, nan, 2} +] + +NOTE: Reference implementation reports non-determinism. +== + +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_duplicate_multi_columns] +SELECT a, b, UPPER(b), SUM(key) +FROM simple_table +GROUP BY ROLLUP((a, b), (a, b), UPPER(b), UPPER(b)) +ORDER BY 1, 2, 3, 4 +-- +ARRAY>[known order: + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 15}, + {10, "bar", "BAR", 5}, + {10, "bar", "BAR", 5}, + {10, "bar", "BAR", 5}, + {10, "bar", "BAR", 5}, + {10, "foo", "FOO", 1}, + {10, "foo", "FOO", 1}, + {10, "foo", "FOO", 1}, + {10, "foo", "FOO", 1}, + {123, "bar", "BAR", 3}, + {123, "bar", "BAR", 3}, + {123, "bar", "BAR", 3}, + {123, "bar", "BAR", 3}, + {123, "foo", "FOO", 2}, + {123, "foo", "FOO", 2}, + {123, "foo", "FOO", 2}, + {123, "foo", "FOO", 2} +] +== + +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_duplicate_columns_in_multi_columns] +SELECT a+1, a-1, SUM(key) +FROM simple_table +GROUP BY ROLLUP((a+1, a+1, a+1), (a-1, a-1, a-1)) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {11, NULL, 6}, + {11, 9, 6}, + {124, NULL, 5}, + {124, 122, 5} +] + +== +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_duplicate_mix_of_column_alias_ordinal_index_in_multi_columns] +SELECT a+1, a-1, a*2 AS a2, SUM(key) +FROM simple_table +GROUP BY ROLLUP((a+1, 1, 1), (2, a-1, 2), (a2, 3, a2)) +ORDER BY 1, 2, 3, 4 +-- +ARRAY>[known order: + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 15}, + {11, NULL, NULL, 6}, + {11, 9, NULL, 6}, + {11, 9, 20, 6}, + {124, NULL, NULL, 5}, + {124, 122, NULL, 5}, + {124, 122, 246, 5} +] +== + +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_mix_of_column_alias_ordinal_index] +SELECT a+1, a-1 AS x, b, SUM(key) +FROM simple_table +GROUP BY ROLLUP(1, (a+1, x), b) +ORDER BY 1, 2, 3, 4 +-- +ARRAY>[known order: + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 15}, + {11, NULL, NULL, 6}, + {11, 9, NULL, 6}, + {11, 9, "bar", 5}, + {11, 9, "foo", 1}, + {124, NULL, NULL, 5}, + {124, 122, NULL, 5}, + {124, 122, "bar", 3}, + {124, 122, "foo", 2} +] +== + +[required_features=GROUP_BY_ROLLUP] +[name=rollup_expression_match] +SELECT a, a+1, SUM(key) +FROM simple_table +GROUP BY ROLLUP(a, a+1) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {10, 11, 6}, + {10, 11, 6}, + {123, 124, 5}, + {123, 124, 5} +] +== + +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_multi_column_expression_match] +SELECT a, a+1, SUM(key) +FROM simple_table +GROUP BY ROLLUP(a, (a, a+1), a+1) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 15}, + {10, 11, 6}, + {10, 11, 6}, + {10, 11, 6}, + {123, 124, 5}, + {123, 124, 5}, + {123, 124, 5} +] +== + +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +[name=rollup_with_subquery] +SELECT SUM(key) +FROM simple_table +GROUP BY ROLLUP( + (SELECT COUNT(*) FROM simple_table), + ((SELECT SUM(key) FROM simple_table), (SELECT MAX(key) FROM simple_table)) +) +ORDER BY 1 +-- +ARRAY>[known order:{15}, {15}, {15}] +== + +[name=rollup_with_collation_columns] +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +SELECT + UPPER(col_ci) AS col1, col_binary AS col_2, col AS col_3, + COUNT(*) AS count +FROM simple_collation_table +GROUP BY ROLLUP(col1, (col_binary, col)) +ORDER BY col1, col_2, col_3, count +-- +ARRAY>[known order: + {NULL, NULL, NULL, 5}, + {"A", NULL, NULL, 1}, + {"A", "a", 1, 1}, + {"ANA", NULL, NULL, 1}, + {"ANA", "ana", 4, 1}, + {"B", NULL, NULL, 2}, + {"B", "B", 3, 1}, + {"B", "b", 2, 1}, + {"BANANA", NULL, NULL, 1}, + {"BANANA", "banana", 5, 1} +] +== + +[required_features=V_1_4_GROUPING_SETS,V_1_4_GROUPING_BUILTIN] +[name=cube_simple_query] +SELECT a, b, GROUPING(a), GROUPING(b), COUNT(*) +FROM simple_table +GROUP BY CUBE(a, b) +ORDER BY a, b, GROUPING(a), GROUPING(b), COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, 0, 0, 1}, + {NULL, NULL, 0, 1, 1}, + {NULL, NULL, 1, 0, 1}, + {NULL, NULL, 1, 1, 5}, + {NULL, "bar", 1, 0, 2}, + {NULL, "foo", 1, 0, 2}, + {10, NULL, 0, 1, 2}, + {10, "bar", 0, 0, 1}, + {10, "foo", 0, 0, 1}, + {123, NULL, 0, 1, 2}, + {123, "bar", 0, 0, 1}, + {123, "foo", 0, 0, 1} +] +== + +[required_features=V_1_4_GROUPING_SETS] +[name=cube_simple_query_no_grouping] +SELECT a, b, COUNT(*) +FROM simple_table +GROUP BY CUBE(a, b) +ORDER BY a, b, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 5}, + {NULL, "bar", 2}, + {NULL, "foo", 2}, + {10, NULL, 2}, + {10, "bar", 1}, + {10, "foo", 1}, + {123, NULL, 2}, + {123, "bar", 1}, + {123, "foo", 1} +] + +== + +[required_features=V_1_4_GROUPING_SETS,V_1_4_GROUPING_BUILTIN] +[name=cube_with_multi_columns_grouping_set] +SELECT a, b, c, GROUPING(c), SUM(d) +FROM simple_table +GROUP BY CUBE(c, (a, b)) +ORDER BY a, b, c, GROUPING(c), SUM(d) +-- +ARRAY>[known order: + {NULL, NULL, NULL, 0, NULL}, + {NULL, NULL, NULL, 0, NULL}, + {NULL, NULL, NULL, 1, NULL}, + {NULL, NULL, NULL, 1, nan}, + {NULL, NULL, false, 0, nan}, + {NULL, NULL, true, 0, 12}, + {10, "bar", NULL, 0, NULL}, + {10, "bar", NULL, 1, NULL}, + {10, "foo", NULL, 1, 12}, + {10, "foo", true, 0, 12}, + {123, "bar", NULL, 1, 1.23456e-65}, + {123, "bar", true, 0, 1.23456e-65}, + {123, "foo", NULL, 1, nan}, + {123, "foo", false, 0, nan} +] +== + +[required_features=V_1_4_GROUPING_SETS] +[name=cube_with_multi_columns_grouping_set_no_grouping] +SELECT a, b, c, SUM(d) +FROM simple_table +GROUP BY CUBE(c, (a, b)) +ORDER BY a, b, c, SUM(d) +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL}, + {NULL, NULL, NULL, NULL}, + {NULL, NULL, NULL, NULL}, + {NULL, NULL, NULL, nan}, + {NULL, NULL, false, nan}, + {NULL, NULL, true, 12}, + {10, "bar", NULL, NULL}, + {10, "bar", NULL, NULL}, + {10, "foo", NULL, 12}, + {10, "foo", true, 12}, + {123, "bar", NULL, 1.23456e-65}, + {123, "bar", true, 1.23456e-65}, + {123, "foo", NULL, nan}, + {123, "foo", false, nan} +] + +== + +[required_features=V_1_4_GROUPING_SETS] +[name=cube_with_duplicated_columns_no_grouping] +SELECT a, b, COUNT(*) +FROM simple_table +GROUP BY CUBE(a, a, b, a) +ORDER BY a, b, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 5}, + {NULL, "bar", 2}, + {NULL, "foo", 2}, + {10, NULL, 2}, + {10, NULL, 2}, + {10, NULL, 2}, + {10, NULL, 2}, + {10, NULL, 2}, + {10, NULL, 2}, + {10, NULL, 2}, + {10, "bar", 1}, + {10, "bar", 1}, + {10, "bar", 1}, + {10, "bar", 1}, + {10, "bar", 1}, + {10, "bar", 1}, + {10, "bar", 1}, + {10, "foo", 1}, + {10, "foo", 1}, + {10, "foo", 1}, + {10, "foo", 1}, + {10, "foo", 1}, + {10, "foo", 1}, + {10, "foo", 1}, + {123, NULL, 2}, + {123, NULL, 2}, + {123, NULL, 2}, + {123, NULL, 2}, + {123, NULL, 2}, + {123, NULL, 2}, + {123, NULL, 2}, + {123, "bar", 1}, + {123, "bar", 1}, + {123, "bar", 1}, + {123, "bar", 1}, + {123, "bar", 1}, + {123, "bar", 1}, + {123, "bar", 1}, + {123, "foo", 1}, + {123, "foo", 1}, + {123, "foo", 1}, + {123, "foo", 1}, + {123, "foo", 1}, + {123, "foo", 1}, + {123, "foo", 1} +] +== + +[required_features=V_1_4_GROUPING_SETS] +[name=cube_with_duplicated_multi_columns] +SELECT a, b, c, COUNT(*) +FROM simple_table +GROUP BY CUBE((a, b), (a, b), c, c, (c, c), (c, c)) +ORDER BY a, b, c, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 5}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, false, 1}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {NULL, NULL, true, 2}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", NULL, 1}, + {10, "foo", NULL, 1}, + {10, "foo", NULL, 1}, + {10, "foo", NULL, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {10, "foo", true, 1}, + {123, "bar", NULL, 1}, + {123, "bar", NULL, 1}, + {123, "bar", NULL, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "bar", true, 1}, + {123, "foo", NULL, 1}, + {123, "foo", NULL, 1}, + {123, "foo", NULL, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1}, + {123, "foo", false, 1} +] + +== + +[required_features=V_1_4_GROUPING_SETS] +[name=cube_with_expressions] +SELECT b, MAX(c) +FROM simple_table +GROUP BY CUBE(key+1, b, c) +ORDER BY b, MAX(c) +-- +ARRAY>[known order: + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, false}, + {NULL, false}, + {NULL, false}, + {NULL, true}, + {NULL, true}, + {NULL, true}, + {NULL, true}, + {NULL, true}, + {NULL, true}, + {"bar", NULL}, + {"bar", NULL}, + {"bar", NULL}, + {"bar", true}, + {"bar", true}, + {"bar", true}, + {"bar", true}, + {"foo", false}, + {"foo", false}, + {"foo", false}, + {"foo", true}, + {"foo", true}, + {"foo", true}, + {"foo", true} +] +== + +[name=cube_with_collated_column] +[required_features=V_1_4_GROUPING_SETS] +SELECT + UPPER(col_ci) AS col1, + col_no_collation AS col2, + COUNT(*) AS count +FROM simple_collation_table +GROUP BY CUBE(col_ci, col_no_collation, col) +ORDER BY col1, col2, count +-- +ARRAY>[known order: + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 1}, + {NULL, NULL, 5}, + {NULL, "B", 1}, + {NULL, "B", 1}, + {NULL, "a", 1}, + {NULL, "a", 1}, + {NULL, "ana", 1}, + {NULL, "ana", 1}, + {NULL, "b", 1}, + {NULL, "b", 1}, + {NULL, "banana", 1}, + {NULL, "banana", 1}, + {"A", NULL, 1}, + {"A", NULL, 1}, + {"A", "a", 1}, + {"A", "a", 1}, + {"ANA", NULL, 1}, + {"ANA", NULL, 1}, + {"ANA", "ana", 1}, + {"ANA", "ana", 1}, + {"B", NULL, 1}, + {"B", NULL, 1}, + {"B", NULL, 1}, + {"B", NULL, 1}, + {"B", "B", 1}, + {"B", "B", 1}, + {"B", "b", 1}, + {"B", "b", 1}, + {"BANANA", NULL, 1}, + {"BANANA", NULL, 1}, + {"BANANA", "banana", 1}, + {"BANANA", "banana", 1} +] +== + +[name=cube_with_analytic_functions] +[required_features=V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS] +SELECT a, b, + COUNT(*) OVER ( + PARTITION BY a + ORDER BY a + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS count, + SUM(a) OVER ( + PARTITION BY b + ORDER BY a + ROWS BETWEEN CURRENT ROW AND CURRENT ROW) AS sum, + MAX(b) OVER ( + PARTITION BY b + ORDER BY a + RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS max +FROM simple_table +GROUP BY CUBE(a, b) +ORDER BY 1, 2, 3, 4, 5 +-- +ARRAY>[known order: + {NULL, NULL, 6, NULL, NULL}, + {NULL, NULL, 6, NULL, NULL}, + {NULL, NULL, 6, NULL, NULL}, + {NULL, NULL, 6, NULL, NULL}, + {NULL, "bar", 6, NULL, "bar"}, + {NULL, "foo", 6, NULL, "foo"}, + {10, NULL, 3, 10, NULL}, + {10, "bar", 3, 10, "bar"}, + {10, "foo", 3, 10, "foo"}, + {123, NULL, 3, 123, NULL}, + {123, "bar", 3, 123, "bar"}, + {123, "foo", 3, 123, "foo"} +] + +== + +[name=cube_with_parenthesized_single_column] +[required_features=V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS] +SELECT a, b, + COUNT(*) OVER ( + PARTITION BY a + ORDER BY a + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS count, + SUM(a) OVER ( + PARTITION BY b + ORDER BY a + ROWS BETWEEN CURRENT ROW AND CURRENT ROW) AS sum, + MAX(b) OVER ( + PARTITION BY b + ORDER BY a + RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS max +FROM simple_table +GROUP BY CUBE( (a), (b) ) +ORDER BY 1, 2, 3, 4, 5 +-- +ARRAY>[known order: + {NULL, NULL, 6, NULL, NULL}, + {NULL, NULL, 6, NULL, NULL}, + {NULL, NULL, 6, NULL, NULL}, + {NULL, NULL, 6, NULL, NULL}, + {NULL, "bar", 6, NULL, "bar"}, + {NULL, "foo", 6, NULL, "foo"}, + {10, NULL, 3, 10, NULL}, + {10, "bar", 3, 10, "bar"}, + {10, "foo", 3, 10, "foo"}, + {123, NULL, 3, 123, NULL}, + {123, "bar", 3, 123, "bar"}, + {123, "foo", 3, 123, "foo"} +] + +== + +[name=cube_mixed_column_alias_and_ordinal_index] +SELECT a, b, a+1, UPPER(b) AS ub, COUNT(*) +FROM simple_table +GROUP BY CUBE(a, 1, 2, 2, ub) +ORDER BY 1, 2, 3, 4, 5 +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, NULL, 5}, + {NULL, NULL, NULL, "BAR", 2}, + {NULL, NULL, NULL, "FOO", 2}, + {NULL, "bar", NULL, NULL, 2}, + {NULL, "bar", NULL, NULL, 2}, + {NULL, "bar", NULL, NULL, 2}, + {NULL, "bar", NULL, "BAR", 2}, + {NULL, "bar", NULL, "BAR", 2}, + {NULL, "bar", NULL, "BAR", 2}, + {NULL, "foo", NULL, NULL, 2}, + {NULL, "foo", NULL, NULL, 2}, + {NULL, "foo", NULL, NULL, 2}, + {NULL, "foo", NULL, "FOO", 2}, + {NULL, "foo", NULL, "FOO", 2}, + {NULL, "foo", NULL, "FOO", 2}, + {10, NULL, 11, NULL, 2}, + {10, NULL, 11, NULL, 2}, + {10, NULL, 11, NULL, 2}, + {10, NULL, 11, "BAR", 1}, + {10, NULL, 11, "BAR", 1}, + {10, NULL, 11, "BAR", 1}, + {10, NULL, 11, "FOO", 1}, + {10, NULL, 11, "FOO", 1}, + {10, NULL, 11, "FOO", 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, NULL, 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "bar", 11, "BAR", 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, NULL, 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {10, "foo", 11, "FOO", 1}, + {123, NULL, 124, NULL, 2}, + {123, NULL, 124, NULL, 2}, + {123, NULL, 124, NULL, 2}, + {123, NULL, 124, "BAR", 1}, + {123, NULL, 124, "BAR", 1}, + {123, NULL, 124, "BAR", 1}, + {123, NULL, 124, "FOO", 1}, + {123, NULL, 124, "FOO", 1}, + {123, NULL, 124, "FOO", 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, NULL, 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "bar", 124, "BAR", 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, NULL, 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1}, + {123, "foo", 124, "FOO", 1} +] + +== + +[name=cube_with_constant_select_list] +SELECT "abc", 111, "ddd" +FROM simple_table +GROUP BY GROUPING SETS(a, b) +-- +ARRAY>[unknown order: + {"abc", 111, "ddd"}, + {"abc", 111, "ddd"}, + {"abc", 111, "ddd"}, + {"abc", 111, "ddd"}, + {"abc", 111, "ddd"}, + {"abc", 111, "ddd"} +] + +== + +[name=cube_with_post_aggregate_clauses] +[required_features=V_1_3_QUALIFY,V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS] +SELECT a, b, c, COUNT(*) OVER() AS cnt +FROM simple_table +GROUP BY CUBE(a, b, c) +QUALIFY a > 10 +ORDER BY a, b, c, cnt +-- +ARRAY>[known order: + {123, NULL, NULL, 30}, + {123, NULL, false, 30}, + {123, NULL, true, 30}, + {123, "bar", NULL, 30}, + {123, "bar", true, 30}, + {123, "foo", NULL, 30}, + {123, "foo", false, 30} +] + +== + +[name=cube_with_having_clauses] +[required_features=V_1_4_GROUPING_SETS] +SELECT a, b, COUNT(*) +FROM simple_table +GROUP BY CUBE(a, b) +HAVING COUNT(*) > 1 +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 5}, + {NULL, "bar", 2}, + {NULL, "foo", 2}, + {10, NULL, 2}, + {123, NULL, 2} +] + +== + +[name=cube_with_qualify_clauses] +[required_features=V_1_3_QUALIFY,V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS,V_1_4_GROUPING_BUILTIN] +SELECT a, b, COUNT(*) OVER() +FROM simple_table +GROUP BY CUBE(a, b) +QUALIFY GROUPING(a) = 0 +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 12}, + {NULL, NULL, 12}, + {10, NULL, 12}, + {10, "bar", 12}, + {10, "foo", 12}, + {123, NULL, 12}, + {123, "bar", 12}, + {123, "foo", 12} +] + +== + +[name=cube_with_pivot] +[required_features=V_1_3_PIVOT,V_1_4_GROUPING_SETS] +SELECT a, foo, bar, COUNT(*) AS cnt +FROM simple_table PIVOT(SUM(d) FOR b IN ('foo', 'bar')) +GROUP BY CUBE(a, foo, bar) +ORDER BY a, foo, bar, cnt +-- +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 2}, + {NULL, NULL, NULL, 3}, + {NULL, NULL, NULL, 4}, + {NULL, NULL, NULL, 5}, + {NULL, NULL, 1.23456e-65, 1}, + {NULL, NULL, 1.23456e-65, 1}, + {NULL, nan, NULL, 1}, + {NULL, nan, NULL, 1}, + {NULL, 12, NULL, 1}, + {NULL, 12, NULL, 1}, + {10, NULL, NULL, 1}, + {10, NULL, NULL, 1}, + {10, NULL, NULL, 2}, + {10, NULL, NULL, 2}, + {10, 12, NULL, 1}, + {10, 12, NULL, 1}, + {123, NULL, NULL, 1}, + {123, NULL, NULL, 1}, + {123, NULL, NULL, 2}, + {123, NULL, 1.23456e-65, 1}, + {123, NULL, 1.23456e-65, 1}, + {123, nan, NULL, 1}, + {123, nan, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. + +== +[name=cube_with_unpivot] +[required_features=V_1_3_UNPIVOT,V_1_4_GROUPING_SETS] +SELECT a, e, f, COUNT(*) AS cnt +FROM simple_table UNPIVOT(e FOR f in (b)) +GROUP BY CUBE(a, e, f) +ORDER BY a, e, f, cnt +-- +ARRAY>[known order: + {NULL, NULL, NULL, 4}, + {NULL, NULL, "b", 4}, + {NULL, "bar", NULL, 2}, + {NULL, "bar", "b", 2}, + {NULL, "foo", NULL, 2}, + {NULL, "foo", "b", 2}, + {10, NULL, NULL, 2}, + {10, NULL, "b", 2}, + {10, "bar", NULL, 1}, + {10, "bar", "b", 1}, + {10, "foo", NULL, 1}, + {10, "foo", "b", 1}, + {123, NULL, NULL, 2}, + {123, NULL, "b", 2}, + {123, "bar", NULL, 1}, + {123, "bar", "b", 1}, + {123, "foo", NULL, 1}, + {123, "foo", "b", 1} +] + +== + +[name=cube_and_with_recursive] +[required_features=V_1_3_WITH_RECURSIVE,V_1_4_GROUPING_SETS] +WITH RECURSIVE + CTE_1 AS ( + ( + SELECT iteration as it, iteration+1 AS it1 + FROM UNNEST([1, 2, 3, 4]) AS iteration + GROUP BY CUBE(iteration, iteration+1) + ) + UNION ALL ( + SELECT it+1 AS it, it1+1 AS it1 FROM CTE_1 WHERE it < 3) + ) +SELECT it, it1 FROM CTE_1 +ORDER BY 1, 2 ASC +-- +ARRAY>[known order: + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {NULL, NULL}, + {1, 2}, + {1, 2}, + {2, 3}, + {2, 3}, + {2, 3}, + {2, 3}, + {3, 4}, + {3, 4}, + {3, 4}, + {3, 4}, + {3, 4}, + {3, 4}, + {4, 5}, + {4, 5} +] +== + +[required_features=V_1_4_GROUPING_SETS] +[name=cube_expression_match] +SELECT a, a+1, SUM(key) +FROM simple_table +GROUP BY CUBE(a, a+1) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 5}, + {NULL, NULL, 6}, + {NULL, NULL, 15}, + {10, 11, 6}, + {10, 11, 6}, + {123, 124, 5}, + {123, 124, 5} +] + +== +[required_features=V_1_4_GROUPING_SETS] +[name=cube_multi_column_expression_match] +SELECT a, a+1, SUM(key) +FROM simple_table +GROUP BY CUBE(a, (a, a+1), a+1) +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 4}, + {NULL, NULL, 5}, + {NULL, NULL, 6}, + {NULL, NULL, 15}, + {10, 11, 6}, + {10, 11, 6}, + {10, 11, 6}, + {10, 11, 6}, + {10, 11, 6}, + {10, 11, 6}, + {123, 124, 5}, + {123, 124, 5}, + {123, 124, 5}, + {123, 124, 5}, + {123, 124, 5}, + {123, 124, 5} +] + +== + +[required_features=V_1_4_GROUPING_SETS] +[name=cube_with_subquery] +SELECT SUM(key) +FROM simple_table +GROUP BY CUBE( + (SELECT COUNT(*) FROM simple_table), + ((SELECT SUM(key) FROM simple_table), (SELECT MAX(key) FROM simple_table)) +) +ORDER BY 1 +-- +ARRAY>[known order:{15}, {15}, {15}, {15}] + +== + +[name=cube_with_keyword_column_name] +[required_features=V_1_4_GROUPING_SETS] +WITH T AS ( + SELECT 10 AS `GROUPING SETS`, "bar" AS `ROLLUP`, true AS `CUBE` UNION ALL + SELECT 11 AS `GROUPING SETS`, "bar" AS `ROLLUP`, false AS `CUBE` +) +SELECT `GROUPING SETS`, `ROLLUP`, `CUBE`, COUNT(*) +FROM T +GROUP BY CUBE(`GROUPING SETS`, `ROLLUP`, `CUBE`) +ORDER BY `GROUPING SETS`, `ROLLUP`, `CUBE`, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, NULL, 2}, + {NULL, NULL, false, 1}, + {NULL, NULL, true, 1}, + {NULL, "bar", NULL, 2}, + {NULL, "bar", false, 1}, + {NULL, "bar", true, 1}, + {10, NULL, NULL, 1}, + {10, NULL, true, 1}, + {10, "bar", NULL, 1}, + {10, "bar", true, 1}, + {11, NULL, NULL, 1}, + {11, NULL, false, 1}, + {11, "bar", NULL, 1}, + {11, "bar", false, 1} +] + +== + +[required_features=GROUP_BY_ROLLUP,ANALYTIC_FUNCTIONS] +[name=rollup_with_analytic_functions] +SELECT a, b, + COUNT(*) OVER ( + PARTITION BY a + ORDER BY a + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS count, + SUM(a) OVER ( + PARTITION BY b + ORDER BY a + ROWS BETWEEN CURRENT ROW AND CURRENT ROW) AS sum, + MAX(b) OVER ( + PARTITION BY b + ORDER BY a + RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS max +FROM simple_table +GROUP BY ROLLUP(a, b, a) +ORDER BY 1, 2, 3, 4, 5 +-- +ARRAY>[known order: + {NULL, NULL, 4, NULL, NULL}, + {NULL, NULL, 4, NULL, NULL}, + {NULL, NULL, 4, NULL, NULL}, + {NULL, NULL, 4, NULL, NULL}, + {10, NULL, 5, 10, NULL}, + {10, "bar", 5, 10, "bar"}, + {10, "bar", 5, 10, "bar"}, + {10, "foo", 5, 10, "foo"}, + {10, "foo", 5, 10, "foo"}, + {123, NULL, 5, 123, NULL}, + {123, "bar", 5, 123, "bar"}, + {123, "bar", 5, 123, "bar"}, + {123, "foo", 5, 123, "foo"}, + {123, "foo", 5, 123, "foo"} +] +== + +[name=rollup_with_qualify_clauses] +[required_features=GROUP_BY_ROLLUP,ANALYTIC_FUNCTIONS,V_1_3_QUALIFY,V_1_4_GROUPING_BUILTIN] +SELECT a, UPPER(b), c, COUNT(*) OVER() AS cnt +FROM simple_table +GROUP BY ROLLUP(a, UPPER(b), c) +QUALIFY GROUPING(a) = 0 +ORDER BY 1, 2, 3, 4 +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 14}, + {NULL, NULL, NULL, 14}, + {NULL, NULL, NULL, 14}, + {10, NULL, NULL, 14}, + {10, "BAR", NULL, 14}, + {10, "BAR", NULL, 14}, + {10, "FOO", NULL, 14}, + {10, "FOO", true, 14}, + {123, NULL, NULL, 14}, + {123, "BAR", NULL, 14}, + {123, "BAR", true, 14}, + {123, "FOO", NULL, 14}, + {123, "FOO", false, 14} +] +== + +[name=rollup_with_having_clause] +[required_features=GROUP_BY_ROLLUP,V_1_4_GROUPING_BUILTIN] +SELECT a, UPPER(b), c, COUNT(*) AS cnt +FROM simple_table +GROUP BY ROLLUP(a, UPPER(b), c) +HAVING GROUPING(a) = 0 +ORDER BY 1, 2, 3, 4 +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {10, NULL, NULL, 2}, + {10, "BAR", NULL, 1}, + {10, "BAR", NULL, 1}, + {10, "FOO", NULL, 1}, + {10, "FOO", true, 1}, + {123, NULL, NULL, 2}, + {123, "BAR", NULL, 1}, + {123, "BAR", true, 1}, + {123, "FOO", NULL, 1}, + {123, "FOO", false, 1} +] +== + +[name=rollup_with_all_post_aggregate_clauses] +[required_features=GROUP_BY_ROLLUP,ANALYTIC_FUNCTIONS,V_1_3_QUALIFY] +SELECT a, UPPER(b), c, COUNT(*) OVER() AS cnt +FROM simple_table +GROUP BY ROLLUP(a, UPPER(b), c) +HAVING a IS NOT NULL +QUALIFY c +ORDER BY 1, 2, 3, 4 +-- + +ARRAY>[known order: + {10, "FOO", true, 10}, + {123, "BAR", true, 10} +] +== + +[name=rollup_with_constant_select_list] +[required_features=GROUP_BY_ROLLUP] +SELECT "a", true, 1.23 +FROM simple_table +GROUP BY ROLLUP(a, b) +-- +ARRAY>[unknown order: + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23}, + {"a", true, 1.23} +] +== + +[name=rollup_with_keyword_column_name] +[required_features=GROUP_BY_ROLLUP] +WITH T AS ( + SELECT 10 AS `GROUPING SETS`, "bar" AS `ROLLUP`, true AS `CUBE` UNION ALL + SELECT 11 AS `GROUPING SETS`, "bar" AS `ROLLUP`, false AS `CUBE` +) +SELECT `GROUPING SETS`, `ROLLUP`, `CUBE`, COUNT(*) +FROM T +GROUP BY ROLLUP(`GROUPING SETS`, `ROLLUP`, `CUBE`) +ORDER BY `GROUPING SETS`, `ROLLUP`, `CUBE`, COUNT(*) +-- +ARRAY>[known order: + {NULL, NULL, NULL, 2}, + {10, NULL, NULL, 1}, + {10, "bar", NULL, 1}, + {10, "bar", true, 1}, + {11, NULL, NULL, 1}, + {11, "bar", NULL, 1}, + {11, "bar", false, 1} +] +== + +[name=rollup_single_columns_with_pivot] +[required_features=V_1_3_PIVOT,GROUP_BY_ROLLUP] +SELECT a+1, foo, bar, COUNT(*) AS cnt +FROM simple_table PIVOT(SUM(d) FOR b IN ('foo', 'bar')) +GROUP BY ROLLUP(a+1, foo, bar) +ORDER BY 1, foo, bar, cnt +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 5}, + {11, NULL, NULL, 1}, + {11, NULL, NULL, 1}, + {11, NULL, NULL, 2}, + {11, 12, NULL, 1}, + {11, 12, NULL, 1}, + {124, NULL, NULL, 1}, + {124, NULL, NULL, 2}, + {124, NULL, 1.23456e-65, 1}, + {124, nan, NULL, 1}, + {124, nan, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. +== + +# Note: the GroupingSetRewriter will be triggered only when V_1_4_GROUPING_SETS +# is enabled. For ROLLUP test cases testing rewriter conflicts, we intentionally +# test it w/o V_1_4_GROUPING_SETS being enabled. +[name=rollup_multi_columns_with_pivot] +[required_features=V_1_3_PIVOT,GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +SELECT a+1, foo, bar, COUNT(*) AS cnt +FROM simple_table PIVOT(SUM(d) FOR b IN ('foo', 'bar')) +GROUP BY ROLLUP(a+1, (foo, bar)) +ORDER BY 1, foo, bar, cnt +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 1}, + {NULL, NULL, NULL, 5}, + {11, NULL, NULL, 1}, + {11, NULL, NULL, 2}, + {11, 12, NULL, 1}, + {124, NULL, NULL, 2}, + {124, NULL, 1.23456e-65, 1}, + {124, nan, NULL, 1} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=rollup_single_columns_with_unpivot] +[required_features=V_1_3_UNPIVOT,GROUP_BY_ROLLUP] +SELECT a+1, e, f, COUNT(*) AS cnt +FROM simple_table UNPIVOT(e FOR f in (b)) +GROUP BY ROLLUP(a+1, e, f) +ORDER BY 1, e, f, cnt +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 4}, + {11, NULL, NULL, 2}, + {11, "bar", NULL, 1}, + {11, "bar", "b", 1}, + {11, "foo", NULL, 1}, + {11, "foo", "b", 1}, + {124, NULL, NULL, 2}, + {124, "bar", NULL, 1}, + {124, "bar", "b", 1}, + {124, "foo", NULL, 1}, + {124, "foo", "b", 1} +] +== + +[name=rollup_multi_columns_with_unpivot] +[required_features=V_1_3_UNPIVOT,GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +SELECT a+1, e, f, COUNT(*) AS cnt +FROM simple_table UNPIVOT(e FOR f in (b)) +GROUP BY ROLLUP(a+1, (e, f)) +ORDER BY 1, e, f, cnt +-- + +ARRAY>[known order: + {NULL, NULL, NULL, 4}, + {11, NULL, NULL, 2}, + {11, "bar", "b", 1}, + {11, "foo", "b", 1}, + {124, NULL, NULL, 2}, + {124, "bar", "b", 1}, + {124, "foo", "b", 1} +] +== + +[name=rollup_and_with_recursive] +[required_features=V_1_3_WITH_RECURSIVE,GROUP_BY_ROLLUP] +WITH RECURSIVE + CTE_1 AS ( + ( + SELECT iteration as it, iteration+1 AS it1 + FROM UNNEST([1, 2, 3, 4]) AS iteration + GROUP BY ROLLUP(iteration, iteration+1) + ) + UNION ALL ( + SELECT it+1 AS it, it1+1 AS it1 FROM CTE_1 WHERE it < 3) + ) +SELECT it, it1 FROM CTE_1 +ORDER BY 1, 2 ASC +-- + +ARRAY>[known order: + {NULL, NULL}, + {1, 2}, + {1, 2}, + {2, 3}, + {2, 3}, + {2, 3}, + {2, 3}, + {3, 4}, + {3, 4}, + {3, 4}, + {3, 4}, + {3, 4}, + {3, 4}, + {4, 5}, + {4, 5} +] +== + +[name=grouping_func_with_regular_group_by_query] +[required_features=V_1_4_GROUPING_BUILTIN] +SELECT a, UPPER(b), GROUPING(a), GROUPING(UPPER(b)) +FROM simple_table +GROUP BY a, UPPER(b) +ORDER BY 3, 4, 1, 2 +-- + +ARRAY>[known order: + {NULL, NULL, 0, 0}, + {10, "BAR", 0, 0}, + {10, "FOO", 0, 0}, + {123, "BAR", 0, 0}, + {123, "FOO", 0, 0} +] +== + +[name=grouping_func_with_single_column_rollup] +[required_features=V_1_4_GROUPING_BUILTIN,GROUP_BY_ROLLUP] +SELECT a, UPPER(b), GROUPING(a), GROUPING(UPPER(b)) +FROM simple_table +GROUP BY ROLLUP(a, UPPER(b)) +ORDER BY 3, 4, 1, 2 +-- + +ARRAY>[known order: + {NULL, NULL, 0, 0}, + {10, "BAR", 0, 0}, + {10, "FOO", 0, 0}, + {123, "BAR", 0, 0}, + {123, "FOO", 0, 0}, + {NULL, NULL, 0, 1}, + {10, NULL, 0, 1}, + {123, NULL, 0, 1}, + {NULL, NULL, 1, 1} +] +== + +[name=grouping_func_with_multi_column_rollup] +[required_features=V_1_4_GROUPING_BUILTIN,GROUP_BY_ROLLUP,V_1_4_GROUPING_SETS] +SELECT a, UPPER(b), c, GROUPING(a), GROUPING(UPPER(b)), GROUPING(c) +FROM simple_table +GROUP BY ROLLUP(a, (UPPER(b), c)) +ORDER BY 4, 5, 6, 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, NULL, 0, 0, 0}, + {10, "BAR", NULL, 0, 0, 0}, + {10, "FOO", true, 0, 0, 0}, + {123, "BAR", true, 0, 0, 0}, + {123, "FOO", false, 0, 0, 0}, + {NULL, NULL, NULL, 0, 1, 1}, + {10, NULL, NULL, 0, 1, 1}, + {123, NULL, NULL, 0, 1, 1}, + {NULL, NULL, NULL, 1, 1, 1} +] +== + +[name=grouping_func_with_cube] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +SELECT a, UPPER(b), c, GROUPING(a), GROUPING(UPPER(b)), GROUPING(c) +FROM simple_table +GROUP BY CUBE(a, (UPPER(b), c)) +ORDER BY 4, 5, 6, 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, NULL, 0, 0, 0}, + {10, "BAR", NULL, 0, 0, 0}, + {10, "FOO", true, 0, 0, 0}, + {123, "BAR", true, 0, 0, 0}, + {123, "FOO", false, 0, 0, 0}, + {NULL, NULL, NULL, 0, 1, 1}, + {10, NULL, NULL, 0, 1, 1}, + {123, NULL, NULL, 0, 1, 1}, + {NULL, NULL, NULL, 1, 0, 0}, + {NULL, "BAR", NULL, 1, 0, 0}, + {NULL, "BAR", true, 1, 0, 0}, + {NULL, "FOO", false, 1, 0, 0}, + {NULL, "FOO", true, 1, 0, 0}, + {NULL, NULL, NULL, 1, 1, 1} +] +== + +[name=grouping_func_with_grouping_sets] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +SELECT a, UPPER(b), c, GROUPING(a), GROUPING(UPPER(b)), GROUPING(c) +FROM simple_table +GROUP BY GROUPING SETS(a, (UPPER(b), c), ()) +ORDER BY 4, 5, 6, 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, NULL, 0, 1, 1}, + {10, NULL, NULL, 0, 1, 1}, + {123, NULL, NULL, 0, 1, 1}, + {NULL, NULL, NULL, 1, 0, 0}, + {NULL, "BAR", NULL, 1, 0, 0}, + {NULL, "BAR", true, 1, 0, 0}, + {NULL, "FOO", false, 1, 0, 0}, + {NULL, "FOO", true, 1, 0, 0}, + {NULL, NULL, NULL, 1, 1, 1} +] +== + +[name=grouping_func_in_having_clause] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +SELECT a, UPPER(b), c, GROUPING(a), GROUPING(UPPER(b)), GROUPING(c) +FROM simple_table +GROUP BY GROUPING SETS(a, (UPPER(b), c), ()) +\-- Extract the result set only from grouping set (a) +HAVING GROUPING(a) = 0 AND GROUPING(UPPER(b)) = 1 AND GROUPING(c) = 1 +ORDER BY 4, 5, 6, 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, NULL, 0, 1, 1}, + {10, NULL, NULL, 0, 1, 1}, + {123, NULL, NULL, 0, 1, 1} +] +== + +[name=grouping_func_in_order_by_clause] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +SELECT a, UPPER(b), c, GROUPING(a), GROUPING(UPPER(b)), GROUPING(c) +FROM simple_table +GROUP BY GROUPING SETS(a, (UPPER(b), c), ()) +ORDER BY 1, 2, 3, GROUPING(a), GROUPING(UPPER(b)), GROUPING(c) +-- +ARRAY>[known order: + {NULL, NULL, NULL, 0, 1, 1}, + {NULL, NULL, NULL, 1, 0, 0}, + {NULL, NULL, NULL, 1, 1, 1}, + {NULL, "BAR", NULL, 1, 0, 0}, + {NULL, "BAR", true, 1, 0, 0}, + {NULL, "FOO", false, 1, 0, 0}, + {NULL, "FOO", true, 1, 0, 0}, + {10, NULL, NULL, 0, 1, 1}, + {123, NULL, NULL, 0, 1, 1} +] +== + +[name=grouping_func_in_qualify_clause] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS,ANALYTIC_FUNCTIONS,V_1_3_QUALIFY] +SELECT a, b, COUNT(*) OVER() +FROM simple_table +GROUP BY GROUPING SETS(a, b) +\-- Extract the result generated by the grouping set (a) +QUALIFY GROUPING(a) = 1 +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, 6}, + {NULL, "bar", 6}, + {NULL, "foo", 6} +] +== + +[name=grouping_func_with_alias] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +SELECT a AS x, UPPER(b) AS y, c AS z +FROM simple_table +GROUP BY GROUPING SETS(x, (y, z), ()) +HAVING GROUPING(x) = 0 AND GROUPING(y) = 1 AND GROUPING(z) = 1 +ORDER BY 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, NULL}, + {10, NULL, NULL}, + {123, NULL, NULL} +] +== + +[name=grouping_func_inside_another_func] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +SELECT a, UPPER(b), c, GROUPING(a) - 1, GROUPING(UPPER(b)) + 1, ABS(GROUPING(c)) +FROM simple_table +GROUP BY GROUPING SETS(a, (UPPER(b), c), ()) +ORDER BY 4, 5, 6, 1, 2, 3 +-- +ARRAY>[known order: + {NULL, NULL, NULL, -1, 2, 1}, + {10, NULL, NULL, -1, 2, 1}, + {123, NULL, NULL, -1, 2, 1}, + {NULL, NULL, NULL, 0, 1, 0}, + {NULL, "BAR", NULL, 0, 1, 0}, + {NULL, "BAR", true, 0, 1, 0}, + {NULL, "FOO", false, 0, 1, 0}, + {NULL, "FOO", true, 0, 1, 0}, + {NULL, NULL, NULL, 0, 2, 1} +] +== + +[name=grouping_func_with_collation_columns] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +SELECT + UPPER(col_ci) AS col1, col_binary AS col_2, col AS col_3, + GROUPING(col_ci), GROUPING(col_binary), GROUPING(col) +FROM simple_collation_table +GROUP BY GROUPING SETS(col_ci, (col_binary, col)) +ORDER BY 1, 2, 3, 4, 5, 6 + +-- +ARRAY>[known order: + {NULL, "B", 3, 1, 0, 0}, + {NULL, "a", 1, 1, 0, 0}, + {NULL, "ana", 4, 1, 0, 0}, + {NULL, "b", 2, 1, 0, 0}, + {NULL, "banana", 5, 1, 0, 0}, + {"A", NULL, NULL, 0, 1, 1}, + {"ANA", NULL, NULL, 0, 1, 1}, + {"B", NULL, NULL, 0, 1, 1}, + {"B", NULL, NULL, 0, 1, 1}, + {"BANANA", NULL, NULL, 0, 1, 1} +] + +== + +[name=grouping_sets_with_int_literals_and_ordinal_columns] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +# 1 matches to the first column key, instead of the constant expression 1. +WITH KeyValue AS ( + SELECT 1 AS key, 'a' AS value +) +SELECT key, 1, GROUPING(key) AS grp_key +FROM KeyValue +GROUP BY GROUPING SETS(key, 2, 1) +ORDER BY key +-- +ARRAY>[known order: + {NULL, 1, 1}, + {1, NULL, 0}, + {1, NULL, 0} +] +== + +[name=grouping_sets_with_cast_int_literals_and_ordinal_columns] +[required_features=V_1_4_GROUPING_BUILTIN,V_1_4_GROUPING_SETS] +# 1 matches to the first column key, instead of the constant expression +# CAST(true AS INT64). +WITH KeyValue AS ( + SELECT 1 AS key, 'a' AS value +) +SELECT key, CAST(true AS INT64), GROUPING(key) AS grp_key +FROM KeyValue +GROUP BY GROUPING SETS(key, 2, 1) +ORDER BY key +-- +ARRAY>[known order: + {NULL, 1, 1}, + {1, NULL, 0}, + {1, NULL, 0} +] diff --git a/zetasql/compliance/testdata/json_queries.test b/zetasql/compliance/testdata/json_queries.test index e914acaff..44ecb4ada 100644 --- a/zetasql/compliance/testdata/json_queries.test +++ b/zetasql/compliance/testdata/json_queries.test @@ -245,7 +245,7 @@ ARRAY>[{1}] [required_features=JSON_TYPE,JSON_VALUE_EXTRACTION_FUNCTIONS] SELECT int64(JSON '10.1'); -- -ERROR: generic::out_of_range: The provided JSON number: 10.1 cannot be converted to an integer +ERROR: generic::out_of_range: The provided JSON number: 10.1 cannot be converted to an int64 == [name=json_int64_double_without_fractional_part] [required_features=JSON_TYPE,JSON_VALUE_EXTRACTION_FUNCTIONS] @@ -819,3 +819,52 @@ ARRAY>[{"1"}] SELECT JSON_REMOVE(JSON '{"a":1, "b":2}', TRIM(" $.a"), TRIM(" $.b"), "$.c") -- ARRAY>[{{}}] +== +[name=json_query_lax_basic] +[required_features=JSON_TYPE,JSON_QUERY_LAX] +SELECT JSON_QUERY(JSON '{"a":1}', "lax $.a") +-- +ARRAY>[{[1]}] +== +[name=json_query_lax_recursive_basic] +[required_features=JSON_TYPE,JSON_QUERY_LAX] +SELECT JSON_QUERY(JSON '{"a":1}', "lax recursive $.a") +-- +ARRAY>[{[1]}] +== +[name=json_query_lax_mixed_nestedness] +[required_features=JSON_TYPE,JSON_QUERY_LAX] +SELECT JSON_QUERY(JSON '{"a":[{"b":1}, [{"b":2}], [[{"b":3}]]]}', "LAx $.a.b") +-- +ARRAY>[{[1]}] +== +[name=json_query_lax_recursive_mixed_nestedness] +[required_features=JSON_TYPE,JSON_QUERY_LAX] +SELECT JSON_QUERY(JSON '{"a":[{"b":1}, [{"b":2}], [[{"b":3}]]]}', "Recursive lax $.a.b") +-- +ARRAY>[{[1,2,3]}] +== +[name=json_query_lax_invalid_keywords_lax_duplicate] +[required_features=JSON_TYPE,JSON_QUERY_LAX] +SELECT JSON_QUERY(JSON '{"a":1}', "lax LAX $.a.b") +-- +ERROR: generic::out_of_range: JSONPath must start with zero or more unique modifiers followed by '$' +== +[name=json_query_lax_invalid_keywords_recursive_without_lax] +[required_features=JSON_TYPE,JSON_QUERY_LAX] +SELECT JSON_QUERY(JSON '{"a":1}', "recursive $.a.b") +-- +ERROR: generic::out_of_range: JSONPath has an invalid combination of modifiers. The 'lax' modifier must be included if 'recursive' is specified. +== +[name=json_query_lax_invalid_keywords_recursive_duplicate] +[required_features=JSON_TYPE,JSON_QUERY_LAX] +SELECT JSON_QUERY(JSON '{"a":1}', "recursive lax RECURSIVE $.a.b") +-- +ERROR: generic::out_of_range: JSONPath must start with zero or more unique modifiers followed by '$' +== +[name=json_query_lax_no_feature_option] +[required_features=JSON_TYPE] +SELECT JSON_QUERY(JSON '{"a":1}', "lax $.a.b") +-- +ERROR: generic::out_of_range: JSONPath must start with '$' + diff --git a/zetasql/compliance/testdata/like_all.test b/zetasql/compliance/testdata/like_all.test index a6a171515..a19799c83 100644 --- a/zetasql/compliance/testdata/like_all.test +++ b/zetasql/compliance/testdata/like_all.test @@ -338,7 +338,7 @@ ARRAY>[ ] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_all_with_collation_ci_test_with_null_elements] # Test "NOT LIKE ALL" with collation for NULL elements on either LHS or RHS list. SELECT @@ -347,9 +347,13 @@ SELECT collate(NULL, 'und:ci') NOT LIKE ALL ('abc', 'abc'), collate('abc', 'und:ci') NOT LIKE ALL (NULL, NULL), collate('abc', 'und:ci') NOT LIKE ALL (NULL, 'ABC'), + collate('abc', 'und:ci') NOT LIKE ALL ('xyz', 'ABC'), + collate('abc', 'und:ci') NOT LIKE ALL ('xyz', 'pqr'), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{NULL, NULL, NULL, NULL, NULL}] +ARRAY>[ + {NULL, NULL, NULL, NULL, false, false, true} +] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -366,18 +370,21 @@ SELECT ARRAY>[{true, true, true, false, false}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_all_with_collation_ci_on_lhs_element] # Test "NOT LIKE ALL" with LHS wrapped in collation. SELECT collate('GooGle', 'und:ci') NOT LIKE ALL ('goo%', 'google'), collate('GooGle', 'und:ci') NOT LIKE ALL ('%goog%', 'GooGLE'), collate('GooGle', 'und:ci') NOT LIKE ALL ('%goO%', collate('GOOglE', 'und:ci')), - collate('GooGle', 'und:ci') NOT LIKE ALL ('%xxx%', collate('GOOGLE', 'und:ci')), - collate('GooGle', 'und:ci') NOT LIKE ALL ('%go%', collate('x%abc%x', 'und:ci')), + collate('GooGle', 'und:ci') NOT LIKE ALL ('%goO%', collate('xlE', 'und:ci')), + collate('GooGle', 'und:ci') NOT LIKE ALL ('%xxx%', collate('ppp', 'und:ci')), + collate('GooGle', 'und:ci') NOT LIKE ALL ('%xx%', collate('x%abc%x', 'und:ci')), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{false, false, false, true, true}] +ARRAY>[ + {false, false, false, false, true, true} +] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -394,18 +401,21 @@ SELECT ARRAY>[{true, true, true, false, false}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_all_with_collation_ci_on_one_of_the_rhs_element] # Test "NOT LIKE ALL" with one of the elements in RHS wrapped in collation. SELECT 'GooGle' NOT LIKE ALL ('go%', collate('google', 'und:ci')), 'GooGle' NOT LIKE ALL (collate('%ooG%', 'und:ci'), 'GOOGLE'), - 'GooGle' NOT LIKE ALL (collate('%ooG%', 'und:ci'), collate('GOOGLE', 'und:ci')), + 'GooGle' NOT LIKE ALL (collate('%ooG%', 'und:ci'), collate('ppp', 'und:ci')), 'GooGle' NOT LIKE ALL ('%xxx%', collate('GOOGLE', 'und:ci')), + 'GooGle' NOT LIKE ALL ('%xxx%', collate('GooyGle', 'und:ci')), collate('defabcdef', '') NOT LIKE ALL ('%ooGs%', collate('x%go%x', 'und:ci')), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{false, false, false, true, true}] +ARRAY>[ + {false, false, false, false, true, true} +] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -428,7 +438,7 @@ ARRAY>[{true, true, false}] SELECT collate('defA\u070FbCdef', 'und:ci') NOT LIKE ALL ('%abc%', '%def%'), 'defA\u070FbCdef' NOT LIKE ALL (collate('%abc%', 'und:ci'), '%def%'), - 'defA\u070FbCdef' NOT LIKE ALL (collate('%abc%', 'und:ci'), '%xyz%'), + 'defA\u070FbCdef' NOT LIKE ALL (collate('%xxx%', 'und:ci'), '%xyz%'), -- # Note: Collation will be applied to LHS and all elements in RHS. ARRAY>[{false, false, true}] @@ -518,3 +528,319 @@ ARRAY>[known order: {"Value1", true, true, false, true}, {"Value2", false, false, false, false} ] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_all_string_constant_patterns_in_subquery_as_array_element] +SELECT + Value, + Value NOT LIKE ALL UNNEST([(SELECT 'Value1'), 'Value1']), + Value NOT LIKE ALL UNNEST([(SELECT 'Value1'), 'Value2']), + Value NOT LIKE ALL UNNEST([(SELECT 'Value1'), (SELECT 'Value1')]), + Value NOT LIKE ALL UNNEST([(SELECT 'Value1'), (SELECT 'Value2')]), + Value NOT LIKE ALL UNNEST([(SELECT 'Valu%1'), (SELECT 'V%lue1')]), + Value NOT LIKE ALL UNNEST([(SELECT 'Valu%3'), (SELECT 'V%lue2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", false, false, false, false, false, true}, + {"Value2", true, false, true, false, true, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_all_array_with_collation_with_null_values] +# Test "LIKE ALL ARRAY" with collation with NULL values in LHS or RHS list. +SELECT + collate(NULL, 'und:ci') LIKE ALL UNNEST(CAST(NULL AS ARRAY)), + collate(NULL, 'und:ci') LIKE ALL UNNEST(ARRAY[NULL]), + collate('goog', 'und:ci') LIKE ALL UNNEST(CAST(NULL AS ARRAY)), + collate('goog', 'und:ci') LIKE ALL UNNEST(ARRAY[NULL]), + collate(NULL, 'und:ci') LIKE ALL UNNEST(['google', 'GOOGLE']), + collate(NULL, 'und:ci') LIKE ALL UNNEST(['goog']), + collate('GOOGLE', 'und:ci') LIKE ALL UNNEST(['google', NULL]), + 'GOOGLE' LIKE ALL UNNEST([collate('google', 'und:ci'), NULL]), +-- +ARRAY>[ + {true, NULL, true, NULL, NULL, NULL, NULL, NULL} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_all_array_with_collation_with_null_values] +# Test "NOT LIKE ALL ARRAY" with collation with NULL values in LHS or RHS list. +SELECT + collate(NULL, 'und:ci') NOT LIKE ALL UNNEST(CAST(NULL AS ARRAY)), + collate(NULL, 'und:ci') NOT LIKE ALL UNNEST(ARRAY[NULL]), + collate('goog', 'und:ci') NOT LIKE ALL UNNEST(CAST(NULL AS ARRAY)), + collate('goog', 'und:ci') NOT LIKE ALL UNNEST(ARRAY[NULL]), + collate(NULL, 'und:ci') NOT LIKE ALL UNNEST(['google', 'GOOGLE']), + collate(NULL, 'und:ci') NOT LIKE ALL UNNEST(['goog']), + collate('GOOGLE', 'und:ci') NOT LIKE ALL UNNEST(['google', NULL]), + 'GOOGLE' NOT LIKE ALL UNNEST([collate('google', 'und:ci'), NULL]), +-- +ARRAY>[ + {false, NULL, false, NULL, NULL, NULL, NULL, NULL} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_all_array_with_collation] +# Test "LIKE ALL ARRAY" with collation in LHS or RHS list. +SELECT + collate('google', 'und:ci') LIKE ALL UNNEST([NULL, 'GOOGLE']), + 'google' LIKE ALL UNNEST([NULL, collate('GOOGLE', 'und:ci')]), + collate('GooGle', 'und:ci') LIKE ALL UNNEST(['goo%', 'xxx']), + 'GooGle' LIKE ALL UNNEST([collate('goo%', 'und:ci'), 'xxx']), + collate('GooGle', 'und:ci') LIKE ALL UNNEST(['%yyy%', 'GOOGLE']), + 'GooGle' LIKE ALL UNNEST([collate('%yyy%', 'und:ci'), 'GOOGLE']), + collate('GooG', 'und:ci') LIKE ALL UNNEST(['%oO%', collate('XXX', 'und:ci')]), + collate('GooG', 'und:ci') LIKE ALL UNNEST(['%x%', collate('GOOG', 'und:ci')]), + collate('GooG', 'und:ci') LIKE ALL UNNEST(['%p%', collate('x%a%', 'und:ci')]), +-- +ARRAY>[ + {NULL, NULL, false, false, false, false, false, false, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_all_array_with_collation] +# Test "NOT LIKE ALL ARRAY" with collation in LHS or RHS list. +SELECT + collate('google', 'und:ci') NOT LIKE ALL UNNEST([NULL, 'GOOGLE']), + 'google' NOT LIKE ALL UNNEST([NULL, collate('GOOGLE', 'und:ci')]), + collate('GooGle', 'und:ci') NOT LIKE ALL UNNEST(['goo%', 'xxx']), + 'GooGle' NOT LIKE ALL UNNEST([collate('goo%', 'und:ci'), 'xxx']), + collate('GooGle', 'und:ci') NOT LIKE ALL UNNEST(['%yyy%', 'GOOGLE']), + 'GooGle' NOT LIKE ALL UNNEST([collate('%yyy%', 'und:ci'), 'GOOGLE']), + collate('GooG', 'und:ci') NOT LIKE ALL UNNEST(['%oO%', collate('XXX', 'und:ci')]), + collate('GooG', 'und:ci') NOT LIKE ALL UNNEST(['%x%', collate('GOOG', 'und:ci')]), + collate('GooG', 'und:ci') NOT LIKE ALL UNNEST(['%p%', collate('x%a%', 'und:ci')]), +-- +ARRAY>[ + {NULL, NULL, true, true, true, true, true, true, true} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_all_array_with_arrayconact_function] +# Test "LIKE ALL ARRAY" with ARRAY_CONCAT function. +SELECT + NULL LIKE ALL UNNEST(ARRAY_CONCAT(['abc'], ['xyz', NULL])), + 'abc' LIKE ALL UNNEST(ARRAY_CONCAT(['abc', NULL], [])), + 'abc' LIKE ALL UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'abc' LIKE ALL UNNEST(ARRAY_CONCAT(['xyz', '%z%'], ['%b%'])), + 'abc' LIKE ALL UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'abc' LIKE ALL UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%z%'], ['%x%'])), + 'abc' LIKE ALL UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%c%'], ['%x%'])), +-- +ARRAY>[ + {NULL, NULL, false, false, false, false, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_all_array_with_arrayconact_function] +# Test "NOT LIKE ALL ARRAY" with ARRAY_CONCAT function. +SELECT + NULL NOT LIKE ALL UNNEST(ARRAY_CONCAT(['abc'], ['xyz', NULL])), + 'abc' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['abc', NULL], [])), + 'abc' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'abc' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['xyz', '%z%'], ['%b%'])), + 'abc' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['abc'], ['%b%'], ['%a%'])), + 'abc' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['%a%', '%b%'], ['%b%'], ['%c%'])), + 'abc' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%c%'], ['%x%'])), +-- +ARRAY>[ + {NULL, false, NULL, false, false, false, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_all_array_with_arrayconact_function_with_collation] +# Test "LIKE ALL ARRAY" with ARRAY_CONCAT function with collation enabled. +SELECT + NULL LIKE ALL UNNEST(ARRAY_CONCAT([collate('abc', 'und:ci')], ['xyz', NULL])), + collate('abc', 'und:ci') LIKE ALL UNNEST(ARRAY_CONCAT(['AbC', NULL], [])), + collate('abc', 'und:ci') LIKE ALL UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'ABC' LIKE ALL UNNEST(ARRAY_CONCAT([collate('xyz', 'und:ci'), '%z%'], ['%b%'])), + collate('abc', 'und:ci') LIKE ALL UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'ABC' LIKE ALL UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%z%', 'und:ci')], ['%x%'])), + 'ABC' LIKE ALL UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%c%', 'und:ci')], ['%x%'])), +-- +ARRAY>[ + {NULL, NULL, false, false, true, false, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_all_array_with_arrayconact_function_with_collation] +# Test "NOT LIKE ALL ARRAY" with ARRAY_CONCAT function with collation enabled. +SELECT + NULL NOT LIKE ALL UNNEST(ARRAY_CONCAT([collate('abc', 'und:ci')], ['xyz', NULL])), + collate('abc', 'und:ci') NOT LIKE ALL UNNEST(ARRAY_CONCAT(['AbC', NULL], [])), + collate('abc', 'und:ci') NOT LIKE ALL UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'ABC' NOT LIKE ALL UNNEST(ARRAY_CONCAT([collate('xyz', 'und:ci'), '%z%'], ['%b%'])), + collate('abc', 'und:ci') NOT LIKE ALL UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'ABC' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%z%', 'und:ci')], ['%x%'])), + 'ABC' NOT LIKE ALL UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%c%', 'und:ci')], ['%x%'])), +-- +ARRAY>[ + {NULL, false, NULL, false, false, true, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_all_array_with_scalar_subquery] +SELECT + Value, + Value NOT LIKE ALL UNNEST ([(SELECT 'Value1')]), + Value NOT LIKE ALL UNNEST ([(SELECT 'Value1'), 'Value2']), + Value NOT LIKE ALL UNNEST ([(SELECT 'Value1'), NULL]), + Value NOT LIKE ALL UNNEST ([(SELECT 'Value1'), (SELECT 'Value2')]), + Value NOT LIKE ALL UNNEST ([(SELECT 'Valu%1'), (SELECT 'V%lue1')]), + Value NOT LIKE ALL UNNEST ([(SELECT 'Valu%2'), (SELECT 'V%lue2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", false, true, NULL, true, false, true}, + {"Value2", true, true, true, true, true, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_SUBQUERY] +[name=not_like_all_with_patterns_in_non_paranthesized_scalar_subquery] +# TODO: Like any/all subquery is current not using the new +# implementation of 'not like any/all' operator which needs to be fixed +# when subquery feature will be fully implemented. As of now, the subquery +# variant of like any/all is not completely ready. +SELECT + Value, + Value NOT LIKE ALL (SELECT 'Value1'), + Value NOT LIKE ALL (SELECT Value FROM KeyValue WHERE Value in ('Value1')), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL}, + {"Value1", false, false}, + {"Value2", true, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_all_array_with_scalar_subquery_with_collation] +SELECT + Value, + Value LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci'))]), + Value LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci')), + collate('vALue2', 'und:ci')]), + Value LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci')), NULL]), + Value LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci')), + (SELECT collate('%LUE2%', 'und:ci'))]), + collate(Value, 'und:ci') LIKE ALL UNNEST ([(SELECT 'valu%1'), + (SELECT 'v%lue1')]), + collate(Value, 'und:ci') LIKE ALL UNNEST ([(SELECT 'VaLU%2'), + (SELECT 'V%LUE2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", true, false, NULL, false, true, false}, + {"Value2", false, false, false, false, false, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_all_array_with_scalar_subquery_with_collation] +SELECT + Value, + Value NOT LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci'))]), + Value NOT LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci')), + collate('vALue2', 'und:ci')]), + Value NOT LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci')), NULL]), + Value NOT LIKE ALL UNNEST ([(SELECT collate('vALue1', 'und:ci')), + (SELECT collate('%LUE2%', 'und:ci'))]), + collate(Value, 'und:ci') NOT LIKE ALL UNNEST ([(SELECT 'valu%1'), + (SELECT 'v%lue1')]), + collate(Value, 'und:ci') NOT LIKE ALL UNNEST ([(SELECT 'VaLU%2'), + (SELECT 'V%LUE2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", false, false, false, false, false, true}, + {"Value2", true, false, NULL, false, true, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_all_array_with_array_agg_function] +SELECT + 'a' LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', NULL]) x)), + 'a' LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', NULL]) x)), + 'a' LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', 'b']) x)), + 'a' LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', 'b']) x)), + 'a' LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), +-- +ARRAY>[ + {false, NULL, false, false, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=like_all_array_with_array_agg_function] +SELECT + 'a' NOT LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', NULL]) x)), + 'a' NOT LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', NULL]) x)), + 'a' NOT LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', 'b']) x)), + 'a' NOT LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', 'b']) x)), + 'a' NOT LIKE ALL UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), +-- +ARRAY>[{NULL, false, false, true, false}] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_all_array_with_array_agg_function_with_collation] +SELECT + 'a' LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), NULL]) x)), + 'a' LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('A', 'und:ci'), NULL]) x)), + 'A' LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['a', collate('b', 'und:ci')]) x)), + 'A' LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), '%A%']) x)), + collate('A', 'und:ci') LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), + collate('A', 'und:ci') LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%b%', 'z']) x)), +-- +ARRAY>[ + {false, NULL, false, false, false, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_all_array_with_array_agg_function_with_collation] +SELECT + 'a' NOT LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), NULL]) x)), + 'a' NOT LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('A', 'und:ci'), NULL]) x)), + 'A' NOT LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['a', collate('b', 'und:ci')]) x)), + 'A' NOT LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), '%A%']) x)), + 'A' NOT LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('a', 'und:ci'), '%A%']) x)), + collate('A', 'und:ci') NOT LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), + collate('A', 'und:ci') NOT LIKE ALL UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'a']) x)), +-- +ARRAY>[ + {NULL, false, false, false, false, false, false} +] +== diff --git a/zetasql/compliance/testdata/like_any.test b/zetasql/compliance/testdata/like_any.test index f2084c220..bbee9e922 100644 --- a/zetasql/compliance/testdata/like_any.test +++ b/zetasql/compliance/testdata/like_any.test @@ -370,7 +370,20 @@ SELECT ARRAY>[{NULL, NULL, NULL, NULL, true}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_any_with_null_elements] +# Test "NOT LIKE ANY" for NULL elements on either LHS or RHS list. +SELECT + NULL NOT LIKE ANY (NULL, 'abc'), + 'abc' NOT LIKE ANY (NULL, NULL), + 'abc' NOT LIKE ANY ('abc', '%z%'), + 'abc' NOT LIKE ANY ('abc', '%b%'), + 'abc' NOT LIKE ANY ('x', '%y%'), +-- +ARRAY>[{NULL, NULL, true, false, true}] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_any_with_collation_ci_test_with_null_elements] # Test "NOT LIKE ANY" with collation for NULL elements on either LHS or RHS list. SELECT @@ -379,9 +392,12 @@ SELECT collate(NULL, 'und:ci') NOT LIKE ANY ('abc', 'abc'), collate('abc', 'und:ci') NOT LIKE ANY (NULL, NULL), collate('abc', 'und:ci') NOT LIKE ANY (NULL, 'ABC'), + collate('abc', 'und:ci') NOT LIKE ANY (NULL, 'xyz') -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{NULL, NULL, NULL, NULL, false}] +ARRAY>[ + {NULL, NULL, NULL, NULL, NULL, true} +] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -398,19 +414,20 @@ SELECT ARRAY>[{true, true, true, true, false}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_any_with_collation_ci_on_lhs_element] # Test "NOT LIKE ANY" with LHS wrapped in collation. SELECT collate('GooGle', 'und:ci') NOT LIKE ANY ('goo%', 'xxx'), collate('GooGle', 'und:ci') NOT LIKE ANY ('%yyy%', 'GOOGLE'), collate('GooGle', 'und:ci') NOT LIKE ANY ('%goO%', collate('XXX', 'und:ci')), - collate('GooGle', 'und:ci') NOT LIKE ANY ('%xxx%', collate('%OO%', 'und:ci')), + collate('GooGle', 'und:ci') NOT LIKE ANY ('%le%', collate('%OO%', 'und:ci')), collate('GooGle', 'und:ci') NOT LIKE ANY ('%ppp%', collate('%aa%', 'und:ci')), + collate('GooGle', 'und:ci') NOT LIKE ANY ('%G%', '%E%'), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[ - {false, false, false, false, true} +ARRAY>[ + {true, true, true, false, true, false} ] == @@ -428,7 +445,7 @@ SELECT ARRAY>[{true, true, true, false, true}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_any_with_collation_ci_on_one_of_the_rhs_element] # Test "NOT LIKE ANY" with one of the elements in RHS wrapped in collation. SELECT @@ -439,7 +456,7 @@ SELECT collate('GooGle', '') NOT LIKE ANY ('%ooGs%', collate('x%go%x', 'und:ci')), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{false, false, false, true, true}] +ARRAY>[{true, true, false, true, true}] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -455,7 +472,7 @@ SELECT ARRAY>[{true, true, false}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_any_with_collation_ci_with_ignorable_character] # Test "NOT LIKE ANY" with an ignorable character in the pattern. # \u070F is an ignorable character @@ -463,9 +480,10 @@ SELECT collate('defA\u070FbCdef', 'und:ci') NOT LIKE ANY ('%abc%', '%xyz%'), 'defA\u070FbCdef' NOT LIKE ANY (collate('%ABC%', 'und:ci'), '%xyz%'), 'defA\u070FbCdef' NOT LIKE ANY (collate('x%ABC%x', 'und:ci'), '%xyz%'), + collate('defA\u070FbCdef', 'und:ci') NOT LIKE ANY ('%abc%', '%def%'), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{false, false, true}] +ARRAY>[{true, true, true, false}] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -551,3 +569,310 @@ ARRAY>[known order: {"Value1", true, true, true, true}, {"Value2", true, true, NULL, false} ] + +== +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_any_string_constant_patterns_in_subquery_as_array_elements] +SELECT + NULL NOT LIKE ANY UNNEST(['abc', NULL]), + 'abc' NOT LIKE ANY UNNEST(['abc', NULL]), + 'abc' NOT LIKE ANY UNNEST(['abc', '%z%']), + 'abc' NOT LIKE ANY UNNEST(['abc', '%b%']), + 'abc' NOT LIKE ANY UNNEST(['x', '%y%']), +-- +ARRAY>[{NULL, NULL, true, false, true}] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_any_array_with_collation_with_null_values] +# Test "LIKE ANY ARRAY" with collation with NULL values in LHS or RHS list. +SELECT + collate(NULL, 'und:ci') LIKE ANY UNNEST(CAST(NULL AS ARRAY)), + collate(NULL, 'und:ci') LIKE ANY UNNEST(ARRAY[NULL]), + collate('goog', 'und:ci') LIKE ANY UNNEST(CAST(NULL AS ARRAY)), + collate('goog', 'und:ci') LIKE ANY UNNEST(ARRAY[NULL]), + collate(NULL, 'und:ci') LIKE ANY UNNEST(['google', 'GOOGLE']), + collate(NULL, 'und:ci') LIKE ANY UNNEST(['goog']), + collate('GOOGLE', 'und:ci') LIKE ANY UNNEST(['google', NULL]), + 'GOOGLE' LIKE ANY UNNEST([collate('google', 'und:ci'), NULL]), +-- +ARRAY>[ + {false, NULL, false, NULL, NULL, NULL, true, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_any_array_with_collation_with_null_values] +# Test "NOT LIKE ANY ARRAY" with collation with NULL values in LHS or RHS list. +SELECT + collate(NULL, 'und:ci') NOT LIKE ANY UNNEST(CAST(NULL AS ARRAY)), + collate(NULL, 'und:ci') NOT LIKE ANY UNNEST(ARRAY[NULL]), + collate('goog', 'und:ci') NOT LIKE ANY UNNEST(CAST(NULL AS ARRAY)), + collate('goog', 'und:ci') NOT LIKE ANY UNNEST(ARRAY[NULL]), + collate(NULL, 'und:ci') NOT LIKE ANY UNNEST(['google', 'GOOGLE']), + collate(NULL, 'und:ci') NOT LIKE ANY UNNEST(['goog']), + collate('GOOGLE', 'und:ci') NOT LIKE ANY UNNEST(['google', NULL]), + 'GOOGLE' NOT LIKE ANY UNNEST([collate('google', 'und:ci'), NULL]), +-- +ARRAY>[ + {true, NULL, true, NULL, NULL, NULL, false, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_any_array_with_collation] +# Test "LIKE ANY ARRAY" with collation in LHS or RHS list. +SELECT + collate('google', 'und:ci') LIKE ANY UNNEST([NULL, 'GOOGLE']), + 'google' LIKE ANY UNNEST([NULL, collate('GOOGLE', 'und:ci')]), + collate('GooGle', 'und:ci') LIKE ANY UNNEST(['goo%', 'xxx']), + 'GooGle' LIKE ANY UNNEST([collate('goo%', 'und:ci'), 'xxx']), + collate('GooGle', 'und:ci') LIKE ANY UNNEST(['%yyy%', 'GOOGLE']), + 'GooGle' LIKE ANY UNNEST([collate('%yyy%', 'und:ci'), 'GOOGLE']), + collate('GooG', 'und:ci') LIKE ANY UNNEST(['%oO%', collate('XXX', 'und:ci')]), + collate('GooG', 'und:ci') LIKE ANY UNNEST(['%x%', collate('GOOG', 'und:ci')]), + collate('GooG', 'und:ci') LIKE ANY UNNEST(['%p%', collate('x%a%', 'und:ci')]), +-- +ARRAY>[ + {true, true, true, true, true, true, true, true, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_any_array_with_collation] +# Test "NOT LIKE ANY ARRAY" with collation in LHS or RHS list. +SELECT + collate('google', 'und:ci') NOT LIKE ANY UNNEST([NULL, 'GOOGLE']), + 'google' NOT LIKE ANY UNNEST([NULL, collate('GOOGLE', 'und:ci')]), + collate('GooGle', 'und:ci') NOT LIKE ANY UNNEST(['goo%', 'xxx']), + 'GooGle' NOT LIKE ANY UNNEST([collate('goo%', 'und:ci'), 'xxx']), + collate('GooGle', 'und:ci') NOT LIKE ANY UNNEST(['%yyy%', 'GOOGLE']), + 'GooGle' NOT LIKE ANY UNNEST([collate('%yyy%', 'und:ci'), 'GOOGLE']), + collate('GooG', 'und:ci') NOT LIKE ANY UNNEST(['%oO%', collate('XXX', 'und:ci')]), + collate('GooG', 'und:ci') NOT LIKE ANY UNNEST(['%x%', collate('GOOG', 'und:ci')]), + collate('GooG', 'und:ci') NOT LIKE ANY UNNEST(['%p%', collate('x%a%', 'und:ci')]), +-- +ARRAY>[ + {false, false, false, false, false, false, false, false, true} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_any_array_with_arrayconact_function] +# Test "LIKE ANY ARRAY" with ARRAY_CONCAT function. +SELECT + NULL LIKE ANY UNNEST(ARRAY_CONCAT(['abc'], ['xyz', NULL])), + 'abc' LIKE ANY UNNEST(ARRAY_CONCAT(['abc', NULL], [])), + 'abc' LIKE ANY UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'abc' LIKE ANY UNNEST(ARRAY_CONCAT(['xyz', '%z%'], ['%b%'])), + 'abc' LIKE ANY UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'abc' LIKE ANY UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%z%'], ['%x%'])), + 'abc' LIKE ANY UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%c%'], ['%x%'])), +-- +ARRAY>[ + {NULL, true, NULL, true, true, false, true} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_any_array_with_arrayconact_function] +# Test "NOT LIKE ANY ARRAY" with ARRAY_CONCAT function. +SELECT + NULL NOT LIKE ANY UNNEST(ARRAY_CONCAT(['abc'], ['xyz', NULL])), + 'abc' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['abc', NULL], [])), + 'abc' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'abc' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['xyz', '%z%'], ['%b%'])), + 'abc' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['abc'], ['%b%'], ['%a%'])), + 'abc' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['%a%', '%b%'], ['%b%'], ['%c%'])), + 'abc' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%c%'], ['%x%'])), +-- +ARRAY>[ + {NULL, NULL, true, true, false, false, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_any_array_with_arrayconact_function_with_collation] +# Test "LIKE ANY ARRAY" with ARRAY_CONCAT function with collation enabled. +SELECT + NULL LIKE ANY UNNEST(ARRAY_CONCAT([collate('abc', 'und:ci')], ['xyz', NULL])), + collate('abc', 'und:ci') LIKE ANY UNNEST(ARRAY_CONCAT(['AbC', NULL], [])), + collate('abc', 'und:ci') LIKE ANY UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'ABC' LIKE ANY UNNEST(ARRAY_CONCAT([collate('xyz', 'und:ci'), '%z%'], ['%b%'])), + collate('abc', 'und:ci') LIKE ANY UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'ABC' LIKE ANY UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%z%', 'und:ci')], ['%x%'])), + 'ABC' LIKE ANY UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%c%', 'und:ci')], ['%x%'])), +-- +ARRAY>[ + {NULL, true, NULL, true, true, false, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_any_array_with_arrayconact_function_with_collation] +# Test "NOT LIKE ANY ARRAY" with ARRAY_CONCAT function with collation enabled. +SELECT + NULL NOT LIKE ANY UNNEST(ARRAY_CONCAT([collate('abc', 'und:ci')], ['xyz', NULL])), + collate('abc', 'und:ci') NOT LIKE ANY UNNEST(ARRAY_CONCAT(['AbC', NULL], [])), + collate('abc', 'und:ci') NOT LIKE ANY UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'ABC' NOT LIKE ANY UNNEST(ARRAY_CONCAT([collate('xyz', 'und:ci'), '%z%'], ['%b%'])), + collate('abc', 'und:ci') NOT LIKE ANY UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'ABC' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%z%', 'und:ci')], ['%x%'])), + 'ABC' NOT LIKE ANY UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%c%', 'und:ci')], ['%x%'])), +-- +ARRAY>[ + {NULL, NULL, true, true, false, true, true} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_any_array_with_scalar_subquery] +SELECT + Value, + Value NOT LIKE ANY UNNEST ([(SELECT 'Value1')]), + Value NOT LIKE ANY UNNEST ([(SELECT 'Value1'), 'Value2']), + Value NOT LIKE ANY UNNEST ([(SELECT 'Value1'), NULL]), + Value NOT LIKE ANY UNNEST ([(SELECT 'Value1'), (SELECT 'Value2')]), + Value NOT LIKE ANY UNNEST ([(SELECT 'Valu%1'), (SELECT 'V%lue1')]), + Value NOT LIKE ANY UNNEST ([(SELECT 'Valu%2'), (SELECT 'V%lue2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", false, false, false, false, false, true}, + {"Value2", true, false, NULL, false, true, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_SUBQUERY] +[name=not_like_any_with_patterns_in_non_paranthesized_scalar_subquery] +# TODO: Like any/all subquery is current not using the new +# implementation of 'not like any/all' operator which needs to be fixed +# when subquery feature will be fully implemented. As of now, the subquery +# variant of like any/all is not completely ready. +SELECT + Value, + Value NOT LIKE ANY (SELECT 'Value1'), + Value NOT LIKE ANY (SELECT Value FROM KeyValue WHERE Value in ('Value1')), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL}, + {"Value1", false, false}, + {"Value2", true, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_any_array_with_scalar_subquery_with_collation] +SELECT + Value, + Value LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci'))]), + Value LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci')), + collate('vALue2', 'und:ci')]), + Value LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci')), NULL]), + Value LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci')), + (SELECT collate('%LUE2%', 'und:ci'))]), + collate(Value, 'und:ci') LIKE ANY UNNEST ([(SELECT 'valu%1'), + (SELECT 'v%lue1')]), + collate(Value, 'und:ci') LIKE ANY UNNEST ([(SELECT 'VaLU%2'), + (SELECT 'V%LUE2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", true, true, true, true, true, false}, + {"Value2", false, true, NULL, true, false, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_any_array_with_scalar_subquery_with_collation] +SELECT + Value, + Value NOT LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci'))]), + Value NOT LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci')), + collate('vALue2', 'und:ci')]), + Value NOT LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci')), NULL]), + Value NOT LIKE ANY UNNEST ([(SELECT collate('vALue1', 'und:ci')), + (SELECT collate('%LUE2%', 'und:ci'))]), + collate(Value, 'und:ci') NOT LIKE ANY UNNEST ([(SELECT 'valu%1'), + (SELECT 'v%lue1')]), + collate(Value, 'und:ci') NOT LIKE ANY UNNEST ([(SELECT 'VaLU%2'), + (SELECT 'V%LUE2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", false, true, NULL, true, false, true}, + {"Value2", true, true, true, true, true, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_any_array_with_array_agg_function] +SELECT + 'a' LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', NULL]) x)), + 'a' LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', NULL]) x)), + 'a' LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', 'b']) x)), + 'a' LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', 'b']) x)), + 'a' LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), +-- +ARRAY>[{NULL, true, true, false, true}] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=like_any_array_with_array_agg_function] +SELECT + 'a' NOT LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', NULL]) x)), + 'a' NOT LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', NULL]) x)), + 'a' NOT LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', 'b']) x)), + 'a' NOT LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', 'b']) x)), + 'a' NOT LIKE ANY UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), +-- +ARRAY>[{true, NULL, true, true, true}] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_any_array_with_array_agg_function_with_collation] +SELECT + 'a' LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), NULL]) x)), + 'a' LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('A', 'und:ci'), NULL]) x)), + 'A' LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['a', collate('b', 'und:ci')]) x)), + 'A' LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), '%A%']) x)), + collate('A', 'und:ci') LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), + collate('A', 'und:ci') LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%b%', 'z']) x)), +-- +ARRAY>[ + {NULL, true, true, true, true, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_any_array_with_array_agg_function_with_collation] +SELECT + 'a' NOT LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), NULL]) x)), + 'a' NOT LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('A', 'und:ci'), NULL]) x)), + 'A' NOT LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['a', collate('b', 'und:ci')]) x)), + 'A' NOT LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), '%A%']) x)), + 'A' NOT LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('a', 'und:ci'), '%A%']) x)), + collate('A', 'und:ci') NOT LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), + collate('A', 'und:ci') NOT LIKE ANY UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'a']) x)), +-- +ARRAY>[ + {true, NULL, true, true, false, true, false} +] +== diff --git a/zetasql/compliance/testdata/like_some.test b/zetasql/compliance/testdata/like_some.test index 5a0abaf59..134dfc4af 100644 --- a/zetasql/compliance/testdata/like_some.test +++ b/zetasql/compliance/testdata/like_some.test @@ -370,7 +370,20 @@ SELECT ARRAY>[{NULL, NULL, NULL, NULL, true}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_some_with_null_elements] +# Test "NOT LIKE SOME" for NULL elements on either LHS or RHS list. +SELECT + NULL NOT LIKE SOME (NULL, 'abc'), + 'abc' NOT LIKE SOME (NULL, NULL), + 'abc' NOT LIKE SOME ('abc', '%z%'), + 'abc' NOT LIKE SOME ('abc', '%b%'), + 'abc' NOT LIKE SOME ('x', '%y%'), +-- +ARRAY>[{NULL, NULL, true, false, true}] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_some_with_collation_ci_test_with_null_elements] # Test "NOT LIKE SOME" with collation for NULL elements on either LHS or RHS list. SELECT @@ -379,9 +392,12 @@ SELECT collate(NULL, 'und:ci') NOT LIKE SOME ('abc', 'abc'), collate('abc', 'und:ci') NOT LIKE SOME (NULL, NULL), collate('abc', 'und:ci') NOT LIKE SOME (NULL, 'ABC'), + collate('abc', 'und:ci') NOT LIKE SOME (NULL, 'xyz') -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{NULL, NULL, NULL, NULL, false}] +ARRAY>[ + {NULL, NULL, NULL, NULL, NULL, true} +] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -398,19 +414,20 @@ SELECT ARRAY>[{true, true, true, true, false}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_some_with_collation_ci_on_lhs_element] # Test "NOT LIKE SOME" with LHS wrapped in collation. SELECT collate('GooGle', 'und:ci') NOT LIKE SOME ('goo%', 'xxx'), collate('GooGle', 'und:ci') NOT LIKE SOME ('%yyy%', 'GOOGLE'), collate('GooGle', 'und:ci') NOT LIKE SOME ('%goO%', collate('XXX', 'und:ci')), - collate('GooGle', 'und:ci') NOT LIKE SOME ('%xxx%', collate('%OO%', 'und:ci')), + collate('GooGle', 'und:ci') NOT LIKE SOME ('%le%', collate('%OO%', 'und:ci')), collate('GooGle', 'und:ci') NOT LIKE SOME ('%ppp%', collate('%aa%', 'und:ci')), + collate('GooGle', 'und:ci') NOT LIKE SOME ('%G%', '%E%'), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[ - {false, false, false, false, true} +ARRAY>[ + {true, true, true, false, true, false} ] == @@ -428,7 +445,7 @@ SELECT ARRAY>[{true, true, true, false, true}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_some_with_collation_ci_on_one_of_the_rhs_element] # Test "NOT LIKE SOME" with one of the elements in RHS wrapped in collation. SELECT @@ -439,7 +456,7 @@ SELECT collate('GooGle', '') NOT LIKE SOME ('%ooGs%', collate('x%go%x', 'und:ci')), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{false, false, false, true, true}] +ARRAY>[{true, true, false, true, true}] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -455,7 +472,7 @@ SELECT ARRAY>[{true, true, false}] == -[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] [name=not_like_some_with_collation_ci_with_ignorable_character] # Test "NOT LIKE SOME" with an ignorable character in the pattern. # \u070F is an ignorable character @@ -463,9 +480,10 @@ SELECT collate('defA\u070FbCdef', 'und:ci') NOT LIKE SOME ('%abc%', '%xyz%'), 'defA\u070FbCdef' NOT LIKE SOME (collate('%ABC%', 'und:ci'), '%xyz%'), 'defA\u070FbCdef' NOT LIKE SOME (collate('x%ABC%x', 'und:ci'), '%xyz%'), + collate('defA\u070FbCdef', 'und:ci') NOT LIKE SOME ('%abc%', '%def%'), -- # Note: Collation will be applied to LHS and all elements in RHS. -ARRAY>[{false, false, true}] +ARRAY>[{true, true, true, false}] == [required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL] @@ -551,3 +569,310 @@ ARRAY>[known order: {"Value1", true, true, true, true}, {"Value2", true, true, NULL, false} ] + +== +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_some_string_constant_patterns_in_subquery_as_array_elements] +SELECT + NULL NOT LIKE SOME UNNEST(['abc', NULL]), + 'abc' NOT LIKE SOME UNNEST(['abc', NULL]), + 'abc' NOT LIKE SOME UNNEST(['abc', '%z%']), + 'abc' NOT LIKE SOME UNNEST(['abc', '%b%']), + 'abc' NOT LIKE SOME UNNEST(['x', '%y%']), +-- +ARRAY>[{NULL, NULL, true, false, true}] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_some_array_with_collation_with_null_values] +# Test "LIKE SOME ARRAY" with collation with NULL values in LHS or RHS list. +SELECT + collate(NULL, 'und:ci') LIKE SOME UNNEST(CAST(NULL AS ARRAY)), + collate(NULL, 'und:ci') LIKE SOME UNNEST(ARRAY[NULL]), + collate('goog', 'und:ci') LIKE SOME UNNEST(CAST(NULL AS ARRAY)), + collate('goog', 'und:ci') LIKE SOME UNNEST(ARRAY[NULL]), + collate(NULL, 'und:ci') LIKE SOME UNNEST(['google', 'GOOGLE']), + collate(NULL, 'und:ci') LIKE SOME UNNEST(['goog']), + collate('GOOGLE', 'und:ci') LIKE SOME UNNEST(['google', NULL]), + 'GOOGLE' LIKE SOME UNNEST([collate('google', 'und:ci'), NULL]), +-- +ARRAY>[ + {false, NULL, false, NULL, NULL, NULL, true, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_some_array_with_collation_with_null_values] +# Test "NOT LIKE SOME ARRAY" with collation with NULL values in LHS or RHS list. +SELECT + collate(NULL, 'und:ci') NOT LIKE SOME UNNEST(CAST(NULL AS ARRAY)), + collate(NULL, 'und:ci') NOT LIKE SOME UNNEST(ARRAY[NULL]), + collate('goog', 'und:ci') NOT LIKE SOME UNNEST(CAST(NULL AS ARRAY)), + collate('goog', 'und:ci') NOT LIKE SOME UNNEST(ARRAY[NULL]), + collate(NULL, 'und:ci') NOT LIKE SOME UNNEST(['google', 'GOOGLE']), + collate(NULL, 'und:ci') NOT LIKE SOME UNNEST(['goog']), + collate('GOOGLE', 'und:ci') NOT LIKE SOME UNNEST(['google', NULL]), + 'GOOGLE' NOT LIKE SOME UNNEST([collate('google', 'und:ci'), NULL]), +-- +ARRAY>[ + {true, NULL, true, NULL, NULL, NULL, false, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_some_array_with_collation] +# Test "LIKE SOME ARRAY" with collation in LHS or RHS list. +SELECT + collate('google', 'und:ci') LIKE SOME UNNEST([NULL, 'GOOGLE']), + 'google' LIKE SOME UNNEST([NULL, collate('GOOGLE', 'und:ci')]), + collate('GooGle', 'und:ci') LIKE SOME UNNEST(['goo%', 'xxx']), + 'GooGle' LIKE SOME UNNEST([collate('goo%', 'und:ci'), 'xxx']), + collate('GooGle', 'und:ci') LIKE SOME UNNEST(['%yyy%', 'GOOGLE']), + 'GooGle' LIKE SOME UNNEST([collate('%yyy%', 'und:ci'), 'GOOGLE']), + collate('GooG', 'und:ci') LIKE SOME UNNEST(['%oO%', collate('XXX', 'und:ci')]), + collate('GooG', 'und:ci') LIKE SOME UNNEST(['%x%', collate('GOOG', 'und:ci')]), + collate('GooG', 'und:ci') LIKE SOME UNNEST(['%p%', collate('x%a%', 'und:ci')]), +-- +ARRAY>[ + {true, true, true, true, true, true, true, true, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_like_some_array_with_collation] +# Test "NOT LIKE SOME ARRAY" with collation in LHS or RHS list. +SELECT + collate('google', 'und:ci') NOT LIKE SOME UNNEST([NULL, 'GOOGLE']), + 'google' NOT LIKE SOME UNNEST([NULL, collate('GOOGLE', 'und:ci')]), + collate('GooGle', 'und:ci') NOT LIKE SOME UNNEST(['goo%', 'xxx']), + 'GooGle' NOT LIKE SOME UNNEST([collate('goo%', 'und:ci'), 'xxx']), + collate('GooGle', 'und:ci') NOT LIKE SOME UNNEST(['%yyy%', 'GOOGLE']), + 'GooGle' NOT LIKE SOME UNNEST([collate('%yyy%', 'und:ci'), 'GOOGLE']), + collate('GooG', 'und:ci') NOT LIKE SOME UNNEST(['%oO%', collate('XXX', 'und:ci')]), + collate('GooG', 'und:ci') NOT LIKE SOME UNNEST(['%x%', collate('GOOG', 'und:ci')]), + collate('GooG', 'und:ci') NOT LIKE SOME UNNEST(['%p%', collate('x%a%', 'und:ci')]), +-- +ARRAY>[ + {false, false, false, false, false, false, false, false, true} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_some_array_with_arrayconact_function] +# Test "LIKE SOME ARRAY" with ARRAY_CONCAT function. +SELECT + NULL LIKE SOME UNNEST(ARRAY_CONCAT(['abc'], ['xyz', NULL])), + 'abc' LIKE SOME UNNEST(ARRAY_CONCAT(['abc', NULL], [])), + 'abc' LIKE SOME UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'abc' LIKE SOME UNNEST(ARRAY_CONCAT(['xyz', '%z%'], ['%b%'])), + 'abc' LIKE SOME UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'abc' LIKE SOME UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%z%'], ['%x%'])), + 'abc' LIKE SOME UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%c%'], ['%x%'])), +-- +ARRAY>[ + {NULL, true, NULL, true, true, false, true} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_some_array_with_arrayconact_function] +# Test "NOT LIKE SOME ARRAY" with ARRAY_CONCAT function. +SELECT + NULL NOT LIKE SOME UNNEST(ARRAY_CONCAT(['abc'], ['xyz', NULL])), + 'abc' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['abc', NULL], [])), + 'abc' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'abc' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['xyz', '%z%'], ['%b%'])), + 'abc' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['abc'], ['%b%'], ['%a%'])), + 'abc' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['%a%', '%b%'], ['%b%'], ['%c%'])), + 'abc' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['x', '%y%'], ['%c%'], ['%x%'])), +-- +ARRAY>[ + {NULL, NULL, true, true, false, false, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_some_array_with_arrayconact_function_with_collation] +# Test "LIKE SOME ARRAY" with ARRAY_CONCAT function with collation enabled. +SELECT + NULL LIKE SOME UNNEST(ARRAY_CONCAT([collate('abc', 'und:ci')], ['xyz', NULL])), + collate('abc', 'und:ci') LIKE SOME UNNEST(ARRAY_CONCAT(['AbC', NULL], [])), + collate('abc', 'und:ci') LIKE SOME UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'ABC' LIKE SOME UNNEST(ARRAY_CONCAT([collate('xyz', 'und:ci'), '%z%'], ['%b%'])), + collate('abc', 'und:ci') LIKE SOME UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'ABC' LIKE SOME UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%z%', 'und:ci')], ['%x%'])), + 'ABC' LIKE SOME UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%c%', 'und:ci')], ['%x%'])), +-- +ARRAY>[ + {NULL, true, NULL, true, true, false, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_some_array_with_arrayconact_function_with_collation] +# Test "NOT LIKE SOME ARRAY" with ARRAY_CONCAT function with collation enabled. +SELECT + NULL NOT LIKE SOME UNNEST(ARRAY_CONCAT([collate('abc', 'und:ci')], ['xyz', NULL])), + collate('abc', 'und:ci') NOT LIKE SOME UNNEST(ARRAY_CONCAT(['AbC', NULL], [])), + collate('abc', 'und:ci') NOT LIKE SOME UNNEST(ARRAY_CONCAT(['xyz'], ARRAY[NULL])), + 'ABC' NOT LIKE SOME UNNEST(ARRAY_CONCAT([collate('xyz', 'und:ci'), '%z%'], ['%b%'])), + collate('abc', 'und:ci') NOT LIKE SOME UNNEST(ARRAY_CONCAT(['ABC'], ['%B%'], ['%a%'])), + 'ABC' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%z%', 'und:ci')], ['%x%'])), + 'ABC' NOT LIKE SOME UNNEST(ARRAY_CONCAT(['x', '%y%'], [collate('%c%', 'und:ci')], ['%x%'])), +-- +ARRAY>[ + {NULL, NULL, true, true, false, true, true} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_some_with_scalar_subquery] +SELECT + Value, + Value NOT LIKE SOME ((SELECT 'Value1')), + Value NOT LIKE SOME ((SELECT 'Value1'), 'Value2'), + Value NOT LIKE SOME ((SELECT 'Value1'), NULL), + Value NOT LIKE SOME ((SELECT 'Value1'), (SELECT 'Value2')), + Value NOT LIKE SOME ((SELECT 'Valu%1'), (SELECT 'V%lue1')), + Value NOT LIKE SOME ((SELECT 'Valu%2'), (SELECT 'V%lue2')), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", false, true, NULL, true, false, true}, + {"Value2", true, true, true, true, true, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_SUBQUERY] +[name=not_like_some_with_patterns_in_non_paranthesized_scalar_subquery] +# TODO: Like any/all subquery is current not using the new +# implementation of 'not like any/all' operator which needs to be fixed +# when subquery feature will be fully implemented. As of now, the subquery +# variant of like any/all is not completely ready. +SELECT + Value, + Value NOT LIKE SOME (SELECT 'Value1'), + Value NOT LIKE SOME (SELECT Value FROM KeyValue WHERE Value in ('Value1')), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL}, + {"Value1", false, false}, + {"Value2", true, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_some_array_with_scalar_subquery_with_collation] +SELECT + Value, + Value LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci'))]), + Value LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci')), + collate('vALue2', 'und:ci')]), + Value LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci')), NULL]), + Value LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci')), + (SELECT collate('%LUE2%', 'und:ci'))]), + collate(Value, 'und:ci') LIKE SOME UNNEST ([(SELECT 'valu%1'), + (SELECT 'v%lue1')]), + collate(Value, 'und:ci') LIKE SOME UNNEST ([(SELECT 'VaLU%2'), + (SELECT 'V%LUE2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", true, true, true, true, true, false}, + {"Value2", false, true, NULL, true, false, true} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_some_array_with_scalar_subquery_with_collation] +SELECT + Value, + Value NOT LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci'))]), + Value NOT LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci')), + collate('vALue2', 'und:ci')]), + Value NOT LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci')), NULL]), + Value NOT LIKE SOME UNNEST ([(SELECT collate('vALue1', 'und:ci')), + (SELECT collate('%LUE2%', 'und:ci'))]), + collate(Value, 'und:ci') NOT LIKE SOME UNNEST ([(SELECT 'valu%1'), + (SELECT 'v%lue1')]), + collate(Value, 'und:ci') NOT LIKE SOME UNNEST ([(SELECT 'VaLU%2'), + (SELECT 'V%LUE2')]), +FROM KeyValue ORDER BY Value; +-- +ARRAY>[known order: + {NULL, NULL, NULL, NULL, NULL, NULL, NULL}, + {"Value1", false, true, NULL, true, false, true}, + {"Value2", true, true, true, true, true, false} +] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=not_some_any_array_with_array_agg_function] +SELECT + 'a' LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', NULL]) x)), + 'a' LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', NULL]) x)), + 'a' LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', 'b']) x)), + 'a' LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', 'b']) x)), + 'a' LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), +-- +ARRAY>[{NULL, true, true, false, true}] +== + +[required_features=V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=like_some_array_with_array_agg_function] +SELECT + 'a' NOT LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', NULL]) x)), + 'a' NOT LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', NULL]) x)), + 'a' NOT LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['a', 'b']) x)), + 'a' NOT LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['b', 'b']) x)), + 'a' NOT LIKE SOME UNNEST((SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), +-- +ARRAY>[{true, NULL, true, true, true}] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY] +[name=like_some_array_with_array_agg_function_with_collation] +SELECT + 'a' LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), NULL]) x)), + 'a' LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('A', 'und:ci'), NULL]) x)), + 'A' LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['a', collate('b', 'und:ci')]) x)), + 'A' LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), '%A%']) x)), + collate('A', 'und:ci') LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), + collate('A', 'und:ci') LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%b%', 'z']) x)), +-- +ARRAY>[ + {NULL, true, true, true, true, false} +] +== + +[required_features=V_1_3_ANNOTATION_FRAMEWORK,V_1_3_COLLATION_SUPPORT,V_1_3_LIKE_ANY_SOME_ALL,V_1_4_LIKE_ANY_SOME_ALL_ARRAY,V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL] +[name=not_like_some_array_with_array_agg_function_with_collation] +SELECT + 'a' NOT LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), NULL]) x)), + 'a' NOT LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('A', 'und:ci'), NULL]) x)), + 'A' NOT LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['a', collate('b', 'und:ci')]) x)), + 'A' NOT LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('b', 'und:ci'), '%A%']) x)), + 'A' NOT LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST([collate('a', 'und:ci'), '%A%']) x)), + collate('A', 'und:ci') NOT LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'z']) x)), + collate('A', 'und:ci') NOT LIKE SOME UNNEST( + (SELECT ARRAY_AGG(x) FROM UNNEST(['%a%', 'a']) x)), +-- +ARRAY>[ + {true, NULL, true, true, false, true, false} +] +== diff --git a/zetasql/compliance/testdata/limit_queries.test b/zetasql/compliance/testdata/limit_queries.test index 9faaf146c..347a15852 100644 --- a/zetasql/compliance/testdata/limit_queries.test +++ b/zetasql/compliance/testdata/limit_queries.test @@ -433,160 +433,27 @@ ARRAY>[{1}] == +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] [name=limit_null] SELECT a FROM (SELECT 1 a) LIMIT NULL -- -ERROR: generic::invalid_argument: Syntax error: Unexpected keyword NULL [at 1:34] -SELECT a FROM (SELECT 1 a) LIMIT NULL - ^ -== -[name=limit_identifier] -SELECT a FROM (SELECT 1 a) LIMIT a --- -ERROR: generic::invalid_argument: Syntax error: Unexpected identifier "a" [at 1:34] -SELECT a FROM (SELECT 1 a) LIMIT a - ^ -== -[name=limit_limit] -SELECT a FROM (SELECT 1 a) LIMIT LIMIT --- -ERROR: generic::invalid_argument: Syntax error: Unexpected keyword LIMIT [at 1:34] -SELECT a FROM (SELECT 1 a) LIMIT LIMIT - ^ -== -[name=limit_1_limit_1] -SELECT a FROM (SELECT 1 a) LIMIT 1 LIMIT 1 --- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got keyword LIMIT [at 1:36] -SELECT a FROM (SELECT 1 a) LIMIT 1 LIMIT 1 - ^ -== - - -[name=offset_too_big_for_int64] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 99999999999999999999999999999999999999 --- -ERROR: generic::invalid_argument: Invalid integer literal: 99999999999999999999999999999999999999 [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 99999999999999999999999999999999999999 - ^ +ERROR: generic::out_of_range: Limit requires non-null count and offset == [name=offset_in_parethesis] SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET (1) -- -ERROR: generic::invalid_argument: Syntax error: Unexpected "(" [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET (1) - ^ -== -[name=offset_aliased] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 1 AS ofst --- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got keyword AS [at 1:45] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 1 AS ofst - ^ -== -[name=offset_negative] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET -1 --- -ERROR: generic::invalid_argument: Syntax error: Unexpected "-" [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET -1 - ^ -== -[name=offset_double] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 1.0 --- -ERROR: generic::invalid_argument: Syntax error: Unexpected floating point literal "1.0" [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 1.0 - ^ -== -[name=offset_string] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET '1' --- -ERROR: generic::invalid_argument: Syntax error: Unexpected string literal '1' [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET '1' - ^ +ARRAY>[] == +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] [name=offset_null] SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET NULL -- -ERROR: generic::invalid_argument: Syntax error: Unexpected keyword NULL [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET NULL - ^ -== -[name=offset_identifier] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET a --- -ERROR: generic::invalid_argument: Syntax error: Unexpected identifier "a" [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET a - ^ -== -[name=offset_1_offset_1] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 1 OFFSET 1 --- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got keyword OFFSET [at 1:45] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET 1 OFFSET 1 - ^ -== -[name=limit_offset_no_limit] -SELECT a FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) OFFSET 1 --- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got integer literal "1" [at 1:73] -SELECT a FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) OFFSET 1 - ^ -== -[name=limit_offset_limit] -SELECT a FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) -OFFSET 1 LIMIT 1 --- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got integer literal "1" [at 2:8] -OFFSET 1 LIMIT 1 - ^ +ERROR: generic::out_of_range: Limit requires non-null count and offset == # ---------- @@ -635,22 +502,6 @@ SELECT a FROM (SELECT 1 a) LIMIT @lmt -- ERROR: generic::out_of_range: Limit requires non-negative count and offset == -[name=param_limit_string] -[parameters='1' as lmt] -SELECT a FROM (SELECT 1 a) LIMIT @lmt --- -ERROR: generic::invalid_argument: LIMIT expects an integer literal or parameter [at 1:34] -SELECT a FROM (SELECT 1 a) LIMIT @lmt - ^ -== -[name=param_limit_double] -[parameters=1.2 as lmt] -SELECT a FROM (SELECT 1 a) LIMIT @lmt --- -ERROR: generic::invalid_argument: LIMIT expects an integer literal or parameter [at 1:34] -SELECT a FROM (SELECT 1 a) LIMIT @lmt - ^ -== [name=param_limit_null] [parameters=NULL as lmt] SELECT a FROM (SELECT 1 a) LIMIT @lmt @@ -698,22 +549,6 @@ SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET @ofst -- ERROR: generic::out_of_range: Limit requires non-negative count and offset == -[name=param_offset_date] -[parameters=DATE '2000-01-01' as ofst] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET @ofst --- -ERROR: generic::invalid_argument: OFFSET expects an integer literal or parameter [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET @ofst - ^ -== -[name=param_offset_bool] -[parameters=false as ofst] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET @ofst --- -ERROR: generic::invalid_argument: OFFSET expects an integer literal or parameter [at 1:43] -SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET @ofst - ^ -== [name=param_offset_null] [parameters=NULL as ofst] SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET @ofst @@ -731,3 +566,71 @@ ARRAY>[{"*"}] SELECT a FROM (SELECT '*' a) LIMIT @lmt OFFSET @ofst -- ARRAY>[] +== + +# ---------- +# Expressions +# ---------- + +[name=expr_limit_0] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=0 as lmt] +SELECT a FROM (SELECT 1 a) LIMIT MOD(@lmt, 2) +-- +ARRAY>[] +== +[name=expr_limit_1] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=1 as lmt] +SELECT a FROM (SELECT 1 a) LIMIT MOD(@lmt, 2) +-- +ARRAY>[{1}] +== +[name=expr_limit_negative] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=1 as lmt] +SELECT a FROM (SELECT 1 a) LIMIT MOD(@lmt, 2) - 2 +-- +ERROR: generic::out_of_range: Limit requires non-negative count and offset +== +[name=expr_limit_null] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=NULL as lmt] +SELECT a FROM (SELECT 1 a) LIMIT COALESCE(@lmt, NULL) +-- +ERROR: generic::out_of_range: Limit requires non-null count and offset +== +[name=expr_offset_0] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=0 as ofst] +SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET MOD(@ofst, 2) +-- +ARRAY>[{1}] +== +[name=expr_offset_negative] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=1 as ofst] +SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET MOD(@ofst, 2) - 2 +-- +ERROR: generic::out_of_range: Limit requires non-negative count and offset +== +[name=expr_offset_null] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=NULL as ofst] +SELECT a FROM (SELECT 1 a) LIMIT 1 OFFSET COALESCE(@ofst, NULL) +-- +ERROR: generic::out_of_range: Limit requires non-null count and offset +== +[name=expr_limit_1_offset_0] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=1 as lmt, 0 as ofst] +SELECT a FROM (SELECT '*' a) LIMIT MOD(@lmt, 2) OFFSET MOD(@ofst, 2) +-- +ARRAY>[{"*"}] +== +[name=expr_limit_max_offset] +[required_features=V_1_4_LIMIT_OFFSET_EXPRESSIONS] +[parameters=9223372036854775806 as lmt, 2 as ofst] +SELECT a FROM (SELECT '*' a) LIMIT MOD(@lmt, 9223372036854775807) + 1 OFFSET @ofst +-- +ARRAY>[] diff --git a/zetasql/compliance/testdata/map_functions.test b/zetasql/compliance/testdata/map_functions.test new file mode 100644 index 000000000..f15ec1e58 --- /dev/null +++ b/zetasql/compliance/testdata/map_functions.test @@ -0,0 +1,113 @@ +[default required_features=V_1_4_MAP_TYPE] + +[name=map_from_array_basic] +SELECT MAP_FROM_ARRAY([('a', true), ('b', false)]); +-- +ARRAY>>[{{"a": true, "b": false}}] +== + +[name=map_from_array_map_array_value] +SELECT MAP_FROM_ARRAY([('a', [50, 100]), ('b', [1, 2])]); +-- +ARRAY>>>[{{"a": [50, 100], "b": [1, 2]}}] +== + +[name=map_from_array_confusing_struct_names] +SELECT MAP_FROM_ARRAY([STRUCT('k1' as value, 0 as key), ('k2', 1)]); +-- +ARRAY>>[{{"k1": 0, "k2": 1}}] +== + +[name=map_from_array_empty_array] +SELECT MAP_FROM_ARRAY(CAST([] AS ARRAY>)); +-- +ARRAY>>[{{}}] +== + +[name=map_from_array_null_array] +SELECT MAP_FROM_ARRAY(CAST(NULL AS ARRAY>)); +-- +ARRAY>>[{NULL}] +== + +[name=map_from_array_nesting] +[required_features=V_1_4_MAP_TYPE,V_1_2_GROUP_BY_STRUCT] +SELECT MAP_FROM_ARRAY([ + ('a', MAP_FROM_ARRAY([ + ((1, 'b'), MAP_FROM_ARRAY([ + (1.5, MAP_FROM_ARRAY([ + (DATE("2020-01-01"), 'e') + ])) + ])) + ])) +]); +-- +ARRAY, MAP>>> + >>[{{"a": {{1, "b"}: {1.5: {2020-01-01: "e"}}}}}] +== + +[name=map_from_array_nested_containers] +[required_features=V_1_4_MAP_TYPE,V_1_2_GROUP_BY_STRUCT,V_1_2_GROUP_BY_ARRAY] +SELECT MAP_FROM_ARRAY([(('key_struct', ['array_key1', 'array_key2']), 'foo_val')]); +-- +ARRAY>, STRING>>>[ + {{{"key_struct", ["array_key1", "array_key2"]}: "foo_val"}} +] +== + +[name=map_from_array_repeated_key] +SELECT MAP_FROM_ARRAY([('a', [50, 100]), ('b', [50, 100]), ('b', [1, 2]), ('c', []), ('d', []), ('d', [50, 100])]); +-- +ERROR: generic::out_of_range: Duplicate map keys are not allowed, but got multiple instances of key: "b" +== + +[name=map_from_array_repeated_null_key] +SELECT MAP_FROM_ARRAY([(NULL, 0), (NULL, 0)]) +-- +ERROR: generic::out_of_range: Duplicate map keys are not allowed, but got multiple instances of key: NULL +== + +[name=map_from_array_repeated_null_key_2] +SELECT MAP_FROM_ARRAY([(NULL, 0), (A, 1)]) +FROM UNNEST([CAST(NULL AS DOUBLE)]) AS A; +-- +ERROR: generic::out_of_range: Duplicate map keys are not allowed, but got multiple instances of key: NULL +== + +[name=map_from_array_repeated_array_key] +[required_features=V_1_4_MAP_TYPE,V_1_2_GROUP_BY_ARRAY] +SELECT MAP_FROM_ARRAY([(['a'], 0), (['a'], 0)]) +-- +ERROR: generic::out_of_range: Duplicate map keys are not allowed, but got multiple instances of key: ARRAY["a"] +== + +[name=map_from_array_repeated_inf_float_key] +SELECT MAP_FROM_ARRAY([(CAST('inf' AS FLOAT), 0), (CAST('inf' AS FLOAT), 0)]); +-- +ERROR: generic::out_of_range: Duplicate map keys are not allowed, but got multiple instances of key: inf +== + +[name=map_from_array_repeated_nan_float_key] +SELECT MAP_FROM_ARRAY([(CAST('nan' AS FLOAT), 0), (CAST('nan' AS FLOAT), 0)]); +-- +ERROR: generic::out_of_range: Duplicate map keys are not allowed, but got multiple instances of key: nan +== + +[name=map_from_array_transformed_input] +[required_features=V_1_4_MAP_TYPE,V_1_4_ARRAY_ZIP] +SELECT MAP_FROM_ARRAY(ARRAY_ZIP(["a", "b", "c"], [1, 2, 3])); +-- +ARRAY>>[{{"a": 1, "b": 2, "c": 3}}] +== + +[name=map_from_array_correlated_subquery] +SELECT (SELECT MAP_FROM_ARRAY([(x, 1)])) + FROM UNNEST(['a', 'b', 'c']) as x; +-- +ARRAY>>[unknown order: + {{"b": 1}}, + {{"a": 1}}, + {{"c": 1}} +] +== diff --git a/zetasql/compliance/testdata/set_operation_full_corresponding_by.test b/zetasql/compliance/testdata/set_operation_full_corresponding_by.test index 5aebc5a09..7ae3fdff5 100644 --- a/zetasql/compliance/testdata/set_operation_full_corresponding_by.test +++ b/zetasql/compliance/testdata/set_operation_full_corresponding_by.test @@ -6,7 +6,7 @@ # CORRESPONDING BY Literal coercion FULL mode [name=corresponding_by_literal_coercion_full] -[default required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] +[default required_features=V_1_4_CORRESPONDING_FULL] SELECT 1 AS a, 1 AS b FULL UNION ALL CORRESPONDING BY (a, b) SELECT CAST(1 AS INT32) AS a diff --git a/zetasql/compliance/testdata/set_operation_full_mode.test b/zetasql/compliance/testdata/set_operation_full_mode.test index e262d395e..9618a7840 100644 --- a/zetasql/compliance/testdata/set_operation_full_mode.test +++ b/zetasql/compliance/testdata/set_operation_full_mode.test @@ -5,7 +5,7 @@ # CORRESPONDING literal coercion FULL mode [name=corresponding_literal_coercion_full] -[default required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default required_features=V_1_4_CORRESPONDING_FULL] SELECT 1 AS a, 1 AS b FULL UNION ALL CORRESPONDING SELECT CAST(1 AS INT32) AS a @@ -126,7 +126,6 @@ ARRAY>[unknown order: # ===== Start section: FULL CORRESPONDING with EXCEPT ===== # # FULL CORRESPONDING both inputs have padded null. [row, row] EXCEPT ALL [row] = [row] -[default required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] [name=full_corresponding_both_input_padded_null_except_all_2row_1row] WITH Table1 AS (SELECT CAST(NULL AS INT64) AS a UNION ALL SELECT NULL) SELECT a FROM Table1 diff --git a/zetasql/compliance/testdata/set_operation_inner_corresponding_by.test b/zetasql/compliance/testdata/set_operation_inner_corresponding_by.test index 08caf2dde..a152cadc0 100644 --- a/zetasql/compliance/testdata/set_operation_inner_corresponding_by.test +++ b/zetasql/compliance/testdata/set_operation_inner_corresponding_by.test @@ -5,7 +5,7 @@ # CORRESPONDING BY literal coercion INNER mode [name=corresponding_by_literal_coercion_inner] -[default required_features=V_1_4_CORRESPONDING_BY] +[default required_features=V_1_4_CORRESPONDING_FULL] SELECT 1 AS b, 1 AS a UNION ALL CORRESPONDING BY (b) SELECT CAST(1 AS INT32) AS b diff --git a/zetasql/compliance/testdata/set_operation_left_corresponding.test b/zetasql/compliance/testdata/set_operation_left_corresponding.test index b5d07e694..fac31998c 100644 --- a/zetasql/compliance/testdata/set_operation_left_corresponding.test +++ b/zetasql/compliance/testdata/set_operation_left_corresponding.test @@ -5,7 +5,7 @@ # Literal coercion LEFT mode [name=literal_coercion_left] -[default required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default required_features=V_1_4_CORRESPONDING_FULL] SELECT 1 AS a LEFT UNION ALL CORRESPONDING SELECT CAST(1 AS INT32) AS a, 1 AS b diff --git a/zetasql/compliance/testdata/set_operation_left_corresponding_by.test b/zetasql/compliance/testdata/set_operation_left_corresponding_by.test index daa2cfa2c..f34ee600e 100644 --- a/zetasql/compliance/testdata/set_operation_left_corresponding_by.test +++ b/zetasql/compliance/testdata/set_operation_left_corresponding_by.test @@ -5,7 +5,7 @@ # CORRESPONDING BY Literal coercion LEFT mode [name=corresponding_by_literal_coercion_left] -[default required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] +[default required_features=V_1_4_CORRESPONDING_FULL] SELECT 1 AS a LEFT UNION ALL CORRESPONDING BY (a) SELECT CAST(1 AS INT32) AS a, 1 AS b diff --git a/zetasql/compliance/testdata/set_operation_nested.test b/zetasql/compliance/testdata/set_operation_nested.test index fdf309c31..04a801e20 100644 --- a/zetasql/compliance/testdata/set_operation_nested.test +++ b/zetasql/compliance/testdata/set_operation_nested.test @@ -13,7 +13,7 @@ # nested set operation: LEFT CORRESPONDING [name=nested_strict_by_position_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default required_features=V_1_4_CORRESPONDING_FULL] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -27,7 +27,6 @@ ARRAY>[unknown order:{1, 1}, {NULL, 1}] # nested set operation: BY POSITION [name=strict_by_position_nested_left_corresponding] -[required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] SELECT 1 AS a, 1 AS b EXCEPT ALL ( @@ -41,7 +40,6 @@ ARRAY>[] # nested set operation: LEFT CORRESPONDING [name=left_corresponding_nested_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -61,7 +59,6 @@ ARRAY>[{1, 1}] # nested set operation: INNER CORRESPONDING [name=nested_left_corresponding_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -79,7 +76,6 @@ ARRAY>[{1}] # nested set operation: LEFT CORRESPONDING [name=nested_full_corresponding_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS (SELECT CAST(1 AS INT64) AS a UNION ALL SELECT 1), Table2 AS (SELECT CAST(1 AS INT64) AS b UNION ALL SELECT 1) @@ -96,7 +92,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING [name=full_corresponding_nested_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -119,7 +114,6 @@ ARRAY>[] # nested set operation: LEFT CORRESPONDING [name=left_corresponding_nested_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -137,7 +131,6 @@ ARRAY>[unknown order:{1, 1}, {NULL, 1}, {NULL, 1}] # nested set operation: STRICT CORRESPONDING [name=nested_strict_by_position_strict_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -151,7 +144,6 @@ ARRAY>[unknown order:{1, 1}, {1, 1}] # nested set operation: BY POSITION [name=strict_by_position_nested_strict_corresponding] -[required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] SELECT 1 AS a, 1 AS b EXCEPT ALL ( @@ -165,7 +157,6 @@ ARRAY>[] # nested set operation: STRICT CORRESPONDING [name=strict_corresponding_nested_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(2 AS INT64) AS c @@ -185,7 +176,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING [name=nested_strict_corresponding_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -203,7 +193,6 @@ ARRAY>[{1}] # nested set operation: STRICT CORRESPONDING [name=nested_full_corresponding_strict_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS (SELECT CAST(1 AS INT64) AS a UNION ALL SELECT 1), Table2 AS (SELECT CAST(1 AS INT64) AS b UNION ALL SELECT 1) @@ -220,7 +209,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING [name=full_corresponding_nested_strict_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -238,7 +226,6 @@ ARRAY>[] # nested set operation: STRICT CORRESPONDING [name=strict_corresponding_nested_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -256,7 +243,6 @@ ARRAY>[unknown order:{1, 1}, {1, 1}, {1, 1}] # nested set operation: LEFT CORRESPONDING [name=nested_strict_corresponding_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] ( SELECT 1 AS a, 1 AS b UNION ALL STRICT CORRESPONDING @@ -270,7 +256,6 @@ ARRAY>[unknown order:{1, 1}, {1, 1}] # nested set operation: STRICT CORRESPONDING [name=nested_strict_corresponding_strict_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -288,7 +273,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING_BY [name=nested_strict_by_position_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -302,7 +286,6 @@ ARRAY>[unknown order:{1}, {1}] # nested set operation: BY POSITION [name=strict_by_position_nested_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY] SELECT 1 AS a, 1 AS b EXCEPT ALL ( @@ -316,7 +299,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING_BY [name=inner_corresponding_by_nested_inner_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -336,7 +318,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING [name=nested_inner_corresponding_by_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(2 AS INT64) AS c @@ -356,7 +337,6 @@ ARRAY>[{1}] # nested set operation: INNER CORRESPONDING_BY [name=nested_full_corresponding_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS (SELECT CAST(1 AS INT64) AS a UNION ALL SELECT 1), Table2 AS (SELECT CAST(1 AS INT64) AS b UNION ALL SELECT 1) @@ -373,7 +353,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING [name=full_corresponding_nested_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -393,7 +372,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING_BY [name=inner_corresponding_by_nested_left_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -411,7 +389,6 @@ ARRAY>[unknown order:{1}, {1}, {1}] # nested set operation: LEFT CORRESPONDING [name=nested_inner_corresponding_by_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] ( SELECT 1 AS b, 2 AS c, 1 AS a UNION ALL CORRESPONDING BY (a, b) @@ -425,7 +402,6 @@ ARRAY>[unknown order:{1, 1}, {1, 1}] # nested set operation: INNER CORRESPONDING_BY [name=nested_strict_corresponding_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -443,7 +419,6 @@ ARRAY>[] # nested set operation: STRICT CORRESPONDING [name=strict_corresponding_nested_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(2 AS INT64) AS c @@ -463,7 +438,6 @@ ARRAY>[{1, 1}] # nested set operation: INNER CORRESPONDING_BY [name=inner_corresponding_by_nested_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -483,7 +457,6 @@ ARRAY>[{1}] # nested set operation: FULL CORRESPONDING_BY [name=nested_strict_by_position_full_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -500,7 +473,6 @@ ARRAY>[unknown order: # nested set operation: BY POSITION [name=strict_by_position_nested_full_corresponding_by] -[required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] SELECT 1 AS a, 1 AS b EXCEPT ALL ( @@ -514,7 +486,6 @@ ARRAY>[{1, 1}] # nested set operation: FULL CORRESPONDING_BY [name=full_corresponding_by_nested_inner_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -534,7 +505,6 @@ ARRAY>[{NULL, 1, 1}] # nested set operation: INNER CORRESPONDING [name=nested_full_corresponding_by_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(2 AS INT64) AS c UNION ALL SELECT 1, 2 @@ -552,7 +522,6 @@ ARRAY>[unknown order:{NULL}, {1}] # nested set operation: FULL CORRESPONDING_BY [name=nested_full_corresponding_full_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS (SELECT CAST(1 AS INT64) AS a UNION ALL SELECT 1), Table2 AS (SELECT CAST(1 AS INT64) AS b UNION ALL SELECT 1) @@ -569,7 +538,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING [name=full_corresponding_nested_full_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -590,7 +558,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING_BY [name=full_corresponding_by_nested_left_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -612,7 +579,6 @@ ARRAY>[unknown order: # nested set operation: LEFT CORRESPONDING [name=nested_full_corresponding_by_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] ( SELECT 1 AS a, 2 AS c FULL UNION ALL CORRESPONDING BY (a, b) @@ -626,7 +592,6 @@ ARRAY>[{1, NULL}] # nested set operation: FULL CORRESPONDING_BY [name=nested_strict_corresponding_full_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -644,7 +609,6 @@ ARRAY>[{NULL, 1, 1}] # nested set operation: STRICT CORRESPONDING [name=strict_corresponding_nested_full_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(2 AS INT64) AS c UNION ALL SELECT 1, 2 @@ -662,7 +626,6 @@ ARRAY>[unknown order:{1, NULL}, {1, 1}] # nested set operation: FULL CORRESPONDING_BY [name=full_corresponding_by_nested_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -682,7 +645,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING_BY [name=nested_full_corresponding_by_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(2 AS INT64) AS c UNION ALL SELECT 1, 2 @@ -703,7 +665,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING_BY [name=nested_full_corresponding_by_full_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(2 AS INT64) AS c UNION ALL SELECT 1, 2 @@ -725,7 +686,6 @@ ARRAY>[unknown order: # nested set operation: LEFT CORRESPONDING_BY [name=nested_strict_by_position_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -739,7 +699,6 @@ ARRAY>[unknown order:{1, 1}, {1, NULL}] # nested set operation: BY POSITION [name=strict_by_position_nested_left_corresponding_by] -[required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] SELECT 1 AS a, 1 AS b EXCEPT ALL ( @@ -753,7 +712,6 @@ ARRAY>[] # nested set operation: LEFT CORRESPONDING_BY [name=left_corresponding_by_nested_inner_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -773,7 +731,6 @@ ARRAY>[{1, 1}] # nested set operation: INNER CORRESPONDING [name=nested_left_corresponding_by_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -791,7 +748,6 @@ ARRAY>[{1}] # nested set operation: LEFT CORRESPONDING_BY [name=nested_full_corresponding_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS (SELECT CAST(1 AS INT64) AS a UNION ALL SELECT 1), Table2 AS (SELECT CAST(1 AS INT64) AS b UNION ALL SELECT 1) @@ -808,7 +764,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING [name=full_corresponding_nested_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -831,7 +786,6 @@ ARRAY>[] # nested set operation: LEFT CORRESPONDING_BY [name=left_corresponding_by_nested_left_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -849,7 +803,6 @@ ARRAY>[unknown order:{1, 1}, {1, NULL}, {1, NULL}] # nested set operation: LEFT CORRESPONDING [name=nested_left_corresponding_by_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] ( SELECT 1 AS a, 1 AS b LEFT UNION ALL CORRESPONDING BY (a, b) @@ -863,7 +816,6 @@ ARRAY>[unknown order:{NULL, NULL}, {1, 1}] # nested set operation: LEFT CORRESPONDING_BY [name=nested_strict_corresponding_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -881,7 +833,6 @@ ARRAY>[{1, 1}] # nested set operation: STRICT CORRESPONDING [name=strict_corresponding_nested_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -899,7 +850,6 @@ ARRAY>[{1, 1}] # nested set operation: LEFT CORRESPONDING_BY [name=left_corresponding_by_nested_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -919,7 +869,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING_BY [name=nested_left_corresponding_by_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -940,7 +889,6 @@ ARRAY>[] # nested set operation: LEFT CORRESPONDING_BY [name=nested_full_corresponding_by_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(2 AS INT64) AS c UNION ALL SELECT 1, 2 @@ -958,7 +906,6 @@ ARRAY>[unknown order:{NULL, 1}, {NULL, 1}, {1, NULL}] # nested set operation: FULL CORRESPONDING_BY [name=full_corresponding_by_nested_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] SELECT 1 AS a, 1 AS b FULL EXCEPT ALL CORRESPONDING BY (c, b, a) ( @@ -972,7 +919,6 @@ ARRAY>[{NULL, 1, 1}] # nested set operation: LEFT CORRESPONDING_BY [name=left_corresponding_by_nested_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(2 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c @@ -992,7 +938,6 @@ ARRAY>[{1, 1}] # nested set operation: STRICT CORRESPONDING_BY [name=nested_strict_by_position_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1006,7 +951,6 @@ ARRAY>[unknown order:{1, 1}, {1, 1}] # nested set operation: BY POSITION [name=strict_by_position_nested_strict_corresponding_by] -[required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] SELECT 1 AS a, 1 AS b EXCEPT ALL ( @@ -1020,7 +964,6 @@ ARRAY>[] # nested set operation: STRICT CORRESPONDING_BY [name=strict_corresponding_by_nested_inner_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(2 AS INT64) AS c @@ -1040,7 +983,6 @@ ARRAY>[] # nested set operation: INNER CORRESPONDING [name=nested_strict_corresponding_by_inner_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1058,7 +1000,6 @@ ARRAY>[{1}] # nested set operation: STRICT CORRESPONDING_BY [name=nested_full_corresponding_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS (SELECT CAST(1 AS INT64) AS a UNION ALL SELECT 1), Table2 AS (SELECT CAST(1 AS INT64) AS b UNION ALL SELECT 1) @@ -1075,7 +1016,6 @@ ARRAY>[] # nested set operation: FULL CORRESPONDING [name=full_corresponding_nested_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS b, CAST(1 AS INT64) AS c UNION ALL SELECT 1, 1 @@ -1093,7 +1033,6 @@ ARRAY>[] # nested set operation: STRICT CORRESPONDING_BY [name=strict_corresponding_by_nested_left_corresponding] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1111,7 +1050,6 @@ ARRAY>[unknown order:{1, 1}, {1, 1}, {1, 1}] # nested set operation: LEFT CORRESPONDING [name=nested_strict_corresponding_by_left_corresponding] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] ( SELECT 1 AS b, 1 AS a UNION ALL STRICT CORRESPONDING BY (a, b) @@ -1125,7 +1063,6 @@ ARRAY>[unknown order:{1, 1}, {1, 1}] # nested set operation: STRICT CORRESPONDING_BY [name=nested_strict_corresponding_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1143,7 +1080,6 @@ ARRAY>[] # nested set operation: STRICT CORRESPONDING [name=strict_corresponding_nested_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1161,7 +1097,6 @@ ARRAY>[{1, 1}] # nested set operation: STRICT CORRESPONDING_BY [name=strict_corresponding_by_nested_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b, CAST(2 AS INT64) AS c @@ -1181,7 +1116,6 @@ ARRAY>[{1, 1}] # nested set operation: INNER CORRESPONDING_BY [name=nested_strict_corresponding_by_inner_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1199,7 +1133,6 @@ ARRAY>[{1}] # nested set operation: STRICT CORRESPONDING_BY [name=nested_full_corresponding_by_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(2 AS INT64) AS c UNION ALL SELECT 1, 2 @@ -1217,7 +1150,6 @@ ARRAY>[unknown order:{1, NULL}, {1, NULL}, {1, 1}] # nested set operation: FULL CORRESPONDING_BY [name=full_corresponding_by_nested_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] SELECT 1 AS a, 1 AS b FULL EXCEPT ALL CORRESPONDING BY (c, b, a) ( @@ -1231,7 +1163,6 @@ ARRAY>[{NULL, 1, 1}] # nested set operation: STRICT CORRESPONDING_BY [name=strict_corresponding_by_nested_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1249,7 +1180,6 @@ ARRAY>[] # nested set operation: LEFT CORRESPONDING_BY [name=nested_strict_corresponding_by_left_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 @@ -1267,7 +1197,6 @@ ARRAY>[{1, NULL}] # nested set operation: STRICT CORRESPONDING_BY [name=nested_strict_corresponding_by_strict_corresponding_by] -[required_features=V_1_4_CORRESPONDING_BY,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] WITH Table1 AS ( SELECT CAST(1 AS INT64) AS a, CAST(1 AS INT64) AS b UNION ALL SELECT 1, 1 diff --git a/zetasql/compliance/testdata/set_operation_strict_corresponding.test b/zetasql/compliance/testdata/set_operation_strict_corresponding.test index b16926f3c..1efe60903 100644 --- a/zetasql/compliance/testdata/set_operation_strict_corresponding.test +++ b/zetasql/compliance/testdata/set_operation_strict_corresponding.test @@ -5,7 +5,7 @@ # Literal coercion STRICT mode [name=literal_coercion_strict] -[default required_features=V_1_4_CORRESPONDING,V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE] +[default required_features=V_1_4_CORRESPONDING_FULL] SELECT 1 AS b, 1 AS a UNION ALL STRICT CORRESPONDING SELECT 1 AS a, CAST(1 AS INT32) AS b diff --git a/zetasql/compliance/testdata/set_operation_strict_corresponding_by.test b/zetasql/compliance/testdata/set_operation_strict_corresponding_by.test index 1f979ed05..c52209bfb 100644 --- a/zetasql/compliance/testdata/set_operation_strict_corresponding_by.test +++ b/zetasql/compliance/testdata/set_operation_strict_corresponding_by.test @@ -5,7 +5,7 @@ # CORRESPONDING BY Literal coercion STRICT mode [name=corresponding_by_literal_coercion_strict] -[default required_features=V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE,V_1_4_CORRESPONDING_BY] +[default required_features=V_1_4_CORRESPONDING_FULL] SELECT 1 AS b, 1 AS a UNION ALL STRICT CORRESPONDING BY (b, a) SELECT 1 AS a, CAST(1 AS INT32) AS b diff --git a/zetasql/compliance/testdata/strings.test b/zetasql/compliance/testdata/strings.test index c511e8da3..a266fd6b9 100644 --- a/zetasql/compliance/testdata/strings.test +++ b/zetasql/compliance/testdata/strings.test @@ -125,7 +125,7 @@ SELECT "a"b" [name=strings_unescaped_quote_types_5] SELECT "a""b" -- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got string literal "b" [at 1:11] +ERROR: generic::invalid_argument: Syntax error: concatenated string literals must be separated by whitespace or comments [at 1:11] SELECT "a""b" ^ == @@ -141,7 +141,7 @@ SELECT 'a'b' [name=strings_unescaped_quote_types_7] SELECT 'a''b' -- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got string literal 'b' [at 1:11] +ERROR: generic::invalid_argument: Syntax error: concatenated string literals must be separated by whitespace or comments [at 1:11] SELECT 'a''b' ^ == @@ -157,7 +157,7 @@ SELECT '''a'''b''' [name=strings_unescaped_quote_types_9] SELECT '''a''''''b''' -- -ERROR: generic::invalid_argument: Syntax error: Expected end of input but got string literal '''b''' [at 1:15] +ERROR: generic::invalid_argument: Syntax error: concatenated string literals must be separated by whitespace or comments [at 1:15] SELECT '''a''''''b''' ^ == @@ -1353,3 +1353,46 @@ ARRAY>[unknown order: SELECT JSON_EXTRACT_SCALAR('"extract scalar"'), JSON_VALUE('"extract value"') -- ARRAY>[{"extract scalar", "extract value"}] +== +[name=json_query_extraction_bug_326974631] +SELECT JSON_QUERY('{"a" : [{"b" : [{"c": 1,"d": 2 } ]}, {"b" : [{"c" : 3,"d" : 4},{"c" : 5,"d" : 6}]}]}', "$.a[0].b[1].c") +-- +ARRAY>[{NULL}] +== +[name=json_value_extraction_bug_326974631] +SELECT JSON_VALUE('{"a" : [{"b" : [{"c": 1,"d": 2 } ]}, {"b" : [{"c" : 3,"d" : 4},{"c" : 5,"d" : 6}]}]}', "$.a[0].b[1].c") +-- +ARRAY>[{NULL}] +== +[name=json_extract_extraction_bug_326974631] +SELECT JSON_EXTRACT('{"a" : [{"b" : [{"c": 1,"d": 2 } ]}, {"b" : [{"c" : 3,"d" : 4},{"c" : 5,"d" : 6}]}]}', "$.a[0].b[1].c") +-- +ARRAY>[{NULL}] +== +[name=json_extract_scalar_extraction_bug_326974631] +SELECT JSON_EXTRACT_SCALAR('{"a" : [{"b" : [{"c": 1,"d": 2 } ]}, {"b" : [{"c" : 3,"d" : 4},{"c" : 5,"d" : 6}]}]}', "$.a[0].b[1].c") +-- +ARRAY>[{NULL}] +== +[required_features=JSON_ARRAY_FUNCTIONS] +[name=json_query_array_extraction_bug_326974631] +SELECT JSON_QUERY_ARRAY('{"a" : [{"b" : [{"c": [1,2],"d": [3,4] } ]}, {"b" : [{"c" : [5,6],"d" : [7,8]},{"c" : [9,10],"d" : [11,12]}]}]}', "$.a[0].b[1].c") +-- +ARRAY>>[{ARRAY(NULL)}] +== +[required_features=JSON_ARRAY_FUNCTIONS] +[name=json_value_array_extraction_bug_326974631] +SELECT JSON_VALUE_ARRAY('{"a" : [{"b" : [{"c": [1,2],"d": [3,4] } ]}, {"b" : [{"c" : [5,6],"d" : [7,8]},{"c" : [9,10],"d" : [11,12]}]}]}', "$.a[0].b[1].c") +-- +ARRAY>>[{ARRAY(NULL)}] +== +[name=json_extract_array_extraction_bug_326974631] +SELECT JSON_EXTRACT_ARRAY('{"a" : [{"b" : [{"c": [1,2],"d": [3,4] } ]}, {"b" : [{"c" : [5,6],"d" : [7,8]},{"c" : [9,10],"d" : [11,12]}]}]}', "$.a[0].b[1].c") +-- +ARRAY>>[{ARRAY(NULL)}] +== +[required_features=JSON_ARRAY_FUNCTIONS] +[name=json_extract_string_array_extraction_bug_326974631] +SELECT JSON_EXTRACT_STRING_ARRAY('{"a" : [{"b" : [{"c": [1,2],"d": [3,4] } ]}, {"b" : [{"c" : [5,6],"d" : [7,8]},{"c" : [9,10],"d" : [11,12]}]}]}', "$.a[0].b[1].c") +-- +ARRAY>>[{ARRAY(NULL)}] diff --git a/zetasql/compliance/testdata/unnest_multiway.test b/zetasql/compliance/testdata/unnest_multiway.test new file mode 100644 index 000000000..31eb3645b --- /dev/null +++ b/zetasql/compliance/testdata/unnest_multiway.test @@ -0,0 +1,724 @@ +# This file covers the following dimensions of tests: +# 1. UNNEST with 2, 3 or 10 arguments. +# 2. Multiway UNNEST with different ARRAY_ZIP_MODE values in mode argument. +# 3. NULL input in array or/and mode argument. +# 4. Empty array argument. +# 5. Array inputs of the same length or different lengths. +# 6. Unordered array input in multiway UNNEST. +# 7. Array inputs of literals or column references, with and without explicit alias. +[default required_features=V_1_4_MULTIWAY_UNNEST] +[load_proto_files=zetasql/testdata/test_schema.proto] +[load_proto_names=zetasql_test__.KitchenSinkPB] + +[name=multiway_unnest_unequal_literal_arrays_default_mode] +SELECT * +FROM UNNEST([1,2], [2]) WITH OFFSET +-- +ARRAY>[unknown order: + {1, 2, 0}, + {2, NULL, 1} +] +== + +[name=multiway_unnest_unequal_literal_arrays_strict_mode] +SELECT * +FROM UNNEST([1,2], ["a"], mode => "STRICT") +-- +ERROR: generic::out_of_range: Unnested arrays under STRICT mode must have equal lengths +== + +[name=multiway_unnest_unequal_literal_arrays_truncate_mode] +SELECT * +FROM UNNEST([1,2], ["a"], mode => "TRUNCATE") +-- +ARRAY>[{1, "a"}] +== + +[name=multiway_unnest_unequal_literal_arrays_pad_mode] +SELECT * +FROM UNNEST([1,2], [1.0], ["a", "b", "c"], mode => "PAD") +-- +ARRAY>[unknown order: + {1, 1, "a"}, + {2, NULL, "b"}, + {NULL, NULL, "c"} +] +== + +[name=multiway_unnest_unequal_literal_arrays_null_mode] +SELECT * +FROM UNNEST([1,2], [1.0], ["a", "b", "c"], mode => CAST(NULL AS ARRAY_ZIP_MODE)) +-- +ERROR: generic::out_of_range: UNNEST does not allow NULL mode argument +== + +[name=multiway_unnest_empty_literal_arrays_default_mode] +SELECT * +FROM UNNEST([1,2], []) WITH OFFSET +-- +ARRAY>[unknown order: + {1, NULL, 0}, + {2, NULL, 1} +] +== + +[name=multiway_unnest_empty_literal_arrays_strict_mode] +SELECT * +FROM UNNEST([1,2], [], mode => "STRICT") +-- +ERROR: generic::out_of_range: Unnested arrays under STRICT mode must have equal lengths +== + +[name=multiway_unnest_empty_literal_arrays_truncate_mode] +SELECT * +FROM UNNEST([1,2], [], mode => "TRUNCATE") +-- +ARRAY>[] +== + +[name=multiway_unnest_empty_literal_arrays_pad_mode] +SELECT * +FROM UNNEST([1,2], [], ["a", "b", "c"], mode => "PAD") +-- +ARRAY>[unknown order: + {1, NULL, "a"}, + {2, NULL, "b"}, + {NULL, NULL, "c"} +] +== + +[name=multiway_unnest_empty_literal_arrays_null_mode] +SELECT * +FROM UNNEST([1,2], [], ["a", "b", "c"], mode => CAST(NULL AS ARRAY_ZIP_MODE)) +-- +ERROR: generic::out_of_range: UNNEST does not allow NULL mode argument +== + +[name=multiway_unnest_one_null_array_default_mode] +SELECT * +FROM UNNEST([1,2], CAST(NULL AS ARRAY)) WITH OFFSET +-- +ARRAY>[unknown order: + {1, NULL, 0}, + {2, NULL, 1} +] +== + +[name=multiway_unnest_one_null_array_strict_mode] +SELECT * +FROM UNNEST([1,2], CAST(NULL AS ARRAY), mode => "STRICT") +-- +ERROR: generic::out_of_range: Unnested arrays under STRICT mode must have equal lengths +== + +[name=multiway_unnest_one_null_array_truncate_mode] +SELECT * +FROM UNNEST([1,2], CAST(NULL AS ARRAY), mode => "TRUNCATE") +-- +ARRAY>[] +== + +[name=multiway_unnest_one_null_array_pad_mode] +SELECT * +FROM UNNEST([1,2], CAST(NULL AS ARRAY), ["a", "b", "c"], mode => "PAD") +-- +ARRAY>[unknown order: + {1, NULL, "a"}, + {2, NULL, "b"}, + {NULL, NULL, "c"} +] +== + +[name=multiway_unnest_one_null_array_null_mode] +SELECT * +FROM UNNEST([1,2], CAST(NULL AS ARRAY), ["a", "b", "c"], mode => CAST(NULL AS ARRAY_ZIP_MODE)) +-- +ERROR: generic::out_of_range: UNNEST does not allow NULL mode argument +== + +[name=multiway_unnest_all_null_arrays_default_mode] +SELECT * +FROM UNNEST(CAST(NULL AS ARRAY), CAST(NULL AS ARRAY)) WITH OFFSET +-- +ARRAY>[] +== + +[name=multiway_unnest_all_null_arrays_strict_mode] +SELECT * +FROM UNNEST(CAST(NULL AS ARRAY), CAST(NULL AS ARRAY), mode => "STRICT") +-- +ARRAY>[] +== + +[name=multiway_unnest_all_null_arrays_truncate_mode] +SELECT * +FROM UNNEST(CAST(NULL AS ARRAY), CAST(NULL AS ARRAY), mode => "TRUNCATE") +-- +ARRAY>[] +== + +[name=multiway_unnest_all_null_arrays_pad_mode] +SELECT * +FROM UNNEST(CAST(NULL AS ARRAY), CAST(NULL AS ARRAY), CAST(NULL AS ARRAY), mode => "PAD") +-- +ARRAY>[] +== + +[name=multiway_unnest_all_null_arrays_null_mode] +SELECT * +FROM UNNEST(CAST(NULL AS ARRAY), CAST(NULL AS ARRAY), CAST(NULL AS ARRAY), mode => CAST(NULL AS ARRAY_ZIP_MODE)) +-- +ERROR: generic::out_of_range: UNNEST does not allow NULL mode argument +== + +[name=multiway_unnest_all_empty_arrays_strict_mode] +SELECT * +FROM UNNEST([], CAST([] AS ARRAY), CAST([] AS ARRAY), mode => "STRICT") +-- +ARRAY>[] +== + +[name=multiway_unnest_aliased_literal_arrays_strict_mode] +SELECT * +FROM UNNEST([1,2] AS arr1, ["a"] AS arr2, mode => "STRICT") WITH OFFSET +-- +ERROR: generic::out_of_range: Unnested arrays under STRICT mode must have equal lengths +== + +[name=multiway_unnest_aliased_literal_arrays_truncate_mode] +SELECT * +FROM UNNEST([1,2] AS arr1, ["a"] AS arr2, mode => "TRUNCATE") WITH OFFSET +-- +ARRAY>[{1, "a", 0}] +== + +[name=multiway_unnest_aliased_literal_arrays_pad_mode] +SELECT * +FROM UNNEST([1,2] AS arr1, [1.0] AS arr2, ["a", "b", "c"] AS arr3, mode => "PAD") WITH OFFSET +ORDER BY offset +-- +ARRAY>[known order: + {1, 1, "a", 0}, + {2, NULL, "b", 1}, + {NULL, NULL, "c", 2} +] +== + +[name=multiway_unnest_aliased_literal_arrays_null_mode] +SELECT * +FROM UNNEST([1,2] AS arr1, [1.0] AS arr2, ["a", "b", "c"] AS arr3, mode => CAST(NULL AS ARRAY_ZIP_MODE)) WITH OFFSET +ORDER BY offset +-- +ERROR: generic::out_of_range: UNNEST does not allow NULL mode argument +== + +[name=multiway_unnest_arrays_unordered_default_mode] +WITH T1 AS ( + SELECT ARRAY_AGG(x) AS unordered_arr1 + FROM ( + SELECT 1 AS x + UNION ALL + SELECT 2 AS x + UNION ALL + SELECT 3 AS x + ) +), T2 AS ( + SELECT ["hello", "world"] AS arr2 +) +SELECT unordered_arr1, arr2 +FROM T1, T2, UNNEST( + T1.unordered_arr1, + T2.arr2 AS arr2) +-- +ARRAY>[unknown order: + {1, "world"}, + {2, "hello"}, + {3, NULL} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=multiway_unnest_all_arrays_unordered_default_mode] +WITH T1 AS ( + SELECT ARRAY_AGG(x) AS unordered_arr1 + FROM ( + SELECT 1 AS x + UNION ALL + SELECT 2 AS x + UNION ALL + SELECT 3 AS x + ) +), T2 AS ( + SELECT ARRAY_AGG(y) AS unordered_arr2 + FROM ( + SELECT "hello" AS y + UNION ALL + SELECT "world" AS y + ) +) +SELECT unordered_arr1, unordered_arr2 +FROM T1, T2, UNNEST( + T1.unordered_arr1, + T2.unordered_arr2 AS unordered_arr2) +-- +ARRAY> +[unknown order:{1, "hello"}, {2, "world"}, {3, NULL}] + +NOTE: Reference implementation reports non-determinism. +== + +[name=multiway_unnest_arrays_unordered_with_offset_default_mode] +WITH T1 AS ( + SELECT ARRAY_AGG(x) AS unordered_arr1 + FROM ( + SELECT 1 AS x + UNION ALL + SELECT 2 AS x + UNION ALL + SELECT 3 AS x + ) +), T2 AS ( + SELECT ["hello", "world"] AS arr2 +) +SELECT unordered_arr1, arr2, offset +FROM T1, T2, UNNEST( + T1.unordered_arr1 AS unordered_arr1, + T2.arr2 AS arr2) WITH OFFSET +-- +ARRAY>[unknown order: + {1, "world", 1}, + {2, "hello", 0}, + {3, NULL, 2} +] + +NOTE: Reference implementation reports non-determinism. +== + +[name=multiway_unnest_three_unequal_arrays_column_ref_default_mode] +WITH T AS ( + SELECT ['a', 'b'] AS str_array, [1] AS int64_array, [1.0, 5.0, 10.0] AS double_array +) +SELECT * +FROM T, UNNEST( + T.str_array AS arr1, + T.int64_array AS arr2, + T.double_array AS arr3) WITH OFFSET +-- +ARRAY, + int64_array ARRAY<>, + double_array ARRAY<>, + arr1 STRING, + arr2 INT64, + arr3 DOUBLE, + offset INT64>> +[unknown order:{ + ARRAY[known order:"a", "b"], + ARRAY[1], + ARRAY[known order:1, 5, 10], + "b", + NULL, + 5, + 1 + }, + { + ARRAY[known order:"a", "b"], + ARRAY[1], + ARRAY[known order:1, 5, 10], + "a", + 1, + 1, + 0 + }, + { + ARRAY[known order:"a", "b"], + ARRAY[1], + ARRAY[known order:1, 5, 10], + NULL, + NULL, + 10, + 2 + }] +== + +[name=multiway_unnest_three_unequal_arrays_column_ref_strict_mode] +WITH T AS ( + SELECT ['a', 'b', 'c'] AS str_array, [1, 2, 3] AS int64_array, [1.0, 5.0] AS double_array +) +SELECT * +FROM T, UNNEST( + T.str_array AS arr1, + T.int64_array AS arr2, + T.double_array AS arr3, + mode => "STRICT") WITH OFFSET +-- +ERROR: generic::out_of_range: Unnested arrays under STRICT mode must have equal lengths +== + +[name=multiway_unnest_three_unequal_arrays_column_ref_truncate_mode] +WITH T AS ( + SELECT ['a', 'b', 'c'] AS str_array, [1, 2, 3] AS int64_array, [1.0, 5.0] AS double_array +) +SELECT * +FROM T, UNNEST( + T.str_array AS arr1, + T.int64_array AS arr2, + T.double_array AS arr3, + mode => "TRUNCATE") WITH OFFSET +-- +ARRAY, + int64_array ARRAY<>, + double_array ARRAY<>, + arr1 STRING, + arr2 INT64, + arr3 DOUBLE, + offset INT64>> +[unknown order:{ + ARRAY[known order:"a", "b", "c"], + ARRAY[known order:1, 2, 3], + ARRAY[known order:1, 5], + "b", + 2, + 5, + 1 + }, + { + ARRAY[known order:"a", "b", "c"], + ARRAY[known order:1, 2, 3], + ARRAY[known order:1, 5], + "a", + 1, + 1, + 0 + }] +== + +[name=multiway_unnest_three_unequal_arrays_column_ref_pad_mode] +WITH T AS ( + SELECT ['a', 'b', 'c'] AS str_array, [1, 2, 3] AS int64_array, [1.0, 5.0] AS double_array +) +SELECT * +FROM T, UNNEST( + T.str_array AS arr1, + T.int64_array AS arr2, + T.double_array AS arr3, + mode => "PAD") WITH OFFSET +-- +ARRAY, + int64_array ARRAY<>, + double_array ARRAY<>, + arr1 STRING, + arr2 INT64, + arr3 DOUBLE, + offset INT64>> +[unknown order:{ + ARRAY[known order:"a", "b", "c"], + ARRAY[known order:1, 2, 3], + ARRAY[known order:1, 5], + "b", + 2, + 5, + 1 + }, + { + ARRAY[known order:"a", "b", "c"], + ARRAY[known order:1, 2, 3], + ARRAY[known order:1, 5], + "a", + 1, + 1, + 0 + }, + { + ARRAY[known order:"a", "b", "c"], + ARRAY[known order:1, 2, 3], + ARRAY[known order:1, 5], + "c", + 3, + NULL, + 2 + }] +== + +[name=multiway_unnest_three_unequal_arrays_column_ref_null_mode] +WITH T AS ( + SELECT ['a', 'b', 'c'] AS str_array, [1, 2, 3] AS int64_array, [1.0, 5.0] AS double_array +) +SELECT * +FROM T, UNNEST( + T.str_array AS arr1, + T.int64_array AS arr2, + T.double_array AS arr3, + mode => CAST(NULL AS ARRAY_ZIP_MODE)) WITH OFFSET +-- +ERROR: generic::out_of_range: UNNEST does not allow NULL mode argument +== + +[name=multiway_unnest_ten_unequal_arrays_column_ref_default_mode] +WITH T AS ( + SELECT + ['a', 'b'] AS arr1, + [1] AS arr2, + ARRAY[1, 2] AS arr3, + ARRAY[1, 2] AS arr4, + ARRAY[1, 2] AS arr5, + [1.0] AS arr6, + [NEW zetasql_test__.KitchenSinkPB(1 AS int64_key_1, 2 AS int64_key_2), + NEW zetasql_test__.KitchenSinkPB(10 AS int64_key_1, 20 AS int64_key_2)] AS arr7, + [STRUCT(1, "a")] AS arr8, + [b'a', b'b'] AS arr9, + [STRUCT(100 AS field1, "st1" AS field2), STRUCT(200 AS field1, "st2" AS field2)] AS arr10, +) +SELECT arr1, arr2, arr3, arr4, arr5, arr6, arr7, arr8, arr9, arr10, offset +FROM T, UNNEST( + T.arr1, + T.arr2, + T.arr3, + T.arr4, + T.arr5, + T.arr6, + T.arr7, + T.arr8, + T.arr9, + T.arr10) WITH OFFSET +-- +ARRAY, + arr8 STRUCT, + arr9 BYTES, + arr10 STRUCT, + offset INT64 + >> +[unknown order:{"b", + NULL, + 2, + 2, + 2, + NULL, + { + int64_key_1: 10 + int64_key_2: 20 + }, + NULL, + b"b", + {200, "st2"}, + 1}, + {"a", + 1, + 1, + 1, + 1, + 1, + { + int64_key_1: 1 + int64_key_2: 2 + }, + {1, "a"}, + b"a", + {100, "st1"}, + 0}] +== + +[name=multiway_unnest_ten_unequal_arrays_column_ref_strict_mode] +WITH T AS ( + SELECT + ['a', 'b'] AS arr1, + [1] AS arr2, + ARRAY[1, 2] AS arr3, + ARRAY[1, 2] AS arr4, + ARRAY[1, 2] AS arr5, + [1.0] AS arr6, + [NEW zetasql_test__.KitchenSinkPB(1 AS int64_key_1, 2 AS int64_key_2), + NEW zetasql_test__.KitchenSinkPB(10 AS int64_key_1, 20 AS int64_key_2)] AS arr7, + [STRUCT(1, "a")] AS arr8, + [b'a', b'b'] AS arr9, + [STRUCT(100 AS field1, "st1" AS field2), STRUCT(200 AS field1, "st2" AS field2)] AS arr10, +) +SELECT arr1, arr2, arr3, arr4, arr5, arr6, arr7, arr8, arr9, arr10, offset +FROM T, UNNEST( + T.arr1, + T.arr2, + T.arr3, + T.arr4, + T.arr5, + T.arr6, + T.arr7, + T.arr8, + T.arr9, + T.arr10, + mode => "STRICT") WITH OFFSET +-- +ERROR: generic::out_of_range: Unnested arrays under STRICT mode must have equal lengths +== + +[name=multiway_unnest_ten_unequal_arrays_column_ref_truncate_mode] +WITH T AS ( + SELECT + ['a', 'b'] AS arr1, + [1] AS arr2, + ARRAY[1, 2] AS arr3, + ARRAY[1, 2] AS arr4, + ARRAY[1, 2] AS arr5, + [1.0] AS arr6, + [NEW zetasql_test__.KitchenSinkPB(1 AS int64_key_1, 2 AS int64_key_2), + NEW zetasql_test__.KitchenSinkPB(10 AS int64_key_1, 20 AS int64_key_2)] AS arr7, + [STRUCT(1, "a")] AS arr8, + [b'a', b'b'] AS arr9, + [STRUCT(100 AS field1, "st1" AS field2), STRUCT(200 AS field1, "st2" AS field2)] AS arr10, +) +SELECT arr1, arr2, arr3, arr4, arr5, arr6, arr7, arr8, arr9, arr10, offset +FROM T, UNNEST( + T.arr1, + T.arr2, + T.arr3, + T.arr4, + T.arr5, + T.arr6, + T.arr7, + T.arr8, + T.arr9, + T.arr10, + mode => "TRUNCATE") WITH OFFSET +-- +ARRAY, + arr8 STRUCT, + arr9 BYTES, + arr10 STRUCT, + offset INT64 + >> +[{"a", + 1, + 1, + 1, + 1, + 1, + { + int64_key_1: 1 + int64_key_2: 2 + }, + {1, "a"}, + b"a", + {100, "st1"}, + 0}] +== + +[name=multiway_unnest_ten_unequal_arrays_column_ref_pad_mode] +WITH T AS ( + SELECT + ['a', 'b'] AS arr1, + [1] AS arr2, + ARRAY[1, 2] AS arr3, + ARRAY[1, 2] AS arr4, + ARRAY[1, 2] AS arr5, + [1.0] AS arr6, + [NEW zetasql_test__.KitchenSinkPB(1 AS int64_key_1, 2 AS int64_key_2), + NEW zetasql_test__.KitchenSinkPB(10 AS int64_key_1, 20 AS int64_key_2)] AS arr7, + [STRUCT(1, "a")] AS arr8, + [b'a', b'b'] AS arr9, + [STRUCT(100 AS field1, "st1" AS field2), STRUCT(200 AS field1, "st2" AS field2)] AS arr10, +) +SELECT arr1, arr2, arr3, arr4, arr5, arr6, arr7, arr8, arr9, arr10, offset +FROM T, UNNEST( + T.arr1, + T.arr2, + T.arr3, + T.arr4, + T.arr5, + T.arr6, + T.arr7, + T.arr8, + T.arr9, + T.arr10, + mode => "PAD") WITH OFFSET +-- +ARRAY, + arr8 STRUCT, + arr9 BYTES, + arr10 STRUCT, + offset INT64 + >> +[unknown order:{"b", + NULL, + 2, + 2, + 2, + NULL, + { + int64_key_1: 10 + int64_key_2: 20 + }, + NULL, + b"b", + {200, "st2"}, + 1}, + {"a", + 1, + 1, + 1, + 1, + 1, + { + int64_key_1: 1 + int64_key_2: 2 + }, + {1, "a"}, + b"a", + {100, "st1"}, + 0}] +== + +[name=multiway_unnest_ten_unequal_arrays_column_ref_null_mode] +WITH T AS ( + SELECT + ['a', 'b'] AS arr1, + [1] AS arr2, + ARRAY[1, 2] AS arr3, + ARRAY[1, 2] AS arr4, + ARRAY[1, 2] AS arr5, + [1.0] AS arr6, + [NEW zetasql_test__.KitchenSinkPB(1 AS int64_key_1, 2 AS int64_key_2), + NEW zetasql_test__.KitchenSinkPB(10 AS int64_key_1, 20 AS int64_key_2)] AS arr7, + [STRUCT(1, "a")] AS arr8, + [b'a', b'b'] AS arr9, + [STRUCT(100 AS field1, "st1" AS field2), STRUCT(200 AS field1, "st2" AS field2)] AS arr10, +) +SELECT arr1, arr2, arr3, arr4, arr5, arr6, arr7, arr8, arr9, arr10, offset +FROM T, UNNEST( + T.arr1, + T.arr2, + T.arr3, + T.arr4, + T.arr5, + T.arr6, + T.arr7, + T.arr8, + T.arr9, + T.arr10, + mode => CAST(NULL AS ARRAY_ZIP_MODE)) WITH OFFSET +-- +ERROR: generic::out_of_range: UNNEST does not allow NULL mode argument +== + diff --git a/zetasql/local_service/local_service_test.cc b/zetasql/local_service/local_service_test.cc index 02d0ac321..da745d47b 100644 --- a/zetasql/local_service/local_service_test.cc +++ b/zetasql/local_service/local_service_test.cc @@ -726,83 +726,67 @@ TEST_F(ZetaSqlLocalServiceImplTest, Parse) { ParseResponse expectedResponse; google::protobuf::TextFormat::ParseFromString( R"pb(parsed_statement { - ast_query_statement_node { - parent { - parent { - parse_location_range { - filename: "" - start: 0 - end: 8 - } - } - } - query { - parent { - parent { - parse_location_range { - filename: "" - start: 0 - end: 8 - } - } - parenthesized: false - } - query_expr { - ast_select_node { - parent { - parent { - parse_location_range { - filename: "" - start: 0 - end: 8 - } - } - parenthesized: false - } - distinct: false - select_list { - parent { - parse_location_range { - filename: "" - start: 7 - end: 8 - } - } - columns { - parent { - parse_location_range { - filename: "" - start: 7 - end: 8 - } - } - expression { - ast_leaf_node { - ast_int_literal_node { - parent { - parent { - parent { - parse_location_range { - filename: "" - start: 7 - end: 8 - } - } - parenthesized: false - } - image: "9" - } - } - } - } - } - } - } - } - is_nested: false - is_pivot_input: false - } - } + ast_query_statement_node { + parent { + parent { + parse_location_range { filename: "" start: 0 end: 8 } + } + } + query { + parent { + parent { + parse_location_range { filename: "" start: 0 end: 8 } + } + parenthesized: false + } + query_expr { + ast_select_node { + parent { + parent { + parse_location_range { filename: "" start: 0 end: 8 } + } + parenthesized: false + } + distinct: false + select_list { + parent { + parse_location_range { filename: "" start: 7 end: 8 } + } + columns { + parent { + parse_location_range { filename: "" start: 7 end: 8 } + } + expression { + ast_leaf_node { + ast_printable_leaf_node { + ast_int_literal_node { + parent { + parent { + parent { + parent { + parse_location_range { + filename: "" + start: 7 + end: 8 + } + } + parenthesized: false + } + } + image: "9" + } + } + } + } + } + } + } + } + } + is_nested: false + is_pivot_input: false + } + } })pb", &expectedResponse); EXPECT_THAT(response, EqualsProto(expectedResponse)); @@ -843,9 +827,9 @@ TEST_F(ZetaSqlLocalServiceImplTest, ParseScript) { ParseResponse expectedResponse; google::protobuf::TextFormat::ParseFromString( R"pb(parsed_script { - parent { parse_location_range { filename: "" start: 5 end: 53 } } + parent { parse_location_range { filename: "" start: 5 end: 50 } } statement_list_node { - parent { parse_location_range { filename: "" start: 5 end: 53 } } + parent { parse_location_range { filename: "" start: 5 end: 50 } } statement_list { ast_script_statement_node { ast_variable_declaration_node { @@ -950,19 +934,23 @@ TEST_F(ZetaSqlLocalServiceImplTest, ParseScript) { } expression { ast_leaf_node { - ast_int_literal_node { - parent { + ast_printable_leaf_node { + ast_int_literal_node { parent { parent { - parse_location_range { - filename: "" - start: 48 - end: 49 + parent { + parent { + parse_location_range { + filename: "" + start: 48 + end: 49 + } + } + parenthesized: false } } - parenthesized: false + image: "1" } - image: "1" } } } @@ -2435,6 +2423,7 @@ TEST_F(ZetaSqlLocalServiceImplTest, GetBuiltinFunctions) { supports_safe_error_mode: false supports_having_modifier: true uses_upper_case_sql_name: true + may_suppress_side_effects: false })", &function1); google::protobuf::TextFormat::ParseFromString(R"( @@ -2485,6 +2474,7 @@ TEST_F(ZetaSqlLocalServiceImplTest, GetBuiltinFunctions) { supports_safe_error_mode: true supports_having_modifier: true uses_upper_case_sql_name: true + may_suppress_side_effects: false })", &function2); function1.mutable_options()->set_supports_clamped_between_modifier(false); @@ -2588,6 +2578,7 @@ TEST_F(ZetaSqlLocalServiceImplTest, GetBuiltinFunctionsReturnsTypes) { supports_having_modifier: true uses_upper_case_sql_name: true supports_clamped_between_modifier: false + may_suppress_side_effects: false })", &expected_round_function); diff --git a/zetasql/parser/BUILD b/zetasql/parser/BUILD index 6ac145a12..8220edba2 100644 --- a/zetasql/parser/BUILD +++ b/zetasql/parser/BUILD @@ -47,6 +47,8 @@ py_binary( srcs = ["gen_extra_files.py"], main = "gen_extra_files.py", python_version = "PY3", + deps = [ + ], ) genrule( @@ -186,8 +188,6 @@ cc_library( "ast_node_internal.h", "bison_parser.cc", "bison_parser.h", - "flex_tokenizer.cc", - "flex_tokenizer.h", "parser.cc", "unparser.cc", ], @@ -210,10 +210,10 @@ cc_library( ":bison_parser_generated_lib", ":bison_parser_mode", ":flex_istream", + ":flex_tokenizer", ":keywords", - ":location", ":parse_tree", - ":tokenizer", + ":token_disambiguator", "//bazel:flex", "//zetasql/base", "//zetasql/base:arena", @@ -227,6 +227,8 @@ cc_library( "//zetasql/common:thread_stack", "//zetasql/common:timer_util", "//zetasql/common:utf_util", + "//zetasql/parser/macros:macro_catalog", + "//zetasql/parser/macros:macro_expander", "//zetasql/proto:internal_error_location_cc_proto", "//zetasql/public:error_location_cc_proto", "//zetasql/public:id_string", @@ -262,24 +264,74 @@ cc_library( ], ) +cc_library( + name = "flex_tokenizer", + srcs = ["flex_tokenizer.cc"], + hdrs = ["flex_tokenizer.h"], + deps = [ + ":bison_parser_generated_lib", # buildcleaner: keep + ":bison_parser_mode", + ":bison_token_codes", + ":flex_istream", + ":keywords", + "//bazel:flex", + "//zetasql/base:check", + "//zetasql/base:status", + "//zetasql/common:errors", + "//zetasql/parser/macros:token_with_location", + "//zetasql/public:language_options", + "//zetasql/public:parse_location", + "@com_google_absl//absl/flags:flag", + "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + ], +) + +cc_library( + name = "token_disambiguator", + srcs = ["token_disambiguator.cc"], + hdrs = ["token_disambiguator.h"], + deps = [ + ":bison_parser_mode", + ":bison_token_codes", + ":flex_tokenizer", + "//zetasql/base:arena", + "//zetasql/base:check", + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/common:errors", + "//zetasql/parser/macros:flex_token_provider", + "//zetasql/parser/macros:macro_catalog", + "//zetasql/parser/macros:macro_expander", + "//zetasql/parser/macros:token_with_location", + "//zetasql/public:error_helpers", + "//zetasql/public:language_options", + "//zetasql/public:parse_location", + "@com_google_absl//absl/base:core_headers", + "@com_google_absl//absl/log", + "@com_google_absl//absl/memory", + "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + ], +) + cc_test( name = "flex_tokenizer_test", srcs = [ - "flex_tokenizer.h", "flex_tokenizer_test.cc", ], deps = [ - ":bison_parser_generated_lib", - ":location", - ":parser", - ":tokenizer", - "//bazel:flex", + ":bison_parser_mode", + ":bison_token_codes", + ":flex_tokenizer", + ":token_disambiguator", + "//zetasql/base:check", + "//zetasql/base:status", + "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", - "//zetasql/parser:bison_parser_mode", - "//zetasql/parser:bison_token_codes", "//zetasql/public:language_options", - "//zetasql/public:parse_location", - "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", ], @@ -304,16 +356,6 @@ cc_test( ], ) -cc_library( - name = "tokenizer", - hdrs = ["tokenizer.h"], - deps = [ - ":location", - "@com_google_absl//absl/status", - "@com_google_absl//absl/strings", - ], -) - cc_library( name = "bison_parser_generated_lib", srcs = [ @@ -346,9 +388,7 @@ cc_library( deps = [ ":bison_parser_mode", ":keywords", - ":location", ":parse_tree", - ":tokenizer", "//bazel:flex", "//zetasql/base", "//zetasql/base:arena", @@ -358,6 +398,8 @@ cc_library( "//zetasql/base:strings", "//zetasql/common:errors", "//zetasql/common:timer_util", + "//zetasql/parser/macros:macro_catalog", + "//zetasql/parser/macros:token_with_location", "//zetasql/public:id_string", "//zetasql/public:language_options", "//zetasql/public:options_cc_proto", @@ -370,6 +412,7 @@ cc_library( "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/memory", "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", "@com_google_absl//absl/time", @@ -415,6 +458,7 @@ cc_library( "//zetasql/base:arena", "//zetasql/base:check", "//zetasql/base:status", + "//zetasql/parser/macros:macro_catalog", "//zetasql/public:language_options", "//zetasql/public:options_cc_proto", "//zetasql/public/proto:logging_cc_proto", @@ -623,6 +667,7 @@ cc_test( "//zetasql/base/testing:zetasql_gtest_main", "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", "@com_googlesource_code_re2//:re2", ], ) @@ -638,12 +683,17 @@ cc_library( "//zetasql/public:catalog", "//zetasql/public:function", "//zetasql/public:language_options", + "//zetasql/public:options_cc_proto", "//zetasql/public:parse_resume_location", "//zetasql/public:simple_catalog", "//zetasql/public:strings", + "//zetasql/public:type_cc_proto", "//zetasql/public/types", + "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:str_format", ], ) @@ -652,6 +702,7 @@ cc_test( srcs = ["deidentify_test.cc"], deps = [ ":deidentify", + ":parse_tree", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", ], @@ -671,18 +722,10 @@ genlex( genyacc( name = "bison_parser_generated", src = "bison_parser.y", - extra_outs = [ - "location.hh", - ], header_out = "bison_parser.bison.h", source_out = "bison_parser.bison.cc", ) -cc_library( - name = "location", - hdrs = ["location.hh"], -) - [gen_parser_test( filename = f.split("/")[-1], ) for f in glob( diff --git a/zetasql/parser/ast_enums.proto b/zetasql/parser/ast_enums.proto index a6bfa50f3..2ec1b364d 100644 --- a/zetasql/parser/ast_enums.proto +++ b/zetasql/parser/ast_enums.proto @@ -49,6 +49,7 @@ enum SchemaObjectKind { kTableFunction = 13; kView = 14; kSnapshotTable = 15; + kExternalSchema = 18; } message ASTBinaryExpressionEnums { diff --git a/zetasql/parser/bison_parser.cc b/zetasql/parser/bison_parser.cc index 4f639c33f..789750973 100644 --- a/zetasql/parser/bison_parser.cc +++ b/zetasql/parser/bison_parser.cc @@ -25,14 +25,19 @@ #include #include +#include "zetasql/base/arena.h" #include "zetasql/common/errors.h" #include "zetasql/common/timer_util.h" #include "zetasql/common/utf_util.h" #include "zetasql/parser/ast_node.h" #include "zetasql/parser/bison_parser.bison.h" #include "zetasql/parser/keywords.h" +#include "zetasql/parser/macros/macro_catalog.h" +#include "zetasql/parser/token_disambiguator.h" #include "zetasql/public/id_string.h" +#include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" +#include "zetasql/public/parse_location.h" #include "zetasql/public/proto/logging.pb.h" #include "absl/cleanup/cleanup.h" #include "absl/flags/flag.h" @@ -168,7 +173,7 @@ static std::string ShortenBytesLiteralForError(absl::string_view literal) { // BisonParser::Parse(). It is required that 'bison_error_message' is the actual // error message produced by the bison parser for the given inputs. static absl::StatusOr GenerateImprovedBisonSyntaxError( - const LanguageOptions& language_options, ParseLocationPoint error_location, + const LanguageOptions& language_options, ParseLocationPoint& error_location, absl::string_view bison_error_message, BisonParserMode mode, absl::string_view input, int start_offset) { // Bison error messages are always of the form "syntax error, unexpected X, @@ -260,11 +265,12 @@ static absl::StatusOr GenerateImprovedBisonSyntaxError( // token at the start to indicate the statement type. That token interferes // with errors at offset 0. auto tokenizer = std::make_unique( - BisonParserMode::kTokenizer, error_location.filename(), input, - start_offset, language_options); + BisonParserMode::kTokenizerPreserveComments, error_location.filename(), + input, start_offset, language_options); ParseLocationRange token_location; int token = -1; while (token != 0) { + ParseLocationPoint last_token_location_end = token_location.end(); ZETASQL_RETURN_IF_ERROR(tokenizer->GetNextToken(&token_location, &token)); // Bison always returns parse errors at token boundaries, so this should // never happen. @@ -272,16 +278,22 @@ static absl::StatusOr GenerateImprovedBisonSyntaxError( token_location.start().GetByteOffset()); if (token == 0 || error_location.GetByteOffset() == token_location.start().GetByteOffset()) { - const absl::string_view token_text = - absl::ClippedSubstr(input, token_location.start().GetByteOffset(), - token_location.end().GetByteOffset() - - token_location.start().GetByteOffset()); + const absl::string_view token_text = token_location.GetTextFrom(input); std::string actual_token_description; if (token == 0) { // The error location was at end-of-input, so this is an - // unexpected-end-of error. + // unexpected-end-of error. Format with a better string, and move its + // location to the end of the last token. actual_token_description = absl::StrCat("end of ", GetBisonParserModeName(mode)); + if (last_token_location_end.IsValid()) { + error_location = last_token_location_end; + } else { + // There was not even a comment, the input is just whitespace. Move + // the error to skip it all. + error_location = token_location.start(); + error_location.SetByteOffset(start_offset); + } } else if (token == zetasql_bison_parser::BisonParserImpl::token::KW_OVER) { // When the OVER keyword is used in the wrong place, we tell the user @@ -340,11 +352,6 @@ static absl::StatusOr GenerateImprovedBisonSyntaxError( } else if (token == zetasql_bison_parser::BisonParserImpl::token::KW_DOUBLE_AT) { actual_token_description = "\"@@\""; - } else if (token == - zetasql_bison_parser::BisonParserImpl::token::KW_DOT_STAR) { - // This is a single token for ".*", but we want to expose this as "." - // externally. - actual_token_description = "\".\""; } else { actual_token_description = absl::StrCat("\"", token_text, "\""); } @@ -364,13 +371,16 @@ static absl::StatusOr GenerateImprovedBisonSyntaxError( static absl::Status ParseWithBison( BisonParser* parser, absl::string_view filename, absl::string_view input, BisonParserMode mode, int start_byte_offset, - const LanguageOptions& language_options, ASTNode*& output_node, - std::string& error_message, ParseLocationPoint& error_location, + const LanguageOptions& language_options, + const macros::MacroCatalog* macro_catalog, zetasql_base::UnsafeArena* arena, + ASTNode*& output_node, std::string& error_message, + ParseLocationPoint& error_location, ASTStatementProperties* ast_statement_properties, bool& move_error_location_past_whitespace, int* statement_end_byte_offset, bool& format_error, int64_t& out_num_lexical_tokens) { - auto tokenizer = std::make_unique( - mode, filename, input, start_byte_offset, language_options); + ZETASQL_ASSIGN_OR_RETURN(auto tokenizer, DisambiguatorLexer::Create( + mode, filename, input, start_byte_offset, + language_options, macro_catalog, arena)); zetasql_bison_parser::BisonParserImpl bison_parser_impl( tokenizer.get(), parser, &output_node, ast_statement_properties, @@ -439,14 +449,15 @@ static absl::Status InitNodes( absl::Status BisonParser::Parse( BisonParserMode mode, absl::string_view filename, absl::string_view input, int start_byte_offset, IdStringPool* id_string_pool, zetasql_base::UnsafeArena* arena, - const LanguageOptions& language_options, std::unique_ptr* output, + const LanguageOptions& language_options, + const macros::MacroCatalog* macro_catalog, std::unique_ptr* output, std::vector>* other_allocated_ast_nodes, ASTStatementProperties* ast_statement_properties, int* statement_end_byte_offset) { - absl::Status status = - ParseInternal(mode, filename, input, start_byte_offset, id_string_pool, - arena, language_options, output, other_allocated_ast_nodes, - ast_statement_properties, statement_end_byte_offset); + absl::Status status = ParseInternal( + mode, filename, input, start_byte_offset, id_string_pool, arena, + language_options, macro_catalog, output, other_allocated_ast_nodes, + ast_statement_properties, statement_end_byte_offset); if (!status.ok() && absl::GetFlag(FLAGS_zetasql_parser_strip_errors)) { return absl::InvalidArgumentError("Syntax error"); } @@ -456,7 +467,8 @@ absl::Status BisonParser::Parse( absl::Status BisonParser::ParseInternal( BisonParserMode mode, absl::string_view filename, absl::string_view input, int start_byte_offset, IdStringPool* id_string_pool, zetasql_base::UnsafeArena* arena, - const LanguageOptions& language_options, std::unique_ptr* output, + const LanguageOptions& language_options, + const macros::MacroCatalog* macro_catalog, std::unique_ptr* output, std::vector>* other_allocated_ast_nodes, ASTStatementProperties* ast_statement_properties, int* statement_end_byte_offset) { @@ -493,9 +505,9 @@ absl::Status BisonParser::ParseInternal( int64_t num_lexical_tokens; absl::Status parse_status = ParseWithBison( this, filename, input, mode, start_byte_offset, language_options, - output_node, error_message, error_location, ast_statement_properties, - move_error_location_past_whitespace, statement_end_byte_offset, - format_error, num_lexical_tokens); + macro_catalog, arena, output_node, error_message, error_location, + ast_statement_properties, move_error_location_past_whitespace, + statement_end_byte_offset, format_error, num_lexical_tokens); parser_runtime_info_->add_lexical_tokens(num_lexical_tokens); if (parse_status.ok()) { @@ -513,13 +525,15 @@ absl::Status BisonParser::ParseInternal( // missing. In those cases we typically don't have the position of the // next token available, and the parser can request that the error // location be moved past any whitespace onto the next token. - ZetaSqlFlexTokenizer skip_whitespace_tokenizer( - BisonParserMode::kTokenizer, filename_.ToStringView(), input_, - error_location.GetByteOffset(), this->language_options()); + ZETASQL_ASSIGN_OR_RETURN(auto skip_whitespace_tokenizer, + DisambiguatorLexer::Create( + BisonParserMode::kTokenizer, filename_.ToStringView(), + input_, error_location.GetByteOffset(), + this->language_options(), macro_catalog, arena)); ParseLocationRange next_token_location; int token; ZETASQL_RETURN_IF_ERROR( - skip_whitespace_tokenizer.GetNextToken(&next_token_location, &token)); + skip_whitespace_tokenizer->GetNextToken(&next_token_location, &token)); // Ignore the token. We only care about its starting location. error_location = next_token_location.start(); } diff --git a/zetasql/parser/bison_parser.h b/zetasql/parser/bison_parser.h index 6db9753e1..3617bead3 100644 --- a/zetasql/parser/bison_parser.h +++ b/zetasql/parser/bison_parser.h @@ -25,7 +25,7 @@ #include "zetasql/base/arena_allocator.h" #include "zetasql/common/errors.h" #include "zetasql/parser/bison_parser_mode.h" -#include "zetasql/parser/location.hh" +#include "zetasql/parser/macros/macro_catalog.h" #include "zetasql/parser/parse_tree.h" #include "zetasql/parser/parser_runtime_info.h" #include "zetasql/parser/statement_properties.h" @@ -103,6 +103,7 @@ class BisonParser { BisonParserMode mode, absl::string_view filename, absl::string_view input, int start_byte_offset, IdStringPool* id_string_pool, zetasql_base::UnsafeArena* arena, const LanguageOptions& language_options, + const macros::MacroCatalog* macro_catalog, std::unique_ptr* output, std::vector>* other_allocated_ast_nodes, ASTStatementProperties* ast_statement_properties, @@ -112,10 +113,7 @@ class BisonParser { // returned characters will remain valid throughout Parse(). template absl::string_view GetInputText(const Location& bison_location) const { - ABSL_DCHECK_GE(bison_location.end.column, bison_location.begin.column); - return absl::string_view( - input_.data() + bison_location.begin.column, - bison_location.end.column - bison_location.begin.column); + return bison_location.GetTextFrom(input_); } // Creates an ASTNode of type T with a unique id. Sets its location to @@ -176,10 +174,8 @@ class BisonParser { // 'bison_location'. template void SetNodeLocation(const Location& bison_location, ASTNode* node) { - node->set_start_location(zetasql::ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), bison_location.begin.column)); - node->set_end_location(zetasql::ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), bison_location.end.column)); + node->set_start_location(bison_location.start()); + node->set_end_location(bison_location.end()); } // Sets the node location of 'node' to the ZetaSQL equivalent of the @@ -188,26 +184,15 @@ class BisonParser { template void SetNodeLocation(Location& bison_location_start, Location& bison_location_end, ASTNode* node) { - node->set_start_location(zetasql::ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), bison_location_start.begin.column)); - node->set_end_location(zetasql::ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), bison_location_end.end.column)); - } - - static zetasql_bison_parser::location GetBisonLocation( - const zetasql::ParseLocationRange& location_range) { - zetasql_bison_parser::location result; - result.begin.column = location_range.start().GetByteOffset(); - result.end.column = location_range.end().GetByteOffset(); - return result; + node->set_start_location(bison_location_start.start()); + node->set_end_location(bison_location_end.end()); } // Sets the start location of 'node' to the start of 'location', and returns // 'node'. template ASTNodeType* WithStartLocation(ASTNodeType* node, const Location& location) { - node->set_start_location(ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), location.begin.column)); + node->set_start_location(location.start()); return node; } @@ -215,18 +200,15 @@ class BisonParser { // 'node'. template ASTNodeType* WithEndLocation(ASTNodeType* node, const Location& location) { - node->set_end_location(ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), location.end.column)); + node->set_end_location(location.end()); return node; } // Sets the location of 'node' to 'location', and returns 'node'. template ASTNodeType* WithLocation(ASTNodeType* node, const Location& location) { - node->set_start_location(ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), location.begin.column)); - node->set_end_location(ParseLocationPoint::FromByteOffset( - filename_.ToStringView(), location.end.column)); + node->set_start_location(location.start()); + node->set_end_location(location.end()); return node; } @@ -254,7 +236,7 @@ class BisonParser { // Returns true if there is whitespace between `left` and `right`. template bool HasWhitespace(const Location& left, const Location& right) { - return left.end.column != right.begin.column; + return left.end().GetByteOffset() != right.start().GetByteOffset(); } template @@ -301,6 +283,7 @@ class BisonParser { BisonParserMode mode, absl::string_view filename, absl::string_view input, int start_byte_offset, IdStringPool* id_string_pool, zetasql_base::UnsafeArena* arena, const LanguageOptions& language_options, + const macros::MacroCatalog* macro_catalog, std::unique_ptr* output, std::vector>* other_allocated_ast_nodes, ASTStatementProperties* ast_statement_properties, diff --git a/zetasql/parser/bison_parser.y b/zetasql/parser/bison_parser.y index 191dd438c..90f5a558e 100644 --- a/zetasql/parser/bison_parser.y +++ b/zetasql/parser/bison_parser.y @@ -24,12 +24,14 @@ // (Do NOT set the --report-file to a path on citc, because then the file will // be truncated at 1MB for some reason.) -#include "zetasql/parser/location.hh" +#include + #include "zetasql/parser/bison_parser.h" #include "zetasql/parser/join_processor.h" #include "zetasql/parser/parse_tree.h" #include "zetasql/parser/parser_internal.h" #include "zetasql/parser/statement_properties.h" +#include "zetasql/public/parse_location.h" #include "zetasql/public/strings.h" #include "zetasql/base/case.h" #include "absl/memory/memory.h" @@ -43,12 +45,28 @@ #define YYDEBUG 0 #endif +// Define the handling of our custom ParseLocationRange. +# define YYLLOC_DEFAULT(Cur, Rhs, N) \ +do \ + if (N) \ + { \ + (Cur).set_start(YYRHSLOC(Rhs, 1).start()); \ + (Cur).set_end(YYRHSLOC(Rhs, N).end()); \ + } \ + else \ + { \ + (Cur).set_start(YYRHSLOC(Rhs, 0).end()); \ + (Cur).set_end(YYRHSLOC(Rhs, 0).end()); \ + } \ +while (0) } %defines %skeleton "lalr1.cc" %define parse.error verbose %define api.parser.class {BisonParserImpl} +%define api.location.type {zetasql::ParseLocationRange} + %initial-action { #if YYDEBUG @@ -68,8 +86,8 @@ // Parameters for the parser. The tokenizer gets passed through into the lexer // as well, so it is declared with "%lex-param" too. -%lex-param {zetasql::parser::ZetaSqlFlexTokenizer* tokenizer} -%parse-param {zetasql::parser::ZetaSqlFlexTokenizer* tokenizer} +%lex-param {zetasql::parser::DisambiguatorLexer* tokenizer} +%parse-param {zetasql::parser::DisambiguatorLexer* tokenizer} %parse-param {zetasql::parser::BisonParser* parser} %parse-param {zetasql::ASTNode** ast_node_result} %parse-param {zetasql::parser::ASTStatementProperties* @@ -262,6 +280,10 @@ %union { bool boolean; int64_t int64_val; + struct { + const char* str; + size_t len; + } string_view; const char* string_constant; zetasql::TypeKind type_kind; zetasql::ASTFunctionCall::NullHandlingModifier null_handling_modifier; @@ -323,6 +345,10 @@ zetasql::ASTUnpivotClause* unpivot_clause; zetasql::ASTSetOperationType* set_operation_type; zetasql::ASTSetOperationAllOrDistinct* set_operation_all_or_distinct; + zetasql::ASTBytesLiteral* bytes_literal; + zetasql::ASTBytesLiteralComponent* bytes_literal_component; + zetasql::ASTStringLiteral* string_literal; + zetasql::ASTStringLiteralComponent* string_literal_component; struct { zetasql::ASTPivotClause* pivot_clause; zetasql::ASTUnpivotClause* unpivot_clause; @@ -373,6 +399,13 @@ zetasql::ASTQuery* query; zetasql::ASTPathExpression* replica_source; } query_or_replica_source_info; + struct { + zetasql::ASTNode* hint; + } group_by_preamble; + zetasql::ASTStructBracedConstructor* struct_braced_constructor; + zetasql::ASTBracedConstructor* braced_constructor; + zetasql::ASTBracedConstructorField* braced_constructor_field; + zetasql::ASTBracedConstructorFieldValue* braced_constructor_field_value; } // YYEOF is a special token used to indicate the end of the input. It's alias // defaults to "end of file", but "end of input" is more appropriate for us. @@ -391,7 +424,8 @@ %token BYTES_LITERAL "bytes literal" %token INTEGER_LITERAL "integer literal" %token FLOATING_POINT_LITERAL "floating point literal" -%token IDENTIFIER "identifier" +%token IDENTIFIER "identifier" +%token BACKSLASH // Only for lenient macro expansion // Script labels. This is set apart from IDENTIFIER for two reasons: // - Identifiers should still be disallowed at statement beginnings in all @@ -435,8 +469,7 @@ %token '/' "/" %token '~' "~" %token '.' "." -%token KW_DOT_STAR ".*" -%token KW_OPEN_HINT "@{" +%token KW_OPEN_HINT "@ for hint" %token '}' "}" %token '?' "?" %token KW_OPEN_INTEGER_HINT "@n" @@ -528,6 +561,7 @@ using namespace zetasql::parser_internal; %token KW_CROSS "CROSS" %token KW_CURRENT "CURRENT" %token KW_DEFAULT "DEFAULT" +%token KW_DEFINE_FOR_MACROS "DEFINE for macros" %token KW_DEFINE "DEFINE" %token KW_DESC "DESC" %token KW_DISTINCT "DISTINCT" @@ -541,7 +575,6 @@ using namespace zetasql::parser_internal; %token KW_FOLLOWING "FOLLOWING" %token KW_FROM "FROM" %token KW_FULL "FULL" -%token KW_FULL_IN_SET_OP %token KW_GROUP "GROUP" %token KW_GROUPING "GROUPING" %token KW_HASH "HASH" @@ -622,6 +655,7 @@ using namespace zetasql::parser_internal; %token KW_WITH_STARTING_WITH_EXPRESSION "WITH starting with expression" %token KW_EXCEPT_IN_SET_OP "EXCEPT in set operation" +%token KW_FULL_IN_SET_OP // This is a different token because using KW_NOT for BETWEEN/IN/LIKE would // confuse the operator precedence parsing. Boolean NOT has a different // precedence than NOT BETWEEN/IN/LIKE. @@ -679,6 +713,7 @@ using namespace zetasql::parser_internal; %token KW_DEFINER "DEFINER" %token KW_DELETE "DELETE" %token KW_DELETION "DELETION" +%token KW_DEPTH "DEPTH" %token KW_DESCRIBE "DESCRIBE" %token KW_DESCRIPTOR "DESCRIPTOR" %token KW_DETERMINISTIC "DETERMINISTIC" @@ -841,6 +876,7 @@ using namespace zetasql::parser_internal; // enumerate all token kinds to implement the macro body rule. %token MACRO_BODY_TOKEN +%token CUSTOM_MODE_START // Gets disambiguated to one of the modes below. %token MODE_STATEMENT %token MODE_SCRIPT %token MODE_NEXT_STATEMENT @@ -871,7 +907,8 @@ using namespace zetasql::parser_internal; %type maybe_dashed_path_expression_with_scope %type bignumeric_literal %type boolean_literal -%type bytes_literal +%type bytes_literal +%type bytes_literal_component %type call_statement %type call_statement_with_args_prefix %type case_expression @@ -967,7 +1004,7 @@ using namespace zetasql::parser_internal; %type grantee_list_with_parens_prefix %type group_by_clause_prefix %type group_by_all -%type group_by_preamble +%type group_by_preamble %type grouping_item %type grouping_set %type grouping_set_list @@ -1032,6 +1069,7 @@ using namespace zetasql::parser_internal; %type insert_values_row %type insert_values_row_prefix %type int_literal_or_parameter +%type opt_int_literal_or_parameter %type integer_literal %type join %type join_input @@ -1054,13 +1092,14 @@ using namespace zetasql::parser_internal; %type new_constructor_arg %type new_constructor_prefix %type new_constructor_prefix_no_arg -%type braced_constructor_field_value -%type braced_constructor_field -%type braced_constructor_extension -%type braced_constructor_start -%type braced_constructor_prefix -%type braced_constructor +%type braced_constructor_field_value +%type braced_constructor_field +%type braced_constructor_extension +%type braced_constructor_start +%type braced_constructor_prefix +%type braced_constructor %type braced_new_constructor +%type struct_braced_constructor %type next_statement %type next_script_statement %type null_literal @@ -1255,6 +1294,7 @@ using namespace zetasql::parser_internal; %type show_target %type signed_numerical_literal %type simple_column_schema_inner +%type string_literal_component %type sql_function_body %type star_except_list %type star_except_list_prefix @@ -1266,9 +1306,8 @@ using namespace zetasql::parser_internal; %type sql_statement_body %type statement_list %type script -%type non_empty_statement_list %type unterminated_non_empty_statement_list -%type string_literal +%type string_literal %type string_literal_or_parameter %type struct_constructor %type struct_constructor_arg @@ -1298,7 +1337,6 @@ using namespace zetasql::parser_internal; %type table_primary %type table_subquery %type templated_parameter_type -%type terminated_statement %type transaction_mode %type transaction_mode_list %type truncate_statement @@ -1337,6 +1375,9 @@ using namespace zetasql::parser_internal; %type with_connection_clause %type aliased_query %type aliased_query_list +%type aliased_query_modifiers +%type possibly_unbounded_int_literal_or_parameter +%type recursion_depth_modifier %type opt_with_connection_clause %type alter_action_list %type alter_action @@ -1371,6 +1412,7 @@ using namespace zetasql::parser_internal; %type opt_index_type %type opt_access %type opt_aggregate +%type asc_or_desc %type opt_asc_or_desc %type opt_filter %type opt_if_exists @@ -1484,8 +1526,7 @@ next_script_statement: unterminated_statement ";" { // The semicolon marks the end of the statement. - tokenizer->SetForceTerminate(); - *statement_end_byte_offset = @2.end.column; + SetForceTerminate(tokenizer, statement_end_byte_offset); $$ = $1; } | unterminated_statement @@ -1500,8 +1541,7 @@ next_statement: unterminated_sql_statement ";" { // The semicolon marks the end of the statement. - tokenizer->SetForceTerminate(); - *statement_end_byte_offset = @2.end.column; + SetForceTerminate(tokenizer, statement_end_byte_offset); $$ = $1; } | unterminated_sql_statement @@ -1540,13 +1580,6 @@ unterminated_script_statement: | raise_statement ; -terminated_statement: - unterminated_statement ";" - { - $$ = $1; - } - ; - sql_statement_body: query_statement | alter_statement @@ -1602,20 +1635,34 @@ sql_statement_body: ; define_macro_statement: - "DEFINE" "MACRO" identifier[name] + // Use a special version of KW_DEFINE which indicates that this macro + // definition was "original", not expanded from other macros. + "DEFINE for macros" "MACRO" identifier[name] { if (!parser->language_options().LanguageFeatureEnabled( zetasql::FEATURE_V_1_4_SQL_MACROS)) { YYERROR_AND_ABORT_AT(@2, "Macros are not supported"); } - tokenizer->PushBisonParserMode( + PushBisonParserMode(tokenizer, zetasql::parser::BisonParserMode::kMacroBody); } macro_body[tokens] { - tokenizer->PopBisonParserMode(); + PopBisonParserMode(tokenizer); $$ = MAKE_NODE(ASTDefineMacroStatement, @$, {$name, $tokens}); } + | "DEFINE" "MACRO"[kw_macro] + { + if (!parser->language_options().LanguageFeatureEnabled( + zetasql::FEATURE_V_1_4_SQL_MACROS)) { + YYERROR_AND_ABORT_AT(@2, "Macros are not supported"); + } + // Rule to capture wrong usage, where DEFINE MACRO is resulting from + // expanding other macros, instead of being original user input. + YYERROR_AND_ABORT_AT(@kw_macro, + "Syntax error: DEFINE MACRO statements cannot be " + "composed from other expansions."); + } ; macro_body: @@ -1946,6 +1993,8 @@ schema_object_kind: $$ = zetasql::SchemaObjectKind::kExternalTable; } } + | "EXTERNAL" "SCHEMA" + { $$ = zetasql::SchemaObjectKind::kExternalSchema; } | "FUNCTION" { $$ = zetasql::SchemaObjectKind::kFunction; } | "INDEX" @@ -1987,6 +2036,8 @@ alter_statement: node = MAKE_NODE(ASTAlterDatabaseStatement, @$); } else if ($2 == zetasql::SchemaObjectKind::kSchema) { node = MAKE_NODE(ASTAlterSchemaStatement, @$); + } else if ($2 == zetasql::SchemaObjectKind::kExternalSchema) { + node = MAKE_NODE(ASTAlterExternalSchemaStatement, @$); } else if ($2 == zetasql::SchemaObjectKind::kView) { node = MAKE_NODE(ASTAlterViewStatement, @$); } else if ($2 == zetasql::SchemaObjectKind::kMaterializedView) { @@ -2288,7 +2339,7 @@ unordered_options_body: parser->AddWarning(parser->GenerateWarning( "The preferred style places the OPTIONS clause before the " "function body.", - (@options).begin.column)); + (@options).start().GetByteOffset())); } $$.options = $options; $$.body = $body; @@ -2931,13 +2982,13 @@ create_external_schema_statement: undrop_statement: "UNDROP" schema_object_kind opt_if_not_exists path_expression - opt_at_system_time + opt_at_system_time opt_options_list { if ($schema_object_kind != zetasql::SchemaObjectKind::kSchema) { YYERROR_AND_ABORT_AT(@schema_object_kind, absl::StrCat("UNDROP ", absl::AsciiStrToUpper( parser->GetInputText(@schema_object_kind)), " is not supported")); } - auto* undrop = MAKE_NODE(ASTUndropStatement, @$, {$path_expression, $opt_at_system_time}); + auto* undrop = MAKE_NODE(ASTUndropStatement, @$, {$path_expression, $opt_at_system_time, $opt_options_list}); undrop->set_schema_object_kind($schema_object_kind); undrop->set_is_if_not_exists($opt_if_not_exists); $$ = undrop; @@ -4101,14 +4152,15 @@ create_view_statement: } ; query_or_replica_source: - query + query[q] { - $$.query = $1; + $$ = {.query = $q, .replica_source = nullptr }; } | - "REPLICA" "OF" maybe_dashed_path_expression + "REPLICA" "OF" maybe_dashed_path_expression[path] { - $$.replica_source = static_cast($3); + $$ = {.query = nullptr, + .replica_source = static_cast($path)}; } ; @@ -4826,13 +4878,13 @@ hint_entry: ; hint_with_body_prefix: - KW_OPEN_INTEGER_HINT integer_literal "@{" hint_entry + KW_OPEN_INTEGER_HINT integer_literal KW_OPEN_HINT "{" hint_entry[entry] { - $$ = MAKE_NODE(ASTHint, @$, {$2, $4}); + $$ = MAKE_NODE(ASTHint, @$, {$2, $entry}); } - | "@{" hint_entry + | KW_OPEN_HINT "{" hint_entry[entry] { - $$ = MAKE_NODE(ASTHint, @$, {$2}); + $$ = MAKE_NODE(ASTHint, @$, {$entry}); } | hint_with_body_prefix "," hint_entry { @@ -4954,15 +5006,15 @@ select_column: auto* alias = MAKE_NODE(ASTAlias, @2, {$2}); $$ = MAKE_NODE(ASTSelectColumn, @$, {$1, alias}); } - | expression ".*" + | expression[expr] "." "*" { - auto* dot_star = MAKE_NODE(ASTDotStar, @1, @2, {$1}); + auto* dot_star = MAKE_NODE(ASTDotStar, @$, {$expr}); $$ = MAKE_NODE(ASTSelectColumn, @$, {dot_star}); } - | expression ".*" star_modifiers + | expression "." "*" star_modifiers[modifiers] { auto* dot_star_with_modifiers = - MAKE_NODE(ASTDotStarWithModifiers, @1, @3, {$1, $3}); + MAKE_NODE(ASTDotStarWithModifiers, @$, {$1, $modifiers}); $$ = MAKE_NODE(ASTSelectColumn, @$, {dot_star_with_modifiers}); } | "*" @@ -5019,6 +5071,10 @@ opt_natural: opt_outer: "OUTER" | %empty ; +opt_int_literal_or_parameter: + int_literal_or_parameter + | %empty { $$ = nullptr; } + int_literal_or_parameter: integer_literal | parameter_expression @@ -5257,15 +5313,13 @@ opt_pivot_or_unpivot_clause_and_alias: | "AS" identifier pivot_clause opt_as_alias { $$.alias = MAKE_NODE(ASTAlias, @1, {$2}); $$.alias = parser->WithEndLocation($$.alias, @2); - $$.pivot_clause = WithExtraChildren($3, - {static_cast($4)}); + $$.pivot_clause = WithExtraChildren($3, {$4}); $$.unpivot_clause = nullptr; } | "AS" identifier unpivot_clause opt_as_alias { $$.alias = MAKE_NODE(ASTAlias, @1, {$2}); $$.alias = parser->WithEndLocation($$.alias, @2); - $$.unpivot_clause = WithExtraChildren($3, - {static_cast($4)}); + $$.unpivot_clause = WithExtraChildren($3, {$4}); $$.pivot_clause = nullptr; } | "AS" identifier qualify_clause_nonreserved { @@ -5276,14 +5330,12 @@ opt_pivot_or_unpivot_clause_and_alias: } | identifier pivot_clause opt_as_alias { $$.alias = MAKE_NODE(ASTAlias, @1, {$1}); - $$.pivot_clause = WithExtraChildren($2, - {static_cast($3)}); + $$.pivot_clause = WithExtraChildren($2, {$3}); $$.unpivot_clause = nullptr; } | identifier unpivot_clause opt_as_alias { $$.alias = MAKE_NODE(ASTAlias, @1, {$1}); - $$.unpivot_clause = WithExtraChildren($2, - {static_cast($3)}); + $$.unpivot_clause = WithExtraChildren($2, {$3}); $$.pivot_clause = nullptr; } | identifier qualify_clause_nonreserved { @@ -5294,14 +5346,12 @@ opt_pivot_or_unpivot_clause_and_alias: } | pivot_clause opt_as_alias { $$.alias = nullptr; - $$.pivot_clause = WithExtraChildren($1, - {static_cast($2)}); + $$.pivot_clause = WithExtraChildren($1, {$2}); $$.unpivot_clause = nullptr; } | unpivot_clause opt_as_alias { $$.alias = nullptr; - $$.unpivot_clause = WithExtraChildren($1, - {static_cast($2)}); + $$.unpivot_clause = WithExtraChildren($1, {$2}); $$.pivot_clause = nullptr; } | qualify_clause_nonreserved { @@ -5963,16 +6013,18 @@ grouping_item: ; group_by_preamble: - "GROUP" opt_hint "BY" + "GROUP" opt_hint + "BY" { - $$ = $opt_hint; + $$.hint = $opt_hint; } ; group_by_clause_prefix: - group_by_preamble[hint] grouping_item[item] + group_by_preamble[preamble] grouping_item[item] { - $$ = MAKE_NODE(ASTGroupBy, @$, {$hint, $item}); + auto* node = MAKE_NODE(ASTGroupBy, @$, {$preamble.hint, $item}); + $$ = node; } | group_by_clause_prefix[prefix] "," grouping_item[item] { @@ -5981,10 +6033,11 @@ group_by_clause_prefix: ; group_by_all: - group_by_preamble[hint] KW_ALL[all] + group_by_preamble[preamble] KW_ALL[all] { auto* group_by_all = MAKE_NODE(ASTGroupByAll, @all, {}); - $$ = MAKE_NODE(ASTGroupBy, @$, {$hint, group_by_all}); + auto* node = MAKE_NODE(ASTGroupBy, @$, {$preamble.hint, group_by_all}); + $$ = node; } ; @@ -6068,12 +6121,12 @@ qualify_clause_nonreserved: ; limit_offset_clause: - "LIMIT" possibly_cast_int_literal_or_parameter - "OFFSET" possibly_cast_int_literal_or_parameter + "LIMIT" expression + "OFFSET" expression { $$ = MAKE_NODE(ASTLimitOffset, @$, {$2, $4}); } - | "LIMIT" possibly_cast_int_literal_or_parameter + | "LIMIT" expression { $$ = MAKE_NODE(ASTLimitOffset, @$, {$2}); } @@ -6138,10 +6191,57 @@ opt_null_handling_modifier: } ; +possibly_unbounded_int_literal_or_parameter: + int_literal_or_parameter { $$ = MAKE_NODE(ASTIntOrUnbounded, @$, {$1}); } + | "UNBOUNDED" { $$ = MAKE_NODE(ASTIntOrUnbounded, @$, {}); } + ; + +recursion_depth_modifier: + "WITH" "DEPTH" opt_as_alias_with_required_as[alias] + { + auto empty_location = LocationFromOffset(@alias.end()); + + // By default, they're unbounded when unspecified. + auto* lower_bound = MAKE_NODE(ASTIntOrUnbounded, empty_location, {}); + auto* upper_bound = MAKE_NODE(ASTIntOrUnbounded, empty_location, {}); + $$ = MAKE_NODE(ASTRecursionDepthModifier, @$, + {$alias, lower_bound, upper_bound}); + } + // TODO: Clean up BETWEEN ... AND ... syntax once + // we move to TextMapper. + | "WITH" "DEPTH" opt_as_alias_with_required_as[alias] + "BETWEEN" possibly_unbounded_int_literal_or_parameter[lower_bound] + "AND for BETWEEN" possibly_unbounded_int_literal_or_parameter[upper_bound] + { + $$ = MAKE_NODE(ASTRecursionDepthModifier, @$, + {$alias, $lower_bound, $upper_bound}); + } + | "WITH" "DEPTH" opt_as_alias_with_required_as[alias] + "MAX" possibly_unbounded_int_literal_or_parameter[upper_bound] + { + auto empty_location = LocationFromOffset(@alias.end()); + + // Lower bound is unspecified in this case. + auto* lower_bound = MAKE_NODE(ASTIntOrUnbounded, empty_location, {}); + $$ = MAKE_NODE(ASTRecursionDepthModifier, @$, + {$alias, lower_bound, $upper_bound}); + } + ; + +aliased_query_modifiers: + recursion_depth_modifier + { + $$ = MAKE_NODE(ASTAliasedQueryModifiers, @$, + {$recursion_depth_modifier}); + } + | %empty { $$ = nullptr; } + ; + aliased_query: identifier "AS" parenthesized_query[query] + aliased_query_modifiers[modifiers] { - $$ = MAKE_NODE(ASTAliasedQuery, @$, {$1, $query}); + $$ = MAKE_NODE(ASTAliasedQuery, @$, {$1, $query, $modifiers}); } ; @@ -6190,10 +6290,13 @@ with_clause_with_trailing_comma: } ; -// Returns true for DESC, false for ASC (which is the default). -opt_asc_or_desc: +asc_or_desc: "ASC" { $$ = zetasql::ASTOrderingExpression::ASC; } | "DESC" { $$ = zetasql::ASTOrderingExpression::DESC; } + ; + +opt_asc_or_desc: + asc_or_desc { $$ = $1; } | %empty { $$ = zetasql::ASTOrderingExpression::UNSPECIFIED; } ; @@ -6214,7 +6317,7 @@ opt_null_order: ; string_literal_or_parameter: - string_literal + string_literal { $$ = $string_literal; } | parameter_expression | system_variable_expression; @@ -6585,8 +6688,8 @@ expression_maybe_parenthesized: expression_not_parenthesized: null_literal | boolean_literal - | string_literal - | bytes_literal + | string_literal { $$ = $string_literal; } + | bytes_literal { $$ = $bytes_literal; } | integer_literal | numeric_literal | bignumeric_literal @@ -6598,8 +6701,9 @@ expression_not_parenthesized: | system_variable_expression | array_constructor | new_constructor - | braced_constructor + | braced_constructor[ctor] { $$ = $ctor; } | braced_new_constructor + | struct_braced_constructor | case_expression | cast_expression | extract_expression @@ -7623,12 +7727,9 @@ raw_type: type_parameter: integer_literal | boolean_literal - | string_literal - | bytes_literal - | floating_point_literal - { - $$ = $1; - } + | string_literal { $$ = $string_literal; } + | bytes_literal { $$ = $bytes_literal; } + | floating_point_literal { $$ = $floating_point_literal; } | "MAX" { $$ = MAKE_NODE(ASTMaxLiteral, @1, {}); @@ -7755,6 +7856,7 @@ braced_constructor_field_value: ":" expression { $$ = MAKE_NODE(ASTBracedConstructorFieldValue, @$, {$2}); + $$->set_colon_prefixed(true); } | braced_constructor { @@ -7803,6 +7905,7 @@ braced_constructor_prefix: | braced_constructor_prefix "," braced_constructor_field { $$ = WithExtraChildren($1, {$3}); + $3->set_comma_separated(true); } | braced_constructor_prefix braced_constructor_field { @@ -7821,6 +7924,7 @@ braced_constructor_prefix: | braced_constructor_prefix "," braced_constructor_extension { $$ = WithExtraChildren($1, {$3}); + $3->set_comma_separated(true); } ; @@ -7847,6 +7951,17 @@ braced_new_constructor: } ; +struct_braced_constructor: + struct_type[type] braced_constructor[ctor] + { + $$ = MAKE_NODE(ASTStructBracedConstructor, @$, {$type, $ctor}); + } + | "STRUCT" braced_constructor[ctor] + { + $$ = MAKE_NODE(ASTStructBracedConstructor, @$, {$ctor}); + } + ; + case_no_value_expression_prefix: "CASE" "WHEN" expression "THEN" expression { @@ -8524,7 +8639,7 @@ boolean_literal: } ; -string_literal: +string_literal_component: STRING_LITERAL { const absl::string_view input_text = parser->GetInputText(@1); @@ -8535,7 +8650,7 @@ string_literal: input_text, &str, &error_string, &error_offset); if (!parse_status.ok()) { auto location = @1; - location.begin.column += error_offset; + location.mutable_start().IncrementByteOffset(error_offset); if (!error_string.empty()) { YYERROR_AND_ABORT_AT(location, absl::StrCat("Syntax error: ", error_string)); @@ -8546,7 +8661,7 @@ string_literal: parse_status.message())); } - auto* literal = MAKE_NODE(ASTStringLiteral, @1); + auto* literal = MAKE_NODE(ASTStringLiteralComponent, @1); literal->set_string_value(std::move(str)); // TODO: Migrate to absl::string_view or avoid having to // set this at all if the client isn't interested. @@ -8555,7 +8670,35 @@ string_literal: } ; -bytes_literal: +// Can be a concatenation of multiple string literals +string_literal: + string_literal_component[component] + { + $$ = MAKE_NODE(ASTStringLiteral, @$, {$component}); + $$->set_string_value($component->string_value()); + } + | string_literal[list] string_literal_component[component] + { + if (@component.start().GetByteOffset() == @list.end().GetByteOffset()) { + YYERROR_AND_ABORT_AT(@2, "Syntax error: concatenated string literals must be separated by whitespace or comments"); + } + + $$ = WithExtraChildren($list, {$component}); + // TODO: append the value in place, instead of StrCat() + // then set(). + $$->set_string_value( + absl::StrCat($$->string_value(), $component->string_value())); + } + | string_literal bytes_literal_component[component] + { + // Capture this case to provide a better error message + YYERROR_AND_ABORT_AT( + @component, + "Syntax error: string and bytes literals cannot be concatenated."); + } + ; + +bytes_literal_component: BYTES_LITERAL { const absl::string_view input_text = parser->GetInputText(@1); @@ -8566,7 +8709,7 @@ bytes_literal: input_text, &bytes, &error_string, &error_offset); if (!parse_status.ok()) { auto location = @1; - location.begin.column += error_offset; + location.mutable_start().IncrementByteOffset(error_offset); if (!error_string.empty()) { YYERROR_AND_ABORT_AT(location, absl::StrCat("Syntax error: ", error_string)); @@ -8580,7 +8723,7 @@ bytes_literal: // The identifier is parsed *again* in the resolver. The output of the // parser maintains the original image. // TODO: Fix this wasted work when the JavaCC parser is gone. - auto* literal = MAKE_NODE(ASTBytesLiteral, @1); + auto* literal = MAKE_NODE(ASTBytesLiteralComponent, @1); literal->set_bytes_value(std::move(bytes)); // TODO: Migrate to absl::string_view or avoid having to // set this at all if the client isn't interested. @@ -8589,6 +8732,35 @@ bytes_literal: } ; + +// Can be a concatenation of multiple string literals +bytes_literal: + bytes_literal_component[component] + { + $$ = MAKE_NODE(ASTBytesLiteral, @$, {$component}); + $$->set_bytes_value($component->bytes_value()); + } + | bytes_literal[list] bytes_literal_component[component] + { + if (@component.start().GetByteOffset() == @list.end().GetByteOffset()) { + YYERROR_AND_ABORT_AT( + @2, + "Syntax error: concatenated bytes literals must be separated by whitespace or comments"); + } + + $$ = WithExtraChildren($list, {$component}); + // TODO: append the value in place, instead of StrCat() + // then set(). + $$->set_bytes_value( + absl::StrCat($$->bytes_value(), $2->bytes_value())); + } + | bytes_literal string_literal_component[component] + { + // Capture this case to provide a better error message + YYERROR_AND_ABORT_AT(@component, "Syntax error: string and bytes literals cannot be concatenated."); + } + ; + integer_literal: INTEGER_LITERAL { @@ -8604,11 +8776,9 @@ numeric_literal_prefix: ; numeric_literal: - numeric_literal_prefix STRING_LITERAL + numeric_literal_prefix string_literal { - auto* literal = MAKE_NODE(ASTNumericLiteral, @$); - literal->set_image(std::string(parser->GetInputText(@2))); - $$ = literal; + $$ = MAKE_NODE(ASTNumericLiteral, @$, {$2}); } ; @@ -8618,20 +8788,16 @@ bignumeric_literal_prefix: ; bignumeric_literal: - bignumeric_literal_prefix STRING_LITERAL + bignumeric_literal_prefix string_literal { - auto* literal = MAKE_NODE(ASTBigNumericLiteral, @$); - literal->set_image(std::string(parser->GetInputText(@2))); - $$ = literal; + $$ = MAKE_NODE(ASTBigNumericLiteral, @$, {$2}); } ; json_literal: - "JSON" STRING_LITERAL + "JSON" string_literal { - auto* literal = MAKE_NODE(ASTJSONLiteral, @$); - literal->set_image(std::string(parser->GetInputText(@2))); - $$ = literal; + $$ = MAKE_NODE(ASTJSONLiteral, @$, {$2}); } ; @@ -8647,7 +8813,7 @@ floating_point_literal: identifier: IDENTIFIER { - const absl::string_view identifier_text = parser->GetInputText(@1); + const absl::string_view identifier_text($1.str, $1.len); // The tokenizer rule already validates that the identifier is valid, // except for backquoted identifiers. if (identifier_text[0] == '`') { @@ -8659,7 +8825,7 @@ identifier: identifier_text, &str, &error_string, &error_offset); if (!parse_status.ok()) { auto location = @1; - location.begin.column += error_offset; + location.mutable_start().IncrementByteOffset(error_offset); if (!error_string.empty()) { YYERROR_AND_ABORT_AT(location, absl::StrCat("Syntax error: ", @@ -8696,7 +8862,7 @@ label: label_text, &str, &error_string, &error_offset); if (!parse_status.ok()) { auto location = @1; - location.begin.column += error_offset; + location.mutable_start().IncrementByteOffset(error_offset); if (!error_string.empty()) { YYERROR_AND_ABORT_AT(location, absl::StrCat("Syntax error: ", @@ -8772,6 +8938,7 @@ keyword_as_identifier: | "DEFINER" | "DELETE" | "DELETION" + | "DEPTH" | "DESCRIBE" | "DETERMINISTIC" | "DO" @@ -8854,7 +9021,7 @@ keyword_as_identifier: // we have the engine-specific root URI to use. parser->AddWarning(parser->GenerateWarningForFutureKeywordReservation( zetasql::parser::kQualify, - (@1).begin.column)); + (@1).start().GetByteOffset())); } | "RAISE" | "READ" @@ -9203,18 +9370,18 @@ insert_statement_prefix: if (element->node_kind() != zetasql::AST_PATH_EXPRESSION) { if (element->node_kind() == zetasql::AST_DEFAULT_LITERAL) { YYERROR_AND_ABORT_AT( - parser->GetBisonLocation(element->GetParseLocationRange()), + element->GetParseLocationRange(), "Syntax error: Expected column name, got keyword DEFAULT"); } YYERROR_AND_ABORT_AT( - parser->GetBisonLocation(element->GetParseLocationRange()), + element->GetParseLocationRange(), "Syntax error: Expected column name"); } auto* path_expression = element->GetAsOrDie(); if (path_expression->num_children() != 1) { YYERROR_AND_ABORT_AT( - parser->GetBisonLocation(element->GetParseLocationRange()), + element->GetParseLocationRange(), "Syntax error: Expected column name"); } column_list->AddChild(path_expression->mutable_child(0)); @@ -9225,8 +9392,7 @@ insert_statement_prefix: // first list for being correct as a column list, because we assume // that the user intended it as a VALUES list. YYERROR_AND_ABORT_AT( - parser->GetBisonLocation( - row_list->child(1)->GetParseLocationRange()), + row_list->child(1)->GetParseLocationRange(), "Syntax error: Unexpected multiple column lists"); } insert->AddChild(column_list); @@ -9880,7 +10046,6 @@ drop_statement: } ; - index_type: KW_SEARCH { $$ = IndexTypeKeywords::kSearch; } @@ -9891,24 +10056,14 @@ opt_index_type: index_type | %empty { $$ = IndexTypeKeywords::kNone; }; -non_empty_statement_list: - terminated_statement - { - $$ = MAKE_NODE(ASTStatementList, @$, {$1}); - } - | non_empty_statement_list terminated_statement - { - $$ = parser->WithEndLocation(WithExtraChildren($1, {$2}), @$); - }; - unterminated_non_empty_statement_list: - unterminated_statement + unterminated_statement[stmt] { - $$ = MAKE_NODE(ASTStatementList, @$, {$1}); + $$ = MAKE_NODE(ASTStatementList, @$, {$stmt}); } - | non_empty_statement_list unterminated_statement + | unterminated_non_empty_statement_list ';' unterminated_statement[new_stmt] { - $$ = parser->WithEndLocation(WithExtraChildren($1, {$2}), @$); + $$ = parser->WithEndLocation(WithExtraChildren($1, {$new_stmt}), @$); }; opt_execute_into_clause: @@ -9967,15 +10122,15 @@ execute_immediate: ; script: - non_empty_statement_list + unterminated_non_empty_statement_list { $1->set_variable_declarations_allowed(true); $$ = MAKE_NODE(ASTScript, @$, {$1}); } - | unterminated_non_empty_statement_list + | unterminated_non_empty_statement_list ';' { $1->set_variable_declarations_allowed(true); - $$ = MAKE_NODE(ASTScript, @$, {$1}); + $$ = MAKE_NODE(ASTScript, @$, {parser->WithEndLocation($1, @$)}); } | %empty { @@ -9987,9 +10142,9 @@ script: ; statement_list: - non_empty_statement_list + unterminated_non_empty_statement_list ';' { - $$ = $1; + $$ = parser->WithEndLocation($1, @$); } | %empty { @@ -10342,7 +10497,7 @@ next_statement_kind: // The parser will complain about the remainder of the input if we let // the tokenizer continue to produce tokens, because we don't have any // grammar for the rest of the input. - tokenizer->SetForceTerminate(); + SetForceTerminate(tokenizer, /*end_byte_offset=*/nullptr); $$ = $2; } ; @@ -10382,7 +10537,7 @@ next_statement_kind_without_hint: | next_statement_kind_parenthesized_select | "DEFINE" "TABLE" { $$ = zetasql::ASTDefineTableStatement::kConcreteNodeKind; } - | "DEFINE" "MACRO" + | "DEFINE for macros" "MACRO" { $$ = zetasql::ASTDefineMacroStatement::kConcreteNodeKind; } | "EXECUTE" "IMMEDIATE" { $$ = zetasql::ASTExecuteImmediateStatement::kConcreteNodeKind; } @@ -10444,6 +10599,7 @@ next_statement_kind_without_hint: } } | "GRANT" { $$ = zetasql::ASTGrantStatement::kConcreteNodeKind; } + | "GRAPH" { $$ = zetasql::ASTQueryStatement::kConcreteNodeKind; } | "REVOKE" { $$ = zetasql::ASTRevokeStatement::kConcreteNodeKind; } | "RENAME" { $$ = zetasql::ASTRenameStatement::kConcreteNodeKind; } | "START" { $$ = zetasql::ASTBeginStatement::kConcreteNodeKind; } @@ -10471,6 +10627,8 @@ next_statement_kind_without_hint: { $$ = zetasql::ASTAlterDatabaseStatement::kConcreteNodeKind; } | "ALTER" "SCHEMA" { $$ = zetasql::ASTAlterSchemaStatement::kConcreteNodeKind; } + | "ALTER" "EXTERNAL" "SCHEMA" + { $$ = zetasql::ASTAlterExternalSchemaStatement::kConcreteNodeKind; } | "ALTER" "TABLE" { $$ = zetasql::ASTAlterTableStatement::kConcreteNodeKind; } | "ALTER" "PRIVILEGE" @@ -10751,9 +10909,8 @@ spanner_set_on_delete_action: %% void zetasql_bison_parser::BisonParserImpl::error( - const zetasql_bison_parser::location& loc, + const zetasql::ParseLocationRange& loc, const std::string& msg) { *error_message = msg; - *error_location = zetasql::ParseLocationPoint::FromByteOffset( - parser->filename().ToStringView(), loc.begin.column); + *error_location = loc.start(); } diff --git a/zetasql/parser/deidentify.cc b/zetasql/parser/deidentify.cc index 60c1f8642..2dada2bd0 100644 --- a/zetasql/parser/deidentify.cc +++ b/zetasql/parser/deidentify.cc @@ -19,21 +19,26 @@ #include #include #include +#include #include -#include #include +#include "zetasql/parser/ast_node_kind.h" #include "zetasql/parser/parse_tree.h" #include "zetasql/parser/parser.h" #include "zetasql/parser/unparser.h" #include "zetasql/public/builtin_function_options.h" #include "zetasql/public/catalog.h" #include "zetasql/public/language_options.h" +#include "zetasql/public/options.pb.h" #include "zetasql/public/parse_resume_location.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/strings.h" #include "zetasql/public/table_valued_function.h" +#include "zetasql/public/type.pb.h" #include "zetasql/public/types/type.h" +#include "absl/container/flat_hash_map.h" +#include "absl/container/flat_hash_set.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" @@ -43,6 +48,18 @@ namespace zetasql { namespace parser { namespace { +// Returns the images of components in this literal concatenation, separated by +// a single space. +static std::string ComposeImageForLiteralConcatenation( + const ASTStringLiteral* string_literal) { + std::string composed_image; + for (const auto& component : string_literal->components()) { + absl::StrAppend(&composed_image, component->image(), " "); + } + composed_image.pop_back(); // remove trailing space + return composed_image; +} + // Deidentify SQL but identifiers and literals. // // Unfortunately, we cannot replace all literals just with ? parameters as that @@ -57,48 +74,187 @@ namespace { class DeidentifyingUnparser : public Unparser { public: DeidentifyingUnparser(Catalog& catalog, - const LanguageOptions& language_options) + const LanguageOptions& language_options, + std::set deidentified_ast_node_kinds, + std::set remapped_ast_node_kinds) : zetasql::parser::Unparser(&unparsed_output_), catalog_(catalog), - language_options_(language_options) {} + language_options_(language_options), + deidentified_ast_node_kinds_(deidentified_ast_node_kinds), + remapped_ast_node_kinds_(remapped_ast_node_kinds) {} absl::string_view GetUnparsedOutput() const { return unparsed_output_; } + absl::flat_hash_map GetRemappedIdentifiers() const { + return remapped_identifiers_; + } void ResetOutput() { unparsed_output_.clear(); } private: void visitASTIntLiteral(const ASTIntLiteral* node, void* data) override { - formatter_.Format("0"); + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + formatter_.Format("0"); + return; + } + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(node->image()); + return; + } + absl::string_view literal = node->image(); + if (ShouldRemapLiteral(literal)) { + formatter_.Format(RemapLiteral(TypeKind::TYPE_INT32, literal)); + } else { + Unparser::visitASTIntLiteral(node, data); + } } void visitASTNumericLiteral(const ASTNumericLiteral* node, void* data) override { - formatter_.Format("0"); + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + formatter_.Format("0"); + return; + } + + std::string composed_image = + ComposeImageForLiteralConcatenation(node->string_literal()); + + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(absl::StrCat("NUMERIC ", composed_image)); + return; + } + + if (ShouldRemapLiteral(composed_image)) { + formatter_.Format(RemapLiteral(TypeKind::TYPE_NUMERIC, composed_image)); + } else { + Unparser::visitASTNumericLiteral(node, data); + } } void visitASTBigNumericLiteral(const ASTBigNumericLiteral* node, void* data) override { - formatter_.Format("0"); + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + formatter_.Format("0"); + return; + } + + std::string composed_image = + ComposeImageForLiteralConcatenation(node->string_literal()); + + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(absl::StrCat("BIGNUMERIC ", composed_image)); + return; + } + if (ShouldRemapLiteral(composed_image)) { + formatter_.Format( + RemapLiteral(TypeKind::TYPE_BIGNUMERIC, composed_image)); + } else { + Unparser::visitASTBigNumericLiteral(node, data); + } } void visitASTJSONLiteral(const ASTJSONLiteral* node, void* data) override { - formatter_.Format("JSON '{\"\": null}'"); + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + formatter_.Format("JSON '{\"\": null}'"); + return; + } + + std::string composed_image = + ComposeImageForLiteralConcatenation(node->string_literal()); + + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(absl::StrCat("JSON ", composed_image)); + return; + } + + if (ShouldRemapLiteral(composed_image)) { + formatter_.Format(RemapLiteral(TypeKind::TYPE_JSON, composed_image)); + } else { + Unparser::visitASTJSONLiteral(node, data); + } } void visitASTFloatLiteral(const ASTFloatLiteral* node, void* data) override { - formatter_.Format("0.0"); + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + formatter_.Format("0.0"); + return; + } + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(node->image()); + return; + } + absl::string_view literal = node->image(); + if (ShouldRemapLiteral(literal)) { + formatter_.Format(RemapLiteral(TypeKind::TYPE_FLOAT, literal)); + } else { + Unparser::visitASTFloatLiteral(node, data); + } } + void visitASTStringLiteral(const ASTStringLiteral* node, void* data) override { - formatter_.Format("\"\""); + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + formatter_.Format("\"\""); + return; + } + + std::string composed_image = ComposeImageForLiteralConcatenation(node); + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(composed_image); + return; + } + + if (ShouldRemapLiteral(composed_image)) { + formatter_.Format(RemapLiteral(TypeKind::TYPE_STRING, composed_image)); + } else { + Unparser::visitASTStringLiteral(node, data); + } } void visitASTBytesLiteral(const ASTBytesLiteral* node, void* data) override { formatter_.Format("?"); } void visitASTDateOrTimeLiteral(const ASTDateOrTimeLiteral* node, void* data) override { - formatter_.Format("?"); + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + formatter_.Format("?"); + return; + } + std::string composed_image = + ComposeImageForLiteralConcatenation(node->string_literal()); + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(absl::StrCat( + Type::TypeKindToString(node->type_kind(), PRODUCT_INTERNAL), " ", + composed_image)); + return; + } + if (ShouldRemapLiteral(composed_image)) { + formatter_.Format(RemapLiteral(node->type_kind(), composed_image)); + } else { + Unparser::visitASTDateOrTimeLiteral(node, data); + } } void visitASTRangeLiteral(const ASTRangeLiteral* node, void* data) override { formatter_.Format("?"); } void visitASTIdentifier(const ASTIdentifier* node, void* data) override { + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + return; + } + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(node->GetAsStringView()); + return; + } if (ShouldRemapIdentifier(node->GetAsStringView())) { formatter_.Format(RemapIdentifier(node->GetAsStringView())); } else { @@ -107,6 +263,15 @@ class DeidentifyingUnparser : public Unparser { } void visitASTAlias(const ASTAlias* node, void* data) override { + if (deidentified_ast_node_kinds_.find(node->node_kind()) != + deidentified_ast_node_kinds_.end()) { + return; + } + if (remapped_ast_node_kinds_.find(node->node_kind()) == + remapped_ast_node_kinds_.end()) { + formatter_.Format(node->identifier()->GetAsStringView()); + return; + } absl::string_view identifier = node->identifier()->GetAsStringView(); if (ShouldRemapIdentifier(identifier)) { print(absl::StrCat("AS ", RemapIdentifier(identifier))); @@ -150,8 +315,23 @@ class DeidentifyingUnparser : public Unparser { return true; } + // TODO: This function should always return true, i.e., should be removed. + bool ShouldRemapLiteral(absl::string_view literal_value) { + if (zetasql::IsKeyword(literal_value)) { + return false; + } + if (language_options_.GenericEntityTypeSupported(literal_value)) { + return false; + } + if (language_options_.GenericSubEntityTypeSupported(literal_value)) { + return false; + } + + return true; + } + std::string RemapIdentifier(absl::string_view identifier) { - if (auto i = remapped_identifiers_.find(std::string(identifier)); + if (auto i = remapped_identifiers_.find(identifier); i != remapped_identifiers_.end()) { return i->second; } @@ -171,17 +351,68 @@ class DeidentifyingUnparser : public Unparser { remapped_name += "_"; } - remapped_identifiers_[std::string(identifier)] = remapped_name; + remapped_identifiers_[identifier] = remapped_name; + return remapped_name; + } + + std::string GetLiteralPrefix(TypeKind kind) { + if (kind == TypeKind::TYPE_NUMERIC) { + return "NUMERIC "; + } else if (kind == TypeKind::TYPE_BIGNUMERIC) { + return "BIGNUMERIC "; + } else if (kind == TypeKind::TYPE_JSON) { + return "JSON "; + } else if (kind == TypeKind::TYPE_DATE) { + return "DATE "; + } else if (kind == TypeKind::TYPE_TIME) { + return "TIME "; + } else if (kind == TypeKind::TYPE_DATETIME) { + return "DATETIME "; + } + return ""; + } + + std::string RemapLiteral(TypeKind kind, absl::string_view literal) { + std::string literal_prefix = GetLiteralPrefix(kind); + if (auto i = remapped_identifiers_.find(literal); + i != remapped_identifiers_.end()) { + return i->second; + } + + std::string remapped_name = ""; + size_t index = remapped_identifiers_.size(); + + while (index > 0) { + char current_digit = 'A' + (index % 26); + remapped_name += current_digit; + index /= 26; + } + std::reverse(remapped_name.begin(), remapped_name.end()); + + while (zetasql::IsKeyword(remapped_name)) { + remapped_name += "_"; + } + + remapped_identifiers_[literal_prefix.append(std::string(literal))] = + remapped_name; return remapped_name; } Catalog& catalog_; const zetasql::LanguageOptions& language_options_; std::string unparsed_output_; - std::unordered_map remapped_identifiers_; + absl::flat_hash_map remapped_identifiers_; + std::set deidentified_ast_node_kinds_; + std::set remapped_ast_node_kinds_; }; } // namespace +static const auto all_literals = new std::set{ + ASTNodeKind::AST_INT_LITERAL, ASTNodeKind::AST_FLOAT_LITERAL, + ASTNodeKind::AST_STRING_LITERAL, ASTNodeKind::AST_NUMERIC_LITERAL, + ASTNodeKind::AST_BIGNUMERIC_LITERAL, ASTNodeKind::AST_JSON_LITERAL, + ASTNodeKind::AST_DATE_OR_TIME_LITERAL}; + absl::StatusOr DeidentifySQLIdentifiersAndLiterals( absl::string_view input, const zetasql::LanguageOptions& language_options) { @@ -195,7 +426,9 @@ absl::StatusOr DeidentifySQLIdentifiersAndLiterals( SimpleCatalog catalog("allowed identifiers for deidentification"); ZETASQL_RETURN_IF_ERROR(catalog.AddBuiltinFunctionsAndTypes( zetasql::BuiltinFunctionOptions::AllReleasedFunctions())); - DeidentifyingUnparser unparser(catalog, language_options); + DeidentifyingUnparser unparser( + catalog, language_options, *all_literals, + {ASTNodeKind::AST_IDENTIFIER, ASTNodeKind::AST_ALIAS}); do { ZETASQL_RETURN_IF_ERROR(zetasql::ParseNextStatement( @@ -209,5 +442,37 @@ absl::StatusOr DeidentifySQLIdentifiersAndLiterals( return all_deidentified; } +absl::StatusOr DeidentifySQLWithMapping( + absl::string_view input, std::set deidentified_kinds, + std::set remapped_kinds, + const zetasql::LanguageOptions& language_options) { + ParserOptions parser_options(language_options); + std::unique_ptr parser_output; + zetasql::ParseResumeLocation parse_resume = + zetasql::ParseResumeLocation::FromStringView(input); + DeidentificationResult remapped_identifiers = {}; + + std::string all_deidentified; + SimpleCatalog catalog("allowed identifiers for deidentification"); + ZETASQL_RETURN_IF_ERROR(catalog.AddBuiltinFunctionsAndTypes( + zetasql::BuiltinFunctionOptions::AllReleasedFunctions())); + DeidentifyingUnparser unparser(catalog, language_options, deidentified_kinds, + remapped_kinds); + + for (bool at_end_of_input = false; !at_end_of_input;) { + ZETASQL_RETURN_IF_ERROR(zetasql::ParseNextStatement( + &parse_resume, parser_options, &parser_output, &at_end_of_input)); + parser_output->node()->Accept(&unparser, nullptr); + unparser.FlushLine(); + + absl::StrAppend(&all_deidentified, unparser.GetUnparsedOutput()); + remapped_identifiers.remappings.merge(unparser.GetRemappedIdentifiers()); + unparser.ResetOutput(); + } + + remapped_identifiers.deidentified_sql = all_deidentified; + return remapped_identifiers; +} + } // namespace parser } // namespace zetasql diff --git a/zetasql/parser/deidentify.h b/zetasql/parser/deidentify.h index 910f07283..3b70fe4b5 100644 --- a/zetasql/parser/deidentify.h +++ b/zetasql/parser/deidentify.h @@ -17,21 +17,56 @@ #ifndef ZETASQL_PARSER_DEIDENTIFY_H_ #define ZETASQL_PARSER_DEIDENTIFY_H_ +#include #include +#include "zetasql/parser/ast_node_kind.h" #include "zetasql/public/language_options.h" +#include "absl/container/flat_hash_map.h" #include "absl/status/statusor.h" +#include "absl/strings/str_format.h" +#include "absl/strings/str_join.h" #include "absl/strings/string_view.h" namespace zetasql { namespace parser { +// Result from deidentification mapping. Includes the deidentified SQL along +// with a map of anonymized identifiers and literals that can be used to rebuild +// an equivalent SQL statement to the input. +struct DeidentificationResult { + absl::flat_hash_map remappings; + std::string deidentified_sql; + + template + friend void AbslStringify(Sink& sink, const DeidentificationResult& res) { + sink.Append(absl::StrFormat( + "SQL: %s\n%s\n", res.deidentified_sql, + absl::StrJoin(res.remappings.begin(), res.remappings.end(), "\n", + absl::PairFormatter(absl::AlphaNumFormatter(), ": ", + absl::StreamFormatter())))); + } +}; + // Return cleaned SQL with comments stripped, all identifiers relabelled // consistently starting from A and then literals replaced by ? like parameters. absl::StatusOr DeidentifySQLIdentifiersAndLiterals( absl::string_view input, const zetasql::LanguageOptions& language_options = zetasql::LanguageOptions::MaximumFeatures()); + +// Return cleaned SQL with comments stripped, identifiers and literals +// relabelled based on the provided set of kinds. Updated nodes are labeled +// consistently starting from A, avoiding any keywords. +// The deidentified_kinds parameter changes any matching node to a zero, +// redacted or '?' value like parameters. The remapped_kinds will replace each +// matching identifier or literal node with a label and return the mapping in +// the result. +absl::StatusOr DeidentifySQLWithMapping( + absl::string_view input, std::set deidentified_kinds, + std::set remapped_kinds, + const zetasql::LanguageOptions& language_options = + zetasql::LanguageOptions::MaximumFeatures()); } // namespace parser } // namespace zetasql diff --git a/zetasql/parser/deidentify_test.cc b/zetasql/parser/deidentify_test.cc index 81e8beb63..361d0a3b2 100644 --- a/zetasql/parser/deidentify_test.cc +++ b/zetasql/parser/deidentify_test.cc @@ -17,6 +17,7 @@ #include "zetasql/parser/deidentify.h" #include "zetasql/base/testing/status_matchers.h" +#include "zetasql/parser/ast_node_kind.h" #include "gmock/gmock.h" #include "gtest/gtest.h" @@ -24,10 +25,15 @@ namespace zetasql { namespace parser { namespace { +using testing::FieldsAre; +using testing::Pair; +using testing::UnorderedElementsAre; +using zetasql_base::testing::IsOkAndHolds; + TEST(Deidentify, SimpleExample) { EXPECT_THAT(DeidentifySQLIdentifiersAndLiterals( "SELECT X + 1232, USER_FUNCTION(X) FROM business.financial"), - zetasql_base::testing::IsOkAndHolds( + IsOkAndHolds( R"(SELECT A + 0, B(A) @@ -39,7 +45,7 @@ FROM TEST(Deidentify, AsExample) { EXPECT_THAT(DeidentifySQLIdentifiersAndLiterals( "SELECT X AS private_name FROM business.financial"), - zetasql_base::testing::IsOkAndHolds( + IsOkAndHolds( R"(SELECT A AS B FROM @@ -162,6 +168,242 @@ TEST(Deidentify, BigExample) { )")); } +TEST(DeidentifyWithMapping, SimpleExample) { + EXPECT_THAT( + DeidentifySQLWithMapping( + "SELECT X + 1232, USER_FUNCTION(X) FROM business.financial", {}, + {ASTNodeKind::AST_IDENTIFIER, ASTNodeKind::AST_ALIAS}), + IsOkAndHolds(FieldsAre( + UnorderedElementsAre(Pair("X", "A"), Pair("USER_FUNCTION", "B"), + Pair("business", "C"), Pair("financial", "D")), + R"(SELECT + A + 1232, + B(A) +FROM + C.D +)"))); +} + +TEST(DeidentifyWithMapping, SimpleExampleWithIntLiteral) { + EXPECT_THAT( + DeidentifySQLWithMapping( + "SELECT X + 1232, USER_FUNCTION(X) FROM business.financial", {}, + {ASTNodeKind::AST_IDENTIFIER, ASTNodeKind::AST_ALIAS, + ASTNodeKind::AST_INT_LITERAL}), + IsOkAndHolds(FieldsAre( + UnorderedElementsAre(Pair("X", "A"), Pair("1232", "B"), + Pair("USER_FUNCTION", "C"), + Pair("business", "D"), Pair("financial", "E")), + R"(SELECT + A + B, + C(A) +FROM + D.E +)"))); +} + +TEST(DeidentifyWithMapping, SimpleExampleWithLiterals) { + EXPECT_THAT( + DeidentifySQLWithMapping( + "SELECT X,Y,Z FROM business.financial WHERE X=2.3 AND Y=\"FOO\" AND " + "Z=NUMERIC '-9.876e-3'", + {}, + {ASTNodeKind::AST_IDENTIFIER, ASTNodeKind::AST_ALIAS, + ASTNodeKind::AST_INT_LITERAL, ASTNodeKind::AST_STRING_LITERAL, + ASTNodeKind::AST_NUMERIC_LITERAL, ASTNodeKind::AST_FLOAT_LITERAL}), + IsOkAndHolds(FieldsAre( + UnorderedElementsAre(Pair("X", "A"), Pair("Y", "B"), Pair("Z", "C"), + Pair("business", "D"), Pair("financial", "E"), + Pair("2.3", "F"), Pair("\"FOO\"", "G"), + Pair("NUMERIC '-9.876e-3'", "H")), + R"(SELECT + A, + B, + C +FROM + D.E +WHERE + A = F AND B = G AND C = H +)"))); +} + +TEST(DeidentifyWithMapping, SimpleExampleWithComplexLiterals) { + EXPECT_THAT( + DeidentifySQLWithMapping( + "SELECT X,Y,Z FROM business.financial WHERE X=DATETIME '2014-09-27 " + "12:30:00.45' AND Y=BIGNUMERIC '12345e123' AND Z=JSON '{\"id\": 10}'", + {}, + {ASTNodeKind::AST_IDENTIFIER, ASTNodeKind::AST_ALIAS, + ASTNodeKind::AST_DATE_OR_TIME_LITERAL, + ASTNodeKind::AST_BIGNUMERIC_LITERAL, ASTNodeKind::AST_JSON_LITERAL}), + IsOkAndHolds(FieldsAre( + UnorderedElementsAre(Pair("X", "A"), Pair("Y", "B"), Pair("Z", "C"), + Pair("business", "D"), Pair("financial", "E"), + Pair("DATETIME '2014-09-27 12:30:00.45'", "F"), + Pair("BIGNUMERIC '12345e123'", "G"), + Pair("JSON '{\"id\": 10}'", "H")), + R"(SELECT + A, + B, + C +FROM + D.E +WHERE + A = F AND B = G AND C = H +)"))); +} + +TEST(DeidentifyWithMapping, BigExample) { + EXPECT_THAT( + DeidentifySQLWithMapping( + "SELECT " + "a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19," + "a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36," + "a37,a38,a39,a40,a41,a42,a43,a44,a45,a46,a47,a48,a49,a50,a51,a52,a53," + "a54,a55,a56,a57,a58,a59,a60,a61,a62,a63,a64,a65,a66,a67,a68,a69,a70," + "a71,a72,a73,a74,a75,a76,a77,a78,a79,a80,a81,a82,a83,a84,a85,a86,a87," + "a88,a89,a90,a91,a92,a93,a94,a95,a96,a97,a98,a99,a100", + {}, {ASTNodeKind::AST_IDENTIFIER, ASTNodeKind::AST_ALIAS}), + IsOkAndHolds( + FieldsAre(UnorderedElementsAre( + Pair("a11", "K"), Pair("a80", "CB"), Pair("a82", "CD"), + Pair("a8", "H"), Pair("a26", "Z"), Pair("a60", "BH"), + Pair("a7", "G"), Pair("a37", "AK"), Pair("a9", "I"), + Pair("a72", "BT"), Pair("a51", "AY"), Pair("a30", "AD"), + Pair("a40", "AN"), Pair("a33", "AG"), Pair("a57", "BE"), + Pair("a75", "BW"), Pair("a53", "BA"), Pair("a54", "BB"), + Pair("a31", "AE"), Pair("a27", "AA"), Pair("a36", "AJ"), + Pair("a79", "CA"), Pair("a43", "AQ"), Pair("a67", "BO"), + Pair("a63", "BK"), Pair("a45", "AS_"), + Pair("a62", "BJ"), Pair("a3", "C"), Pair("a2", "B"), + Pair("a97", "CS"), Pair("a76", "BX"), Pair("a23", "W"), + Pair("a10", "J"), Pair("a78", "BZ"), Pair("a91", "CM"), + Pair("a74", "BV"), Pair("a21", "U"), Pair("a18", "R"), + Pair("a87", "CI"), Pair("a16", "P"), Pair("a55", "BC"), + Pair("a90", "CL"), Pair("a68", "BP"), Pair("a44", "AR"), + Pair("a52", "AZ"), Pair("a49", "AW"), Pair("a70", "BR"), + Pair("a73", "BU"), Pair("a29", "AC"), Pair("a19", "S"), + Pair("a47", "AU"), Pair("a28", "AB"), Pair("a14", "N"), + Pair("a69", "BQ"), Pair("a50", "AX"), Pair("a95", "CQ"), + Pair("a5", "E"), Pair("a77", "BY_"), Pair("a39", "AM"), + Pair("a1", "A"), Pair("a85", "CG"), Pair("a15", "O"), + Pair("a92", "CN"), Pair("a93", "CO"), Pair("a66", "BN"), + Pair("a38", "AL"), Pair("a58", "BF"), Pair("a12", "L"), + Pair("a13", "M"), Pair("a65", "BM"), Pair("a59", "BG"), + Pair("a4", "D"), Pair("a46", "AT_"), Pair("a34", "AH"), + Pair("a48", "AV"), Pair("a71", "BS"), Pair("a32", "AF"), + Pair("a99", "CU"), Pair("a24", "X"), Pair("a100", "CV"), + Pair("a56", "BD"), Pair("a17", "Q"), Pair("a81", "CC"), + Pair("a94", "CP"), Pair("a35", "AI"), Pair("a41", "AO"), + Pair("a22", "V"), Pair("a25", "Y"), Pair("a96", "CR"), + Pair("a6", "F"), Pair("a88", "CJ"), Pair("a89", "CK"), + Pair("a61", "BI"), Pair("a86", "CH"), Pair("a98", "CT"), + Pair("a20", "T"), Pair("a42", "AP"), Pair("a64", "BL"), + Pair("a83", "CE"), Pair("a84", "CF")), + R"(SELECT + A, + B, + C, + D, + E, + F, + G, + H, + I, + J, + K, + L, + M, + N, + O, + P, + Q, + R, + S, + T, + U, + V, + W, + X, + Y, + Z, + AA, + AB, + AC, + AD, + AE, + AF, + AG, + AH, + AI, + AJ, + AK, + AL, + AM, + AN, + AO, + AP, + AQ, + AR, + AS_, + AT_, + AU, + AV, + AW, + AX, + AY, + AZ, + BA, + BB, + BC, + BD, + BE, + BF, + BG, + BH, + BI, + BJ, + BK, + BL, + BM, + BN, + BO, + BP, + BQ, + BR, + BS, + BT, + BU, + BV, + BW, + BX, + BY_, + BZ, + CA, + CB, + CC, + CD, + CE, + CF, + CG, + CH, + CI, + CJ, + CK, + CL, + CM, + CN, + CO, + CP, + CQ, + CR, + CS, + CT, + CU, + CV +)"))); +} + } // namespace } // namespace parser } // namespace zetasql diff --git a/zetasql/parser/flex_tokenizer.cc b/zetasql/parser/flex_tokenizer.cc index d348368da..d6a5deeaf 100644 --- a/zetasql/parser/flex_tokenizer.cc +++ b/zetasql/parser/flex_tokenizer.cc @@ -16,252 +16,67 @@ #include "zetasql/parser/flex_tokenizer.h" +#include #include #include #include #include "zetasql/common/errors.h" -#include "zetasql/parser/bison_parser.bison.h" +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/bison_token_codes.h" #include "zetasql/parser/flex_istream.h" #include "zetasql/parser/keywords.h" -#include "zetasql/parser/location.hh" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/language_options.h" #include "zetasql/public/parse_location.h" #include "absl/flags/flag.h" +#include "zetasql/base/check.h" +#include "absl/status/status.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/status_macros.h" +#ifndef ZETASQL_TOKEN_DISMABIGUATOR // TODO: The end state is to turn on everywhere and remove this // flag. Before that, we'll turn on this feature in test environment and soak // for a while. Then roll out to Evenflow prod instances and eventually // deprecate this flag. ABSL_FLAG(bool, zetasql_use_customized_flex_istream, true, "If true, use customized StringStreamWithSentinel to read input."); +#endif namespace zetasql { namespace parser { // Include the helpful type aliases in the namespace within the C++ file so // that they are useful for free helper functions as well as class member // functions. -using Token = zetasql_bison_parser::BisonParserImpl::token; +using Token = TokenKinds; using TokenKind = int; -using Location = zetasql_bison_parser::location; - -static bool IsReservedKeywordToken(TokenKind token) { - // We need to add sentinels before and after each block of keywords to make - // this safe. - return token > Token::SENTINEL_RESERVED_KW_START && - token < Token::SENTINEL_RESERVED_KW_END; -} - -static bool IsNonreservedKeywordToken(TokenKind token) { - // We need to add sentinels before and after each block of keywords to make - // this safe. - return token > Token::SENTINEL_NONRESERVED_KW_START && - token < Token::SENTINEL_NONRESERVED_KW_END; -} - -int ZetaSqlFlexTokenizer::GetNextToken(Location* location) { - return GetNextTokenFlex(location); -} - -// The token disambiguation rules are allowed to see a fixed-length sequence of -// tokens produced by the lexical rules in flex_tokenzer.h and may change the -// kind of `token` based on the kinds of the other tokens in the window. -// -// For now, the window available is: -// [prev_dispensed_token_, token, Lookahead1()] -// -// `prev_dispensed_token_` is the token most recently dispensed to the consuming -// component (usually the parser). -// `token` is the token that is about to be dispensed to the consuming -// component. -// `Lookahead1()` is the next token that will be disambiguated on the subsequent -// call to GetNextToken. -// -// USE WITH CAUTION: -// For any given sequence of tokens, there may be many different shift/reduce -// sequences in the parser that "accept" that token sequence. It's critical -// when adding a token disambiguation rule that all parts of the grammar that -// accept the sequence of tokens are identified to verify that changing the kind -// of `token` does not break any unanticipated cases where that sequence would -// currently be accepted. -TokenKind ZetaSqlFlexTokenizer::ApplyTokenDisambiguation( - TokenKind token, const Location& location) { - switch (mode_) { - case BisonParserMode::kTokenizer: - case BisonParserMode::kTokenizerPreserveComments: - // Tokenizer modes are used to extract tokens for error messages among - // other things. The rules below are mostly intended to support the bison - // parser, and aren't necessary in tokenizer mode. - return token; - case BisonParserMode::kMacroBody: - switch (token) { - case ';': - case Token::YYEOF: - return token; - default: - return Token::MACRO_BODY_TOKEN; - } - default: - break; - } - - switch (prev_dispensed_token_) { - case '@': - case Token::KW_DOUBLE_AT: - // The only place in the grammar that '@' or KW_DOUBLE_AT appear is in - // rules for query parameters and system variables respectively. And - // macros rules, but those accept all tokens. For both query parameters - // and system variables, the following token is always treated as an - // identifier even if it is otherwise a reserved keyword. - // - // This rule results in minor improvements to both the implemented parser - // (the generated code) in terms of number of states and the grammar file - // by eliminating a few lines. The significant improvement is greatly - // reducing the complexity of analyzing subsequent rules added here. - // Without this rule, any reserved keyword can be the last token in an - // expression and can thus be followed by many potential sequences of - // reduces. Even though that analysis can only be reliably done by a tool, - // reducing the number of paths the tool needs to explore reduces the tool - // runtime from hours to minutes. - // - // The negative side of this rule, like similar rules embedded in the - // lexer's regexps, is that it effectively means `@` or `@@` operator - // followed by a keyword will interpret the keyword as an identifier in - // *all contexts*. That prevents using `@` or `@@` as other operators. - // - // It looks like we intended to support `SELECT @param_name`, but - // accidentally supported `SELECT @ param_name` as well. If we deprecate - // and remove the ability to put whitespace in a parameter reference, then - // we should probably remove this rule and instead change the lexer to - // directly produce tokens for parameter and system variable names - // directly. - // TODO: Remove this rule and add lex parameter and system - // variable references as their own token kinds. - if (IsReservedKeywordToken(token) || IsNonreservedKeywordToken(token)) { - // The dot-identifier mini-parser is triggered based on the value - // of prev_flex_token_ being identifier or non-reserved keyword. We - // don't want to update prev_flex_token_ from this code since it might - // actually reflect the lookahead token and not the current token (for - // the case where the lookahead is populated). The long term healthy - // solution for this is to pull the lexer modes forward into this layer, - // but for now we engage directly with them. - if (IsReservedKeywordToken(token) && Lookahead1(location) == '.') { - yy_push_state(/*DOT_IDENTIFIER*/ 1); - } - return Token::IDENTIFIER; - } - break; - default: - break; +using Location = ParseLocationRange; +using TokenWithLocation = macros::TokenWithLocation; + +absl::StatusOr ZetaSqlFlexTokenizer::GetNextToken( + Location* location) { + int token = GetNextTokenFlexImpl(location); + if (!override_error_.ok()) { + return override_error_; } - - switch (token) { - case Token::KW_NOT: - // This returns a different token because returning KW_NOT would confuse - // the operator precedence parsing. Boolean NOT has a different - // precedence than NOT BETWEEN/IN/LIKE/DISTINCT. - switch (Lookahead1(location)) { - case Token::KW_BETWEEN: - case Token::KW_IN: - case Token::KW_LIKE: - case Token::KW_DISTINCT: - return Token::KW_NOT_SPECIAL; - default: - break; - } - break; - case Token::KW_WITH: - // The WITH expression uses a function-call like syntax and is followed by - // the open parenthesis. - if (Lookahead1(location) == '(') { - return Token::KW_WITH_STARTING_WITH_EXPRESSION; - } - break; - case Token::KW_EXCEPT: - // EXCEPT is used in two locations of the language. And when the parser is - // exploding the rules it detects that two rules can be used for the same - // syntax. - // - // This rule generates a special token for an EXCEPT that is followed by a - // hint, ALL or DISTINCT which is distinctly the set operator use. - switch (Lookahead1(location)) { - case '(': - // This is the SELECT * EXCEPT (column...) case. - return Token::KW_EXCEPT; - case Token::KW_ALL: - case Token::KW_DISTINCT: - case Token::KW_OPEN_HINT: - case Token::KW_OPEN_INTEGER_HINT: - // This is the {query} EXCEPT {opt_hint} ALL|DISTINCT {query} case. - return Token::KW_EXCEPT_IN_SET_OP; - default: - SetOverrideError( - location, "EXCEPT must be followed by ALL, DISTINCT, or \"(\""); - break; - } - break; - default: - break; - } - + num_lexical_tokens_++; + prev_flex_token_ = token; return token; } -int ZetaSqlFlexTokenizer::Lookahead1(const Location& current_token_location) { - if (!lookahead_1_.has_value()) { - lookahead_1_ = {.token = 0, .token_location = current_token_location}; - prev_flex_token_ = GetNextTokenFlexImpl(&lookahead_1_->token_location); - lookahead_1_->token = prev_flex_token_; - } - return lookahead_1_->token; -} - -// Returns the next token id, returning its location in 'yylloc'. On input, -// 'yylloc' must be the location of the previous token that was returned. -int ZetaSqlFlexTokenizer::GetNextTokenFlex(Location* yylloc) { - TokenKind token = 0; - if (lookahead_1_.has_value()) { - // Get the next token from the lookahead buffer and advance the buffer. If - // force_terminate_ was set, we still need the location from the buffer, - // with Token::YYEOF as the token. - token = force_terminate_ ? Token::YYEOF : lookahead_1_->token; - *yylloc = lookahead_1_->token_location; - lookahead_1_.reset(); - } else { - // The lookahead buffer is empty, so get a token from the underlying lexer. - prev_flex_token_ = GetNextTokenFlexImpl(yylloc); - token = prev_flex_token_; - } - token = ApplyTokenDisambiguation(token, *yylloc); - if (override_error_.ok()) { - num_lexical_tokens_++; - } - prev_dispensed_token_ = token; - return token; +static absl::Status MakeError(absl::string_view error_message, + const Location& yylloc) { + return MakeSqlErrorAtPoint(yylloc.start()) << error_message; } absl::Status ZetaSqlFlexTokenizer::GetNextToken(ParseLocationRange* location, TokenKind* token) { - Location bison_location; - bison_location.begin.column = location->start().GetByteOffset(); - bison_location.end.column = location->end().GetByteOffset(); - *token = GetNextTokenFlex(&bison_location); - location->set_start( - ParseLocationPoint::FromByteOffset(filename_, - bison_location.begin.column)); - location->set_end( - ParseLocationPoint::FromByteOffset(filename_, - bison_location.end.column)); + ZETASQL_ASSIGN_OR_RETURN(*token, GetNextToken(location)); return override_error_; } -void ZetaSqlFlexTokenizer::SetForceTerminate() { - force_terminate_ = true; - // Ensure that the lookahead buffer immediately reflects the termination. - if (lookahead_1_.has_value()) { - lookahead_1_->token = Token::YYEOF; - } -} - bool ZetaSqlFlexTokenizer::IsDotGeneralizedIdentifierPrefixToken( TokenKind bison_token) const { if (bison_token == Token::IDENTIFIER || bison_token == ')' || @@ -315,13 +130,41 @@ int ZetaSqlFlexTokenizer::GetIdentifierLength(absl::string_view text) { return static_cast(text.size()); } +// Returns true if the given parser mode requires a start token. +// Any new mode should have its own start token. +static bool ModeRequiresStartToken(BisonParserMode mode) { + switch (mode) { + case BisonParserMode::kStatement: + case BisonParserMode::kScript: + case BisonParserMode::kNextStatement: + case BisonParserMode::kNextScriptStatement: + case BisonParserMode::kNextStatementKind: + case BisonParserMode::kExpression: + case BisonParserMode::kType: + return true; + case BisonParserMode::kTokenizer: + case BisonParserMode::kTokenizerPreserveComments: + case BisonParserMode::kMacroBody: + return false; + } +} + +static bool ShouldTerminateAfterNextStatement(BisonParserMode mode) { + return mode == BisonParserMode::kNextStatement || + mode == BisonParserMode::kNextScriptStatement || + mode == BisonParserMode::kNextStatementKind; +} + ZetaSqlFlexTokenizer::ZetaSqlFlexTokenizer( BisonParserMode mode, absl::string_view filename, absl::string_view input, int start_offset, const LanguageOptions& language_options) : filename_(filename), + input_(input), start_offset_(start_offset), input_size_(static_cast(input.size())), - mode_(mode), + generate_custom_mode_start_token_(ModeRequiresStartToken(mode)), + terminate_after_statement_(ShouldTerminateAfterNextStatement(mode)), + preserve_comments_(mode == BisonParserMode::kTokenizerPreserveComments), language_options_(language_options) { if (absl::GetFlag(FLAGS_zetasql_use_customized_flex_istream)) { input_stream_ = std::make_unique(input); @@ -338,9 +181,7 @@ ZetaSqlFlexTokenizer::ZetaSqlFlexTokenizer( void ZetaSqlFlexTokenizer::SetOverrideError(const Location& yylloc, absl::string_view error_message) { - override_error_ = MakeSqlErrorAtPoint(ParseLocationPoint::FromByteOffset( - filename_, yylloc.begin.column)) - << error_message; + override_error_ = MakeError(error_message, yylloc); } void ZetaSqlFlexTokenizer::LexerError(const char* msg) { @@ -351,15 +192,9 @@ bool ZetaSqlFlexTokenizer::AreMacrosEnabled() const { return language_options_.LanguageFeatureEnabled(FEATURE_V_1_4_SQL_MACROS); } -void ZetaSqlFlexTokenizer::PushBisonParserMode(BisonParserMode mode) { - restore_modes_.push(mode_); - mode_ = mode; -} - -void ZetaSqlFlexTokenizer::PopBisonParserMode() { - ABSL_DCHECK(!restore_modes_.empty()); - mode_ = restore_modes_.top(); - restore_modes_.pop(); +bool ZetaSqlFlexTokenizer::EnforceStrictMacros() const { + return language_options_.LanguageFeatureEnabled( + FEATURE_V_1_4_ENFORCE_STRICT_MACROS); } bool ZetaSqlFlexTokenizer::AreAlterArrayOptionsEnabled() const { diff --git a/zetasql/parser/flex_tokenizer.h b/zetasql/parser/flex_tokenizer.h index 25052643d..6414777ce 100644 --- a/zetasql/parser/flex_tokenizer.h +++ b/zetasql/parser/flex_tokenizer.h @@ -20,9 +20,8 @@ #include #include #include -#include -#include -#include + +#include "absl/status/statusor.h" // Some contortions to avoid duplicate inclusion of FlexLexer.h in the // generated flex_tokenizer.flex.cc. @@ -31,8 +30,6 @@ #include #include "zetasql/parser/bison_parser_mode.h" -#include "zetasql/parser/location.hh" -#include "zetasql/parser/tokenizer.h" #include "zetasql/public/language_options.h" #include "zetasql/public/parse_location.h" #include "absl/flags/declare.h" @@ -45,9 +42,12 @@ namespace zetasql { namespace parser { // Flex-based tokenizer for the ZetaSQL Bison parser. -class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, - public Tokenizer { +class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase { public: + // Type aliases to improve readability of API. + using Location = ParseLocationRange; + using TokenKind = int; + // Constructs a simple wrapper around a flex generated tokenizer. 'mode' // controls the first token that is returned to the bison parser, which // determines the starting production used by the parser. @@ -59,17 +59,7 @@ class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, ZetaSqlFlexTokenizer(const ZetaSqlFlexTokenizer&) = delete; ZetaSqlFlexTokenizer& operator=(const ZetaSqlFlexTokenizer&) = delete; - std::unique_ptr GetNewInstance( - absl::string_view filename, absl::string_view input) const override { - return std::make_unique( - mode_, filename, input, start_offset_, language_options_); - } - - int GetNextToken(Location* location) override; - - // Returns the next token id, returning its location in 'yylloc'. On input, - // 'yylloc' must be the location of the previous token that was returned. - TokenKind GetNextTokenFlex(Location* yylloc); + absl::StatusOr GetNextToken(Location* location); // This is the "nice" API for the tokenizer, to be used by GetParseTokens(). // On input, 'location' must be the location of the previous token that was @@ -77,32 +67,16 @@ class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, // in 'location'. Returns an error if the tokenizer sets override_error. absl::Status GetNextToken(ParseLocationRange* location, TokenKind* token); - // Returns a non-OK error status if the tokenizer encountered an error. This - // error takes priority over a parser error, because the parser error is - // always a consequence of the tokenizer error. - absl::Status GetOverrideError() const override { return override_error_; } - - // Ensures that the next token returned will be EOF, even if we're not at the - // end of the input. - void SetForceTerminate(); - - // Some sorts of statements need to change the mode after the parser consumes - // the preamble of the statement. DEFINE MACRO is an example, it wants to - // consume the macro body as raw tokens. - void PushBisonParserMode(BisonParserMode mode); - // Restore the BisonParserMode to its value before the previous Push. - void PopBisonParserMode(); - // Helper function for determining if the given 'bison_token' followed by "." // should trigger the generalized identifier tokenizer mode. bool IsDotGeneralizedIdentifierPrefixToken(TokenKind bison_token) const; int64_t num_lexical_tokens() const { return num_lexical_tokens_; } - private: - // This friend is used by the unit test to help test internals. - friend class TokenTestThief; + absl::string_view filename() const { return filename_; } + absl::string_view input() const { return input_; } + private: void SetOverrideError(const Location& yylloc, absl::string_view error_message); @@ -111,18 +85,6 @@ class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, // Returns the next token id, returning its location in 'yylloc'. TokenKind GetNextTokenFlexImpl(Location* yylloc); - // If the N+1 token is already buffered we simply return the token value from - // the buffer. Otherwise we read the next token from `GetNextTokenFlexImpl` - // and put it in the lookahead buffer before returning it. - int Lookahead1(const Location& current_token_location); - - // Applies a set of rules based on previous and successive token kinds and if - // any rule matches, returns the token kind specified by the rule. Otherwise - // when no rule matches, returns `token`. `location` is used when requesting - // Lookahead tokens and also to generate error messages for - // `SetOverrideError`. - TokenKind ApplyTokenDisambiguation(TokenKind token, const Location& location); - // This is called by flex when it is wedged. void LexerError(const char* msg) override; @@ -134,6 +96,7 @@ class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, bool IsReservedKeyword(absl::string_view text) const; bool AreMacrosEnabled() const; + bool EnforceStrictMacros() const; bool AreAlterArrayOptionsEnabled() const; @@ -160,13 +123,14 @@ class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, // The kind of the most recently generated token from the flex layer. TokenKind prev_flex_token_ = 0; - // The kind of the most recently dispensed token to the consuming component - // (usually the last token dispensed to the parser). - TokenKind prev_dispensed_token_ = 0; // The (optional) filename from which the statement is being parsed. absl::string_view filename_; + // The input where we are tokenizing from. Note that if `start_offset_` is + // not zero, tokenization starts from the specified offset. + absl::string_view input_; + // The offset in the input of the first byte that is tokenized. This is used // to determine the returned location for the first token. const int start_offset_ = 0; @@ -177,20 +141,22 @@ class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, // sentinel. std::unique_ptr input_stream_; - // This determines the first token returned to the bison parser, which - // determines the mode that we'll run in. - BisonParserMode mode_; - std::stack restore_modes_; + // When true, returns CUSTOM_MODE_START as the first token before working on + // the input. + bool generate_custom_mode_start_token_; - // The tokenizer may want to return an error directly. It does this by - // returning EOF to the bison parser, which then may or may not spew out its - // own error message. The BisonParser wrapper then grabs the error from the - // tokenizer instead. - absl::Status override_error_; + // If set, the lexer terminates if it encounters a semicolon, instead of + // continuing into the next statement. + bool terminate_after_statement_; + + // If set, comments are preserved. Used only in raw tokenization for the + // formatter. + bool preserve_comments_; - // If this is set to true, the next token returned will be EOF, even if we're - // not at the end of the input. - bool force_terminate_ = false; + // The Flex-generated tokenizer does not work with absl::StatusOr, so it + // stores the error in this field. GetNextToken() grabs the status from here + // when returning the result. + absl::Status override_error_; // LanguageOptions passed in from parser, used to decide if reservable // keywords are reserved or not. @@ -198,19 +164,6 @@ class ZetaSqlFlexTokenizer final : public ZetaSqlFlexTokenizerBase, // Count of lexical tokens returned int64_t num_lexical_tokens_ = 0; - - // The lookahead_N_ fields implement the token lookahead buffer. There are a - // fixed number of fields here, each represented by an optional, rather than a - // deque or vector because, ideally we only do token disambiguation on small - // windows (e.g. no more than two or three lookaheads). - - // A token in the lookahead buffer. - struct TokenInfo { - TokenKind token; - Location token_location; - }; - // The lookahead buffer slot for token N+1. - std::optional lookahead_1_; }; } // namespace parser diff --git a/zetasql/parser/flex_tokenizer.l b/zetasql/parser/flex_tokenizer.l index b3d1ae181..3c7c7ab4d 100644 --- a/zetasql/parser/flex_tokenizer.l +++ b/zetasql/parser/flex_tokenizer.l @@ -95,7 +95,7 @@ using namespace zetasql::parser_internal; #undef YY_DECL #define YY_DECL \ int zetasql::parser::ZetaSqlFlexTokenizer::GetNextTokenFlexImpl( \ - zetasql_bison_parser::location* yylloc) + Location* yylloc) // This action is executed for every token that is matched, before the defined // actions are executed. We use this to: @@ -105,10 +105,9 @@ using namespace zetasql::parser_internal; // the parsing process when it has read enough of a prefix, e.g. for // multi-statement parsing or for determining the next statement kind. #define YY_USER_ACTION \ - if (force_terminate_) yyterminate(); \ /* Note that we store byte offsets in the 'column' field. */ \ - yylloc->begin.column = yylloc->end.column; \ - yylloc->end.column += yyleng; + yylloc->set_start(ParseLocationPoint::FromByteOffset(filename_, yylloc->end().GetByteOffset())); \ + yylloc->set_end(ParseLocationPoint::FromByteOffset(filename_, yylloc->end().GetByteOffset() + yyleng)); // Call this in an action to return only a prefix of the match of // 'prefix_length' bytes. @@ -116,7 +115,7 @@ using namespace zetasql::parser_internal; do { \ const int prefix_length_result = (prefix_length); \ yyless(prefix_length_result); \ - yylloc->end.column = yylloc->begin.column + prefix_length_result; \ + yylloc->mutable_end().SetByteOffset(yylloc->start().GetByteOffset() + prefix_length_result); \ } while (0) constexpr char ::zetasql::parser::ZetaSqlFlexTokenizer::kEofSentinelInput[]; @@ -318,31 +317,13 @@ comment ({cs_comment}|{dash_comment}|{pound_comment}) yy_push_state(STACK_BOTTOM); yy_push_state(INITIAL); // Note that we store byte offsets in the 'column' field. - yylloc->begin.column = yylloc->end.column = start_offset_; - switch (mode_) { - case BisonParserMode::kStatement: - return BisonParserImpl::token::MODE_STATEMENT; - case BisonParserMode::kScript: - return BisonParserImpl::token::MODE_SCRIPT; - case BisonParserMode::kNextStatement: - return BisonParserImpl::token::MODE_NEXT_STATEMENT; - case BisonParserMode::kNextScriptStatement: - return BisonParserImpl::token::MODE_NEXT_SCRIPT_STATEMENT; - case BisonParserMode::kNextStatementKind: - return BisonParserImpl::token::MODE_NEXT_STATEMENT_KIND; - case BisonParserMode::kExpression: - return BisonParserImpl::token::MODE_EXPRESSION; - case BisonParserMode::kType: - return BisonParserImpl::token::MODE_TYPE; - case BisonParserMode::kMacroBody: - case BisonParserMode::kTokenizer: - case BisonParserMode::kTokenizerPreserveComments: - // Don't generate a mode token when we are doing raw tokenization. - // With or without comments. - break; + yylloc->mutable_start().SetByteOffset(start_offset_); + yylloc->mutable_end().SetByteOffset(start_offset_); + if (generate_custom_mode_start_token_) { + return BisonParserImpl::token::CUSTOM_MODE_START; } } - yylloc->begin = yylloc->end; + yylloc->set_start(yylloc->end()); %} /* IMPORTANT: This rule must come before keywords, since it conditionally @@ -396,11 +377,6 @@ and { if (YY_START == IN_BETWEEN) { // See IN_BETWEEN tokenizer mode description. yy_pop_state(); - if (mode_ == BisonParserMode::kTokenizer || - mode_ == BisonParserMode::kTokenizerPreserveComments || - mode_ == BisonParserMode::kMacroBody) { - return BisonParserImpl::token::KW_AND; - } return BisonParserImpl::token::KW_AND_FOR_BETWEEN; } return BisonParserImpl::token::KW_AND; @@ -462,6 +438,7 @@ define { return BisonParserImpl::token::KW_DEFINE; } definer { return BisonParserImpl::token::KW_DEFINER; } delete { return BisonParserImpl::token::KW_DELETE; } deletion { return BisonParserImpl::token::KW_DELETION; } +depth { return BisonParserImpl::token::KW_DEPTH; } desc { return BisonParserImpl::token::KW_DESC; } descriptor { return BisonParserImpl::token::KW_DESCRIPTOR; } describe { return BisonParserImpl::token::KW_DESCRIBE; } @@ -497,12 +474,6 @@ for { return BisonParserImpl::token::KW_FOR; } foreign { return BisonParserImpl::token::KW_FOREIGN; } format { return BisonParserImpl::token::KW_FORMAT; } from { return BisonParserImpl::token::KW_FROM; } -full/{whitespace}(outer{whitespace})?(union|intersect|except) { - if (mode_ == BisonParserMode::kTokenizer) { - return BisonParserImpl::token::KW_FULL; - } - return BisonParserImpl::token::KW_FULL_IN_SET_OP; -} full { return BisonParserImpl::token::KW_FULL; } function { return BisonParserImpl::token::KW_FUNCTION; } generated { return BisonParserImpl::token::KW_GENERATED; } @@ -543,9 +514,6 @@ last { return BisonParserImpl::token::KW_LAST; } lateral { return BisonParserImpl::token::KW_LATERAL; } leave { return BisonParserImpl::token::KW_LEAVE; } left/{whitespace}(outer{whitespace})?(union|intersect|except) { - if (mode_ == BisonParserMode::kTokenizer) { - return BisonParserImpl::token::KW_LEFT; - } return BisonParserImpl::token::KW_LEFT_IN_SET_OP; } left { return BisonParserImpl::token::KW_LEFT; } @@ -602,9 +570,7 @@ project { return BisonParserImpl::token::KW_PROJECT; } proto { return BisonParserImpl::token::KW_PROTO; } public { return BisonParserImpl::token::KW_PUBLIC; } qualify { - return IsReservedKeyword("QUALIFY") ? - BisonParserImpl::token::KW_QUALIFY_RESERVED : - BisonParserImpl::token::KW_QUALIFY_NONRESERVED; + return BisonParserImpl::token::KW_QUALIFY_NONRESERVED; } raise { return BisonParserImpl::token::KW_RAISE; } range { return BisonParserImpl::token::KW_RANGE; } @@ -762,17 +728,31 @@ zone { return BisonParserImpl::token::KW_ZONE; } instance, 123abc should be error, and we don't want it to be parsed as 123 [AS] abc. */ {decimal_digits}[A-Z_] { - yylloc->begin.column += YYLeng() - 1; - SetOverrideError( - *yylloc, "Syntax error: Missing whitespace between literal and alias"); - yyterminate(); + if (AreMacrosEnabled() && !EnforceStrictMacros()) { + // Lenient macros: instead of error, try again in state. + yy_push_state(DOT_IDENTIFIER); + SET_RETURN_PREFIX_LENGTH(0); + } else { + yylloc->mutable_start().IncrementByteOffset(YYLeng() - 1); + SetOverrideError( + *yylloc, "Syntax error: Missing whitespace between literal and alias"); + yyterminate(); + } } + {hex_integer}[G-Z_] { - yylloc->begin.column += YYLeng() - 1; - SetOverrideError( - *yylloc, "Syntax error: Missing whitespace between literal and alias"); - yyterminate(); + if (AreMacrosEnabled() && !EnforceStrictMacros()) { + // Lenient macros: instead of error, try again in state. + yy_push_state(DOT_IDENTIFIER); + SET_RETURN_PREFIX_LENGTH(0); + } else { + yylloc->mutable_start().IncrementByteOffset(YYLeng() - 1); + SetOverrideError( + *yylloc, "Syntax error: Missing whitespace between literal and alias"); + yyterminate(); + } } + {floating_point_literal}/[A-Z_] { // If the floating point literal starts with a ".", and the preceding token // is an identifier or unreserved keyword, then we should tokenize this @@ -783,23 +763,36 @@ zone { return BisonParserImpl::token::KW_ZONE; } SET_RETURN_PREFIX_LENGTH(1); return '.'; } - // Trigger the missing-whitespace error, but only if the floating point - // literal ends in a digit, e.g. "123.456abc". (Note that this rule only - // matches the floating point literal itself, so the last character in - // YYText() is the last character in {floating_point_literal}. We don't - // trigger the missing whitespace error for cases that don't end in a digit, - // e.g. "123.abc". It's a case that is less likely to be an error, and the - // JavaCC parser doesn't trigger the missing whitespace warning in this case - // either. - // TODO: Consider making this an error too. It's not that likely to - // be correct either. - if (isdigit(YYText()[YYLeng() - 1])) { - yylloc->begin.column += YYLeng(); - SetOverrideError( - *yylloc, "Syntax error: Missing whitespace between literal and alias"); - yyterminate(); + + if (AreMacrosEnabled() && !EnforceStrictMacros()) { + // Lenient macros: instead of error, try again in state. + // But return the leading dot separately if starting with a dot. + yy_push_state(DOT_IDENTIFIER); + if (YYText()[0] == '.') { + SET_RETURN_PREFIX_LENGTH(1); + return '.'; + } + + SET_RETURN_PREFIX_LENGTH(0); + } else { + // Trigger the missing-whitespace error, but only if the floating point + // literal ends in a digit, e.g. "123.456abc". (Note that this rule only + // matches the floating point literal itself, so the last character in + // YYText() is the last character in {floating_point_literal}. We don't + // trigger the missing whitespace error for cases that don't end in a digit, + // e.g. "123.abc". It's a case that is less likely to be an error, and the + // JavaCC parser doesn't trigger the missing whitespace warning in this case + // either. + // TODO: Consider making this an error too. It's not that likely to + // be correct either. + if (isdigit(YYText()[YYLeng() - 1])) { + yylloc->mutable_start().IncrementByteOffset(YYLeng()); + SetOverrideError( + *yylloc, "Syntax error: Missing whitespace between literal and alias"); + yyterminate(); + } + return BisonParserImpl::token::FLOATING_POINT_LITERAL; } - return BisonParserImpl::token::FLOATING_POINT_LITERAL; } {decimal_digits} { return BisonParserImpl::token::INTEGER_LITERAL; } @@ -863,13 +856,6 @@ zone { return BisonParserImpl::token::KW_ZONE; } return YYText()[0]; } -"."{opt_whitespace}"*" { - if (mode_ == BisonParserMode::kTokenizerPreserveComments) { - SET_RETURN_PREFIX_LENGTH(1); - return '.'; - } - return BisonParserImpl::token::KW_DOT_STAR; -} "*" { return '*'; } "," { return ','; } @@ -943,21 +929,20 @@ zone { return BisonParserImpl::token::KW_ZONE; } "?" { return '?'; } "!" { return '!'; } "%" { return '%'; } -"@"{opt_whitespace}"{" { - // "{" needs to suspend special modes such as IN_BETWEEN. This is popped - // again in the "}" rule. - if (mode_ == BisonParserMode::kTokenizerPreserveComments) { - SET_RETURN_PREFIX_LENGTH(1); - return '@'; - } - yy_push_state(INITIAL); +"@"/{opt_whitespace}"{" { return BisonParserImpl::token::KW_OPEN_HINT; } "@"/{opt_whitespace}({decimal_digits}|{hex_integer}) { return BisonParserImpl::token::KW_OPEN_INTEGER_HINT; } -"@" { return '@'; } -"@@" { return BisonParserImpl::token::KW_DOUBLE_AT; } +"@" { + yy_push_state(DOT_IDENTIFIER); + return '@'; +} +"@@" { + yy_push_state(DOT_IDENTIFIER); + return BisonParserImpl::token::KW_DOUBLE_AT; +} "." { if (IsDotGeneralizedIdentifierPrefixToken(prev_flex_token_) ) { // When an identifier or unreserved keyword is followed by a dot, always @@ -969,22 +954,17 @@ zone { return BisonParserImpl::token::KW_ZONE; } } ":" { return ':'; } - -";"{opt_whitespace} { - if (mode_ == BisonParserMode::kTokenizerPreserveComments) { - SET_RETURN_PREFIX_LENGTH(1); - return ';'; - } else if (yylloc->end.column == input_size_ + 1) { - // Don't return the final \n. It is handled by the whitespace rule and will - // trigger EOF. - SET_RETURN_PREFIX_LENGTH(YYLeng() - 1); - } else if (mode_ == BisonParserMode::kNextStatement || - mode_ == BisonParserMode::kNextStatementKind || - mode_ == BisonParserMode::kNextScriptStatement) { - // Don't return anything more if we're just looking at a single statement. - // Only return the semicolon, not the whitespace. - SET_RETURN_PREFIX_LENGTH(1); + /* Used only for ZetaSQL macros when not in strict mode. + We still lex it in all modes because it can be passed as an argument in an + invocation (outside of a macro definition) */ +"\\" { + if (AreMacrosEnabled() && !EnforceStrictMacros()) { + return BisonParserImpl::token::BACKSLASH; } + RETURN_ILLEGAL_CHARACTER_ERROR +} + +";" { return ';'; } @@ -1027,20 +1007,22 @@ zone { return BisonParserImpl::token::KW_ZONE; } See also the comment for kEofSentinelInput in flex_tokenizer.h. */ <*>{whitespace_no_comments} { - if (yylloc->end.column == input_size_ + 1) { + if (yylloc->end().GetByteOffset() == input_size_ + 1) { // The whitespace is adjacent to the end of the input, and includes the // \n that we add to the end of the input. Return EOF at the start of the // whitespace, with zero length. This produces better errors, because the // "unexpected EOF" errors will be adjacent to the last token. - yylloc->end.column = yylloc->begin.column; + int end_offset = yylloc->end().GetByteOffset() - 1; + yylloc->mutable_start().SetByteOffset(end_offset); + yylloc->mutable_end().SetByteOffset(end_offset); yyterminate(); } // The whitespace is not at the end of input. Just skip it. } <*>{comment} { - if (mode_ == BisonParserMode::kTokenizerPreserveComments) { - if (yylloc->end.column == input_size_ + 1) { + if (preserve_comments_) { + if (yylloc->end().GetByteOffset() == input_size_ + 1) { // Don't return the final \n. It is handled by the whitespace rule and // will trigger EOF. If we didn't do this, the <> rule would trigger // instead and return an error. @@ -1048,12 +1030,14 @@ zone { return BisonParserImpl::token::KW_ZONE; } } return BisonParserImpl::token::COMMENT; } - if (yylloc->end.column == input_size_ + 1) { + if (yylloc->end().GetByteOffset() == input_size_ + 1) { // The comment is adjacent to the end of the input, and includes the // \n that we add to the end of the input. Return EOF at the end of the // comment, excluding the extra \n, with zero length. This puts the // "unexpected EOF" errors at the line after end-of-line comments. - yylloc->begin.column = yylloc->end.column = yylloc->end.column - 1; + int end_offset = yylloc->end().GetByteOffset() - 1; + yylloc->mutable_start().SetByteOffset(end_offset); + yylloc->mutable_end().SetByteOffset(end_offset); yyterminate(); } // The comment is not at the end of input and we are not preserving comments. diff --git a/zetasql/parser/flex_tokenizer_test.cc b/zetasql/parser/flex_tokenizer_test.cc index ed29beb1e..45a8681ad 100644 --- a/zetasql/parser/flex_tokenizer_test.cc +++ b/zetasql/parser/flex_tokenizer_test.cc @@ -18,26 +18,39 @@ #include +#include "zetasql/base/testing/status_matchers.h" #include "zetasql/parser/bison_parser_mode.h" #include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/token_disambiguator.h" +#include "zetasql/public/language_options.h" #include "gmock/gmock.h" #include "gtest/gtest.h" +#include "zetasql/base/check.h" +#include "absl/status/status.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/status_macros.h" namespace zetasql::parser { using ::testing::ElementsAre; +using ::zetasql_base::testing::IsOkAndHolds; +using ::zetasql_base::testing::StatusIs; using Token = TokenKinds; using TokenKind = int; +using Location = ZetaSqlFlexTokenizer::Location; // This class is a friend of the tokenizer so that it can help us test the // private API. class TokenTestThief { public: - static TokenKind Lookahead1( - ZetaSqlFlexTokenizer& tokenizer, - const ZetaSqlFlexTokenizer::Location& location) { - return tokenizer.Lookahead1(location); + static TokenKind Lookahead1(DisambiguatorLexer& lexer, + const Location& location) { + return lexer.Lookahead1(location); + } + static TokenKind Lookahead2(DisambiguatorLexer& lexer, + const Location& location) { + return lexer.Lookahead2(location); } }; @@ -45,11 +58,15 @@ class FlexTokenizerTest : public ::testing::Test { public: std::vector GetAllTokens(BisonParserMode mode, absl::string_view sql) { - ZetaSqlFlexTokenizer tokenizer(mode, "fake_file", sql, 0, options_); - ZetaSqlFlexTokenizer::Location location; + auto tokenizer = DisambiguatorLexer::Create( + mode, "fake_file", sql, 0, options_, /*macro_catalog=*/nullptr, + /*arena=*/nullptr); + ZETASQL_DCHECK_OK(tokenizer); + Location location; std::vector tokens; do { - tokens.emplace_back(tokenizer.GetNextTokenFlex(&location)); + absl::string_view text; + tokens.emplace_back(tokenizer.value()->GetNextToken(&text, &location)); } while (tokens.back() != Token::YYEOF); return tokens; } @@ -66,7 +83,7 @@ TEST_F(FlexTokenizerTest, ParameterKeywordStatementMode) { TEST_F(FlexTokenizerTest, ParameterKeywordTokenizerMode) { EXPECT_THAT(GetAllTokens(BisonParserMode::kTokenizer, "a @select c"), - ElementsAre(Token::IDENTIFIER, '@', Token::KW_SELECT, + ElementsAre(Token::IDENTIFIER, '@', Token::IDENTIFIER, Token::IDENTIFIER, Token::YYEOF)); } @@ -80,7 +97,7 @@ TEST_F(FlexTokenizerTest, SysvarKeywordStatementMode) { TEST_F(FlexTokenizerTest, SysvarKeywordTokenizerMode) { EXPECT_THAT(GetAllTokens(BisonParserMode::kTokenizer, "a @@where c"), ElementsAre(Token::IDENTIFIER, Token::KW_DOUBLE_AT, - Token::KW_WHERE, Token::IDENTIFIER, Token::YYEOF)); + Token::IDENTIFIER, Token::IDENTIFIER, Token::YYEOF)); } TEST_F(FlexTokenizerTest, QueryParamCurrentDate) { @@ -99,45 +116,213 @@ TEST_F(FlexTokenizerTest, SysvarWithDotId) { Token::IDENTIFIER, '.', Token::IDENTIFIER, Token::YYEOF)); } +absl::StatusOr GetNextToken(DisambiguatorLexer& tokenizer, + Location& location) { + TokenKind token_kind; + ZETASQL_RETURN_IF_ERROR(tokenizer.GetNextToken(&location, &token_kind)); + return token_kind; +} + TEST_F(FlexTokenizerTest, Lookahead1) { - ZetaSqlFlexTokenizer tokenizer(BisonParserMode::kStatement, "fake_file", - "a 1 SELECT", 0, options_); - ZetaSqlFlexTokenizer::Location location; - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::MODE_STATEMENT); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto lexer, DisambiguatorLexer::Create( + BisonParserMode::kStatement, "fake_file", "a 1 SELECT", 0, + options_, /*macro_catalog=*/nullptr, /*arena=*/nullptr)); + Location location; + DisambiguatorLexer& tokenizer = *lexer; + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::MODE_STATEMENT)); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::IDENTIFIER); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::IDENTIFIER); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::IDENTIFIER); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::IDENTIFIER); + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::IDENTIFIER)); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::INTEGER_LITERAL); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::INTEGER_LITERAL); + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::INTEGER_LITERAL)); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::KW_SELECT); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::KW_SELECT); + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::KW_SELECT)); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::YYEOF); + + EXPECT_THAT(GetNextToken(tokenizer, location), IsOkAndHolds(Token::YYEOF)); // Then even after YYEOF EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::YYEOF); + // TODO: b/324273431 - This should not produce an error. + EXPECT_THAT(GetNextToken(tokenizer, location), + StatusIs(absl::StatusCode::kInvalidArgument, + "Internal error: Encountered real EOF")); } TEST_F(FlexTokenizerTest, Lookahead1WithForceTerminate) { - ZetaSqlFlexTokenizer tokenizer(BisonParserMode::kStatement, "fake_file", - "a 1 SELECT", 0, options_); - ZetaSqlFlexTokenizer::Location location; - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::MODE_STATEMENT); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto lexer, DisambiguatorLexer::Create( + BisonParserMode::kStatement, "fake_file", "a 1 SELECT", 0, + options_, /*macro_catalog=*/nullptr, + /*arena=*/nullptr)); + + Location location; + DisambiguatorLexer& tokenizer = *lexer; + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::MODE_STATEMENT)); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::IDENTIFIER); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::IDENTIFIER); + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::IDENTIFIER)); EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::INTEGER_LITERAL); - tokenizer.SetForceTerminate(); - EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::YYEOF); + + tokenizer.SetForceTerminate(/*end_byte_offset=*/nullptr); + EXPECT_THAT(GetNextToken(tokenizer, location), IsOkAndHolds(Token::YYEOF)); + EXPECT_THAT(GetNextToken(tokenizer, location), IsOkAndHolds(Token::YYEOF)); // Then even after YYEOF EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); - EXPECT_EQ(tokenizer.GetNextTokenFlex(&location), Token::YYEOF); + EXPECT_THAT(GetNextToken(tokenizer, location), IsOkAndHolds(Token::YYEOF)); +} + +TEST_F(FlexTokenizerTest, Lookahead2) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto lexer, DisambiguatorLexer::Create( + BisonParserMode::kStatement, "fake_file", "a 1 SELECT", 0, + options_, /*macro_catalog=*/nullptr, /*arena=*/nullptr)); + + Location location; + DisambiguatorLexer& tokenizer = *lexer; + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::MODE_STATEMENT)); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::IDENTIFIER); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), + Token::INTEGER_LITERAL); + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::IDENTIFIER)); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), + Token::INTEGER_LITERAL); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::KW_SELECT); + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::INTEGER_LITERAL)); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::KW_SELECT); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::YYEOF); + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::KW_SELECT)); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::YYEOF); + + // TODO: b/324273431 - This should not produce an error. + EXPECT_THAT(GetNextToken(tokenizer, location), + StatusIs(absl::StatusCode::kInvalidArgument, + "Internal error: Encountered real EOF")); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::YYEOF); +} + +TEST_F(FlexTokenizerTest, Lookahead2BeforeLookahead1) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto lexer, DisambiguatorLexer::Create( + BisonParserMode::kStatement, "fake_file", "a 1 SELECT", 0, + options_, /*macro_catalog=*/nullptr, /*arena=*/nullptr)); + + Location location; + DisambiguatorLexer& tokenizer = *lexer; + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::MODE_STATEMENT)); + // Calling Lookahead2 before Lookahead1 returns the correct token. + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), + Token::INTEGER_LITERAL); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::IDENTIFIER); + + // Repeated calling Lookahead2 returns the same token. + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), + Token::INTEGER_LITERAL); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), + Token::INTEGER_LITERAL); +} + +TEST_F(FlexTokenizerTest, Lookahead2NoEnoughTokens) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto lexer, DisambiguatorLexer::Create( + BisonParserMode::kStatement, "fake_file", "", 0, options_, + /*macro_catalog=*/nullptr, /*arena=*/nullptr)); + + Location location; + DisambiguatorLexer& tokenizer = *lexer; + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::MODE_STATEMENT)); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::YYEOF); + + // TODO: b/324273431 - This should not produce an error. + EXPECT_THAT(GetNextToken(tokenizer, location), + StatusIs(absl::StatusCode::kInvalidArgument, + "Internal error: Encountered real EOF")); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::YYEOF); +} + +TEST_F(FlexTokenizerTest, Lookahead2ForceTerminate) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto lexer, DisambiguatorLexer::Create( + BisonParserMode::kStatement, "fake_file", "a 1 SELECT", 0, + options_, /*macro_catalog=*/nullptr, /*arena=*/nullptr)); + + Location location; + DisambiguatorLexer& tokenizer = *lexer; + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::MODE_STATEMENT)); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::IDENTIFIER); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), + Token::INTEGER_LITERAL); + + tokenizer.SetForceTerminate(/*end_byte_offset=*/nullptr); + + // After the force termination both lookaheads return YYEOF. + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::YYEOF); + + // Fetching more tokens returns YYEOF. + EXPECT_THAT(GetNextToken(tokenizer, location), IsOkAndHolds(Token::YYEOF)); + EXPECT_EQ(TokenTestThief::Lookahead1(tokenizer, location), Token::YYEOF); + EXPECT_EQ(TokenTestThief::Lookahead2(tokenizer, location), Token::YYEOF); +} + +TEST_F(FlexTokenizerTest, DisambiguatorReturnsYyeofWhenErrors) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto lexer, + DisambiguatorLexer::Create(BisonParserMode::kStatement, "fake_file", + "SELECT * EXCEPT 1", 0, options_, + /*macro_catalog=*/nullptr, /*arena=*/nullptr)); + + Location location; + DisambiguatorLexer& tokenizer = *lexer; + + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::MODE_STATEMENT)); + EXPECT_THAT(GetNextToken(tokenizer, location), + IsOkAndHolds(Token::KW_SELECT)); + EXPECT_THAT(GetNextToken(tokenizer, location), IsOkAndHolds('*')); + + int token_kind; + absl::Status status = tokenizer.GetNextToken(&location, &token_kind); + EXPECT_THAT( + status, + StatusIs( + absl::StatusCode::kInvalidArgument, + R"err_msg(EXCEPT must be followed by ALL, DISTINCT, or "(")err_msg")); + // The returned token should be YYEOF rather than KW_EXCEPT because an error + // is produced. + EXPECT_EQ(token_kind, Token::YYEOF); } } // namespace zetasql::parser diff --git a/zetasql/parser/gen_parse_tree.py b/zetasql/parser/gen_parse_tree.py index a5b4553fc..9855703bf 100644 --- a/zetasql/parser/gen_parse_tree.py +++ b/zetasql/parser/gen_parse_tree.py @@ -39,7 +39,7 @@ from zetasql.parser.generator_utils import Trim from zetasql.parser.generator_utils import UpperCamelCase -NEXT_NODE_TAG_ID = 440 +NEXT_NODE_TAG_ID = 469 ROOT_NODE_NAME = 'ASTNode' @@ -301,37 +301,41 @@ class Visibility(enum.Enum): PROTECTED = 1 -def Field(name, - ctype, - tag_id, - field_loader=FieldLoaderMethod.OPTIONAL, - comment=None, - private_comment=None, - gen_setters_and_getters=True, - getter_is_override=False, - visibility=Visibility.PRIVATE): +def Field( + name, + ctype, + tag_id, + field_loader=FieldLoaderMethod.OPTIONAL, + comment=None, + private_comment=None, + gen_setters_and_getters=True, + getter_is_override=False, + visibility=Visibility.PRIVATE, + serialize_default_value=True, +): """Make a field to put in a node class. Args: name: field name - ctype: c++ type for this field - Should be a ScalarType like an int, string or enum type, - or the name of a node class type (e.g. ASTExpression). - Cannot be a pointer type, and should not include modifiers like - const. + ctype: c++ type for this field Should be a ScalarType like an int, string or + enum type, or the name of a node class type (e.g. ASTExpression). Cannot + be a pointer type, and should not include modifiers like const. tag_id: Unique sequential id for this field within the node, beginning with - 2, which should not change. - field_loader: FieldLoaderMethod enum specifies which FieldLoader method - to use for this field. Ignored when Node has gen_init_fields=False. - Not applicable to scalar types. + 2, which should not change. + field_loader: FieldLoaderMethod enum specifies which FieldLoader method to + use for this field. Ignored when Node has gen_init_fields=False. Not + applicable to scalar types. comment: Comment for this field's public getter/setter method. Text will be - stripped and de-indented. + stripped and de-indented. private_comment: Comment for the field in the protected/private section. gen_setters_and_getters: When False, suppress generation of default - template-based get and set methods. Non-standard alternatives - may be supplied in extra_public_defs. + template-based get and set methods. Non-standard alternatives may be + supplied in extra_public_defs. getter_is_override: Indicates getter overrides virtual method in superclass. visibility: Indicates whether field is private or protected. + serialize_default_value: If false, the serializer will skip this field when + it has a default value. + Returns: The newly created field. @@ -428,8 +432,10 @@ def Field(name, 'proto_optional_or_repeated': proto_optional_or_repeated, 'is_enum': is_enum, 'enum_value': enum_value, + 'serialize_default_value': serialize_default_value, } + # Ancestor holds the subset of a NodeDict needed to serialize its # immediate ancestor. Ancestor = collections.namedtuple('Ancestor', [ @@ -906,8 +912,10 @@ def main(argv): 'columns', 'ASTSelectColumn', tag_id=2, - field_loader=FieldLoaderMethod.REST_AS_REPEATED), - ]) + field_loader=FieldLoaderMethod.REST_AS_REPEATED, + ), + ], + ) gen.AddNode( name='ASTSelectColumn', @@ -918,12 +926,11 @@ def main(argv): 'expression', 'ASTExpression', tag_id=2, - field_loader=FieldLoaderMethod.REQUIRED), - Field( - 'alias', - 'ASTAlias', - tag_id=3) - ]) + field_loader=FieldLoaderMethod.REQUIRED, + ), + Field('alias', 'ASTAlias', tag_id=3), + ], + ) gen.AddNode( name='ASTExpression', @@ -950,31 +957,68 @@ def main(argv): tag_id=9, parent='ASTExpression', is_abstract=True, + extra_public_defs=""" + bool IsLeaf() const override { return true; } + """, + fields=[], + comment=""" + The name ASTLeaf is kept for backward compatibility alone. However, not + all subclasses are necessarily leaf nodes. ASTStringLiteral and + ASTBytesLiteral both have children which are the one or more components + of literal concatenations. Similarly, ASTDateOrTimeLiteral and + ASTRangeLiteral each contain a child ASTStringLiteral, which itself is not + a leaf. + + The grouping does not make much sense at this point, given that it + encompasses not only literals, but also ASTStar. + + Its main function was intended to be the nodes that get printed through + image(), but this is no longer applicable. This functionality is now + handled by a stricted abstract class ASTPrintableLeaf. + + This class should be removed, and subclasses should directly inherit from + ASTExpression (just as ASTDateOrTimeLiteral does right now). Once all + callers have been updated as such, we should remove this class from the + hierarchy and directly inherit from ASTExpression. + """, + ) + + gen.AddNode( + name='ASTPrintableLeaf', + tag_id=452, + parent='ASTLeaf', + is_abstract=True, use_custom_debug_string=True, extra_public_defs=""" // image() references data with the same lifetime as this ASTLeaf object. - absl::string_view image() const { return image_; } void set_image(std::string image) { image_ = std::move(image); } - - bool IsLeaf() const override { return true; } + absl::string_view image() const { return image_; } """, fields=[ - Field( - 'image', - SCALAR_STRING, - tag_id=2, - gen_setters_and_getters=False) - ]) + Field('image', SCALAR_STRING, tag_id=2, gen_setters_and_getters=False) + ], + comment=""" + Intermediate subclass of ASTLeaf which is the parent of nodes that are + still using image(). Ideally image() should be hidden, and only used to + print back to the user, but it is currently being abused in some places + to represent the value as well, such as with ASTIntLiteral and + ASTFloatLiteral. + + Generally, image() should be removed, and location offsets of the node, + leaf or not, should be enough to print back the image, for example within + error messages. + """, + ) gen.AddNode( name='ASTIntLiteral', tag_id=10, - parent='ASTLeaf', + parent='ASTPrintableLeaf', extra_public_defs=""" bool is_hex() const; """, - ) + ) gen.AddNode( name='ASTIdentifier', @@ -1217,14 +1261,11 @@ def main(argv): gen.AddNode( name='ASTBooleanLiteral', tag_id=19, - parent='ASTLeaf', + parent='ASTPrintableLeaf', fields=[ - Field( - 'value', - SCALAR_BOOL, - tag_id=2), + Field('value', SCALAR_BOOL, tag_id=2), ], - ) + ) gen.AddNode( name='ASTAndExpr', @@ -1287,12 +1328,44 @@ def main(argv): name='ASTStringLiteral', tag_id=22, parent='ASTLeaf', + fields=[ + Field( + 'components', + 'ASTStringLiteralComponent', + tag_id=2, + field_loader=FieldLoaderMethod.REPEATING_WHILE_IS_EXPRESSION, + ), + Field( + 'string_value', + SCALAR_STRING, + tag_id=3, + gen_setters_and_getters=False, + ), + ], + extra_public_defs=""" + // The parsed and validated value of this literal. + const std::string& string_value() const { return string_value_; } + void set_string_value(absl::string_view string_value) { + string_value_ = std::string(string_value); + } + """, + comment=""" + Represents a string literal which could be just a singleton or a whole + concatenation. + """, + ) + + gen.AddNode( + name='ASTStringLiteralComponent', + tag_id=453, + parent='ASTPrintableLeaf', fields=[ Field( 'string_value', SCALAR_STRING, tag_id=2, - gen_setters_and_getters=False), + gen_setters_and_getters=False, + ), ], extra_public_defs=""" // The parsed and validated value of this literal. The raw input value can be @@ -1301,14 +1374,14 @@ def main(argv): void set_string_value(std::string string_value) { string_value_ = std::move(string_value); } - """ - ) + """, + ) gen.AddNode( name='ASTStar', tag_id=23, - parent='ASTLeaf', - ) + parent='ASTPrintableLeaf', + ) gen.AddNode( name='ASTOrExpr', @@ -1326,6 +1399,42 @@ def main(argv): """ ) + gen.AddNode( + name='ASTOrderingExpression', + tag_id=27, + parent='ASTNode', + use_custom_debug_string=True, + fields=[ + Field( + 'expression', + 'ASTExpression', + tag_id=2, + field_loader=FieldLoaderMethod.REQUIRED, + ), + Field('collate', 'ASTCollate', tag_id=3), + Field('null_order', 'ASTNullOrder', tag_id=4), + Field('ordering_spec', SCALAR_ORDERING_SPEC, tag_id=5), + ], + extra_public_defs=""" + bool descending() const { return ordering_spec_ == DESC; } + """, + ) + + gen.AddNode( + name='ASTOrderBy', + tag_id=28, + parent='ASTNode', + fields=[ + Field('hint', 'ASTHint', tag_id=2), + Field( + 'ordering_expressions', + 'ASTOrderingExpression', + tag_id=3, + field_loader=FieldLoaderMethod.REST_AS_REPEATED, + ), + ], + ) + gen.AddNode( name='ASTGroupingItem', tag_id=25, @@ -1371,10 +1480,7 @@ def main(argv): tag_id=26, parent='ASTNode', fields=[ - Field( - 'hint', - 'ASTHint', - tag_id=2), + Field('hint', 'ASTHint', tag_id=2), Field( 'all', 'ASTGroupByAll', @@ -1389,8 +1495,10 @@ def main(argv): 'grouping_items', 'ASTGroupingItem', tag_id=4, - field_loader=FieldLoaderMethod.REST_AS_REPEATED), - ]) + field_loader=FieldLoaderMethod.REST_AS_REPEATED, + ), + ], + ) gen.AddNode( name='ASTGroupByAll', @@ -1402,52 +1510,6 @@ def main(argv): """, ) - gen.AddNode( - name='ASTOrderingExpression', - tag_id=27, - parent='ASTNode', - use_custom_debug_string=True, - fields=[ - Field( - 'expression', - 'ASTExpression', - tag_id=2, - field_loader=FieldLoaderMethod.REQUIRED), - Field( - 'collate', - 'ASTCollate', - tag_id=3), - Field( - 'null_order', - 'ASTNullOrder', - tag_id=4), - Field( - 'ordering_spec', - SCALAR_ORDERING_SPEC, - tag_id=5) - - ], - extra_public_defs=""" - bool descending() const { return ordering_spec_ == DESC; } - """, - ) - - gen.AddNode( - name='ASTOrderBy', - tag_id=28, - parent='ASTNode', - fields=[ - Field( - 'hint', - 'ASTHint', - tag_id=2), - Field( - 'ordering_expressions', - 'ASTOrderingExpression', - tag_id=3, - field_loader=FieldLoaderMethod.REST_AS_REPEATED), - ]) - gen.AddNode( name='ASTLimitOffset', tag_id=29, @@ -1474,14 +1536,14 @@ def main(argv): gen.AddNode( name='ASTFloatLiteral', tag_id=30, - parent='ASTLeaf', - ) + parent='ASTPrintableLeaf', + ) gen.AddNode( name='ASTNullLiteral', tag_id=31, - parent='ASTLeaf', - ) + parent='ASTPrintableLeaf', + ) gen.AddNode( name='ASTOnClause', @@ -1512,6 +1574,11 @@ def main(argv): tag_id=3, field_loader=FieldLoaderMethod.REQUIRED, ), + Field( + 'modifiers', + 'ASTAliasedQueryModifiers', + tag_id=4, + ), ], ) @@ -2316,17 +2383,66 @@ def main(argv): gen.AddNode( name='ASTNumericLiteral', tag_id=53, - parent='ASTLeaf') + parent='ASTLeaf', + fields=[ + Field( + 'string_literal', + 'ASTStringLiteral', + tag_id=2, + field_loader=FieldLoaderMethod.REQUIRED, + ), + ], + ) gen.AddNode( name='ASTBigNumericLiteral', tag_id=54, - parent='ASTLeaf') + parent='ASTLeaf', + fields=[ + Field( + 'string_literal', + 'ASTStringLiteral', + tag_id=2, + field_loader=FieldLoaderMethod.REQUIRED, + ), + ], + ) gen.AddNode( name='ASTBytesLiteral', tag_id=55, parent='ASTLeaf', + fields=[ + Field( + 'components', + 'ASTBytesLiteralComponent', + tag_id=2, + field_loader=FieldLoaderMethod.REPEATING_WHILE_IS_EXPRESSION, + ), + Field( + 'bytes_value', + SCALAR_STRING, + tag_id=3, + gen_setters_and_getters=False, + ), + ], + extra_public_defs=""" + // The parsed and validated value of this literal. + const std::string& bytes_value() const { return bytes_value_; } + void set_bytes_value(std::string bytes_value) { + bytes_value_ = std::move(bytes_value); + } + """, + comment=""" + Represents a bytes literal which could be just a singleton or a whole + concatenation. + """, + ) + + gen.AddNode( + name='ASTBytesLiteralComponent', + tag_id=454, + parent='ASTPrintableLeaf', extra_public_defs=""" // The parsed and validated value of this literal. The raw input value can be // found in image(). @@ -2337,7 +2453,8 @@ def main(argv): """, extra_private_defs=""" std::string bytes_value_; - """) + """, + ) gen.AddNode( name='ASTDateOrTimeLiteral', @@ -2359,16 +2476,26 @@ def main(argv): gen.AddNode( name='ASTMaxLiteral', tag_id=57, - parent='ASTLeaf', + parent='ASTPrintableLeaf', comment=""" This represents the value MAX that shows up in type parameter lists. It will not show up as a general expression anywhere else. - """) + """, + ) gen.AddNode( name='ASTJSONLiteral', tag_id=58, - parent='ASTLeaf') + parent='ASTLeaf', + fields=[ + Field( + 'string_literal', + 'ASTStringLiteral', + tag_id=2, + field_loader=FieldLoaderMethod.REQUIRED, + ), + ], + ) gen.AddNode( name='ASTCaseValueExpression', @@ -4384,13 +4511,22 @@ def main(argv): gen.AddNode( name='ASTBracedConstructorFieldValue', tag_id=330, - parent='ASTExpression', + parent='ASTNode', fields=[ Field( 'expression', 'ASTExpression', tag_id=2, field_loader=FieldLoaderMethod.REQUIRED), + Field( + 'colon_prefixed', + SCALAR_BOOL, + tag_id=3, + comment=""" + True if "field:value" syntax is used. + False if "field value" syntax is used. + The later is only allowed in proto instead of struct. + """), ]) gen.AddNode( @@ -4411,6 +4547,16 @@ def main(argv): 'ASTBracedConstructorFieldValue', tag_id=4, field_loader=FieldLoaderMethod.REQUIRED), + Field( + 'comma_separated', + SCALAR_BOOL, + tag_id=5, + comment=""" + True if this field is separated by comma from the previous one, + e.g.all e.g. "a:1,b:2". + False if separated by whitespace, e.g. "a:1 b:2". + The latter is only allowed in proto instead of struct. + """), ]) gen.AddNode( @@ -4442,6 +4588,23 @@ def main(argv): field_loader=FieldLoaderMethod.REQUIRED), ]) + gen.AddNode( + name='ASTStructBracedConstructor', + tag_id=462, + parent='ASTExpression', + fields=[ + Field( + 'type_name', + 'ASTType', + tag_id=2, + field_loader=FieldLoaderMethod.OPTIONAL_TYPE), + Field( + 'braced_constructor', + 'ASTBracedConstructor', + tag_id=3, + field_loader=FieldLoaderMethod.REQUIRED), + ]) + gen.AddNode( name='ASTOptionsList', tag_id=148, @@ -5096,8 +5259,9 @@ def main(argv): gen.AddNode( name='ASTIndexAllColumns', tag_id=171, - parent='ASTLeaf', - comment="Represents 'ALL COLUMNS' index key expression.") + parent='ASTPrintableLeaf', + comment="Represents 'ALL COLUMNS' index key expression.", + ) gen.AddNode( name='ASTIndexItemList', @@ -7454,21 +7618,24 @@ def main(argv): 'string_label', 'ASTStringLiteral', tag_id=2, - gen_setters_and_getters=False), + gen_setters_and_getters=False, + ), Field( 'int_label', 'ASTIntLiteral', tag_id=3, - gen_setters_and_getters=False), + gen_setters_and_getters=False, + ), ], extra_public_defs=""" - const ASTLeaf* label() const { + const ASTExpression* label() const { if (string_label_ != nullptr) { return string_label_; } return int_label_; } - """) + """, + ) gen.AddNode( name='ASTDescriptor', @@ -8620,6 +8787,11 @@ def main(argv): tag_id=307, parent='ASTAlterStatementBase') + gen.AddNode( + name='ASTAlterExternalSchemaStatement', + tag_id=440, + parent='ASTAlterStatementBase') + gen.AddNode( name='ASTAlterTableStatement', tag_id=308, @@ -9208,10 +9380,11 @@ def main(argv): gen.AddNode( name='ASTMacroBody', tag_id=368, - parent='ASTLeaf', + parent='ASTPrintableLeaf', comment=""" Represents the body of a DEFINE MACRO statement. - """) + """, + ) gen.AddNode( name='ASTDefineMacroStatement', @@ -9250,6 +9423,7 @@ def main(argv): ), Field('is_if_not_exists', SCALAR_BOOL, tag_id=4), Field('for_system_time', 'ASTForSystemTime', tag_id=5), + Field('options_list', 'ASTOptionsList', tag_id=6), ], extra_public_defs=""" const ASTPathExpression* GetDdlTarget() const override { return name_; } @@ -9343,6 +9517,68 @@ def main(argv): ], ) + gen.AddNode( + name='ASTAliasedQueryModifiers', + tag_id=463, + parent='ASTNode', + fields=[ + Field( + 'recursion_depth_modifier', + 'ASTRecursionDepthModifier', + tag_id=2, + ), + ], + ) + + gen.AddNode( + name='ASTIntOrUnbounded', + tag_id=464, + parent='ASTExpression', + comment=''' + This represents an integer or an unbounded integer. + The semantic of unbounded integer depends on the context. + ''', + fields=[ + Field( + 'bound', + 'ASTExpression', + tag_id=2, + field_loader=FieldLoaderMethod.OPTIONAL_EXPRESSION, + ), + ], + ) + + gen.AddNode( + name='ASTRecursionDepthModifier', + tag_id=465, + parent='ASTNode', + fields=[ + Field( + 'alias', + 'ASTAlias', + tag_id=2, + ), + Field( + 'lower_bound', + 'ASTIntOrUnbounded', + tag_id=3, + field_loader=FieldLoaderMethod.REQUIRED, + comment=""" + lower bound is 0 when the node's `bound` field is unset. + """, + ), + Field( + 'upper_bound', + 'ASTIntOrUnbounded', + tag_id=4, + field_loader=FieldLoaderMethod.REQUIRED, + comment=""" + upper_bound is infinity when the node's `bound` field is unset. + """, + ), + ], + ) + gen.Generate(output_path, template_path=template_path) diff --git a/zetasql/parser/join_processor.cc b/zetasql/parser/join_processor.cc index 68f122071..f7763db2e 100644 --- a/zetasql/parser/join_processor.cc +++ b/zetasql/parser/join_processor.cc @@ -16,15 +16,13 @@ #include "zetasql/parser/join_processor.h" -#include #include #include #include -#include "zetasql/common/errors.h" #include "zetasql/parser/ast_node_kind.h" -#include "zetasql/parser/bison_parser.bison.h" -#include "absl/memory/memory.h" +#include "zetasql/parser/parse_tree.h" +#include "zetasql/public/parse_location.h" #include "absl/strings/string_view.h" namespace zetasql { @@ -154,17 +152,16 @@ static std::stack FlattenJoinExpression(ASTNode* node) { return q; } -ASTNode* MakeInternalError( - ErrorInfo* error_info, - const zetasql_bison_parser::location& error_location, - absl::string_view error_message) { +ASTNode* MakeInternalError(ErrorInfo* error_info, + const ParseLocationRange& error_location, + absl::string_view error_message) { error_info->location = error_location; error_info->message = absl::StrCat("Internal error: ", error_message); return nullptr; } ASTNode* MakeSyntaxError(ErrorInfo* error_info, - const zetasql_bison_parser::location& error_location, + const ParseLocationRange& error_location, absl::string_view error_message) { error_info->location = error_location; error_info->message = absl::StrCat("Syntax error: ", error_message); @@ -258,8 +255,7 @@ static ASTNode* GenerateError(const JoinErrorTracker& error_tracker, const BisonParser* parser, std::stack* stack, ErrorInfo* error_info) { - zetasql_bison_parser::location location = - parser->GetBisonLocation(stack->top()->GetParseLocationRange()); + ParseLocationRange location = stack->top()->GetParseLocationRange(); // We report the error at the first qualified join of the join block, which is // the bottomest qualified, not created, join in the stack. Search for this @@ -284,8 +280,7 @@ static ASTNode* GenerateError(const JoinErrorTracker& error_tracker, ? "INNER" : join->GetSQLForJoinType(); return MakeSyntaxError( - error_info, - parser->GetBisonLocation(join->join_location()->GetParseLocationRange()), + error_info, join->join_location()->GetParseLocationRange(), absl::StrCat("The number of join conditions is ", error_tracker.join_condition_count(), " but the number of joins that require a join condition is ", @@ -296,8 +291,8 @@ static ASTNode* GenerateError(const JoinErrorTracker& error_tracker, ASTNode* ProcessFlattenedJoinExpression( BisonParser* parser, std::stack* flattened_join_expression, ErrorInfo* error_info) { - zetasql_bison_parser::location location = parser->GetBisonLocation( - flattened_join_expression->top()->GetParseLocationRange()); + ParseLocationRange location = + flattened_join_expression->top()->GetParseLocationRange(); std::stack stack; JoinErrorTracker error_tracker; @@ -310,10 +305,8 @@ ASTNode* ProcessFlattenedJoinExpression( // If it's CROSS, COMMA, or NATURAL JOIN, create a new CROSS, COMMA, // NATURAL JOIN. if (stack.empty()) { - return MakeInternalError( - error_info, - parser->GetBisonLocation(item->GetParseLocationRange()), - "Stack should not be empty"); + return MakeInternalError(error_info, item->GetParseLocationRange(), + "Stack should not be empty"); } ASTNode* lhs = stack.top(); @@ -322,12 +315,10 @@ ASTNode* ProcessFlattenedJoinExpression( flattened_join_expression->pop(); ASTLocation* join_location = - parser->MakeLocation(parser->GetBisonLocation( - join->join_location()->GetParseLocationRange())); + parser->MakeLocation(join->join_location()->GetParseLocationRange()); ASTJoin* new_join = parser->CreateASTNode( - zetasql_bison_parser::location(), - {lhs, GetJoinHint(join), join_location, rhs}); + ParseLocationRange(), {lhs, GetJoinHint(join), join_location, rhs}); new_join->set_join_type(join->join_type()); new_join->set_join_hint(join->join_hint()); new_join->set_natural(join->natural()); @@ -341,8 +332,7 @@ ASTNode* ProcessFlattenedJoinExpression( // Create JOIN with on/using clause. if (stack.size() < 3) { return MakeInternalError( - error_info, - parser->GetBisonLocation(item->GetParseLocationRange()), + error_info, item->GetParseLocationRange(), "Stack should contain at least 3 items at this point"); } error_tracker.increment_join_condition_count(); @@ -355,8 +345,7 @@ ASTNode* ProcessFlattenedJoinExpression( stack.pop(); ASTLocation* join_location = - parser->MakeLocation(parser->GetBisonLocation( - join->join_location()->GetParseLocationRange())); + parser->MakeLocation(join->join_location()->GetParseLocationRange()); ASTJoin* new_join = parser->CreateASTNode( location, {lhs, GetJoinHint(join), join_location, rhs, item}); @@ -414,9 +403,9 @@ static const ASTJoin::ParseError* GetParseError(const ASTNode* node) { return nullptr; } -ASTNode* JoinRuleAction(const zetasql_bison_parser::location& start_location, - const zetasql_bison_parser::location& end_location, - ASTNode* lhs, bool natural, ASTJoin::JoinType join_type, +ASTNode* JoinRuleAction(const ParseLocationRange& start_location, + const ParseLocationRange& end_location, ASTNode* lhs, + bool natural, ASTJoin::JoinType join_type, ASTJoin::JoinHint join_hint, ASTNode* hint, ASTNode* table_primary, ASTNode* on_or_using_clause_list, @@ -442,8 +431,7 @@ ASTNode* JoinRuleAction(const zetasql_bison_parser::location& start_location, if (ContainsCommaJoin(lhs)) { auto* error_node = clause_list->child(1); return MakeSyntaxError( - error_info, - parser->GetBisonLocation(error_node->GetParseLocationRange()), + error_info, error_node->GetParseLocationRange(), absl::StrCat( "Unexpected keyword ", (error_node->node_kind() == AST_ON_CLAUSE ? "ON" : "USING"))); @@ -493,10 +481,8 @@ ASTNode* JoinRuleAction(const zetasql_bison_parser::location& start_location, if (clause_count >= 2) { // Consecutive ON/USING clauses are used. Returns the error in this case. - return MakeSyntaxError( - error_info, - parser->GetBisonLocation(error_node->GetParseLocationRange()), - message); + return MakeSyntaxError(error_info, error_node->GetParseLocationRange(), + message); } else { // Does not throw the error to maintain the backward compatibility. Saves // the error instead. @@ -508,11 +494,11 @@ ASTNode* JoinRuleAction(const zetasql_bison_parser::location& start_location, return join; } -ASTNode* CommaJoinRuleAction( - const zetasql_bison_parser::location& start_location, - const zetasql_bison_parser::location& end_location, ASTNode* lhs, - ASTNode* table_primary, ASTLocation* comma_location, BisonParser* parser, - ErrorInfo* error_info) { +ASTNode* CommaJoinRuleAction(const ParseLocationRange& start_location, + const ParseLocationRange& end_location, + ASTNode* lhs, ASTNode* table_primary, + ASTLocation* comma_location, BisonParser* parser, + ErrorInfo* error_info) { if (IsTransformationNeeded(lhs)) { return MakeSyntaxError( error_info, start_location, diff --git a/zetasql/parser/join_processor.h b/zetasql/parser/join_processor.h index 3fd448830..1bbe3e6fe 100644 --- a/zetasql/parser/join_processor.h +++ b/zetasql/parser/join_processor.h @@ -19,7 +19,10 @@ #include +#include "zetasql/parser/ast_node.h" #include "zetasql/parser/bison_parser.h" +#include "zetasql/parser/parse_tree.h" +#include "zetasql/public/parse_location.h" namespace zetasql { namespace parser { @@ -384,17 +387,16 @@ namespace parser { // error. struct ErrorInfo { - zetasql_bison_parser::location location; + ParseLocationRange location; std::string message; }; // The action to run when the join/from_clause_contents rule is matched. // On success, returns the ASTNode that should be assigned to $$. Returns // nullptr on failure, and 'error_info' will contain the error information. -ASTNode* JoinRuleAction(const zetasql_bison_parser::location& start_location, - const zetasql_bison_parser::location& end_location, - ASTNode* lhs, bool opt_natural, - ASTJoin::JoinType join_type, +ASTNode* JoinRuleAction(const ParseLocationRange& start_location, + const ParseLocationRange& end_location, ASTNode* lhs, + bool opt_natural, ASTJoin::JoinType join_type, ASTJoin::JoinHint join_hint, ASTNode* opt_hint, ASTNode* table_primary, ASTNode* opt_on_or_using_clause_list, @@ -406,17 +408,17 @@ ASTNode* JoinRuleAction(const zetasql_bison_parser::location& start_location, // is matched. // Returns the ASTNode that should be assigned to $$ on success. Returns // nullptr on failure, and 'error_info' will contain the error information. -ASTNode* CommaJoinRuleAction( - const zetasql_bison_parser::location& start_location, - const zetasql_bison_parser::location& end_location, ASTNode* lhs, - ASTNode* table_primary, ASTLocation* comma_location, BisonParser* parser, - ErrorInfo* error_info); +ASTNode* CommaJoinRuleAction(const ParseLocationRange& start_location, + const ParseLocationRange& end_location, + ASTNode* lhs, ASTNode* table_primary, + ASTLocation* comma_location, BisonParser* parser, + ErrorInfo* error_info); // Performs the transformation algorithm on the expression 'node'. // On success, returns the created ASTNode. Returns nullptr on failure, // and 'error_info' will contain the error information. -ASTNode* TransformJoinExpression(ASTNode* node, - BisonParser* parser, ErrorInfo* error_info); +ASTNode* TransformJoinExpression(ASTNode* node, BisonParser* parser, + ErrorInfo* error_info); } // namespace parser } // namespace zetasql diff --git a/zetasql/parser/keywords.cc b/zetasql/parser/keywords.cc index 813978265..56457a3f2 100644 --- a/zetasql/parser/keywords.cc +++ b/zetasql/parser/keywords.cc @@ -133,6 +133,7 @@ constexpr KeywordInfoPOD kAllKeywords[] = { {"definer", KW_DEFINER}, {"delete", KW_DELETE}, {"deletion", KW_DELETION}, + {"depth", KW_DEPTH}, {"desc", KW_DESC, kReserved}, {"describe", KW_DESCRIBE}, {"descriptor", KW_DESCRIPTOR}, diff --git a/zetasql/parser/keywords_test.cc b/zetasql/parser/keywords_test.cc index c7a95355a..800079d01 100644 --- a/zetasql/parser/keywords_test.cc +++ b/zetasql/parser/keywords_test.cc @@ -29,6 +29,7 @@ #include "absl/flags/flag.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" +#include "absl/types/span.h" #include "re2/re2.h" namespace zetasql { @@ -104,7 +105,7 @@ std::vector GetSectionFromFile( // returns them as a set. Recognizes both the quoted form (e.g. "foo") and the // direct form (e.g. KW_FOO). std::set ExtractKeywordsFromLines( - const std::vector& input) { + absl::Span input) { std::set result; RE2 extract_quoted_keyword(".*\"([A-Za-z_]+)\".*"); RE2 extract_unquoted_keyword(".*KW_([A-Za-z_]+).*"); @@ -128,7 +129,7 @@ std::set ExtractKeywordsFromLines( // followed by a space. Rules with trailing context (e.g. "foo/bar") are // ignored. std::set ExtractTokenizerKeywordsFromLines( - const std::vector& input) { + absl::Span input) { std::set result; RE2 extract_tokenizer_keyword("^([A-Za-z_]+) "); for (const std::string& line : input) { diff --git a/zetasql/parser/macros/BUILD b/zetasql/parser/macros/BUILD new file mode 100644 index 000000000..a66054223 --- /dev/null +++ b/zetasql/parser/macros/BUILD @@ -0,0 +1,207 @@ +# +# Copyright 2019 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +cc_library( + name = "quoting", + srcs = ["quoting.cc"], + hdrs = ["quoting.h"], + deps = [ + "//zetasql/base:ret_check", + "//zetasql/base:status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:str_format", + ], +) + +cc_test( + name = "quoting_test", + srcs = ["quoting_test.cc"], + deps = [ + ":quoting", + "//zetasql/base/testing:status_matchers", + "//zetasql/base/testing:zetasql_gtest_main", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:str_format", + ], +) + +cc_library( + name = "token_with_location", + hdrs = ["token_with_location.h"], + visibility = [ + "//zetasql/parser:__pkg__", + "//zetasql/tools/execute_query:__pkg__", + ], + deps = [ + "//zetasql/public:parse_location", + "@com_google_absl//absl/strings", + ], +) + +cc_library( + name = "macro_catalog", + hdrs = ["macro_catalog.h"], + visibility = [ + "//zetasql/parser:__pkg__", + "//zetasql/tools/execute_query:__pkg__", + ], + deps = [ + "@com_google_absl//absl/container:flat_hash_map", + ], +) + +cc_library( + name = "flex_token_provider", + srcs = ["flex_token_provider.cc"], + hdrs = ["flex_token_provider.h"], + visibility = [ + "//zetasql/parser:__pkg__", + "//zetasql/tools/execute_query:__pkg__", + ], + deps = [ + ":token_with_location", + "//zetasql/base:check", + "//zetasql/base:status", + "//zetasql/parser:bison_parser_mode", + "//zetasql/parser:bison_token_codes", + "//zetasql/parser:flex_tokenizer", + "//zetasql/public:language_options", + "//zetasql/public:parse_location", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + ], +) + +cc_test( + name = "flex_token_provider_test", + srcs = ["flex_token_provider_test.cc"], + deps = [ + ":flex_token_provider", + ":token_with_location", + "//zetasql/base/testing:status_matchers", + "//zetasql/base/testing:zetasql_gtest_main", + "//zetasql/parser:bison_parser_mode", + "//zetasql/parser:bison_token_codes", + "//zetasql/parser:token_disambiguator", # build_cleaner: keep + "//zetasql/public:language_options", + "//zetasql/public:parse_location", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:str_format", + ], +) + +cc_library( + name = "token_splicing_utils", + srcs = ["token_splicing_utils.cc"], + hdrs = ["token_splicing_utils.h"], + deps = [ + ":token_with_location", + "//zetasql/parser:bison_token_codes", + "//zetasql/parser:keywords", + "@com_google_absl//absl/strings", + ], +) + +cc_library( + name = "macro_expander", + srcs = [ + "macro_expander.cc", + ], + hdrs = [ + "macro_expander.h", + ], + visibility = [ + "//zetasql/parser:__pkg__", + "//zetasql/tools/execute_query:__pkg__", + ], + deps = [ + ":flex_token_provider", + ":macro_catalog", + ":quoting", + ":token_splicing_utils", + ":token_with_location", + "//zetasql/base:arena", + "//zetasql/base:arena_allocator", + "//zetasql/base:check", + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/common:errors", + "//zetasql/common:thread_stack", + "//zetasql/parser:bison_parser_mode", + "//zetasql/parser:bison_token_codes", + "//zetasql/proto:internal_error_location_cc_proto", + "//zetasql/public:error_helpers", + "//zetasql/public:error_location_cc_proto", + "//zetasql/public:language_options", + "//zetasql/public:options_cc_proto", + "//zetasql/public:parse_location", + "//zetasql/public/functions:convert_string", + "@com_google_absl//absl/memory", + "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:str_format", + ], +) + +cc_test( + name = "macro_expander_test", + size = "small", + srcs = ["macro_expander_test.cc"], + deps = [ + ":flex_token_provider", + ":macro_expander", + ":quoting", + ":standalone_macro_expansion", + ":token_with_location", + "//zetasql/base:arena", + "//zetasql/base/testing:status_matchers", + "//zetasql/base/testing:zetasql_gtest_main", + "//zetasql/parser:bison_parser_mode", + "//zetasql/parser:bison_token_codes", + "//zetasql/parser:token_disambiguator", # build_cleaner: keep + "//zetasql/parser/macros:macro_catalog", + "//zetasql/public:error_helpers", + "//zetasql/public:language_options", + "//zetasql/public:options_cc_proto", + "//zetasql/public:parse_location", + "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:str_format", + ], +) + +cc_library( + name = "standalone_macro_expansion", + srcs = ["standalone_macro_expansion.cc"], + hdrs = ["standalone_macro_expansion.h"], + visibility = [ + "//zetasql/parser:__pkg__", + "//zetasql/tools/execute_query:__pkg__", + ], + deps = [ + ":token_splicing_utils", + ":token_with_location", + "//zetasql/base:check", + "//zetasql/parser:bison_token_codes", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", + ], +) diff --git a/zetasql/parser/macros/flex_token_provider.cc b/zetasql/parser/macros/flex_token_provider.cc new file mode 100644 index 000000000..ca87c69ad --- /dev/null +++ b/zetasql/parser/macros/flex_token_provider.cc @@ -0,0 +1,76 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/flex_token_provider.h" + +#include +#include + +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/flex_tokenizer.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/parse_location.h" +#include "zetasql/base/check.h" +#include "absl/status/statusor.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { +namespace parser { +namespace macros { + +static absl::string_view GetTextBetween(absl::string_view input, size_t start, + size_t end) { + ABSL_DCHECK_LE(start, end); + ABSL_DCHECK_LE(start, input.length()); + size_t len = end - start; + ABSL_DCHECK_LE(len, input.length()); + return absl::ClippedSubstr(input, start, len); +} + +FlexTokenProvider::FlexTokenProvider(BisonParserMode mode, + absl::string_view filename, + absl::string_view input, int start_offset, + const LanguageOptions& language_options) + : mode_(mode), + language_options_(language_options), + tokenizer_(std::make_unique( + mode, filename, input, start_offset, language_options)), + location_(ParseLocationPoint::FromByteOffset(filename, -1), + ParseLocationPoint::FromByteOffset(filename, -1)) {} + +absl::StatusOr FlexTokenProvider::GetFlexToken() { + int last_token_end_offset = location_.end().GetByteOffset(); + if (last_token_end_offset == -1) { + last_token_end_offset = 0; + } + + ZETASQL_ASSIGN_OR_RETURN(int token_kind, tokenizer_->GetNextToken(&location_)); + + absl::string_view prev_whitespaces; + absl::string_view input = tokenizer_->input(); + prev_whitespaces = GetTextBetween(input, last_token_end_offset, + location_.start().GetByteOffset()); + + return { + {token_kind, location_, location_.GetTextFrom(input), prev_whitespaces}}; +} + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/flex_token_provider.h b/zetasql/parser/macros/flex_token_provider.h new file mode 100644 index 000000000..7dbe75c20 --- /dev/null +++ b/zetasql/parser/macros/flex_token_provider.h @@ -0,0 +1,107 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_MACROS_FLEX_TOKEN_PROVIDER_H_ +#define ZETASQL_PARSER_MACROS_FLEX_TOKEN_PROVIDER_H_ + +#include +#include + +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/flex_tokenizer.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/language_options.h" +#include "absl/status/statusor.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { +namespace parser { +namespace macros { + +// Provides the next token from a Flex tokenizer without any macro expansion. +// This is the normal case, where we only have the text and we need to +// tokenize it from the start. +class FlexTokenProvider { + public: + FlexTokenProvider(BisonParserMode mode, absl::string_view filename, + absl::string_view input, int start_offset, + const LanguageOptions& language_options); + + FlexTokenProvider(const FlexTokenProvider&) = delete; + FlexTokenProvider& operator=(const FlexTokenProvider&) = delete; + + // Peeks the next token, but does not consume it. + absl::StatusOr PeekNextToken() { + if (input_token_buffer_.empty()) { + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation next_token, GetFlexToken()); + input_token_buffer_.push(next_token); + return next_token; + } + return input_token_buffer_.front(); + } + + absl::string_view filename() const { return tokenizer_->filename(); } + absl::string_view input() const { return tokenizer_->input(); } + int num_consumed_tokens() const { return num_consumed_tokens_; } + const LanguageOptions& language_options() const { return language_options_; } + + absl::StatusOr ConsumeNextToken() { + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation next_token, ConsumeNextTokenImpl()); + num_consumed_tokens_++; + return next_token; + } + + private: + // Consumes the next token from the buffer, or pull one from Flex if the + // buffer is empty. + absl::StatusOr ConsumeNextTokenImpl() { + if (!input_token_buffer_.empty()) { + // Check for any unused tokens first, before we pull any more + const TokenWithLocation front_token = input_token_buffer_.front(); + input_token_buffer_.pop(); + return front_token; + } + + return GetFlexToken(); + } + + // Pulls the next token from Flex. + absl::StatusOr GetFlexToken(); + + // The parsing mode used when creating this object. + const BisonParserMode mode_; + + const LanguageOptions& language_options_; + + // Used as a buffer when we need a lookahead from the tokenizer. + // Any tokens here are still unprocessed by the expander. + std::queue input_token_buffer_; + + // The ZetaSQL tokenizer which gives us all the tokens. + std::unique_ptr tokenizer_; + + // Location into the current input, used by the tokenizer. + Location location_; + + int num_consumed_tokens_ = 0; +}; + +} // namespace macros +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_MACROS_FLEX_TOKEN_PROVIDER_H_ diff --git a/zetasql/parser/macros/flex_token_provider_test.cc b/zetasql/parser/macros/flex_token_provider_test.cc new file mode 100644 index 000000000..a2654b204 --- /dev/null +++ b/zetasql/parser/macros/flex_token_provider_test.cc @@ -0,0 +1,209 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/flex_token_provider.h" + +#include + +#include "zetasql/base/testing/status_matchers.h" +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/parse_location.h" +#include "gtest/gtest.h" +#include "absl/strings/str_format.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { +namespace macros { + +using ::testing::_; +using ::testing::Eq; +using ::testing::ExplainMatchResult; +using ::testing::FieldsAre; +using ::testing::HasSubstr; +using ::zetasql_base::testing::IsOk; +using ::zetasql_base::testing::StatusIs; + +// Template specialization to print tokens in failed test messages. +static void PrintTo(const TokenWithLocation& token, std::ostream* os) { + *os << absl::StrFormat( + "(kind: %i, location: %s, text: '%s', prev_spaces: '%s')", token.kind, + token.location.GetString(), token.text, token.preceding_whitespaces); +} + +MATCHER_P(IsOkAndHoldsToken, expected, "") { + return ExplainMatchResult(IsOk(), arg, result_listener) && + ExplainMatchResult( + FieldsAre(Eq(expected.kind), Eq(expected.location), + Eq(expected.text), Eq(expected.preceding_whitespaces)), + arg.value(), result_listener); +} + +static absl::string_view kFileName = ""; + +static Location MakeLocation(int start_offset, int end_offset) { + Location location; + location.set_start( + ParseLocationPoint::FromByteOffset(kFileName, start_offset)); + location.set_end(ParseLocationPoint::FromByteOffset(kFileName, end_offset)); + return location; +} + +static FlexTokenProvider MakeTokenProvider(BisonParserMode mode, + absl::string_view input) { + return FlexTokenProvider(mode, kFileName, input, + /*start_offset=*/0, LanguageOptions()); +} + +static FlexTokenProvider MakeTokenProvider(absl::string_view input) { + return MakeTokenProvider(BisonParserMode::kTokenizer, input); +} + +TEST(FlexTokenProviderTest, RawTokenizerMode) { + absl::string_view input = "/*comment*/ 123"; + + EXPECT_THAT( + MakeTokenProvider(BisonParserMode::kTokenizer, input).ConsumeNextToken(), + IsOkAndHoldsToken(TokenWithLocation{ + .kind = INTEGER_LITERAL, + .location = MakeLocation(12, 15), + .text = "123", + .preceding_whitespaces = "/*comment*/ ", + })); +} + +TEST(FlexTokenProviderTest, RawTokenizerPreserveCommentsMode) { + absl::string_view input = "/*comment*/ 123"; + FlexTokenProvider provider = + MakeTokenProvider(BisonParserMode::kTokenizerPreserveComments, input); + EXPECT_THAT(provider.ConsumeNextToken(), IsOkAndHoldsToken(TokenWithLocation{ + .kind = COMMENT, + .location = MakeLocation(0, 11), + .text = "/*comment*/", + .preceding_whitespaces = "", + })); + EXPECT_THAT(provider.ConsumeNextToken(), IsOkAndHoldsToken(TokenWithLocation{ + .kind = INTEGER_LITERAL, + .location = MakeLocation(12, 15), + .text = "123", + .preceding_whitespaces = " ", + })); +} + +TEST(FlexTokenProviderTest, RawTokenizerNextStatementMode) { + absl::string_view input = "/*comment*/ 123"; + FlexTokenProvider provider = + MakeTokenProvider(BisonParserMode::kNextStatement, input); + EXPECT_THAT(provider.ConsumeNextToken(), IsOkAndHoldsToken(TokenWithLocation{ + .kind = CUSTOM_MODE_START, + .location = MakeLocation(0, 0), + .text = "", + .preceding_whitespaces = "", + })); + EXPECT_THAT(provider.ConsumeNextToken(), + IsOkAndHoldsToken(TokenWithLocation{ + .kind = INTEGER_LITERAL, + .location = MakeLocation(12, 15), + .text = "123", + .preceding_whitespaces = "/*comment*/ ", + })); +} + +TEST(FlexTokenProviderTest, AlwaysEndsWithEOF) { + absl::string_view input = "\t\t"; + FlexTokenProvider flex_token_provider = MakeTokenProvider(input); + EXPECT_THAT(flex_token_provider.ConsumeNextToken(), + IsOkAndHoldsToken(TokenWithLocation{ + .kind = TokenKinds::YYEOF, + .location = MakeLocation(2, 2), + .text = "", + .preceding_whitespaces = "\t\t", + })); + EXPECT_THAT(flex_token_provider.ConsumeNextToken(), + StatusIs(_, HasSubstr("Internal error: Encountered real EOF"))); +} + +TEST(FlexTokenProviderTest, CanPeekToken) { + absl::string_view input = "\t123 identifier"; + FlexTokenProvider flex_token_provider = MakeTokenProvider(input); + const TokenWithLocation int_token{ + .kind = TokenKinds::INTEGER_LITERAL, + .location = MakeLocation(1, 4), + .text = "123", + .preceding_whitespaces = "\t", + }; + EXPECT_THAT(flex_token_provider.PeekNextToken(), + IsOkAndHoldsToken(int_token)); + EXPECT_THAT(flex_token_provider.PeekNextToken(), + IsOkAndHoldsToken(int_token)); + EXPECT_THAT(flex_token_provider.ConsumeNextToken(), + IsOkAndHoldsToken(int_token)); + + const TokenWithLocation identifier_token{ + .kind = TokenKinds::IDENTIFIER, + .location = MakeLocation(5, 15), + .text = "identifier", + .preceding_whitespaces = " ", + }; + EXPECT_THAT(flex_token_provider.PeekNextToken(), + IsOkAndHoldsToken(identifier_token)); + EXPECT_THAT(flex_token_provider.PeekNextToken(), + IsOkAndHoldsToken(identifier_token)); + EXPECT_THAT(flex_token_provider.ConsumeNextToken(), + IsOkAndHoldsToken(identifier_token)); +} + +TEST(FlexTokenProviderTest, TracksCountOfConsumedTokensIncludingEOF) { + absl::string_view input = "SELECT"; + FlexTokenProvider flex_token_provider = MakeTokenProvider(input); + + TokenWithLocation first_token{ + .kind = TokenKinds::KW_SELECT, + .location = MakeLocation(0, 6), + .text = "SELECT", + .preceding_whitespaces = "", + }; + + EXPECT_THAT(flex_token_provider.PeekNextToken(), + IsOkAndHoldsToken(first_token)); + EXPECT_EQ(flex_token_provider.num_consumed_tokens(), 0); + + EXPECT_THAT(flex_token_provider.ConsumeNextToken(), + IsOkAndHoldsToken(first_token)); + EXPECT_EQ(flex_token_provider.num_consumed_tokens(), 1); + + TokenWithLocation second_token{ + .kind = TokenKinds::YYEOF, + .location = MakeLocation(6, 6), + .text = "", + .preceding_whitespaces = "", + }; + + EXPECT_THAT(flex_token_provider.PeekNextToken(), + IsOkAndHoldsToken(second_token)); + EXPECT_EQ(flex_token_provider.num_consumed_tokens(), 1); + + EXPECT_THAT(flex_token_provider.ConsumeNextToken(), + IsOkAndHoldsToken(second_token)); + EXPECT_EQ(flex_token_provider.num_consumed_tokens(), 2); +} + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/macro_catalog.h b/zetasql/parser/macros/macro_catalog.h new file mode 100644 index 000000000..826ff798a --- /dev/null +++ b/zetasql/parser/macros/macro_catalog.h @@ -0,0 +1,38 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_MACROS_MACRO_CATALOG_H_ +#define ZETASQL_PARSER_MACROS_MACRO_CATALOG_H_ + +#include + +#include "absl/container/flat_hash_map.h" + +namespace zetasql { +namespace parser { +namespace macros { + +// Represents the catalog of existing macros and their definitions. +// This will likely develop into an interface for more sophisticated catalog in +// the future, like catalog.h, with multi-part paths. +// For now is matching existing catalog APIs for compatibility. +using MacroCatalog = absl::flat_hash_map; + +} // namespace macros +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_MACROS_MACRO_CATALOG_H_ diff --git a/zetasql/parser/macros/macro_expander.cc b/zetasql/parser/macros/macro_expander.cc new file mode 100644 index 000000000..77c7f7d17 --- /dev/null +++ b/zetasql/parser/macros/macro_expander.cc @@ -0,0 +1,1031 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/macro_expander.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "zetasql/base/arena.h" +#include "zetasql/base/arena_allocator.h" +#include "zetasql/common/errors.h" +#include "zetasql/common/thread_stack.h" +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/macros/flex_token_provider.h" +#include "zetasql/parser/macros/macro_catalog.h" +#include "zetasql/parser/macros/quoting.h" +#include "zetasql/parser/macros/token_splicing_utils.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/proto/internal_error_location.pb.h" +#include "zetasql/public/error_helpers.h" +#include "zetasql/public/functions/convert_string.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/options.pb.h" +#include "zetasql/public/parse_location.h" +#include "zetasql/base/check.h" +#include "absl/memory/memory.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/strings/match.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/str_format.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { +namespace parser { +namespace macros { + +// This is a workaround until ZetaSQL's Bison is updated to a newer version +// and has this symbol. +#define YYUNDEF 257 + +// TODO: we should also employ a cycle detector to find infinitie +// macro call recursion and give the users a good error message. +#define RETURN_ERROR_IF_OUT_OF_STACK_SPACE() \ + ZETASQL_RETURN_IF_NOT_ENOUGH_STACK( \ + "Out of stack space due to deeply nested macro calls.") + +static absl::string_view GetMacroName( + const TokenWithLocation& macro_invocation_token) { + return macro_invocation_token.text.substr(1); +} + +// Note: end_offset is exclusive +static absl::string_view GetTextBetween(absl::string_view input, size_t start, + size_t end) { + ABSL_DCHECK_LE(start, end); + ABSL_DCHECK_LE(start, input.length()); + size_t len = end - start; + ABSL_DCHECK_LE(len, input.length()); + return absl::ClippedSubstr(input, start, len); +} + +absl::StatusOr ParseMacroArgIndex(absl::string_view text) { + ZETASQL_RET_CHECK_GE(text.length(), 1); + ZETASQL_RET_CHECK_EQ(text.front(), '$'); + int arg_index; + absl::Status error = absl::OkStatus(); + if (!functions::StringToNumeric(text.substr(1), &arg_index, &error)) { + ZETASQL_RET_CHECK(!error.ok()); + return error; + } + ZETASQL_RET_CHECK_OK(error); + return arg_index; +} + +// Returns true if the two tokens can be spliced into one +static bool CanSplice(const TokenWithLocation& previous_token, + int current_token_kind, + absl::string_view current_token_text) { + if (!IsKeywordOrUnquotedIdentifier(previous_token)) { + return false; + } + + return current_token_kind == INTEGER_LITERAL || + IsKeywordOrUnquotedIdentifier(current_token_kind, current_token_text); +} + +// Convenience overload +static bool CanSplice(const TokenWithLocation& previous, + const TokenWithLocation& current) { + return CanSplice(previous, current.kind, current.text); +} + +static bool AreSame(const QuotingSpec& q1, const QuotingSpec& q2) { + ABSL_DCHECK(q1.literal_kind() == q2.literal_kind()); + + if (q1.quote_kind() != q2.quote_kind()) { + return false; + } + + if (q1.prefix().length() != q2.prefix().length()) { + return false; + } + + return q1.prefix().length() == 2 || + zetasql_base::CaseEqual(q1.prefix(), q2.prefix()); +} + +bool MacroExpander::IsStrict() const { + return token_provider_->language_options().LanguageFeatureEnabled( + FEATURE_V_1_4_ENFORCE_STRICT_MACROS); +} + +static std::unique_ptr CreateUnsafeArena() { + return std::make_unique(/*block_size=*/4096); +} + +MacroExpander::MacroExpander(std::unique_ptr token_provider, + const MacroCatalog& macro_catalog, + zetasql_base::UnsafeArena* arena, + ErrorMessageOptions error_message_options, + StackFrame* parent_location) + : MacroExpander(std::move(token_provider), macro_catalog, arena, + /*call_arguments=*/{}, error_message_options, + parent_location) {} + +absl::StatusOr MacroExpander::ExpandMacros( + absl::string_view filename, absl::string_view input, + const MacroCatalog& macro_catalog, const LanguageOptions& language_options, + ErrorMessageOptions error_message_options) { + ExpansionOutput expansion_output; + expansion_output.arena = CreateUnsafeArena(); + auto token_provider = std::make_unique( + BisonParserMode::kTokenizer, filename, input, /*start_offset=*/0, + language_options); + ZETASQL_RETURN_IF_ERROR(ExpandMacrosInternal( + std::move(token_provider), macro_catalog, expansion_output.arena.get(), + /*call_arguments=*/{}, error_message_options, /*parent_location=*/nullptr, + expansion_output.expanded_tokens, expansion_output.warnings, + /*out_max_arg_ref_index=*/nullptr)); + return expansion_output; +} + +absl::Status MacroExpander::MakeSqlErrorAt(const ParseLocationPoint& location, + absl::string_view message) { + zetasql_base::StatusBuilder status = MakeSqlError() << message; + InternalErrorLocation internal_location = location.ToInternalErrorLocation(); + StackFrame* next_ancestor = parent_location_; + while (next_ancestor != nullptr) { + *internal_location.add_error_source() = next_ancestor->error_source; + next_ancestor = next_ancestor->parent; + } + status.AttachPayload(std::move(internal_location)); + return MaybeUpdateErrorFromPayload(error_message_options_, + token_provider_->input(), status); +} + +absl::StatusOr MacroExpander::GetNextToken() { + // The loop is needed to skip chunks that all expand into empty. For example: + // $empty $empty $empty $empty (...etc) + // Eventually we will hit a nonempty expansion, or YYEOF in the output buffer. + while (output_token_buffer_.empty()) { + ZETASQL_RETURN_IF_ERROR(LoadPotentiallySplicingTokens()); + ZETASQL_RETURN_IF_ERROR(ExpandPotentiallySplicingTokens()); + } + + TokenWithLocation token = output_token_buffer_.ConsumeToken(); + if (token.kind == ';' || token.kind == YYEOF) { + at_statement_start_ = true; + inside_macro_definition_ = false; + ZETASQL_RET_CHECK(output_token_buffer_.empty()); + } + return token; +} + +int MacroExpander::num_unexpanded_tokens_consumed() const { + return token_provider_->num_consumed_tokens(); +} + +// Returns true if `token` may splice from the right of `last_token`. +// `last_was_macro_invocation` is passed separately because a macro invocation +// has multiple tokens when it has arguments, so last_token would only reflect +// the closing parenthesis in that case. +static bool CanUnexpandedTokensSplice(bool last_was_macro_invocation, + const TokenWithLocation& last_token, + const TokenWithLocation& token) { + if (token.start_offset() > last_token.end_offset()) { + // Forced space: we will never splice! + return false; + } + + if (token.kind == MACRO_INVOCATION || + token.kind == MACRO_ARGUMENT_REFERENCE) { + // Invocations and macro args can splice with pretty much anything. + // Safer to always load them. + return true; + } + + if (token.kind == INTEGER_LITERAL) { + // Those will never splice with a previous non-macro token. + // Otherwise, it'd have already lexed with it. + return last_was_macro_invocation; + } + + // Everything else: we have 2 categories: + // 1. Unquoted identifiers and keywords: those can splice with macro + // invocations or argument references. + // 2. Everything else, e.g. symbols, quoted IDs, string literals, etc, does + // not splice. + if (!IsKeywordOrUnquotedIdentifier(token.kind, token.text)) { + return false; + } + return last_was_macro_invocation || + last_token.kind == MACRO_ARGUMENT_REFERENCE; +} + +static absl::string_view AllocateString(absl::string_view str, + zetasql_base::UnsafeArena* arena) { + return *::zetasql_base::NewInArena, zetasql_base::ArenaAllocator>>( + arena, str, arena); +} + +absl::string_view MacroExpander::MaybeAllocateConcatenation( + absl::string_view a, absl::string_view b) { + if (a.empty()) { + return b; + } + if (b.empty()) { + return a; + } + + return AllocateString(absl::StrCat(a, b), arena_); +} + +absl::StatusOr MacroExpander::ConsumeInputToken() { + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation token, + token_provider_->ConsumeNextToken()); + if (IsStrict()) { + return token; + } + + // Add a warning if this is a lenient token (Backslash or a generalized + // identifier that starts with a number, e.g. 30d or 1ab23cd). + if (token.kind == BACKSLASH || + (token.kind == IDENTIFIER && + std::isdigit(token.text.front() && !std::isdigit(token.text.back())))) { + ZETASQL_RETURN_IF_ERROR(RaiseErrorOrAddWarning(MakeSqlErrorAt( + token.location.start(), + absl::StrFormat( + "Invalid token (%s). Did you mean to use it in a literal?", + token.text)))); + } + return token; +} + +absl::Status MacroExpander::LoadPotentiallySplicingTokens() { + ZETASQL_RET_CHECK(splicing_buffer_.empty()); + ZETASQL_RET_CHECK(output_token_buffer_.empty()); + + // Special logic to find top-level DEFINE MACRO statement, which do not get + // expanded. + if (at_statement_start_ && call_arguments_.empty()) { + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation token, token_provider_->PeekNextToken()); + if (token.kind == CUSTOM_MODE_START) { + ZETASQL_ASSIGN_OR_RETURN(token, ConsumeInputToken()); + splicing_buffer_.push(token); + ZETASQL_ASSIGN_OR_RETURN(token, token_provider_->PeekNextToken()); + } + if (token.kind == KW_DEFINE) { + ZETASQL_ASSIGN_OR_RETURN(token, ConsumeInputToken()); + splicing_buffer_.push(token); + + ZETASQL_ASSIGN_OR_RETURN(token, token_provider_->PeekNextToken()); + if (token.kind == KW_MACRO) { + // Mark the leading DEFINE keyword as the special one marking a DEFINE + // MACRO statement. + splicing_buffer_.back().kind = KW_DEFINE_FOR_MACROS; + inside_macro_definition_ = true; + at_statement_start_ = false; + return absl::OkStatus(); + } + } + } + + at_statement_start_ = false; + bool last_was_macro_invocation = false; + + // Unquoted identifiers and INT_LITERALs may only splice to the left if what + // came before them was a macro invocation. Unquoted identifiers may further + // splice with an argument reference to their left. + while (true) { + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation token, token_provider_->PeekNextToken()); + if (!splicing_buffer_.empty() && + !CanUnexpandedTokensSplice(last_was_macro_invocation, + splicing_buffer_.back(), token)) { + return absl::OkStatus(); + } + ZETASQL_ASSIGN_OR_RETURN(token, ConsumeInputToken()); + splicing_buffer_.push(token); + if (token.kind == YYEOF) { + return absl::OkStatus(); + } + + if (token.kind == MACRO_INVOCATION) { + ZETASQL_RETURN_IF_ERROR(LoadArgsIfAny()); + last_was_macro_invocation = true; + } else { + last_was_macro_invocation = false; + } + } + + ZETASQL_RET_CHECK_FAIL() << "We should never hit this path. We always return from " + "inside the loop"; +} + +absl::Status MacroExpander::LoadArgsIfAny() { + ZETASQL_RET_CHECK(!splicing_buffer_.empty()) + << "Splicing buffer cannot be empty. This method should not be " + "called except after a macro invocation has been loaded"; + ZETASQL_RET_CHECK(splicing_buffer_.back().kind == MACRO_INVOCATION) + << "This method should not be called except after a macro invocation " + "has been loaded"; + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation token, token_provider_->PeekNextToken()); + if (token.kind != '(' || + token.start_offset() > splicing_buffer_.back().end_offset()) { + // The next token is not an opening parenthesis, or is an open parenthesis + // that is separated by some whitespace. This means that the current + // invocation does not have an argument list. + return absl::OkStatus(); + } + + ZETASQL_ASSIGN_OR_RETURN(token, ConsumeInputToken()); + splicing_buffer_.push(token); + return LoadUntilParenthesesBalance(); +} + +absl::Status MacroExpander::LoadUntilParenthesesBalance() { + ZETASQL_RET_CHECK(!splicing_buffer_.empty()); + ZETASQL_RET_CHECK_EQ(splicing_buffer_.back().kind, '('); + int num_open_parens = 1; + while (num_open_parens > 0) { + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation token, ConsumeInputToken()); + switch (token.kind) { + case '(': + num_open_parens++; + break; + case ')': + num_open_parens--; + break; + case ';': + case YYEOF: + // Always an error, even when not in strict mode. + return MakeSqlErrorAt( + token.location.start(), + "Unbalanced parentheses in macro argument list. Make sure that " + "parentheses are balanced even inside macro arguments."); + default: + break; + } + splicing_buffer_.push(token); + } + + return absl::OkStatus(); +} + +// Returns the given status as error if expanding in strict mode, or adds it +// as a warning otherwise. +// Note that not all problematic conditions can be relegated to warnings. +// For example, a macro invocation with unbalanced parens is always an error. +absl::Status MacroExpander::RaiseErrorOrAddWarning(absl::Status status) { + ZETASQL_RET_CHECK(!status.ok()); + if (IsStrict()) { + return status; + } + if (warnings_.size() < max_warnings_) { + warnings_.push_back(std::move(status)); + } else if (warnings_.size() == max_warnings_) { + // Add a "sentinel" warning indicating there were more. + warnings_.push_back(absl::InvalidArgumentError( + "Warning count limit reached. Truncating further warnings")); + } + return absl::OkStatus(); +} + +absl::StatusOr MacroExpander::Splice( + TokenWithLocation pending_token, absl::string_view incoming_token_text, + const ParseLocationPoint& location) { + ZETASQL_RET_CHECK(!incoming_token_text.empty()); + ZETASQL_RET_CHECK(!pending_token.text.empty()); + ZETASQL_RET_CHECK_NE(pending_token.kind, YYUNDEF); + + ZETASQL_RETURN_IF_ERROR(RaiseErrorOrAddWarning(MakeSqlErrorAt( + location, absl::StrFormat("Splicing tokens (%s) and (%s)", + pending_token.text, incoming_token_text)))); + + pending_token.text = + MaybeAllocateConcatenation(pending_token.text, incoming_token_text); + + // The splicing token could be a keyword, e.g. FR+OM becoming FROM. But for + // expansion purposes, these are all identifiers, and likely will splice with + // more characters and not be a keyword. Re-lexing happens at the very end. + // TODO: strict mode should produce an error when forming keywords through + // splicing. + pending_token.kind = IDENTIFIER; + return pending_token; +} + +absl::StatusOr MacroExpander::ExpandAndMaybeSpliceMacroItem( + TokenWithLocation unexpanded_macro_token, TokenWithLocation pending_token) { + std::vector expanded_tokens; + if (unexpanded_macro_token.kind == MACRO_ARGUMENT_REFERENCE) { + ZETASQL_RETURN_IF_ERROR( + ExpandMacroArgumentReference(unexpanded_macro_token, expanded_tokens)); + } else { + ZETASQL_RET_CHECK_EQ(unexpanded_macro_token.kind, MACRO_INVOCATION); + + absl::string_view macro_name = GetMacroName(unexpanded_macro_token); + const auto& it = macro_catalog_.find(macro_name); + if (it == macro_catalog_.end()) { + ZETASQL_RETURN_IF_ERROR(RaiseErrorOrAddWarning(MakeSqlErrorAt( + unexpanded_macro_token.location.start(), + absl::StrFormat("Macro '%s' not found.", macro_name)))); + // In lenient mode, just return this token as is, and let the argument + // list expand as if it were just normal text. + return AdvancePendingToken(std::move(pending_token), + std::move(unexpanded_macro_token)); + } + ZETASQL_RETURN_IF_ERROR(ExpandMacroInvocation(unexpanded_macro_token, it->second, + expanded_tokens)); + } + + ZETASQL_RET_CHECK(!expanded_tokens.empty()) + << "A proper expansion should have at least the YYEOF token at " + "the end. Failure was when expanding " + << unexpanded_macro_token.text; + ZETASQL_RET_CHECK(expanded_tokens.back().kind == YYEOF); + // Pop the trailing space at the end of the expansion, which is tacked + // on to the YYEOF. + expanded_tokens.pop_back(); + + // Empty expansions are tricky. There are 2 cases, the 3rd is impossible: + // 1. pending_token == YYUNDEF: only has whitespace, potentially carried from + // previous chunks as in: $empty $empty \t $empty + // 2. pending_token may splice, e.g. part of an identifier (which means there + // can never be whitespace separation, or we wouldn't have chunked them + // together). + // 3. (impossible) pending_token never splices, e.g. '+' (in which case we + // wouldn't be in the same chunk anyway) + if (expanded_tokens.empty()) { + if (!unexpanded_macro_token.preceding_whitespaces.empty()) { + ZETASQL_RET_CHECK(pending_token.kind == YYUNDEF); + pending_token.preceding_whitespaces = MaybeAllocateConcatenation( + pending_token.preceding_whitespaces, + unexpanded_macro_token.preceding_whitespaces); + } + return pending_token; + } + + if (CanSplice(pending_token, expanded_tokens.front())) { + ZETASQL_RET_CHECK(unexpanded_macro_token.preceding_whitespaces.empty()); + // TODO: warning on implicit splicing here and everywhere else + // The first token will splice with what we currently have. + // TODO: if we already have another token, do we + // allow splicing identifiers into a keywod? e.g. FR + OM = FROM. + // 1. Each on its own is an identifier. Together they're a + // keyword. This is token splicing. We probably should make + // this an error. + // However: + // 2. For all we know, more identifiers may come up. So the + // final token is FROMother. + // 3. This is an open question, probably to be done at the final + // result like a validation pass. But even that likely won't + // get it, because we can't tell the difference between a + // quoted and an unquoted identifier. + // 4. Long term, we'd like to stop splicing, and introduce an + // explicit splice operator like C's ##. + // 5. Furthermore, a keyword like FROM may still be understood + // by the parser to be an identifier anyway, such as when it + // is preceded by a dot, e.g. 'a.FROM'. + ZETASQL_ASSIGN_OR_RETURN( + pending_token, + Splice(std::move(pending_token), expanded_tokens.front().text, + unexpanded_macro_token.location.start())); + } else { + // The first expanded token becomes the pending token, and takes on the + // previous whitespace (for example from a series of $empty), as well as + // the whitespace before the original macro token (but drops the leading + // whitespaces from the definition.) + expanded_tokens.front().preceding_whitespaces = + unexpanded_macro_token.preceding_whitespaces; + ZETASQL_ASSIGN_OR_RETURN(pending_token, + AdvancePendingToken(std::move(pending_token), + std::move(expanded_tokens.front()))); + } + + if (expanded_tokens.size() > 1) { + // Leading and trailing spaces around the expansion results are + // discarded, but not the whitespaces within. Consequently, the + // first and last tokens in an expansion may splice with the + // calling context. Everything in the middle is kept as is, with + // no splicing. + output_token_buffer_.Push(std::move(pending_token)); + + // Everything in the middle (all except the first and last) will + // not splice. + for (int i = 1; i < expanded_tokens.size() - 1; i++) { + // Use the unexpanded token's location + output_token_buffer_.Push(std::move(expanded_tokens[i])); + } + pending_token = expanded_tokens.back(); + } + return pending_token; +} + +absl::StatusOr MacroExpander::AdvancePendingToken( + TokenWithLocation pending_token, TokenWithLocation incoming_token) { + if (pending_token.kind != YYUNDEF) { + output_token_buffer_.Push(std::move(pending_token)); + } else { + ZETASQL_RET_CHECK(pending_token.text.empty()); + // Prepend any pending whitespaces. If there's none, we skip allocating + // a copy of incoming_token.preceding_whitespaces. + incoming_token.preceding_whitespaces = + MaybeAllocateConcatenation(pending_token.preceding_whitespaces, + incoming_token.preceding_whitespaces); + } + return incoming_token; +} + +static bool IsQuotedLiteral(const TokenWithLocation& token) { + return token.kind == STRING_LITERAL || token.kind == BYTES_LITERAL || + IsQuotedIdentifier(token); +} + +absl::Status MacroExpander::ExpandPotentiallySplicingTokens() { + ZETASQL_RET_CHECK(!splicing_buffer_.empty()); + TokenWithLocation pending_token{ + .kind = YYUNDEF, + .location = Location{}, + .text = "", + .preceding_whitespaces = pending_whitespaces_}; + // Do not forget to reset pending_whitespaces_ + pending_whitespaces_ = ""; + while (!splicing_buffer_.empty()) { + TokenWithLocation token = splicing_buffer_.front(); + splicing_buffer_.pop(); + + if (inside_macro_definition_) { + // Check the pending token in case there were some pending spaces from the + // previous statement. But there shouldn't be any splicing. + ZETASQL_ASSIGN_OR_RETURN( + pending_token, + AdvancePendingToken(std::move(pending_token), std::move(token))); + } else if (token.kind == MACRO_ARGUMENT_REFERENCE || + token.kind == MACRO_INVOCATION) { + ZETASQL_ASSIGN_OR_RETURN(pending_token, + ExpandAndMaybeSpliceMacroItem(std::move(token), + std::move(pending_token))); + } else if (IsQuotedLiteral(token)) { + ZETASQL_ASSIGN_OR_RETURN(pending_token, ExpandLiteral(std::move(pending_token), + std::move(token))); + } else if (CanSplice(pending_token, token.kind, token.text)) { + ZETASQL_ASSIGN_OR_RETURN( + pending_token, + Splice(std::move(pending_token), token.text, token.location.start())); + } else { + ZETASQL_ASSIGN_OR_RETURN( + pending_token, + AdvancePendingToken(std::move(pending_token), std::move(token))); + } + } + + ZETASQL_RET_CHECK(pending_whitespaces_.empty()); + if (pending_token.kind != YYUNDEF) { + output_token_buffer_.Push(std::move(pending_token)); + pending_whitespaces_ = ""; + } else if (!pending_token.preceding_whitespaces.empty()) { + // This is the case where the last part of our chunk has all been empty + // expansions. We need to hold onto these whitespaces for the next chunk. + pending_whitespaces_ = + AllocateString(pending_token.preceding_whitespaces, arena_); + } + return absl::OkStatus(); +} + +absl::Status MacroExpander::ParseAndExpandArgs( + const TokenWithLocation& unexpanded_macro_invocation_token, + std::vector>& expanded_args) { + RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); + ZETASQL_RET_CHECK(expanded_args.empty()); + // The first argument is the macro name + expanded_args.push_back(std::vector{ + {.kind = IDENTIFIER, + .location = unexpanded_macro_invocation_token.location, + .text = GetMacroName(unexpanded_macro_invocation_token), + .preceding_whitespaces = ""}, + {.kind = YYEOF, + .location = unexpanded_macro_invocation_token.location, + .text = "", + .preceding_whitespaces = ""}}); + + if (splicing_buffer_.empty() || splicing_buffer_.front().kind != '(') { + ZETASQL_RETURN_IF_ERROR(RaiseErrorOrAddWarning( + MakeSqlErrorAt(unexpanded_macro_invocation_token.location.end(), + "Macro invocation missing argument list."))); + return absl::OkStatus(); + } + + const TokenWithLocation opening_paren = splicing_buffer_.front(); + splicing_buffer_.pop(); + ZETASQL_RET_CHECK(opening_paren.kind == '('); + int arg_start_offset = opening_paren.end_offset(); + + // Parse arguments, each being a sequence of tokens. + struct ParseRange { + int start_offset = -1; + int end_offset = -1; + }; + std::vector unexpanded_args; + ZETASQL_RET_CHECK(!splicing_buffer_.empty()); + if (splicing_buffer_.front().kind == ')') { + // Special case: empty parentheses mean zero arguments, not a single + // empty argument! + splicing_buffer_.pop(); + } else { + int num_open_parens = 1; + while (num_open_parens > 0) { + ZETASQL_RET_CHECK(!splicing_buffer_.empty()); + TokenWithLocation token = splicing_buffer_.front(); + splicing_buffer_.pop(); + + // The current argument ends at the next top-level comma, or the closing + // parenthesis. Note that an argument may itself have parentheses and + // commas, for example: + // $m( x( a , b ), y ) + // The arguments to the invocation of $m are `x(a,b)` and y. + // The comma between `a` and `b` is internal, not top-level. + if (token.kind == '(') { + num_open_parens++; + } else if (token.kind == ',' && num_open_parens == 1) { + // Top-level comma means the end of the current argument + unexpanded_args.push_back( + {.start_offset = arg_start_offset, + .end_offset = + token.start_offset() - + static_cast(token.preceding_whitespaces.length())}); + arg_start_offset = token.end_offset(); + } else if (token.kind == ')') { + num_open_parens--; + if (num_open_parens == 0) { + // This was the last argument. + unexpanded_args.push_back( + {.start_offset = arg_start_offset, + .end_offset = + token.start_offset() - + static_cast(token.preceding_whitespaces.length())}); + break; + } + } + } + } + + for (const auto& [arg_start_offset, arg_end_offset] : unexpanded_args) { + std::vector expanded_arg; + int max_arg_ref_index_in_current_arg; + ZETASQL_RETURN_IF_ERROR(ExpandMacrosInternal( + // Expand this arg, but note that we expand in raw tokenization mode: + // We are not necessarily starting a statement or a script. + // Pass the input from the beginning of the file, up to the end offset. + // Note that the start offset is passed separately, in order to reflect + // the correct location of each token. + // If we only pass the arg as the input, the location offsets will start + // from 0. + std::make_unique( + BisonParserMode::kTokenizer, token_provider_->filename(), + GetTextBetween(token_provider_->input(), 0, arg_end_offset), + arg_start_offset, token_provider_->language_options()), + macro_catalog_, arena_, call_arguments_, error_message_options_, + parent_location_, expanded_arg, warnings_, + &max_arg_ref_index_in_current_arg)); + + max_arg_ref_index_ = + std::max(max_arg_ref_index_, max_arg_ref_index_in_current_arg); + ZETASQL_RET_CHECK(!expanded_arg.empty()) << "A proper expansion should have at " + "least the YYEOF token at the end"; + ZETASQL_RET_CHECK(expanded_arg.back().kind == YYEOF); + expanded_args.push_back(std::move(expanded_arg)); + } + + return absl::OkStatus(); +} + +absl::Status MacroExpander::ExpandMacroArgumentReference( + const TokenWithLocation& token, + std::vector& expanded_tokens) { + RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); + ZETASQL_RET_CHECK(expanded_tokens.empty()); + if (call_arguments_.empty()) { + // This is the top-level, not the body of a macro (otherwise, we'd have at + // least the macro name as arg #0). This means we just leave the arg ref + // unexpanded, as opposed to assuming it was passed as if empty. + expanded_tokens = {token, + {.kind = YYEOF, + .location = token.location, + .text = "", + .preceding_whitespaces = ""}}; + return absl::OkStatus(); + } + + ZETASQL_ASSIGN_OR_RETURN(int arg_index, ParseMacroArgIndex(token.text)); + max_arg_ref_index_ = std::max(arg_index, max_arg_ref_index_); + + if (arg_index >= call_arguments_.size()) { + // TODO: provide the location of the invocation when we have + // location stacking. + ZETASQL_RETURN_IF_ERROR(RaiseErrorOrAddWarning(MakeSqlErrorAt( + token.location.start(), + absl::StrFormat( + "Argument index %s out of range. Invocation was provided " + "only %d arguments.", + token.text, call_arguments_.size())))); + expanded_tokens = {TokenWithLocation{.kind = YYEOF, + .location = token.location, + .text = "", + .preceding_whitespaces = ""}}; + return absl::OkStatus(); + } + + // Copy the list, since the same argument may be referenced + // elsewhere. + expanded_tokens = call_arguments_[arg_index]; + return absl::OkStatus(); +} + +absl::StatusOr MacroExpander::MakeStackFrame( + const ParseLocationPoint& location) const { + ErrorSource error_source; + error_source.set_error_message( + absl::StrCat("Expanded from ", token_provider_->filename())); + + ParseLocationTranslator location_translator(token_provider_->input()); + std::pair line_and_column; + ZETASQL_ASSIGN_OR_RETURN( + line_and_column, + location_translator.GetLineAndColumnAfterTabExpansion(location), + _ << "Location " << location.GetString() << "not found in:\n" + << token_provider_->input()); + + ErrorLocation* err_loc = error_source.mutable_error_location(); + err_loc->set_filename(token_provider_->filename()); + err_loc->set_line(line_and_column.first); + err_loc->set_column(line_and_column.second); + + error_source.set_error_message_caret_string( + GetErrorStringWithCaret(token_provider_->input(), *err_loc)); + + return StackFrame{.error_source = error_source, .parent = parent_location_}; +} + +// Expands the macro invocation starting at the given token. +// REQUIRES: Any arguments must have already been loaded into the splicing +// buffer. +absl::Status MacroExpander::ExpandMacroInvocation( + const TokenWithLocation& token, absl::string_view macro_definition, + std::vector& expanded_tokens) { + ZETASQL_RET_CHECK(!token.text.empty()); + ZETASQL_RET_CHECK_EQ(token.text.front(), '$'); + ZETASQL_RET_CHECK(token.kind == MACRO_INVOCATION); + + // We expand arguments regardless, even if the macro being invoked does not + // exist. + std::vector> expanded_args; + ZETASQL_RETURN_IF_ERROR(ParseAndExpandArgs(token, expanded_args)); + + absl::string_view macro_name_as_source = + MaybeAllocateConcatenation("macro:", GetMacroName(token)); + + // The macro definition can contain anything, not necessarily a statement or + // a script. Expanding a definition for an invocation always occurs in raw + // tokenization, without carrying over comments. + auto child_token_provider = std::make_unique( + BisonParserMode::kTokenizer, macro_name_as_source, macro_definition, + /*start_offset=*/0, token_provider_->language_options()); + + int num_args = static_cast(expanded_args.size()) - 1; + + ZETASQL_ASSIGN_OR_RETURN(StackFrame stack_frame, + MakeStackFrame(token.location.start())); + int max_arg_ref_in_definition; + ZETASQL_RETURN_IF_ERROR(ExpandMacrosInternal( + std::move(child_token_provider), macro_catalog_, arena_, + std::move(expanded_args), error_message_options_, &stack_frame, + expanded_tokens, warnings_, &max_arg_ref_in_definition)); + if (num_args > max_arg_ref_in_definition) { + ZETASQL_RETURN_IF_ERROR(RaiseErrorOrAddWarning(MakeSqlErrorAt( + token.location.start(), + absl::StrFormat("Macro invocation has too many arguments (%d) while " + "the definition only references up to %d arguments", + num_args, max_arg_ref_in_definition)))); + } + return absl::OkStatus(); +} + +absl::StatusOr MacroExpander::ExpandLiteral( + TokenWithLocation pending_token, TokenWithLocation literal_token) { + RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); + + if (IsStrict()) { + // Strict mode does not expand literals. + return AdvancePendingToken(std::move(pending_token), + std::move(literal_token)); + } + + absl::string_view literal_contents; + ZETASQL_ASSIGN_OR_RETURN( + QuotingSpec quoting, + QuotingSpec::FindQuotingKind(literal_token.text, literal_contents)); + + // For error offset calculation + int literal_content_start_offset = static_cast( + quoting.prefix().length() + QuoteStr(quoting.quote_kind()).length()); + + // We cannot expand like normal, because literals can contain anything, + // which means we cannot count on our parser. For example, 3m is not a + // SQL token. Furthermore, consider expanding the literal "#$a('#)", what is + // a comment and what is just actual text? For tractability, we do not allow + // arguments. We just have our own miniparser here. + std::string content; + for (int num_chars_read = 0; num_chars_read < literal_contents.length(); + num_chars_read++) { + char cur_char = literal_contents[num_chars_read]; + + // If this is not a macro invocation or a macro argument, or if this is + // a dollar sign at the end, then no expansion will occur. Append + // directly. + if (cur_char != '$' || num_chars_read == literal_contents.length() - 1 || + !IsIdentifierCharacter(literal_contents[num_chars_read + 1])) { + absl::StrAppend(&content, std::string(1, cur_char)); + continue; + } + + // This is a macro token: either an invocation, a macro argument, or a + // standalone dollar sign with other stuff behind it. + int end_index = num_chars_read + 1; + ZETASQL_RET_CHECK(end_index < literal_contents.length()); + if (CanCharStartAnIdentifier(literal_contents[end_index])) { + do { + end_index++; + } while (end_index < literal_contents.size() && + IsIdentifierCharacter(literal_contents[end_index])); + + if (end_index < literal_contents.size() && + literal_contents[end_index] == '(') { + ParseLocationPoint paren_location = literal_token.location.start(); + paren_location.IncrementByteOffset(literal_content_start_offset + + end_index); + + ZETASQL_RETURN_IF_ERROR(RaiseErrorOrAddWarning(MakeSqlErrorAt( + paren_location, "Argument lists are not allowed inside literals"))); + + // Best-effort expansion of this invocation, if we can compose an + // argument list. No quotes or other parens inside. + int token_end = end_index + 1; + const auto disallowed_in_expansion_in_literal = [](const char c) { + return c == '(' || c == ')' || c == '\'' || c == '"'; + }; + while ( + token_end < literal_contents.size() && + !disallowed_in_expansion_in_literal(literal_contents[token_end])) { + token_end++; + } + if (token_end == literal_contents.size() || + literal_contents[token_end] != ')') { + return MakeSqlErrorAt( + paren_location, + "Nested macro argument lists inside literals are not allowed"); + } + end_index = token_end + 1; + } + } else { + // If this is an argument reference, pick up the argument position. + while (end_index < literal_contents.size() && + std::isdigit(literal_contents[end_index])) { + end_index++; + } + if (end_index > num_chars_read) { + ZETASQL_ASSIGN_OR_RETURN(int arg_index, + ParseMacroArgIndex(literal_contents.substr( + num_chars_read, end_index - num_chars_read))); + max_arg_ref_index_ = std::max(arg_index, max_arg_ref_index_); + } + } + + std::vector expanded_tokens; + absl::string_view unexpanded_macro_item = + GetTextBetween(literal_contents, num_chars_read, end_index); + + ParseLocationPoint unexpanded_macro_start_point = + literal_token.location.start(); + unexpanded_macro_start_point.IncrementByteOffset( + literal_content_start_offset + num_chars_read); + num_chars_read = end_index - 1; + + auto child_token_provider = std::make_unique( + BisonParserMode::kTokenizer, token_provider_->filename(), + unexpanded_macro_item, /*start_offset=*/0, + token_provider_->language_options()); + + ZETASQL_ASSIGN_OR_RETURN(StackFrame stack_frame, + MakeStackFrame(unexpanded_macro_start_point)); + + ZETASQL_RETURN_IF_ERROR( + ExpandMacrosInternal(std::move(child_token_provider), macro_catalog_, + arena_, call_arguments_, error_message_options_, + &stack_frame, expanded_tokens, warnings_, + /*out_max_arg_ref_index=*/nullptr)); + + ZETASQL_RET_CHECK(!expanded_tokens.empty()) + << "A proper expansion should have at least the YYEOF token at " + "the end. Failure was when expanding " + << literal_token.text; + ZETASQL_RET_CHECK_EQ(expanded_tokens.back().kind, YYEOF); + // Pop the trailing space, which is stored on the YYEOF's + // preceding_whitespaces. + expanded_tokens.pop_back(); + + for (TokenWithLocation& expanded_token : expanded_tokens) { + // Ensure the incoming text can be safely placed in the current literal + if (!IsQuotedLiteral(expanded_token)) { + absl::StrAppend(&content, expanded_token.preceding_whitespaces); + absl::StrAppend(&content, expanded_token.text); + } else { + absl::string_view extracted_token_content; + ZETASQL_ASSIGN_OR_RETURN(QuotingSpec inner_spec, + QuotingSpec::FindQuotingKind(expanded_token.text, + extracted_token_content)); + if (quoting.literal_kind() != inner_spec.literal_kind()) { + return MakeSqlErrorAt( + unexpanded_macro_start_point, + absl::StrFormat("Cannot expand a %s into a %s.", + inner_spec.Description(), quoting.Description())); + } + + if (AreSame(quoting, inner_spec)) { + // Expand in the same literal since it's identical, but remove the + // quotes. + ZETASQL_RET_CHECK(expanded_token.preceding_whitespaces.empty()); + absl::StrAppend(&content, extracted_token_content); + } else { + // We need to break the literal + TokenWithLocation current_literal = literal_token; + current_literal.text = + AllocateString(QuoteText(content, quoting), arena_); + ZETASQL_ASSIGN_OR_RETURN(current_literal, + AdvancePendingToken(std::move(pending_token), + std::move(current_literal))); + output_token_buffer_.Push(std::move(current_literal)); + if (expanded_token.preceding_whitespaces.empty()) { + // The literal components need some whitespace separation. + // see (broken link). + expanded_token.preceding_whitespaces = " "; + } + output_token_buffer_.Push(std::move(expanded_token)); + + pending_token = {.kind = YYEOF, + .location = literal_token.location, + .text = "", + .preceding_whitespaces = ""}; + + content.clear(); + literal_token.preceding_whitespaces = " "; + } + } + } + } + + literal_token.text = AllocateString(QuoteText(content, quoting), arena_); + return literal_token; +} + +absl::Status MacroExpander::ExpandMacrosInternal( + std::unique_ptr token_provider, + const MacroCatalog& macro_catalog, zetasql_base::UnsafeArena* arena, + const std::vector>& call_arguments, + ErrorMessageOptions error_message_options, StackFrame* parent_location, + std::vector& output_token_list, + std::vector& warnings, int* out_max_arg_ref_index) { + auto expander = absl::WrapUnique(new MacroExpander( + std::move(token_provider), macro_catalog, arena, call_arguments, + error_message_options, parent_location)); + do { + ZETASQL_ASSIGN_OR_RETURN(TokenWithLocation token, expander->GetNextToken()); + output_token_list.push_back(std::move(token)); + } while (output_token_list.back().kind != YYEOF); + + std::vector new_warnings = expander->ReleaseWarnings(); + std::move(new_warnings.begin(), new_warnings.end(), + std::back_inserter(warnings)); + if (out_max_arg_ref_index != nullptr) { + *out_max_arg_ref_index = expander->max_arg_ref_index_; + } + return absl::OkStatus(); +} + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/macro_expander.h b/zetasql/parser/macros/macro_expander.h new file mode 100644 index 000000000..47d75138f --- /dev/null +++ b/zetasql/parser/macros/macro_expander.h @@ -0,0 +1,327 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_MACROS_MACRO_EXPANDER_H_ +#define ZETASQL_PARSER_MACROS_MACRO_EXPANDER_H_ + +#include +#include +#include +#include + +#include "zetasql/base/arena.h" +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/macros/flex_token_provider.h" +#include "zetasql/parser/macros/macro_catalog.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/error_helpers.h" +#include "zetasql/public/error_location.pb.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/parse_location.h" +#include "zetasql/base/check.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { +namespace macros { + +// Interface for the macro expander +class MacroExpanderBase { + public: + virtual ~MacroExpanderBase() = default; + + virtual absl::StatusOr GetNextToken() = 0; + virtual int num_unexpanded_tokens_consumed() const = 0; +}; + +// Encapsulates outputs for the non-streaming API +struct ExpansionOutput { + std::vector expanded_tokens; + std::vector warnings; + std::unique_ptr arena; +}; + +// ZetaSQL's implementation of the macro expander. +class MacroExpander final : public MacroExpanderBase { + public: + struct StackFrame { + ErrorSource error_source; + StackFrame* parent; + }; + + MacroExpander(BisonParserMode mode, absl::string_view filename, + absl::string_view input, int start_offset, + const LanguageOptions& language_options, + const MacroCatalog& macro_catalog, zetasql_base::UnsafeArena* arena, + ErrorMessageOptions error_message_options, + StackFrame* parent_location) + : MacroExpander( + std::make_unique(mode, filename, input, + start_offset, language_options), + macro_catalog, arena, error_message_options, parent_location) {} + + MacroExpander(std::unique_ptr token_provider, + const MacroCatalog& macro_catalog, zetasql_base::UnsafeArena* arena, + ErrorMessageOptions error_message_options, + StackFrame* parent_location); + + MacroExpander(const MacroExpander&) = delete; + MacroExpander& operator=(const MacroExpander&) = delete; + + absl::StatusOr GetNextToken() override; + + std::vector ReleaseWarnings() { + std::vector tmp; + std::swap(tmp, warnings_); + return tmp; + } + + int num_unexpanded_tokens_consumed() const override; + + // Convenient non-streaming API to return all expanded tokens. + static absl::StatusOr ExpandMacros( + absl::string_view filename, absl::string_view input, + const MacroCatalog& macro_catalog, + const LanguageOptions& language_options, + ErrorMessageOptions error_message_options = {}); + + private: + MacroExpander( + std::unique_ptr token_provider, + const MacroCatalog& macro_catalog, zetasql_base::UnsafeArena* arena, + const std::vector> call_arguments, + ErrorMessageOptions error_message_options, StackFrame* parent_location) + : token_provider_(std::move(token_provider)), + macro_catalog_(macro_catalog), + arena_(arena), + call_arguments_(std::move(call_arguments)), + error_message_options_(error_message_options), + parent_location_(parent_location) {} + + // Because this function may be called internally (e.g. when expanding + // a nested macro), it appends to `out_warnings`, instead of replacing it. + static absl::Status ExpandMacrosInternal( + std::unique_ptr token_provider, + const MacroCatalog& macro_catalog, zetasql_base::UnsafeArena* arena, + const std::vector>& call_arguments, + ErrorMessageOptions error_message_options, StackFrame* parent_location, + std::vector& output_token_list, + std::vector& out_warnings, int* out_max_arg_ref_index); + + class TokenBuffer { + public: + void Push(TokenWithLocation token) { tokens_.push(std::move(token)); } + + bool empty() const { return tokens_.empty(); } + + // Consumes the next token from the buffer. + // REQUIRES: the buffer must not be empty. + TokenWithLocation ConsumeToken() { + ABSL_DCHECK(!tokens_.empty()); + TokenWithLocation token = std::move(tokens_.front()); + tokens_.pop(); + return token; + } + + private: + std::queue tokens_; + }; + + // Loads the next chunk of tokens that might be needed to splice the next + // token, until we hit EOF or a token that we know will absolutely never + // contribute to the current token. The candidates are loaded into + // `splicing_buffer_`. + // REQUIRES: `splicing_buffer_` must be empty. + absl::Status LoadPotentiallySplicingTokens(); + + // If we have an argument list, read it to be part of the splicing buffer. + absl::Status LoadArgsIfAny(); + + // We have already consumed the opening parenthesis. Keep reading until + // they are balanced back. + absl::Status LoadUntilParenthesesBalance(); + + // Expands everything in 'splicing_buffer_' and puts the resulting finalized + // tokens into 'output_token_buffer_' + absl::Status ExpandPotentiallySplicingTokens(); + + // Pushes `incoming_token` to `pending_token`. + // 1. If `pending_token` already has a token and is not just pending + // whitespaces, it is first flushed to the output buffer. + // 2. Otherwise, any pending whitespaces on `pending_token` are prepended to + // the preceding whitespaces of `incoming_token`. + absl::StatusOr AdvancePendingToken( + TokenWithLocation pending_token, TokenWithLocation incoming_token); + + // Parses the invocation arguments (each argument must have balanced + // parentheses) and expands the arguments. + absl::Status ParseAndExpandArgs( + const TokenWithLocation& unexpanded_macro_invocation_token, + std::vector>& expanded_args); + + // Expands the given macro invocation or argument reference and handles any + // splicing needed with the tokens around the invocation/argument reference. + // Returns the updated pending_token to reflect the state needed for deciding + // unexpanded_macro_token with the next token after the macro item. + // + // REQUIRES: For an invocation, the full argument list must have already + // been loaded into the splicing buffer. + // + // The method simply expands the given macro item, and handles any necessary + // splicing as follows: + // 1. If the expansion is empty, preserve the space before the invocation. + // This is reflected in the pending_token. + // 2. Otherwise, splice the first token with the pending token if needed. + // 3. The function returns with the last token set as the pending token.(It + // could the first if there is only one, can be already splicing with the + // previous pending token) + absl::StatusOr ExpandAndMaybeSpliceMacroItem( + TokenWithLocation unexpanded_macro_token, + TokenWithLocation pending_token); + + absl::Status ExpandMacroArgumentReference( + const TokenWithLocation& token, + std::vector& expanded_tokens); + + // Expands the macro invocation starting at the given token. + // REQUIRES: The macro definition must have already been loaded from the + // macro catalog. + absl::Status ExpandMacroInvocation( + const TokenWithLocation& token, absl::string_view macro_definition, + std::vector& expanded_tokens); + + // Expands a string literal or a quoted identifier. + absl::StatusOr ExpandLiteral( + TokenWithLocation pending_token, TokenWithLocation literal_token); + + // Creates a new token by appending the new text. + // Location is passed separately because the spliced tokens in can be from + // different expansions. We need to report at the common level of expansion + // to get the line & column number translation correct. + // REQUIRES: neither `pending_token` nor `incoming_token_text` can be empty. + absl::StatusOr Splice( + TokenWithLocation pending_token, absl::string_view incoming_token_text, + const ParseLocationPoint& location); + + // Returns the given status as error if expanding in strict mode, or adds it + // as a warning otherwise. + // Note that not all problematic conditions can be relegated to warnings. + // For example, a macro invocation with unbalanced parens is always an error. + absl::Status RaiseErrorOrAddWarning(absl::Status status); + + // Returns a string_view over the concatenation of the 2 input strings. + // If both inputs are non-empty, the concatenation is stored on `arena_`. + // Otherwise, the returned string_view points to the non-empty input. + // If both are empty, can return either. + absl::string_view MaybeAllocateConcatenation(absl::string_view a, + absl::string_view b); + + // Returns true if this expander is strict, and false if it is lenient. + bool IsStrict() const; + + // Consumes the next token from the input buffer, raising a warning on invalid + // tokens when in lenient mode. + absl::StatusOr ConsumeInputToken(); + + // Creates an INVALID_ARGUMENT status with the given message at the given + // location, based on this expander's `error_message_options_`. + absl::Status MakeSqlErrorAt(const ParseLocationPoint& location, + absl::string_view message); + + // Creates a stackframe from the given location, which must be valid for + // the filename and input of the underlying `token_provider_`. + absl::StatusOr MakeStackFrame( + const ParseLocationPoint& location) const; + + std::unique_ptr token_provider_; + + // The macro catalog which contains current definitions. + // Never changes during the expansion of a statement. + const MacroCatalog& macro_catalog_; + + // Used to allocate strings for spliced tokens. Must stay valid as long as + // the tokens referring to the spliced strings are still alive. + // IMPORTANT: The strings in the arena should never be modified, because they + // store their buffers in the arena as well. AllocateString() returns a + // string_view to enforce this. + zetasql_base::UnsafeArena* arena_ = nullptr; + + // Used when we are expanding potentially splicing tokens, for example: + // $prefix(arg1)some_id$suffix1($somearg(a))$suffix2 + // When it is not empty, it means that we need to expand these tokens + // before we are sure that we are ready to output a token. + std::queue splicing_buffer_; + + // Contains finalized tokens. Anything here will never splice with something + // coming after. + TokenBuffer output_token_buffer_; + + // If we are in a macro invocation, contains the expanded arguments of the + // call. This list is never empty, except at the top level, outside of any + // invocations. In an invocation, $0 is the macro name, and even if no args + // are passed, the 0th argument is the macro name. Every argument, including + // the 0th one, end in YYEOF. + const std::vector> call_arguments_; + + // Controls error message options. + ErrorMessageOptions error_message_options_; + + // Holds whitespaces that will prepend whatever comes next. For example, when + // expanding ` $empty $empty$empty 123`, we do not assume the first $empty + // will splice with anything, so when it turns out it has an empty expansion, + // the spaces before need to be held somewhere, and so on with all $empty + // expansions until we hit the first token (or EOF) that we can emit. + // + // Always backed by a string in the arena, except when it is reset, where it + // takes an empty literal, just like the initialization here. + absl::string_view pending_whitespaces_ = ""; + + // Warnings generated by this expander. + std::vector warnings_; + + // Maximum number of warnings to report. + const int max_warnings_ = 20; + + // Holds the highest index seen for a macro argument reference, e.g. $1, $2, + // etc. Useful when expanding an invocation, to report back the highest index + // seen so that the invocation can give a warning or error on unused + // arguments. + int max_arg_ref_index_ = 0; + + // True only at the beginning, or after a semicolon. Useful when detecting + // top-level DEFINE MACRO statements. + // IMPORTANT: that it is relevant only at the top-level (i.e., call_arguments_ + // is empty) + bool at_statement_start_ = true; + + // This is a mini-parser to detect when we are in the body of a macro + // definition (), in which case nothing is expanded until we exit, either at + // EOI or semicolon. + // IMPORTANT: that it is relevant only at the top-level (i.e., call_arguments_ + // is empty) + bool inside_macro_definition_ = false; + + // Tracks the current stack of macro expansions up to the parent. + StackFrame* parent_location_ = nullptr; +}; + +} // namespace macros +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_MACROS_MACRO_EXPANDER_H_ diff --git a/zetasql/parser/macros/macro_expander_test.cc b/zetasql/parser/macros/macro_expander_test.cc new file mode 100644 index 000000000..b653e9e31 --- /dev/null +++ b/zetasql/parser/macros/macro_expander_test.cc @@ -0,0 +1,1353 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/macro_expander.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "zetasql/base/arena.h" +#include "zetasql/base/testing/status_matchers.h" +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/macros/flex_token_provider.h" +#include "zetasql/parser/macros/macro_catalog.h" +#include "zetasql/parser/macros/quoting.h" +#include "zetasql/parser/macros/standalone_macro_expansion.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/error_helpers.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/options.pb.h" +#include "zetasql/public/parse_location.h" +#include "gmock/gmock.h" +#include "gtest/gtest.h" +#include "absl/container/flat_hash_map.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/strings/match.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/str_format.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { +namespace macros { + +using ::testing::_; +using ::testing::AllOf; +using ::testing::Bool; +using ::testing::Combine; +using ::testing::Each; +using ::testing::ElementsAre; +using ::testing::Eq; +using ::testing::ExplainMatchResult; +using ::testing::FieldsAre; +using ::testing::HasSubstr; +using ::testing::IsEmpty; +using ::testing::Pointwise; +using ::testing::SizeIs; +using ::testing::Values; +using ::testing::ValuesIn; +using ::zetasql_base::testing::IsOkAndHolds; +using ::zetasql_base::testing::StatusIs; + +// Template specializations to print tokens & warnings in failed test messages. +static void PrintTo(const TokenWithLocation& token, std::ostream* os) { + *os << absl::StrFormat( + "(kind: %i, location: %s, text: '%s', prev_spaces: '%s')", token.kind, + token.location.GetString(), token.text, token.preceding_whitespaces); +} + +template +static void PrintTo(const std::vector& tokens, std::ostream* os) { + *os << "Tokens: [\n"; + for (const auto& token : tokens) { + *os << " "; + PrintTo(token, os); + *os << "\n"; + } + *os << "]\n"; +} + +template +static std::string ToString(const std::vector& tokens) { + std::stringstream os; + PrintTo(tokens, &os); + return os.str(); +} + +static void PrintTo(const std::vector& warnings, + std::ostream* os) { + *os << "Warnings: [\n"; + for (const auto& warning : warnings) { + *os << " " << warning << "\n"; + } + *os << "]\n"; +} + +static void PrintTo(const ExpansionOutput& expansion_output, std::ostream* os) { + *os << "ExpansionOutput: {\n"; + PrintTo(expansion_output.expanded_tokens, os); + PrintTo(expansion_output.warnings, os); + *os << "}\n"; +} + +MATCHER_P(TokenIs, expected, "") { + return ExplainMatchResult( + FieldsAre(Eq(expected.kind), Eq(expected.location), Eq(expected.text), + Eq(expected.preceding_whitespaces)), + arg, result_listener); +} + +MATCHER(TokenEq, "") { + return ExplainMatchResult(TokenIs(::testing::get<0>(arg)), + ::testing::get<1>(arg), result_listener); +} + +MATCHER_P(TokensEq, expected, ToString(expected)) { + return ExplainMatchResult(Pointwise(TokenEq(), expected), arg.expanded_tokens, + result_listener); +} + +template +std::string StatusMatcherToString(const M& m) { + std::stringstream os; + ((::testing::Matcher>)m).DescribeTo(&os); + return os.str(); +} + +MATCHER_P(HasTokens, m, + absl::StrCat("Has expanded tokens that ", StatusMatcherToString(m))) { + return ExplainMatchResult(m, arg.expanded_tokens, result_listener); +} + +MATCHER_P(HasWarnings, m, + absl::StrCat("Has warnings that ", StatusMatcherToString(m))) { + return ExplainMatchResult(m, arg.warnings, result_listener); +} + +static LanguageOptions GetLanguageOptions(bool is_strict) { + LanguageOptions language_options = LanguageOptions(); + language_options.EnableLanguageFeature(FEATURE_V_1_4_SQL_MACROS); + if (is_strict) { + language_options.EnableLanguageFeature(FEATURE_V_1_4_ENFORCE_STRICT_MACROS); + } + return language_options; +} + +static Location MakeLocation(absl::string_view filename, int start_offset, + int end_offset) { + return Location(ParseLocationPoint::FromByteOffset(filename, start_offset), + ParseLocationPoint::FromByteOffset(filename, end_offset)); +} + +static absl::string_view kTopFileName = "top_file.sql"; + +static Location MakeLocation(int start_offset, int end_offset) { + return MakeLocation(kTopFileName, start_offset, end_offset); +} + +static absl::StatusOr ExpandMacros( + const absl::string_view text, const MacroCatalog& macro_catalog, + const LanguageOptions& language_options) { + return MacroExpander::ExpandMacros( + kTopFileName, text, macro_catalog, language_options, + {.mode = ErrorMessageMode::ERROR_MESSAGE_ONE_LINE}); +} + +TEST(MacroExpanderTest, ExpandsEmptyMacros) { + MacroCatalog macro_catalog; + macro_catalog.insert({"empty", ""}); + + EXPECT_THAT(ExpandMacros("\t$empty\r\n$empty$empty", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {YYEOF, MakeLocation(21, 21), "", "\t\r\n"}}))); +} + +TEST(MacroExpanderTest, TrailingWhitespaceIsMovedToEofToken) { + MacroCatalog macro_catalog; + + EXPECT_THAT(ExpandMacros(";\t", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {';', MakeLocation(0, 1), ";", ""}, + {YYEOF, MakeLocation(2, 2), "", "\t"}}))); +} + +TEST(MacroExpanderTest, ErrorsCanPrintLocation) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", "$unknown"}); + + EXPECT_THAT( + ExpandMacros("\t$empty\r\n$empty$empty", macro_catalog, + GetLanguageOptions(/*is_strict=*/true)), + StatusIs(_, Eq("Macro 'empty' not found. [at top_file.sql:1:9]"))); +} + +TEST(MacroExpanderTest, TracksCountOfUnexpandedTokensConsumedIncludingEOF) { + MacroCatalog macro_catalog; + macro_catalog.insert({"empty", ""}); + + LanguageOptions options = GetLanguageOptions(/*is_strict=*/false); + + auto token_provider = std::make_unique( + BisonParserMode::kTokenizer, kTopFileName, "\t$empty\r\n$empty$empty", + /*start_offset=*/0, options); + auto arena = std::make_unique(/*block_size=*/1024); + MacroExpander expander(std::move(token_provider), macro_catalog, arena.get(), + ErrorMessageOptions{}, /*parent_location=*/nullptr); + + ASSERT_THAT(expander.GetNextToken(), + IsOkAndHolds(TokenIs(TokenWithLocation{ + YYEOF, MakeLocation(21, 21), "", "\t\r\n"}))); + EXPECT_EQ(expander.num_unexpanded_tokens_consumed(), 4); +} + +TEST(MacroExpanderTest, + TracksCountOfUnexpandedTokensConsumedButNotFromDefinitions) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", "1 2 3"}); + + LanguageOptions options = GetLanguageOptions(/*is_strict=*/false); + + auto token_provider = std::make_unique( + BisonParserMode::kTokenizer, kTopFileName, "$m", /*start_offset=*/0, + options); + auto arena = std::make_unique(/*block_size=*/1024); + MacroExpander expander(std::move(token_provider), macro_catalog, arena.get(), + ErrorMessageOptions{}, /*parent_location=*/nullptr); + + ASSERT_THAT(expander.GetNextToken(), + IsOkAndHolds(TokenIs(TokenWithLocation{ + INTEGER_LITERAL, MakeLocation("macro:m", 0, 1), "1", ""}))); + ASSERT_THAT(expander.GetNextToken(), + IsOkAndHolds(TokenIs(TokenWithLocation{ + INTEGER_LITERAL, MakeLocation("macro:m", 2, 3), "2", " "}))); + ASSERT_THAT(expander.GetNextToken(), + IsOkAndHolds(TokenIs(TokenWithLocation{ + INTEGER_LITERAL, MakeLocation("macro:m", 4, 5), "3", " "}))); + ASSERT_THAT(expander.GetNextToken(), + IsOkAndHolds(TokenIs( + TokenWithLocation{YYEOF, MakeLocation(2, 2), "", ""}))); + + // We count 2 unexpanded tokens: $m and YYEOF. Tokens in $m's definition + // do not count. + EXPECT_EQ(expander.num_unexpanded_tokens_consumed(), 2); +} + +TEST(MacroExpanderTest, ExpandsEmptyMacrosSplicedWithIntLiterals) { + MacroCatalog macro_catalog; + macro_catalog.insert({"empty", ""}); + macro_catalog.insert({"int", "123"}); + + EXPECT_THAT(ExpandMacros("\n$empty()1", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {INTEGER_LITERAL, MakeLocation(9, 10), "1", "\n"}, + {YYEOF, MakeLocation(10, 10), "", ""}}))); + + EXPECT_THAT( + ExpandMacros("\n$empty()$int", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + // Location is from the macro it was pulled from. + // We can stack it later for nested invocations. + {INTEGER_LITERAL, MakeLocation("macro:int", 0, 3), "123", "\n"}, + {YYEOF, MakeLocation(13, 13), "", ""}}))); + + EXPECT_THAT(ExpandMacros("\n1$empty()", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {INTEGER_LITERAL, MakeLocation(1, 2), "1", "\n"}, + {YYEOF, MakeLocation(10, 10), "", ""}}))); + + EXPECT_THAT( + ExpandMacros("\n$int()$empty", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {INTEGER_LITERAL, MakeLocation("macro:int", 0, 3), "123", "\n"}, + {YYEOF, MakeLocation(13, 13), "", ""}}))); +} + +TEST(MacroExpanderTest, + ExpandsIdentifiersSplicedWithEmptyMacrosAndIntLiterals) { + MacroCatalog macro_catalog; + macro_catalog.insert({"identifier", "abc"}); + macro_catalog.insert({"empty", ""}); + macro_catalog.insert({"int", "123"}); + + EXPECT_THAT(ExpandMacros("\na$empty()1", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {IDENTIFIER, MakeLocation(1, 2), "a1", "\n"}, + {YYEOF, MakeLocation(11, 11), "", ""}}))); + + EXPECT_THAT(ExpandMacros("\na$empty()$int", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {IDENTIFIER, MakeLocation(1, 2), "a123", "\n"}, + {YYEOF, MakeLocation(14, 14), "", ""}}))); + + EXPECT_THAT( + ExpandMacros("\n$identifier$empty()1", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {IDENTIFIER, MakeLocation("macro:identifier", 0, 3), "abc1", "\n"}, + {YYEOF, MakeLocation(21, 21), "", ""}}))); + + EXPECT_THAT( + ExpandMacros("\n$identifier$empty()$int", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {IDENTIFIER, MakeLocation("macro:identifier", 0, 3), "abc123", "\n"}, + {YYEOF, MakeLocation(24, 24), "", ""}}))); +} + +TEST(MacroExpanderTest, CanExpandWithoutArgsAndNoSplicing) { + MacroCatalog macro_catalog; + macro_catalog.insert({"prefix", "xyz"}); + macro_catalog.insert({"suffix1", "123"}); + macro_catalog.insert({"suffix2", "456 abc"}); + macro_catalog.insert({"empty", ""}); + + EXPECT_THAT( + ExpandMacros("select abc tbl_\t$empty+ $suffix1 $suffix2 $prefix\t", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {IDENTIFIER, MakeLocation(7, 10), "abc", " "}, + {IDENTIFIER, MakeLocation(11, 15), "tbl_", " "}, + {'+', MakeLocation(22, 23), "+", "\t"}, + {INTEGER_LITERAL, MakeLocation("macro:suffix1", 0, 3), "123", " "}, + {INTEGER_LITERAL, MakeLocation("macro:suffix2", 0, 3), "456", " "}, + {IDENTIFIER, MakeLocation("macro:suffix2", 4, 7), "abc", " "}, + {IDENTIFIER, MakeLocation("macro:prefix", 0, 3), "xyz", " "}, + {YYEOF, MakeLocation(50, 50), "", "\t"}}))); +} + +TEST(MacroExpanderTest, CanExpandWithoutArgsWithSplicing) { + MacroCatalog macro_catalog; + macro_catalog.insert({"identifier", "xyz"}); + macro_catalog.insert({"numbers", "123"}); + macro_catalog.insert({"multiple_tokens", "456 pq abc"}); + macro_catalog.insert({"empty", " "}); + + EXPECT_THAT( + ExpandMacros("select tbl_$numbers$multiple_tokens$empty$identifier " + "$empty$identifier$numbers$empty a+b", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {IDENTIFIER, MakeLocation(7, 11), "tbl_123456", " "}, + {IDENTIFIER, MakeLocation("macro:multiple_tokens", 4, 6), "pq", " "}, + {IDENTIFIER, MakeLocation("macro:multiple_tokens", 7, 10), "abcxyz", + " "}, + {IDENTIFIER, MakeLocation("macro:identifier", 0, 3), "xyz123", " "}, + {IDENTIFIER, MakeLocation(85, 86), "a", " "}, + {'+', MakeLocation(86, 87), "+", ""}, + {IDENTIFIER, MakeLocation(87, 88), "b", ""}, + {YYEOF, MakeLocation(88, 88), "", ""}, + }))); +} + +TEST(MacroExpanderTest, KeywordsCanSpliceToFormIdentifiers) { + MacroCatalog macro_catalog; + macro_catalog.insert({"suffix", "123"}); + macro_catalog.insert({"empty", " "}); + + EXPECT_THAT( + ExpandMacros("FROM$suffix $empty()FROM$suffix FROM$empty$suffix " + "$empty()FROM$empty()2", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + // Note: spliced tokens take the first unexpanded location + {IDENTIFIER, MakeLocation(0, 4), "FROM123", ""}, + {IDENTIFIER, MakeLocation(21, 25), "FROM123", " "}, + {IDENTIFIER, MakeLocation(34, 38), "FROM123", " "}, + {IDENTIFIER, MakeLocation(61, 65), "FROM2", " "}, + {YYEOF, MakeLocation(74, 74), "", ""}}))); +} + +TEST(MacroExpanderTest, SpliceMacroInvocationWithIdentifier_Lenient) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", " a "}); + + EXPECT_THAT( + ExpandMacros("$m()b", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(AllOf( + TokensEq(std::vector{ + {IDENTIFIER, MakeLocation("macro:m", 2, 3), "ab", ""}, + {YYEOF, MakeLocation(5, 5), "", ""}}), + HasWarnings(ElementsAre(StatusIs( + _, Eq("Splicing tokens (a) and (b) [at top_file.sql:1:5]"))))))); +} + +TEST(MacroExpanderTest, SpliceMacroInvocationWithIdentifier_Strict) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", " a "}); + + std::vector warnings; + EXPECT_THAT( + ExpandMacros("$m()b", macro_catalog, + GetLanguageOptions(/*is_strict=*/true)), + StatusIs(_, Eq("Splicing tokens (a) and (b) [at top_file.sql:1:5]"))); +} + +TEST(MacroExpanderTest, StrictProducesErrorOnIncompatibleQuoting) { + MacroCatalog macro_catalog; + macro_catalog.insert({"single_quoted", "'sq'"}); + + ASSERT_THAT(ExpandMacros("select `ab$single_quoted`", macro_catalog, + GetLanguageOptions(/*is_strict=*/true)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {IDENTIFIER, MakeLocation(7, 25), "`ab$single_quoted`", " "}, + {YYEOF, MakeLocation(25, 25), "", ""}, + }))); +} + +TEST(MacroExpanderTest, LenientCanHandleMixedQuoting) { + MacroCatalog macro_catalog; + macro_catalog.insert({"single_quoted", "'sq'"}); + + ASSERT_THAT( + ExpandMacros("select `ab$single_quoted`", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + StatusIs(_, + Eq("Cannot expand a string literal(single " + "quote) into a quoted identifier. [at top_file.sql:1:11]"))); +} + +TEST(MacroExpanderTest, CanHandleUnicode) { + MacroCatalog macro_catalog; + macro_catalog.insert({"unicode", "'😀'"}); + + EXPECT_THAT(ExpandMacros("'$😀$unicode'", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {STRING_LITERAL, MakeLocation(0, 15), "'$😀😀'", ""}, + {YYEOF, MakeLocation(15, 15), "", ""}, + }))); +} + +TEST(MacroExpanderTest, ExpandsWithinQuotedIdentifiers_SingleToken) { + MacroCatalog macro_catalog; + macro_catalog.insert({"identifier", "xyz"}); + macro_catalog.insert({"numbers", "456"}); + macro_catalog.insert({"inner_quoted_id", "`bq`"}); + macro_catalog.insert({"empty", " "}); + + EXPECT_THAT( + ExpandMacros("select `ab$identifier$numbers$empty$inner_quoted_id`", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {IDENTIFIER, MakeLocation(7, 52), "`abxyz456bq`", " "}, + {YYEOF, MakeLocation(52, 52), "", ""}, + }))); +} + +TEST(MacroExpanderTest, ExpandsWithinQuotedIdentifiers_SingleToken_WithSpaces) { + MacroCatalog macro_catalog; + macro_catalog.insert({"identifier", "xyz"}); + macro_catalog.insert({"numbers", "456"}); + macro_catalog.insert({"inner_quoted_id", "`bq`"}); + macro_catalog.insert({"empty", " "}); + + EXPECT_THAT( + ExpandMacros("select ` ab$identifier$numbers$empty$inner_quoted_id `\n", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {IDENTIFIER, MakeLocation(7, 56), "` abxyz456bq `", " "}, + {YYEOF, MakeLocation(57, 57), "", "\n"}, + }))); +} + +TEST(MacroExpanderTest, ExpandsWithinQuotedIdentifiers_MultipleTokens) { + MacroCatalog macro_catalog; + macro_catalog.insert({"identifier", "xyz"}); + macro_catalog.insert({"numbers", "123 456"}); + macro_catalog.insert({"inner_quoted_id", "`bq`"}); + macro_catalog.insert({"empty", " "}); + + EXPECT_THAT( + ExpandMacros( + "select\n` ab$identifier$numbers$empty$inner_quoted_id cd\t\t`\n", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {IDENTIFIER, MakeLocation(7, 60), "` abxyz123 456bq cd\t\t`", "\n"}, + {YYEOF, MakeLocation(61, 61), "", "\n"}, + }))); +} + +TEST(MacroExpanderTest, UnknownMacrosAreLeftUntouched) { + MacroCatalog macro_catalog; + macro_catalog.insert({"ints", "\t\t\t1 2\t\t\t"}); + + EXPECT_THAT( + ExpandMacros(" $x$y(a\n,\t$ints\n\t,\t$z,\n$w(\t\tb\n\n,\t\r)\t\n) ", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {MACRO_INVOCATION, MakeLocation(2, 4), "$x", " "}, + {MACRO_INVOCATION, MakeLocation(4, 6), "$y", ""}, + {'(', MakeLocation(6, 7), "(", ""}, + {IDENTIFIER, MakeLocation(7, 8), "a", ""}, + {',', MakeLocation(9, 10), ",", "\n"}, + {INTEGER_LITERAL, MakeLocation("macro:ints", 3, 4), "1", "\t"}, + {INTEGER_LITERAL, MakeLocation("macro:ints", 5, 6), "2", " "}, + {',', MakeLocation(18, 19), ",", "\n\t"}, + {MACRO_INVOCATION, MakeLocation(20, 22), "$z", "\t"}, + {',', MakeLocation(22, 23), ",", ""}, + {MACRO_INVOCATION, MakeLocation(24, 26), "$w", "\n"}, + {'(', MakeLocation(26, 27), "(", ""}, + {IDENTIFIER, MakeLocation(29, 30), "b", "\t\t"}, + {',', MakeLocation(32, 33), ",", "\n\n"}, + {')', MakeLocation(35, 36), ")", "\t\r"}, + {')', MakeLocation(38, 39), ")", "\t\n"}, + {YYEOF, MakeLocation(42, 42), "", " "}}))); +} + +TEST(MacroExpanderTest, UnknownMacrosAreLeftUntouched_EmptyArgList) { + MacroCatalog macro_catalog; + + EXPECT_THAT(ExpandMacros("\t$w(\n$z(\n)\t) ", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {MACRO_INVOCATION, MakeLocation(1, 3), "$w", "\t"}, + {'(', MakeLocation(3, 4), "(", ""}, + {MACRO_INVOCATION, MakeLocation(5, 7), "$z", "\n"}, + {'(', MakeLocation(7, 8), "(", ""}, + {')', MakeLocation(9, 10), ")", "\n"}, + {')', MakeLocation(11, 12), ")", "\t"}, + {YYEOF, MakeLocation(15, 15), "", " "}}))); +} + +TEST(MacroExpanderTest, LeavesPlxParamsUndisturbed) { + MacroCatalog macro_catalog; + + EXPECT_THAT(ExpandMacros(" a${x} ", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {IDENTIFIER, MakeLocation(2, 3), "a", " "}, + {DOLLAR_SIGN, MakeLocation(3, 4), "$", ""}, + {'{', MakeLocation(4, 5), "{", ""}, + {IDENTIFIER, MakeLocation(5, 6), "x", ""}, + {'}', MakeLocation(6, 7), "}", ""}, + {YYEOF, MakeLocation(10, 10), "", " "}}))); +} + +TEST(MacroExpanderTest, DoesNotStrictlyTokenizeLiteralContents) { + MacroCatalog macro_catalog; + EXPECT_THAT(ExpandMacros(R"("30d\a's")", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {STRING_LITERAL, MakeLocation(0, 9), R"("30d\a's")", ""}, + {YYEOF, MakeLocation(9, 9), "", ""}}))); +} + +TEST(MacroExpanderTest, SkipsQuotesInLiterals) { + MacroCatalog macro_catalog; + + EXPECT_THAT( + ExpandMacros(R"(SELECT "Doesn't apply")", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "SELECT", ""}, + {STRING_LITERAL, MakeLocation(7, 22), R"("Doesn't apply")", " "}, + {YYEOF, MakeLocation(22, 22), "", ""}}))); +} + +TEST(MacroExpanderTest, SeparateParenthesesAreNotArgLists) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", " bc "}); + + EXPECT_THAT(ExpandMacros("a$m (x, y)", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {IDENTIFIER, MakeLocation(0, 1), "abc", ""}, + {'(', MakeLocation(4, 5), "(", " "}, + {IDENTIFIER, MakeLocation(5, 6), "x", ""}, + {',', MakeLocation(6, 7), ",", ""}, + {IDENTIFIER, MakeLocation(8, 9), "y", " "}, + {')', MakeLocation(9, 10), ")", ""}, + {YYEOF, MakeLocation(10, 10), "", ""}}))); +} + +TEST(MacroExpanderTest, + ProducesCorrectErrorOnUnbalancedParenthesesInMacroArgumentLists) { + MacroCatalog macro_catalog; + + EXPECT_THAT( + ExpandMacros("a$m((x, y)", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + StatusIs( + absl::StatusCode::kInvalidArgument, + Eq("Unbalanced parentheses in macro argument list. Make sure that " + "parentheses are balanced even inside macro arguments. [at " + "top_file.sql:1:11]"))); + + EXPECT_THAT( + ExpandMacros("$m(x;)", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + StatusIs( + absl::StatusCode::kInvalidArgument, + Eq("Unbalanced parentheses in macro argument list. Make sure that " + "parentheses are balanced even inside macro arguments. [at " + "top_file.sql:1:5]"))); +} + +TEST(MacroExpanderTest, ArgsNotAllowedInsideLiterals) { + MacroCatalog macro_catalog; + + EXPECT_THAT( + ExpandMacros("'$a()'", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(HasWarnings(ElementsAre( + StatusIs(_, Eq("Argument lists are not allowed inside " + "literals [at top_file.sql:1:4]")), + StatusIs(_, + HasSubstr("Macro 'a' not found. [at top_file.sql:1:1]")))))); + + EXPECT_THAT(ExpandMacros("'$a('", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + StatusIs(_, Eq("Nested macro argument lists inside literals are " + "not allowed [at top_file.sql:1:4]"))); +} + +TEST(MacroExpanderTest, SpliceArgWithIdentifier) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", "$1a"}); + + EXPECT_THAT(ExpandMacros("$m(x)", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {IDENTIFIER, MakeLocation(3, 4), "xa", ""}, + {YYEOF, MakeLocation(5, 5), "", ""}, + }))); +} + +TEST(MacroExpanderTest, ExpandsArgs) { + MacroCatalog macro_catalog; + macro_catalog.insert({"select_list", "a , $1, $2 "}); + macro_catalog.insert({"from_clause", "FROM tbl_$1$empty"}); + macro_catalog.insert({"splice", " $1$2 "}); + macro_catalog.insert({"numbers", "123 456"}); + macro_catalog.insert({"inner_quoted_id", "`bq`"}); + macro_catalog.insert({"empty", " "}); + macro_catalog.insert({"MakeLiteral", "'$1'"}); + + EXPECT_THAT( + ExpandMacros("select $MakeLiteral( x ) $splice( b , 89 ), " + "$select_list( b, c )123 $from_clause( 123 ) ", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {STRING_LITERAL, MakeLocation("macro:MakeLiteral", 0, 4), "'x'", " "}, + {IDENTIFIER, MakeLocation(37, 38), "b89", " "}, + {',', MakeLocation(49, 50), ",", ""}, + {IDENTIFIER, MakeLocation("macro:select_list", 0, 1), "a", " "}, + {',', MakeLocation("macro:select_list", 2, 3), ",", " "}, + {IDENTIFIER, MakeLocation(66, 67), "b", " "}, + {',', MakeLocation("macro:select_list", 6, 7), ",", ""}, + {IDENTIFIER, MakeLocation(69, 70), "c123", " "}, + {KW_FROM, MakeLocation("macro:from_clause", 0, 4), "FROM", " "}, + {IDENTIFIER, MakeLocation("macro:from_clause", 5, 9), "tbl_123", " "}, + {YYEOF, MakeLocation(98, 98), "", " "}, + }))); +} + +TEST(MacroExpanderTest, ExpandsArgsThatHaveParensAndCommas) { + MacroCatalog macro_catalog; + macro_catalog.insert({"repeat", "$1, $1, $2, $2"}); + + EXPECT_THAT(ExpandMacros("select $repeat(1, (2,3))", macro_catalog, + GetLanguageOptions(/*is_strict=*/true)), + IsOkAndHolds(TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "select", ""}, + {INTEGER_LITERAL, MakeLocation(15, 16), "1", " "}, + {',', MakeLocation("macro:repeat", 2, 3), ",", ""}, + {INTEGER_LITERAL, MakeLocation(15, 16), "1", " "}, + {',', MakeLocation("macro:repeat", 6, 7), ",", ""}, + {'(', MakeLocation(18, 19), "(", " "}, + {INTEGER_LITERAL, MakeLocation(19, 20), "2", ""}, + {',', MakeLocation(20, 21), ",", ""}, + {INTEGER_LITERAL, MakeLocation(21, 22), "3", ""}, + {')', MakeLocation(22, 23), ")", ""}, + {',', MakeLocation("macro:repeat", 10, 11), ",", ""}, + {'(', MakeLocation(18, 19), "(", " "}, + {INTEGER_LITERAL, MakeLocation(19, 20), "2", ""}, + {',', MakeLocation(20, 21), ",", ""}, + {INTEGER_LITERAL, MakeLocation(21, 22), "3", ""}, + {')', MakeLocation(22, 23), ")", ""}, + {YYEOF, MakeLocation(24, 24), "", ""}}))); +} + +TEST(MacroExpanderTest, ExtraArgsProduceWarningOrError) { + MacroCatalog macro_catalog; + macro_catalog.insert({"empty", ""}); + + EXPECT_THAT(ExpandMacros("$empty(x)", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(AllOf( + TokensEq(std::vector{ + {YYEOF, MakeLocation(9, 9), "", ""}}), + HasWarnings(ElementsAre(StatusIs( + _, Eq("Macro invocation has too many arguments (1) " + "while the definition only references up to 0 " + "arguments [at top_file.sql:1:1]"))))))); + + EXPECT_THAT( + ExpandMacros("$empty(x)", macro_catalog, + GetLanguageOptions(/*is_strict=*/true)), + StatusIs(_, Eq("Macro invocation has too many arguments (1) while " + "the definition only references up to 0 arguments " + "[at top_file.sql:1:1]"))); +} + +TEST(MacroExpanderTest, UnknownArgsAreLeftUntouched) { + MacroCatalog macro_catalog; + macro_catalog.insert({"empty", ""}); + macro_catalog.insert({"unknown_not_in_a_literal", "\nx$3\n"}); + macro_catalog.insert({"unknown_inside_literal", "'\ty$3\t'"}); + + EXPECT_THAT( + ExpandMacros( + " a$empty()$1$unknown_not_in_a_literal '$unknown_not_in_a_literal' " + "$unknown_inside_literal\t$1\n'$2'", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(AllOf( + TokensEq(std::vector{ + {IDENTIFIER, MakeLocation(2, 3), "a", " "}, + {MACRO_ARGUMENT_REFERENCE, MakeLocation(11, 13), "$1", ""}, + {IDENTIFIER, MakeLocation("macro:unknown_not_in_a_literal", 1, 2), + "x", ""}, + {STRING_LITERAL, MakeLocation(40, 67), "'x'", " "}, + {STRING_LITERAL, + MakeLocation("macro:unknown_inside_literal", 0, 7), "'\ty\t'", + " "}, + {MACRO_ARGUMENT_REFERENCE, MakeLocation(92, 94), "$1", "\t"}, + {STRING_LITERAL, MakeLocation(95, 99), "'$2'", "\n"}, + {YYEOF, MakeLocation(99, 99), "", ""}}), + HasWarnings(ElementsAre( + StatusIs(_, Eq("Macro invocation missing argument list. " + "[at top_file.sql:1:39]")), + StatusIs(_, Eq("Argument index $3 out of range. Invocation was " + "provided only 1 arguments. [at " + "macro:unknown_not_in_a_literal:2:2]; Expanded " + "from top_file.sql [at top_file.sql:1:14]")), + StatusIs(_, Eq("Macro invocation missing argument list. [at " + "top_file.sql:1:26]; Expanded from top_file.sql " + "[at top_file.sql:1:42]")), + StatusIs(_, + Eq("Argument index $3 out of range. Invocation was " + "provided only 1 arguments. [at " + "macro:unknown_not_in_a_literal:2:2]; Expanded from " + "top_file.sql [at top_file.sql:1:42]; Expanded from " + "top_file.sql [at top_file.sql:1:1]")), + StatusIs(_, Eq("Macro invocation missing argument list. " + "[at top_file.sql:1:92]")), + StatusIs( + _, + Eq("Argument index $3 out of range. Invocation was provided " + "only 1 arguments. [at macro:unknown_inside_literal:1:1]; " + "Expanded from top_file.sql [at top_file.sql:1:69]; " + "Expanded from macro:unknown_inside_literal [at " + "macro:unknown_inside_literal:1:10]"))))))); + + EXPECT_THAT( + ExpandMacros("unknown_inside_literal() $unknown_not_in_a_literal()", + macro_catalog, GetLanguageOptions(/*is_strict=*/true)), + StatusIs(_, + Eq("Argument index $3 out of range. Invocation was provided " + "only 1 arguments. [at macro:unknown_not_in_a_literal:2:2]; " + "Expanded from top_file.sql [at top_file.sql:1:26]"))); +} + +TEST(MacroExpanderTest, ExpandsStandaloneDollarSignsAtTopLevel) { + MacroCatalog macro_catalog; + macro_catalog.insert({"empty", "\n\n"}); + + EXPECT_THAT(ExpandMacros("\t\t$empty${a} \n$$$empty${b}$", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {DOLLAR_SIGN, MakeLocation(8, 9), "$", "\t\t"}, + {'{', MakeLocation(9, 10), "{", ""}, + {IDENTIFIER, MakeLocation(10, 11), "a", ""}, + {'}', MakeLocation(11, 12), "}", ""}, + {DOLLAR_SIGN, MakeLocation(16, 17), "$", " \n"}, + {DOLLAR_SIGN, MakeLocation(17, 18), "$", ""}, + {DOLLAR_SIGN, MakeLocation(24, 25), "$", ""}, + {'{', MakeLocation(25, 26), "{", ""}, + {IDENTIFIER, MakeLocation(26, 27), "b", ""}, + {'}', MakeLocation(27, 28), "}", ""}, + {DOLLAR_SIGN, MakeLocation(28, 29), "$", ""}, + {YYEOF, MakeLocation(29, 29), "", ""}}))); +} + +TEST(MacroExpanderTest, ProducesWarningOrErrorOnMissingArgumentLists) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", "1"}); + + EXPECT_THAT(ExpandMacros("$m", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(AllOf( + TokensEq(std::vector{ + {INTEGER_LITERAL, MakeLocation("macro:m", 0, 1), "1", ""}, + {YYEOF, MakeLocation(2, 2), "", ""}}), + HasWarnings(ElementsAre( + StatusIs(_, Eq("Macro invocation missing argument " + "list. [at top_file.sql:1:3]"))))))); + + EXPECT_THAT( + ExpandMacros("$m", macro_catalog, GetLanguageOptions(/*is_strict=*/true)), + StatusIs(_, Eq("Macro invocation missing argument list. [at " + "top_file.sql:1:3]"))); +} + +TEST(MacroExpanderTest, ExpandsAllFormsOfLiterals) { + MacroCatalog macro_catalog; + std::vector string_literals{ + "'a'", "'''a'''", R"("a")", R"("""a""")", + "r'a'", "r'''a'''", R"(r"a")", R"(r"""a""")", + }; + + for (const std::string& string_literal : string_literals) { + int literal_length = static_cast(string_literal.length()); + EXPECT_THAT( + ExpandMacros(string_literal, macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {STRING_LITERAL, MakeLocation(0, literal_length), string_literal, + ""}, + {YYEOF, MakeLocation(literal_length, literal_length), "", ""}}))); + } + + std::vector byte_literals{ + "b'''a'''", R"(b"a")", R"(b"""a""")", "rb'a'", + "rb'''a'''", R"(rb"a")", R"(rb"""a""")", + }; + + for (const std::string& byte_literal : byte_literals) { + int literal_length = static_cast(byte_literal.length()); + EXPECT_THAT( + ExpandMacros(byte_literal, macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {BYTES_LITERAL, MakeLocation(0, literal_length), byte_literal, ""}, + {YYEOF, MakeLocation(literal_length, literal_length), "", ""}}))); + } +} + +static int FinalTokenKind(const QuotingSpec quoting_spec) { + switch (quoting_spec.literal_kind()) { + case LiteralTokenKind::kNonLiteral: + // Tests using this helper only use int literals + return INTEGER_LITERAL; + case LiteralTokenKind::kBacktickedIdentifier: + return IDENTIFIER; + case LiteralTokenKind::kStringLiteral: + return STRING_LITERAL; + case LiteralTokenKind::kBytesLiteral: + return BYTES_LITERAL; + } +} + +class PositiveNestedLiteralTest + : public ::testing::TestWithParam> {}; + +TEST_P(PositiveNestedLiteralTest, PositiveNestedLiteralTestParameterized) { + const QuotingSpec outer_quoting = std::get<0>(GetParam()); + const QuotingSpec inner_quoting = std::get<1>(GetParam()); + + MacroCatalog catalog; + std::string macro_def = QuoteText("1", inner_quoting); + catalog.insert({"a", macro_def}); + std::string unexpanded_sql = QuoteText("$a", outer_quoting); + int unexpanded_length = static_cast(unexpanded_sql.length()); + + std::string expanded = QuoteText("1", outer_quoting); + + int final_token_kind = FinalTokenKind(outer_quoting); + + if (inner_quoting.literal_kind() == LiteralTokenKind::kNonLiteral || + (outer_quoting.quote_kind() == inner_quoting.quote_kind() && + outer_quoting.prefix().length() == inner_quoting.prefix().length())) { + EXPECT_THAT( + ExpandMacros(unexpanded_sql, catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {final_token_kind, MakeLocation(0, unexpanded_length), + QuoteText("1", outer_quoting), ""}, + {YYEOF, MakeLocation(unexpanded_length, unexpanded_length), "", ""}, + }))); + } else { + ASSERT_TRUE(outer_quoting.literal_kind() == inner_quoting.literal_kind()); + EXPECT_THAT( + ExpandMacros(unexpanded_sql, catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + {final_token_kind, MakeLocation(0, unexpanded_length), + QuoteText("", outer_quoting), ""}, + {final_token_kind, + MakeLocation("macro:a", 0, static_cast(macro_def.length())), + macro_def, " "}, + {final_token_kind, MakeLocation(0, unexpanded_length), + QuoteText("", outer_quoting), " "}, + {YYEOF, MakeLocation(unexpanded_length, unexpanded_length), "", ""}, + }))); + } +} + +class NegativeNestedLiteralTest + : public ::testing::TestWithParam> {}; + +TEST_P(NegativeNestedLiteralTest, NegativeNestedLiteralTestParameterized) { + const QuotingSpec outer_quoting = std::get<0>(GetParam()); + const QuotingSpec inner_quoting = std::get<1>(GetParam()); + + MacroCatalog catalog; + catalog.insert({"a", QuoteText("1", inner_quoting)}); + std::string unexpanded_sql = QuoteText("$a", outer_quoting); + + EXPECT_THAT(ExpandMacros(unexpanded_sql, catalog, + GetLanguageOptions(/*is_strict=*/false)), + StatusIs(_, HasSubstr("Cannot expand a "))); +} + +const QuoteKind kAllLiteralQuoteKinds[] = { + QuoteKind::kOneSingleQuote, QuoteKind::kThreeSingleQuotes, + QuoteKind::kOneDoubleQuote, QuoteKind::kThreeDoubleQuotes}; + +static std::vector AllStringLiteralQuotingSpecs() { + const std::vector string_literal_prefixes{"", "r", "R"}; + std::vector string_literals; + + for (QuoteKind quote_kind : kAllLiteralQuoteKinds) { + for (absl::string_view prefix : string_literal_prefixes) { + string_literals.push_back(QuotingSpec::StringLiteral(prefix, quote_kind)); + } + } + + return string_literals; +} + +static std::vector AllBytesLiteralQuotingSpecs() { + const std::vector bytes_literal_prefixes{ + "b", "B", "br", "bR", "Br", "BR", "rb", "rB", "Rb", "RB", + }; + std::vector bytes_literals; + + for (QuoteKind quote_kind : kAllLiteralQuoteKinds) { + for (absl::string_view prefix : bytes_literal_prefixes) { + bytes_literals.push_back(QuotingSpec::BytesLiteral(prefix, quote_kind)); + } + } + + return bytes_literals; +} + +static std::vector AllStringsAndBytesLiteralsQuotingSpecs() { + std::vector quoting_specs; + for (const QuotingSpec& string_spec : AllStringLiteralQuotingSpecs()) { + quoting_specs.push_back(string_spec); + } + for (const QuotingSpec& bytes_spec : AllBytesLiteralQuotingSpecs()) { + quoting_specs.push_back(bytes_spec); + } + return quoting_specs; +} + +// Excludes NON_LITERAL +static std::vector AllQuotingSpecs() { + std::vector quoting_specs; + quoting_specs.push_back(QuotingSpec::BacktickedIdentifier()); + for (const QuotingSpec& bytes_spec : + AllStringsAndBytesLiteralsQuotingSpecs()) { + quoting_specs.push_back(bytes_spec); + } + + return quoting_specs; +} + +// NON_LITERAL expanded within literal. +INSTANTIATE_TEST_SUITE_P(NonLiteralCanBeNestedEverywhere, + PositiveNestedLiteralTest, + Combine(ValuesIn(AllQuotingSpecs()), + Values(QuotingSpec::NonLiteral()))); + +// Backticked identifier only works with backticked identifier on both sides +// (Interaction with non-literals is already covered above). +INSTANTIATE_TEST_SUITE_P(BacktickedIdentifierAcceptsItself, + PositiveNestedLiteralTest, + Combine(Values(QuotingSpec::BacktickedIdentifier()), + Values(QuotingSpec::BacktickedIdentifier()))); +INSTANTIATE_TEST_SUITE_P( + NestingInsideBacktickedIdentifierIsDisallowed, NegativeNestedLiteralTest, + Combine(Values(QuotingSpec::BacktickedIdentifier()), + ValuesIn(AllStringsAndBytesLiteralsQuotingSpecs()))); + +static std::vector> +IdenticalLiteralTestCases() { + std::vector> test_cases; + for (const QuotingSpec& quoting : AllStringsAndBytesLiteralsQuotingSpecs()) { + test_cases.push_back(std::make_tuple(quoting, quoting)); + } + return test_cases; +} + +INSTANTIATE_TEST_SUITE_P(LiteralsCanNestInIdenticalQuoting, + PositiveNestedLiteralTest, + ValuesIn(IdenticalLiteralTestCases())); + +static std::vector> +DifferentLiteralKindTestCases() { + std::vector literal_quoting_specs = + AllStringsAndBytesLiteralsQuotingSpecs(); + + std::vector> test_cases; + for (const QuotingSpec& outer : literal_quoting_specs) { + for (const QuotingSpec& inner : literal_quoting_specs) { + if (outer.literal_kind() != inner.literal_kind()) { + test_cases.push_back(std::make_tuple(outer, inner)); + } + } + } + + return test_cases; +} + +INSTANTIATE_TEST_SUITE_P(LiteralsCannotNestInDifferentTokenKinds, + NegativeNestedLiteralTest, + ValuesIn(DifferentLiteralKindTestCases())); + +INSTANTIATE_TEST_SUITE_P(StringLiteralsCanExpandInAnyStringLiteral, + PositiveNestedLiteralTest, + Combine(ValuesIn(AllStringLiteralQuotingSpecs()), + ValuesIn(AllStringLiteralQuotingSpecs()))); +INSTANTIATE_TEST_SUITE_P(BytesLiteralsCanExpandInAnyBytesLiteral, + PositiveNestedLiteralTest, + Combine(ValuesIn(AllBytesLiteralQuotingSpecs()), + ValuesIn(AllBytesLiteralQuotingSpecs()))); + +TEST(MacroExpanderTest, DoesNotReexpand) { + absl::flat_hash_map macro_catalog; + macro_catalog.insert({"select_list", "a , $1, $2 "}); + macro_catalog.insert({"splice_invoke", " $$1$2 "}); + macro_catalog.insert({"inner_quoted_id", "`bq`"}); + macro_catalog.insert({"empty", " "}); + + EXPECT_THAT( + ExpandMacros("$splice_invoke( inner_\t,\t\tquoted_id ), " + "$$select_list( b, c )123\t", + macro_catalog, GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(TokensEq(std::vector{ + // Note: the dollar sign is not spliced with the identifier into a new + // macro invocation token, because we do not reexpand. + {DOLLAR_SIGN, MakeLocation("macro:splice_invoke", 3, 4), "$", ""}, + {IDENTIFIER, MakeLocation(17, 23), "inner_quoted_id", ""}, + {',', MakeLocation(40, 41), ",", ""}, + {DOLLAR_SIGN, MakeLocation(42, 43), "$", " "}, + {IDENTIFIER, MakeLocation("macro:select_list", 0, 1), "a", ""}, + {',', MakeLocation("macro:select_list", 2, 3), ",", " "}, + {IDENTIFIER, MakeLocation(58, 59), "b", " "}, + {',', MakeLocation("macro:select_list", 6, 7), ",", + ""}, // Comes from the macro def + {IDENTIFIER, MakeLocation(61, 62), "c123", " "}, + {YYEOF, MakeLocation(69, 69), "", "\t"}, + }))); +} + +TEST(MacroExpanderTest, + CapsWarningCountAndAddsSentinelWarningWhenTooManyAreProduced) { + MacroCatalog macro_catalog; + + int num_slashes = 30; + ZETASQL_ASSERT_OK_AND_ASSIGN( + ExpansionOutput result, + ExpandMacros(std::string(num_slashes, '\\'), macro_catalog, + GetLanguageOptions(/*is_strict=*/false))); + + // Add 1 for the YYEOF + EXPECT_EQ(result.expanded_tokens.size(), num_slashes + 1); + for (int i = 0; i < num_slashes - 1; ++i) { + EXPECT_EQ(result.expanded_tokens[i].kind, TokenKinds::BACKSLASH); + } + EXPECT_EQ(result.expanded_tokens.back().kind, YYEOF); + + // The number of warnings is the cap (20) + 1 for the sentinel indicating that + // more warnings were truncated. + EXPECT_EQ(result.warnings.size(), 21); + absl::Status sentinel_status = std::move(result.warnings.back()); + result.warnings.pop_back(); + EXPECT_THAT( + result.warnings, + Each(StatusIs( + _, + HasSubstr( + R"(Invalid token (\). Did you mean to use it in a literal?)")))); + EXPECT_THAT( + sentinel_status, + StatusIs(_, + Eq("Warning count limit reached. Truncating further warnings"))); +} + +// Ignores location +static TokenWithLocation MakeToken( + int kind, absl::string_view text, + absl::string_view preceding_whitespaces = "") { + return {.kind = kind, + .location = MakeLocation(-1, -1), + .text = text, + .preceding_whitespaces = preceding_whitespaces}; +} + +TEST(TokensToStringTest, CanGenerateStringFromTokens) { + std::vector tokens = {MakeToken(IDENTIFIER, "a", "\t"), + MakeToken(KW_FROM, "FROM"), + MakeToken(YYEOF, "", "\n")}; + + // Note the forced space between `a` and `FROM`. + EXPECT_EQ(TokensToString(tokens, /*force_single_whitespace=*/false), + "\ta FROM\n"); + EXPECT_EQ(TokensToString(tokens, /*force_single_whitespace=*/true), "a FROM"); +} + +TEST(TokensToStringTest, DoesNotSpliceTokensEvenWhenNoOriginalSpacesExist) { + std::vector tokens = { + MakeToken(IDENTIFIER, "a", "\t"), + MakeToken(KW_FROM, "FROM"), + MakeToken(INTEGER_LITERAL, "0x1A"), + MakeToken(IDENTIFIER, "b", "\t"), + MakeToken(KW_FROM, "FROM"), + MakeToken(FLOATING_POINT_LITERAL, "1."), + MakeToken(IDENTIFIER, "x", "\t"), + MakeToken(MACRO_INVOCATION, "$a"), + MakeToken(INTEGER_LITERAL, "123"), + MakeToken(MACRO_INVOCATION, "$1"), + MakeToken(INTEGER_LITERAL, "23"), + MakeToken('*', "*"), + MakeToken(YYEOF, "", "\n")}; + + // Note the forced spaces + EXPECT_EQ(TokensToString(tokens, /*force_single_whitespace=*/false), + "\ta FROM 0x1A\tb FROM 1.\tx$a 123$1 23*\n"); +} + +TEST(TokensToStringTest, DoesNotCauseCommentOuts) { + std::vector tokens = { + MakeToken('-', "-"), MakeToken('-', "-"), MakeToken('/', "/"), + MakeToken('/', "/"), MakeToken('/', "/"), MakeToken('*', "*"), + MakeToken(YYEOF, "")}; + + // Note the forced spaces, except for -/ + EXPECT_EQ(TokensToString(tokens, /*standardize_to_single_whitespace=*/false), + "- -/ / / *"); +} + +TEST(TokensToStringTest, AlwaysSeparatesNumericLiterals) { + std::vector tokens = { + MakeToken(INTEGER_LITERAL, "0x1"), + MakeToken(INTEGER_LITERAL, "2"), + MakeToken(INTEGER_LITERAL, "3"), + MakeToken(FLOATING_POINT_LITERAL, "4."), + MakeToken(INTEGER_LITERAL, "5"), + MakeToken(FLOATING_POINT_LITERAL, ".6"), + MakeToken(INTEGER_LITERAL, "7"), + MakeToken(FLOATING_POINT_LITERAL, ".8e9"), + MakeToken(INTEGER_LITERAL, "10"), + MakeToken(STRING_LITERAL, "'11'"), + MakeToken(YYEOF, "")}; + + // Note the forced spaces + EXPECT_EQ(TokensToString(tokens, /*standardize_to_single_whitespace=*/false), + "0x1 2 3 4. 5 .6 7 .8e9 10'11'"); +} + +class MacroExpanderLenientTokensInStrictModeTest + : public ::testing::TestWithParam {}; + +TEST_P(MacroExpanderLenientTokensInStrictModeTest, + LenientTokensDisallowedInStrictMode) { + absl::string_view text = GetParam(); + MacroCatalog macro_catalog; + macro_catalog.insert({"m", std::string(text)}); + + EXPECT_THAT( + ExpandMacros(text, macro_catalog, GetLanguageOptions(/*is_strict=*/true)), + StatusIs(_, HasSubstr("Syntax error"))); + + EXPECT_THAT(ExpandMacros("$m()", macro_catalog, + GetLanguageOptions(/*is_strict=*/true)), + StatusIs(_, HasSubstr("Syntax error"))); +} + +INSTANTIATE_TEST_SUITE_P(LenientTokensDisallowedInStrictMode, + MacroExpanderLenientTokensInStrictModeTest, + Values("\\", "3d", "12abc", "0x123xyz", "3.4dab", + ".4dab", "1.2e3ab", "1.2e-3ab")); + +class MacroExpanderLenientTokensTest + : public ::testing::TestWithParam {}; + +TEST_P(MacroExpanderLenientTokensTest, LenientTokensAllowedOnlyWhenLenient) { + absl::string_view text = GetParam(); + MacroCatalog macro_catalog; + macro_catalog.insert({"m", std::string(text)}); + + int expected_token_count = 2; // One token + YYEOF + if (absl::StrContains(text, '.')) { + // The dot itself, and maybe the other part + expected_token_count += absl::StartsWith(text, ".") ? 1 : 2; + if (absl::StrContains(text, "e-")) { + expected_token_count += + 2; // The negative exponent adds '-' & the exp part + } + } + + EXPECT_THAT(ExpandMacros(text, macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(HasTokens(SizeIs(expected_token_count)))); + + EXPECT_THAT(ExpandMacros("$m", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(HasTokens(SizeIs(expected_token_count)))); + + // When preceding with 'a', we need to update the expected token count as the + // 'a' will splice with the expanded text unless it's a symbol. + if (!std::isalpha(text.front()) && !std::isdigit(text.front())) { + expected_token_count++; + } + + EXPECT_THAT(ExpandMacros("a$m", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(HasTokens(SizeIs(expected_token_count)))); +} + +INSTANTIATE_TEST_SUITE_P(LenientTokens, MacroExpanderLenientTokensTest, + Values("\\", "3d", "12abc", "0x123xyz", "3.4dab", + ".4dab", "1.2e3ab", "1.2e-3ab")); + +TEST(MacroExpanderTest, ProducesWarningOnLenientTokens) { + MacroCatalog macro_catalog; + + EXPECT_THAT(ExpandMacros("\\", macro_catalog, + GetLanguageOptions(/*is_strict=*/false)), + IsOkAndHolds(HasWarnings(ElementsAre( + StatusIs(_, Eq("Invalid token (\\). Did you mean to use it " + "in a literal? [at top_file.sql:1:1]")))))); +} + +class MacroExpanderParameterizedTest : public ::testing::TestWithParam {}; + +TEST_P(MacroExpanderParameterizedTest, DoesNotExpandDefineMacroStatements) { + MacroCatalog macro_catalog; + macro_catalog.insert({"x", "a"}); + macro_catalog.insert({"y", "b"}); + + EXPECT_THAT( + ExpandMacros("DEFINE MACRO $x $y; DEFINE MACRO $y $x", macro_catalog, + GetLanguageOptions(/*is_strict=*/GetParam())), + IsOkAndHolds( + AllOf(TokensEq(std::vector{ + {KW_DEFINE_FOR_MACROS, MakeLocation(0, 6), "DEFINE", ""}, + {KW_MACRO, MakeLocation(7, 12), "MACRO", " "}, + {MACRO_INVOCATION, MakeLocation(13, 15), "$x", " "}, + {MACRO_INVOCATION, MakeLocation(16, 18), "$y", " "}, + {';', MakeLocation(18, 19), ";", ""}, + {KW_DEFINE_FOR_MACROS, MakeLocation(20, 26), "DEFINE", " "}, + {KW_MACRO, MakeLocation(27, 32), "MACRO", " "}, + {MACRO_INVOCATION, MakeLocation(33, 35), "$y", " "}, + {MACRO_INVOCATION, MakeLocation(36, 38), "$x", " "}, + {YYEOF, MakeLocation(38, 38), "", ""}, + }), + HasWarnings(IsEmpty())))); +} + +TEST_P(MacroExpanderParameterizedTest, + DefineStatementsAreNotSpecialExceptAtTheStart) { + MacroCatalog macro_catalog; + macro_catalog.insert({"x", "a"}); + + // This will expand the definition, but it will be a syntax error when sent + // to the parser, because the KW_DEFINE has not been marked as the special + // KW_DEFINE_FOR_MACROS which has to be at the start, and original (not + // expanded from a macro). + EXPECT_THAT(ExpandMacros("SELECT DEFINE MACRO $x() 1", macro_catalog, + GetLanguageOptions(/*is_strict=*/GetParam())), + IsOkAndHolds(AllOf( + TokensEq(std::vector{ + {KW_SELECT, MakeLocation(0, 6), "SELECT", ""}, + {KW_DEFINE, MakeLocation(7, 13), "DEFINE", " "}, + {KW_MACRO, MakeLocation(14, 19), "MACRO", " "}, + {IDENTIFIER, MakeLocation("macro:x", 0, 1), "a", " "}, + {INTEGER_LITERAL, MakeLocation(25, 26), "1", " "}, + {YYEOF, MakeLocation(26, 26), "", ""}, + }), + HasWarnings(IsEmpty())))); +} + +TEST_P(MacroExpanderParameterizedTest, + DoesNotRecognizeGeneratedDefineMacroStatements) { + MacroCatalog macro_catalog; + macro_catalog.insert({"def", "DEFINE MACRO"}); + + EXPECT_THAT( + ExpandMacros("$def() a 1", macro_catalog, + GetLanguageOptions(/*is_strict=*/GetParam())), + IsOkAndHolds( + AllOf(TokensEq(std::vector{ + // Note that the first token is KW_DEFINE, not + // KW_DEFINE_FOR_MACROS + {KW_DEFINE, MakeLocation("macro:def", 0, 6), "DEFINE", ""}, + {KW_MACRO, MakeLocation("macro:def", 7, 12), "MACRO", " "}, + {IDENTIFIER, MakeLocation(7, 8), "a", " "}, + {INTEGER_LITERAL, MakeLocation(9, 10), "1", " "}, + {YYEOF, MakeLocation(10, 10), "", ""}, + }), + HasWarnings(IsEmpty())))); +} + +TEST_P(MacroExpanderParameterizedTest, HandlesInfiniteRecursion) { + MacroCatalog macro_catalog; + macro_catalog.insert({"a", "$b()"}); + macro_catalog.insert({"b", "$a()"}); + + EXPECT_THAT( + ExpandMacros("$a()", macro_catalog, + GetLanguageOptions(/*is_strict=*/GetParam())), + StatusIs(absl::StatusCode::kResourceExhausted, + Eq("Out of stack space due to deeply nested macro calls."))); +} + +INSTANTIATE_TEST_SUITE_P(MacroExpanderParameterizedTest, + MacroExpanderParameterizedTest, Bool()); + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/quoting.cc b/zetasql/parser/macros/quoting.cc new file mode 100644 index 000000000..21b8add41 --- /dev/null +++ b/zetasql/parser/macros/quoting.cc @@ -0,0 +1,158 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/quoting.h" + +#include + +#include "absl/status/statusor.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/str_format.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { +namespace parser { +namespace macros { + +// Returns the actual string of a quote kind. +absl::string_view QuoteStr(const QuoteKind& quote_kind) { + switch (quote_kind) { + case QuoteKind::kNone: + return ""; + case QuoteKind::kBacktick: + return "`"; + case QuoteKind::kOneSingleQuote: + return "'"; + case QuoteKind::kThreeSingleQuotes: + return "'''"; + case QuoteKind::kOneDoubleQuote: + return "\""; + case QuoteKind::kThreeDoubleQuotes: + return "\"\"\""; + } +} + +static absl::string_view QuoteKindDescription(const QuoteKind quote_kind) { + switch (quote_kind) { + case QuoteKind::kNone: + return ""; + case QuoteKind::kBacktick: + return "backtick"; + case QuoteKind::kOneSingleQuote: + return "single quote"; + case QuoteKind::kThreeSingleQuotes: + return "three single quotes"; + case QuoteKind::kOneDoubleQuote: + return "double quote"; + case QuoteKind::kThreeDoubleQuotes: + return "three double quotes"; + } +} + +std::string QuotingSpec::Description() const { + switch (literal_kind()) { + case LiteralTokenKind::kNonLiteral: + return "non literal"; + case LiteralTokenKind::kBacktickedIdentifier: + return "quoted identifier"; + case LiteralTokenKind::kStringLiteral: + return absl::StrFormat("string literal(%s%s)", + QuoteKindDescription(quote_kind()), + prefix().length() == 1 ? " raw" : ""); + case LiteralTokenKind::kBytesLiteral: + return absl::StrFormat("bytes literal(%s%s)", + QuoteKindDescription(quote_kind()), + prefix().length() == 2 ? " raw" : ""); + } +} + +// Given a valid SQL literal or backticked identifier, retrieve the string value +// of its contents. +// REQUIRES: the input text must represent a single, valid SQL literal or +// backticked identifier, and the input quoting must be the result of +// calling FindQuotingKind() on the input literal. +static absl::string_view GetLiteralContents(const absl::string_view literal, + const QuotingSpec quoting_spec) { + absl::string_view quote = QuoteStr(quoting_spec.quote_kind()); + return literal.substr( + quoting_spec.prefix().length() + quote.length(), + literal.length() - quoting_spec.prefix().length() - 2 * quote.length()); +} + +absl::StatusOr QuotingSpec::FindQuotingKind( + const absl::string_view text) { + ZETASQL_RET_CHECK_GE(text.length(), 2); + + if (text.front() == '`') { + ZETASQL_RET_CHECK_EQ(text.front(), text.back()); + return QuotingSpec::BacktickedIdentifier(); + } + + // This a string or a bytes literal + int prefix_end_offset = 0; + while (prefix_end_offset < text.length() && text[prefix_end_offset] != '\'' && + text[prefix_end_offset] != '"') { + prefix_end_offset++; + } + + absl::string_view prefix = text.substr(0, prefix_end_offset); + ZETASQL_RET_CHECK_LE(prefix.length(), 2); + bool is_bytes = false; + for (char c : prefix) { + if (c == 'b' || c == 'B') { + is_bytes = true; + } + } + + absl::string_view literal_without_prefix = text.substr(prefix_end_offset); + ZETASQL_RET_CHECK_EQ(literal_without_prefix.front(), literal_without_prefix.back()); + + bool is_triple_quoted = + (literal_without_prefix.length() >= 6 && + literal_without_prefix.front() == literal_without_prefix[1]); + + QuoteKind quote = QuoteKind::kNone; + if (literal_without_prefix.front() == '\'') { + quote = is_triple_quoted ? QuoteKind::kThreeSingleQuotes + : QuoteKind::kOneSingleQuote; + } else { + ZETASQL_RET_CHECK_EQ(literal_without_prefix.front(), '"'); + quote = is_triple_quoted ? QuoteKind::kThreeDoubleQuotes + : QuoteKind::kOneDoubleQuote; + } + + return is_bytes ? QuotingSpec::BytesLiteral(prefix, quote) + : QuotingSpec::StringLiteral(prefix, quote); +} + +absl::StatusOr QuotingSpec::FindQuotingKind( + const absl::string_view text, absl::string_view& out_literal_contents) { + ZETASQL_ASSIGN_OR_RETURN(QuotingSpec quoting, QuotingSpec::FindQuotingKind(text)); + out_literal_contents = GetLiteralContents(text, quoting); + return quoting; +} + +std::string QuoteText(const absl::string_view escaped_text, + const QuotingSpec& quoting_spec) { + absl::string_view quote = QuoteStr(quoting_spec.quote_kind()); + return absl::StrCat(quoting_spec.prefix(), quote, escaped_text, quote); +} + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/quoting.h b/zetasql/parser/macros/quoting.h new file mode 100644 index 000000000..5e6dcc667 --- /dev/null +++ b/zetasql/parser/macros/quoting.h @@ -0,0 +1,129 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_MACROS_QUOTING_H_ +#define ZETASQL_PARSER_MACROS_QUOTING_H_ + +#include +#include + +#include "absl/status/statusor.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/ret_check.h" + +namespace zetasql { +namespace parser { +namespace macros { + +enum class LiteralTokenKind { + kNonLiteral, + kBacktickedIdentifier, + kStringLiteral, + kBytesLiteral, +}; + +enum class QuoteKind { + kNone, // Used for non-literals and for backticked identifiers + kBacktick, + kOneSingleQuote, + kOneDoubleQuote, + kThreeSingleQuotes, + kThreeDoubleQuotes, +}; + +// Describes the quoting spec of a SQL token. There are 4 main kinds, +// represented by literal_token_kind. STRING and BYTES literals have variations, +// e.g. one single quote, raw bytes with three double quotes, etc. +// 'quote_kind' and 'prefix' capture these finer variations. The 'prefix' +// contains the optional 'r' or 'b' modifiers that indicate raw or bytes. +// Note that the prefix doesn't care about modifier order or case, e.g. 'bR' and +// 'rB' are equivalent. Instead of normalizing them, we keep the original prefix +// for round-tripping, to avoid changing the user input when serializing the +// final expanded result. +// The 'prefix' is a string_view from the input used in creating the +// QuotingSpec. The caller is expected to make the source string outlive this +// QuotingSpec. +class QuotingSpec { + public: + static QuotingSpec NonLiteral() { + return QuotingSpec(LiteralTokenKind::kNonLiteral, QuoteKind::kNone, + /*prefix=*/""); + } + + static QuotingSpec BacktickedIdentifier() { + return QuotingSpec(LiteralTokenKind::kBacktickedIdentifier, + QuoteKind::kBacktick, + /*prefix=*/""); + } + + static QuotingSpec StringLiteral(absl::string_view prefix, + QuoteKind quote_kind) { + ABSL_DCHECK(quote_kind != QuoteKind::kNone); + ABSL_DCHECK(quote_kind != QuoteKind::kBacktick); + ABSL_DCHECK_LE(prefix.length(), 1); + return QuotingSpec(LiteralTokenKind::kStringLiteral, quote_kind, prefix); + } + + static QuotingSpec BytesLiteral(absl::string_view prefix, + QuoteKind quote_kind) { + ABSL_DCHECK(quote_kind != QuoteKind::kNone); + ABSL_DCHECK(quote_kind != QuoteKind::kBacktick); + ABSL_DCHECK_GE(prefix.length(), 1); + ABSL_DCHECK_LE(prefix.length(), 2); + return QuotingSpec(LiteralTokenKind::kBytesLiteral, quote_kind, prefix); + } + + // Finds the quoting kind (e.g. one single quote, raw 3 double quotes, ... + // etc) of the given token. If successful, 'out_literal_contents' takes on + // the contents of the literal. out_literal_contents is always a substring of + // the input 'text' - ultimately, it's a view into the input to the expander. + // REQUIRES: the input text must represent a single, valid SQL token. + static absl::StatusOr FindQuotingKind( + absl::string_view text, absl::string_view& out_literal_contents); + + LiteralTokenKind literal_kind() const { return literal_kind_; } + QuoteKind quote_kind() const { return quote_kind_; } + absl::string_view prefix() const { return prefix_; } + + // Returns a user-friendly string describing this quoting spec. Used only in + // error messages, does not need to be round-trippable. + std::string Description() const; + + private: + QuotingSpec(LiteralTokenKind literal_kind, QuoteKind quote_kind, + absl::string_view prefix) + : literal_kind_(literal_kind), quote_kind_(quote_kind), prefix_(prefix) {} + + // REQUIRES: the input text must represent a single, valid SQL token. + static absl::StatusOr FindQuotingKind(absl::string_view text); + + const LiteralTokenKind literal_kind_; + const QuoteKind quote_kind_; + const absl::string_view prefix_; +}; + +// Returns a string representation of the quote (without the prefix, and only +// one side) For example, for kThreeDoubleQuotes, returns R"(""")". +absl::string_view QuoteStr(const QuoteKind& quote_kind); + +// Wrap the input text with the given quoting spec, *WITHOUT* escaping. +std::string QuoteText(absl::string_view text, const QuotingSpec& quoting_spec); + +} // namespace macros +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_MACROS_QUOTING_H_ diff --git a/zetasql/parser/macros/quoting_test.cc b/zetasql/parser/macros/quoting_test.cc new file mode 100644 index 000000000..653acdb58 --- /dev/null +++ b/zetasql/parser/macros/quoting_test.cc @@ -0,0 +1,156 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/quoting.h" + +#include + +#include "zetasql/base/testing/status_matchers.h" +#include "gtest/gtest.h" +#include "absl/status/statusor.h" +#include "absl/strings/str_format.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { +namespace macros { + +using ::testing::AllOf; +using ::testing::Combine; +using ::testing::Eq; +using ::testing::ExplainMatchResult; +using ::testing::Not; +using ::testing::Property; +using ::testing::Values; +using ::zetasql_base::testing::IsOk; + +MATCHER_P(QuotingMatches, quoting, "") { + return ExplainMatchResult( + AllOf(Property(&QuotingSpec::literal_kind, Eq(quoting.literal_kind())), + Property(&QuotingSpec::quote_kind, Eq(quoting.quote_kind())), + Property(&QuotingSpec::prefix, Eq(quoting.prefix()))), + arg, result_listener); +} + +MATCHER_P2(HasQuotingAndContents, expected_quoting, expected_contents, "") { + absl::string_view actual_contents; + absl::StatusOr actual_quoting = + QuotingSpec::FindQuotingKind(arg, actual_contents); + + if (!ExplainMatchResult(IsOk(), actual_quoting.status(), result_listener)) { + return false; + } + return ExplainMatchResult(QuotingMatches(expected_quoting), + actual_quoting.value(), result_listener) && + ExplainMatchResult(Eq(expected_contents), actual_contents, + result_listener); +} + +MATCHER_P4(MatchesQuotingAndContents, expected_token_kind, expected_quote, + expected_prefix, expected_contents, "") { + absl::string_view actual_contents; + absl::StatusOr actual_quoting = + QuotingSpec::FindQuotingKind(arg, actual_contents); + + if (!ExplainMatchResult(IsOk(), actual_quoting.status(), result_listener)) { + return false; + } + return ExplainMatchResult( + AllOf( + Property(&QuotingSpec::literal_kind, Eq(expected_token_kind)), + Property(&QuotingSpec::quote_kind, Eq(expected_quote)), + Property(&QuotingSpec::prefix, Eq(expected_prefix))), + actual_quoting.value(), result_listener) && + ExplainMatchResult(Eq(expected_contents), actual_contents, + result_listener); +} + +TEST(QuotingTest, DetectsInvalidQuotingTokens) { + // None of these are literal tokens + absl::string_view contents; + EXPECT_THAT(QuotingSpec::FindQuotingKind("abc", contents), Not(IsOk())); + EXPECT_THAT(QuotingSpec::FindQuotingKind("+", contents), Not(IsOk())); + EXPECT_THAT(QuotingSpec::FindQuotingKind("-", contents), Not(IsOk())); +} + +TEST(QuotingTest, DetectsCorrectQuotingFromBacktickedIdentifiers) { + EXPECT_THAT("`a`", + HasQuotingAndContents(QuotingSpec::BacktickedIdentifier(), "a")); + EXPECT_THAT( + "`unic😀de`", + HasQuotingAndContents(QuotingSpec::BacktickedIdentifier(), "unic😀de")); +} + +class QuotingStringLiteralTest + : public ::testing::TestWithParam< + std::tuple> {}; + +TEST_P(QuotingStringLiteralTest, DetectsCorrectQuotingFromLiterals) { + const auto& [prefix, contents, expected_literal_kind] = GetParam(); + EXPECT_THAT( + absl::StrFormat("%s'%s'", prefix, contents), + MatchesQuotingAndContents(expected_literal_kind, + QuoteKind::kOneSingleQuote, prefix, contents)); + EXPECT_THAT( + absl::StrFormat("%s\"%s\"", prefix, contents), + MatchesQuotingAndContents(expected_literal_kind, + QuoteKind::kOneDoubleQuote, prefix, contents)); + EXPECT_THAT(absl::StrFormat("%s'''%s'''", prefix, contents), + MatchesQuotingAndContents(expected_literal_kind, + QuoteKind::kThreeSingleQuotes, prefix, + contents)); + EXPECT_THAT(absl::StrFormat(R"(%s"""%s""")", prefix, contents), + MatchesQuotingAndContents(expected_literal_kind, + QuoteKind::kThreeDoubleQuotes, prefix, + contents)); +} + +INSTANTIATE_TEST_SUITE_P(StringLiteralQuotingDetectionTest, + QuotingStringLiteralTest, + Combine(Values("", "r", "R"), + Values("", "a", "xyz 123", "unic😀de"), + Values(LiteralTokenKind::kStringLiteral))); + +INSTANTIATE_TEST_SUITE_P( + RawStringLiteralQuotingDetectionTest_MultilineAndEscapes, + QuotingStringLiteralTest, + Combine(Values("r", "R"), + Values(R"(line1 unic😀de + line2)", + "\n'\\\"\"\r"), + Values(LiteralTokenKind::kStringLiteral))); + +INSTANTIATE_TEST_SUITE_P(BytesLiteralQuotingDetectionTest, + QuotingStringLiteralTest, + Combine(Values("b", "B", "rb", "rB", "Rb", "RB", "br", + "bR", "Br", "BR"), + Values("", "a", "xyz 123", "unic😀de"), + Values(LiteralTokenKind::kBytesLiteral))); + +INSTANTIATE_TEST_SUITE_P( + RawBytesLiteralQuotingDetectionTest_MultilineAndEscapes, + QuotingStringLiteralTest, + Combine(Values("rb", "rB", "Rb", "RB", "br", "bR", "Br", "BR"), + Values(R"(line1 unic😀de + line2)", + "\n'\\\"\"\r"), + Values(LiteralTokenKind::kBytesLiteral))); + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/standalone_macro_expansion.cc b/zetasql/parser/macros/standalone_macro_expansion.cc new file mode 100644 index 000000000..c45251129 --- /dev/null +++ b/zetasql/parser/macros/standalone_macro_expansion.cc @@ -0,0 +1,112 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/standalone_macro_expansion.h" + +#include +#include +#include + +#include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/macros/token_splicing_utils.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/base/check.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" +#include "absl/types/span.h" + +namespace zetasql { +namespace parser { +namespace macros { + +static bool IsIntegerOrFloatingPointLiteral(const TokenWithLocation& token) { + return token.kind == INTEGER_LITERAL || token.kind == FLOATING_POINT_LITERAL; +} + +static bool SplicingTokensCouldStartComment( + const TokenWithLocation& previous_token, + const TokenWithLocation& current_token) { + return (previous_token.kind == '-' && current_token.kind == '-') || + (previous_token.kind == '/' && current_token.kind == '/') || + (previous_token.kind == '/' && current_token.kind == '*'); +} + +static bool TokensRequireExplicitSeparation( + const TokenWithLocation& previous_token, + const TokenWithLocation& current_token) { + if (current_token.text.empty()) { + // YYEOF doesn't need separation. + return false; + } + + // Macro invocation, keyword or unquoted identifier followed by a character + // that can continue it. + if (previous_token.kind == MACRO_INVOCATION || + IsKeywordOrUnquotedIdentifier(previous_token)) { + return IsIdentifierCharacter(current_token.text.front()); + } + // Macro argument reference followed by a decimal digit. + if (previous_token.kind == MACRO_ARGUMENT_REFERENCE) { + return std::isdigit(current_token.text.front()); + } + + // Avoid comment-outs, where symbols inadvertently become the start of a + // comment. + if (SplicingTokensCouldStartComment(previous_token, current_token)) { + return true; + } + + // Integer and floating-point literals should not splice + if (IsIntegerOrFloatingPointLiteral(previous_token) && + IsIntegerOrFloatingPointLiteral(current_token)) { + return true; + } + + // OK to have no space. + return false; +} + +std::string TokensToString(absl::Span tokens, + bool standardize_to_single_whitespace) { + std::string expanded_sql; + for (auto it = tokens.begin(); it != tokens.end(); ++it) { + const auto& current_token = *it; + absl::string_view whitespace = current_token.preceding_whitespaces; + if (standardize_to_single_whitespace) { + whitespace = " "; + if (it == tokens.begin() || it == tokens.end() - 1) { + // No space before the first token. + // Also at the end it's YYEOF, so we also drop spaces before it (which + // would be trailing to the content). + ABSL_DCHECK(it != tokens.rbegin().base() || current_token.text.empty()); + whitespace = ""; + } + } + if (whitespace.empty() && it != tokens.begin()) { + const TokenWithLocation& previous_token = *(it - 1); + if (TokensRequireExplicitSeparation(previous_token, current_token)) { + // Prevent token splicing by forcing an extra space. + whitespace = " "; + } + } + absl::StrAppend(&expanded_sql, whitespace, current_token.text); + } + return expanded_sql; +} + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/standalone_macro_expansion.h b/zetasql/parser/macros/standalone_macro_expansion.h new file mode 100644 index 000000000..7f3ef5d25 --- /dev/null +++ b/zetasql/parser/macros/standalone_macro_expansion.h @@ -0,0 +1,50 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_MACROS_STANDALONE_MACRO_EXPANSION_H_ +#define ZETASQL_PARSER_MACROS_STANDALONE_MACRO_EXPANSION_H_ + +#include +#include + +#include "zetasql/parser/macros/token_with_location.h" +#include "absl/types/span.h" + +namespace zetasql { +namespace parser { +namespace macros { + +// Converts the given tokens to a string. `standardize_to_single_whitespace` +// controls whether to preserve the whitespace on the tokens, or to always place +// exactly one whitespace between tokens. +// +// IMPORTANT: The function prevents splicing even if +// `standardize_to_single_whitespace` is false by inserting an extra single +// whitespace where needed: +// 1. Unquoted identifier, keyword, or a macro invocation followed by a token +// that starts with a character that could continue the previous token. +// Any potential lenient splicing should have already been done by the +// expander. +// 2. Symbols that may cause comment-out, i.e. --, /*, or // +// 3. Integer and floating point literals, e.g. `1.` and `2` +std::string TokensToString(absl::Span tokens, + bool standardize_to_single_whitespace); + +} // namespace macros +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_MACROS_STANDALONE_MACRO_EXPANSION_H_ diff --git a/zetasql/parser/macros/token_splicing_utils.cc b/zetasql/parser/macros/token_splicing_utils.cc new file mode 100644 index 000000000..cc81c1852 --- /dev/null +++ b/zetasql/parser/macros/token_splicing_utils.cc @@ -0,0 +1,54 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/macros/token_splicing_utils.h" + +#include + +#include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/keywords.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { +namespace macros { + +bool CanCharStartAnIdentifier(const char c) { + return c == '_' || std::isalpha(c); +} + +bool IsQuotedIdentifier(const TokenWithLocation& token) { + return token.kind == IDENTIFIER && token.text.front() == '`'; +} + +bool IsIdentifierCharacter(const char c) { + return CanCharStartAnIdentifier(c) || std::isdigit(c); +} + +bool IsKeywordOrUnquotedIdentifier(int token_kind, + absl::string_view token_text) { + return GetKeywordInfo(token_text) != nullptr || + (token_kind == IDENTIFIER && token_text.front() != '`'); +} + +bool IsKeywordOrUnquotedIdentifier(const TokenWithLocation& token) { + return IsKeywordOrUnquotedIdentifier(token.kind, token.text); +} + +} // namespace macros +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/macros/token_splicing_utils.h b/zetasql/parser/macros/token_splicing_utils.h new file mode 100644 index 000000000..49e9f4725 --- /dev/null +++ b/zetasql/parser/macros/token_splicing_utils.h @@ -0,0 +1,50 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_MACROS_TOKEN_SPLICING_UTILS_H_ +#define ZETASQL_PARSER_MACROS_TOKEN_SPLICING_UTILS_H_ + +#include "zetasql/parser/macros/token_with_location.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { +namespace macros { + +// Indicates whether the given character `c` can start an unquoted identifier. +bool CanCharStartAnIdentifier(char c); + +// Returns true if `token` is a quoted identifier. Note that `token.kind` is +// IDENTIFIER, whether quoted or not, so the text is needed to tell the +// difference. +bool IsQuotedIdentifier(const TokenWithLocation& token); + +// Indicates whether this character can be part of an unquoted identifier. +bool IsIdentifierCharacter(char c); + +// Returns true if the token's text is a keyword or an unquoted identifier. +bool IsKeywordOrUnquotedIdentifier(int token_kind, + absl::string_view token_text); + +// Returns true if the token's text is a keyword or an unquoted identifier. +// Convenience overload. +bool IsKeywordOrUnquotedIdentifier(const TokenWithLocation& token); + +} // namespace macros +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_MACROS_TOKEN_SPLICING_UTILS_H_ diff --git a/zetasql/parser/macros/token_with_location.h b/zetasql/parser/macros/token_with_location.h new file mode 100644 index 000000000..2a2412deb --- /dev/null +++ b/zetasql/parser/macros/token_with_location.h @@ -0,0 +1,50 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_MACROS_TOKEN_WITH_LOCATION_H_ +#define ZETASQL_PARSER_MACROS_TOKEN_WITH_LOCATION_H_ + +#include "zetasql/public/parse_location.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { +namespace macros { + +using Location = ParseLocationRange; + +// Represents one token in the unexpanded input stream. 'Kind' is the Flex token +// kind, e.g. STRING_LITERAL, IDENTIFIER, KW_SELECT. +// Offsets and 'text' refer to the token *without* any whitespaces, consistent +// with Flex. +// Preceding whitespaces are stored in their own string_view; they +// are used when serializing the end result to preserve the original user +// spacing. +struct TokenWithLocation { + int kind; + Location location; + absl::string_view text; + absl::string_view preceding_whitespaces; + + int start_offset() const { return location.start().GetByteOffset(); } + int end_offset() const { return location.end().GetByteOffset(); } +}; + +} // namespace macros +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_MACROS_TOKEN_WITH_LOCATION_H_ diff --git a/zetasql/parser/parse_tree.cc b/zetasql/parser/parse_tree.cc index 7824a85ff..88047852d 100644 --- a/zetasql/parser/parse_tree.cc +++ b/zetasql/parser/parse_tree.cc @@ -1539,9 +1539,9 @@ std::string ASTFilterFieldsArg::SingleNodeDebugString() const { GetSQLForOperator(), ")"); } -std::string ASTLeaf::SingleNodeDebugString() const { +std::string ASTPrintableLeaf::SingleNodeDebugString() const { return absl::StrCat(std::string(ASTNode::SingleNodeDebugString()), "(", - image_, ")"); + image_, ")"); } std::string ASTWithClause::SingleNodeDebugString() const { @@ -1622,6 +1622,8 @@ absl::string_view SchemaObjectKindToName(SchemaObjectKind schema_object_kind) { return "DATABASE"; case SchemaObjectKind::kExternalTable: return "EXTERNAL TABLE"; + case SchemaObjectKind::kExternalSchema: + return "EXTERNAL SCHEMA"; case SchemaObjectKind::kFunction: return "FUNCTION"; case SchemaObjectKind::kIndex: diff --git a/zetasql/parser/parse_tree_serializer.cc.template b/zetasql/parser/parse_tree_serializer.cc.template index 5913f4c96..891919d5f 100644 --- a/zetasql/parser/parse_tree_serializer.cc.template +++ b/zetasql/parser/parse_tree_serializer.cc.template @@ -121,6 +121,10 @@ absl::Status ParseTreeSerializer::Serialize(const {{node.name}}* node, proto->set_{{field.name}}(static_cast<{{field.enum_value}}>(node->{{field.member_name}})); # elif field.member_type is eq('IdString') proto->set_{{field.name}}(node->{{field.member_name}}.ToString()); + # elif not field.serialize_default_value + if (node->{{field.member_name}}) { + proto->set_{{field.name}}(node->{{field.member_name}}); + } # else {# This case includes bool, int, std::string, TypeKind. #} proto->set_{{field.name}}(node->{{field.member_name}}); diff --git a/zetasql/parser/parser.cc b/zetasql/parser/parser.cc index f2fea323d..8d6978cc4 100644 --- a/zetasql/parser/parser.cc +++ b/zetasql/parser/parser.cc @@ -22,47 +22,61 @@ #include #include -#include "zetasql/base/logging.h" +#include "zetasql/base/arena.h" #include "zetasql/common/errors.h" +#include "zetasql/parser/ast_node_kind.h" #include "zetasql/parser/bison_parser.h" #include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/macros/macro_catalog.h" #include "zetasql/parser/parse_tree.h" #include "zetasql/parser/parser_runtime_info.h" +#include "zetasql/parser/statement_properties.h" #include "zetasql/public/id_string.h" +#include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/parse_resume_location.h" #include "absl/container/flat_hash_map.h" +#include "zetasql/base/check.h" #include "absl/memory/memory.h" #include "absl/status/status.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status.h" #include "zetasql/base/status_macros.h" namespace zetasql { -using parser::BisonParserMode; using parser::BisonParser; +using parser::BisonParserMode; +using MacroCatalog = parser::macros::MacroCatalog; ParserOptions::ParserOptions() : ParserOptions(LanguageOptions{}) {} ParserOptions::ParserOptions(std::shared_ptr id_string_pool, std::shared_ptr arena, - const LanguageOptions* language_options) - : arena_(std::move(arena)), id_string_pool_(std::move(id_string_pool)) { - set_language_options(language_options); -} + const LanguageOptions* language_options, + const MacroCatalog* macro_catalog) + : arena_(std::move(arena)), + id_string_pool_(std::move(id_string_pool)), + language_options_(language_options ? *language_options + : LanguageOptions()), + macro_catalog_(macro_catalog) {} -ParserOptions::ParserOptions(LanguageOptions language_options) +ParserOptions::ParserOptions(LanguageOptions language_options, + const MacroCatalog* macro_catalog) : arena_(std::make_shared(/*block_size=*/4096)), id_string_pool_(std::make_shared(arena_)), - language_options_(std::move(language_options)) {} + language_options_(std::move(language_options)), + macro_catalog_(macro_catalog) {} ParserOptions::ParserOptions(std::shared_ptr id_string_pool, std::shared_ptr arena, - LanguageOptions language_options) + LanguageOptions language_options, + const MacroCatalog* macro_catalog) : arena_(std::move(arena)), id_string_pool_(std::move(id_string_pool)), - language_options_(std::move(language_options)) {} + language_options_(std::move(language_options)), + macro_catalog_(macro_catalog) {} ParserOptions::~ParserOptions() = default; @@ -110,7 +124,8 @@ absl::Status ParseStatement(absl::string_view statement_string, BisonParserMode::kStatement, /*filename=*/absl::string_view(), statement_string, /*start_byte_offset=*/0, parser_options.id_string_pool().get(), parser_options.arena().get(), - parser_options.language_options(), &ast_node, &other_allocated_ast_nodes, + parser_options.language_options(), parser_options.macro_catalog(), + &ast_node, &other_allocated_ast_nodes, /*ast_statement_properties=*/nullptr, /*statement_end_byte_offset=*/nullptr); ZETASQL_RETURN_IF_ERROR( @@ -141,7 +156,7 @@ absl::Status ParseScript(absl::string_view script_string, BisonParserMode::kScript, /*filename=*/absl::string_view(), script_string, /*start_byte_offset=*/0, parser_options.id_string_pool().get(), parser_options.arena().get(), parser_options.language_options(), - &ast_node, &other_allocated_ast_nodes, + parser_options.macro_catalog(), &ast_node, &other_allocated_ast_nodes, /*ast_statement_properties=*/nullptr, /*statement_end_byte_offset=*/nullptr); @@ -151,7 +166,11 @@ absl::Status ParseScript(absl::string_view script_string, script = absl::WrapUnique(ast_node.release()->GetAsOrDie()); } ZETASQL_RETURN_IF_ERROR(ConvertInternalErrorLocationAndAdjustErrorString( - error_message_mode, keep_error_location_payload, script_string, status)); + ErrorMessageOptions{ + .mode = error_message_mode, + .attach_error_location_payload = keep_error_location_payload, + .stability = ERROR_MESSAGE_STABILITY_UNSPECIFIED}, + script_string, status)); *output = std::make_unique( parser_options.id_string_pool(), parser_options.arena(), std::move(other_allocated_ast_nodes), std::move(script), @@ -182,7 +201,7 @@ absl::Status ParseNextStatementInternal(ParseResumeLocation* resume_location, mode, resume_location->filename(), resume_location->input(), resume_location->byte_position(), parser_options.id_string_pool().get(), parser_options.arena().get(), parser_options.language_options(), - &ast_node, &other_allocated_ast_nodes, + parser_options.macro_catalog(), &ast_node, &other_allocated_ast_nodes, /*ast_statement_properties=*/nullptr, &next_statement_byte_offset); ZETASQL_RETURN_IF_ERROR( ConvertInternalErrorLocationToExternal(status, resume_location->input())); @@ -239,7 +258,7 @@ absl::Status ParseType(absl::string_view type_string, BisonParserMode::kType, /* filename = */ absl::string_view(), type_string, 0 /* offset */, parser_options.id_string_pool().get(), parser_options.arena().get(), parser_options.language_options(), - &ast_node, &other_allocated_ast_nodes, + parser_options.macro_catalog(), &ast_node, &other_allocated_ast_nodes, /*ast_statement_properties=*/nullptr, /*statement_end_byte_offset=*/nullptr); ZETASQL_RETURN_IF_ERROR(ConvertInternalErrorLocationToExternal(status, type_string)); @@ -267,7 +286,7 @@ absl::Status ParseExpression(absl::string_view expression_string, BisonParserMode::kExpression, /* filename = */ absl::string_view(), expression_string, 0 /* offset */, parser_options.id_string_pool().get(), parser_options.arena().get(), parser_options.language_options(), - &ast_node, &other_allocated_ast_nodes, + parser_options.macro_catalog(), &ast_node, &other_allocated_ast_nodes, /*ast_statement_properties=*/nullptr, /*statement_end_byte_offset=*/nullptr); ZETASQL_RETURN_IF_ERROR( @@ -297,7 +316,8 @@ absl::Status ParseExpression(const ParseResumeLocation& resume_location, BisonParserMode::kExpression, resume_location.filename(), resume_location.input(), resume_location.byte_position(), parser_options.id_string_pool().get(), parser_options.arena().get(), - parser_options.language_options(), &ast_node, &other_allocated_ast_nodes, + parser_options.language_options(), parser_options.macro_catalog(), + &ast_node, &other_allocated_ast_nodes, /*ast_statement_properties=*/nullptr, /*statement_end_byte_offset=*/nullptr); ZETASQL_RETURN_IF_ERROR( @@ -333,7 +353,8 @@ ASTNodeKind ParseNextStatementKind(const ParseResumeLocation& resume_location, parser .Parse(BisonParserMode::kNextStatementKind, resume_location.filename(), resume_location.input(), resume_location.byte_position(), - &id_string_pool, &arena, language_options, /*output=*/nullptr, + &id_string_pool, &arena, language_options, + /*macro_catalog=*/nullptr, /*output=*/nullptr, &other_allocated_ast_nodes, &ast_statement_properties, /*statement_end_byte_offset=*/nullptr) .IgnoreError(); @@ -366,8 +387,9 @@ absl::Status ParseNextStatementProperties( BisonParserMode::kNextStatementKind, resume_location.filename(), resume_location.input(), resume_location.byte_position(), parser_options.id_string_pool().get(), parser_options.arena().get(), - parser_options.language_options(), &output, allocated_ast_nodes, - ast_statement_properties, /*statement_end_byte_offset=*/nullptr); + parser_options.language_options(), parser_options.macro_catalog(), + &output, allocated_ast_nodes, ast_statement_properties, + /*statement_end_byte_offset=*/nullptr); // In kNextStatementKind mode, the bison parser places the statement level // hint in the output parameter. diff --git a/zetasql/parser/parser.h b/zetasql/parser/parser.h index 20bed8e61..03b12ffad 100644 --- a/zetasql/parser/parser.h +++ b/zetasql/parser/parser.h @@ -25,6 +25,7 @@ #include "zetasql/base/arena.h" #include "zetasql/parser/ast_node_kind.h" +#include "zetasql/parser/macros/macro_catalog.h" #include "zetasql/parser/parser_runtime_info.h" #include "zetasql/parser/statement_properties.h" #include "zetasql/public/language_options.h" @@ -50,18 +51,22 @@ class ParserOptions { public: ABSL_DEPRECATED("Use the overload that accepts LanguageOptions.") ParserOptions(); - explicit ParserOptions(LanguageOptions language_options); + explicit ParserOptions( + LanguageOptions language_options, + const parser::macros::MacroCatalog* macro_catalog = nullptr); // This will make a _copy_ of language_options. It is not referenced after // construction. ABSL_DEPRECATED("Inline me!") ParserOptions(std::shared_ptr id_string_pool, std::shared_ptr arena, - const LanguageOptions* language_options); + const LanguageOptions* language_options, + const parser::macros::MacroCatalog* macro_catalog = nullptr); ParserOptions(std::shared_ptr id_string_pool, std::shared_ptr arena, - LanguageOptions language_options = {}); + LanguageOptions language_options = {}, + const parser::macros::MacroCatalog* macro_catalog = nullptr); ~ParserOptions(); // Sets an IdStringPool for storing strings used in parsing. If it is not set, @@ -91,22 +96,14 @@ class ParserOptions { return arena_ != nullptr && id_string_pool_ != nullptr; } - // If nullptr, resets language options to default. Otherwise makes a copy - // of language options. - ABSL_DEPRECATED("Inline me!") - void set_language_options(const LanguageOptions* language_options) { - if (language_options == nullptr) { - language_options_ = LanguageOptions(); - } else { - language_options_ = *language_options; - } - } - void set_language_options(LanguageOptions language_options) { language_options_ = std::move(language_options); } const LanguageOptions& language_options() const { return language_options_; } + const parser::macros::MacroCatalog* macro_catalog() const { + return macro_catalog_; + } private: // Allocate all AST nodes in this arena. @@ -118,6 +115,7 @@ class ParserOptions { std::shared_ptr id_string_pool_; LanguageOptions language_options_; + const parser::macros::MacroCatalog* macro_catalog_ = nullptr; }; // Output of a parse operation. The output parse tree can be accessed via diff --git a/zetasql/parser/parser_internal.h b/zetasql/parser/parser_internal.h index f86c31355..2c692d0df 100644 --- a/zetasql/parser/parser_internal.h +++ b/zetasql/parser/parser_internal.h @@ -22,17 +22,13 @@ #include #include "zetasql/parser/bison_parser.h" -#include "zetasql/parser/flex_tokenizer.h" -#include "zetasql/parser/location.hh" +#include "zetasql/parser/bison_parser_mode.h" #include "zetasql/parser/parse_tree.h" -#include "zetasql/parser/statement_properties.h" -#include "zetasql/public/strings.h" -#include "zetasql/base/case.h" -#include "absl/memory/memory.h" +#include "zetasql/public/parse_location.h" #include "absl/status/status.h" #include "absl/strings/match.h" -#include "absl/strings/str_format.h" #include "absl/strings/str_join.h" +#include "absl/strings/string_view.h" // Shorthand to call parser->CreateASTNode<>(). The "node_type" must be a // AST... class from the zetasql namespace. The "..." are the arguments to @@ -87,7 +83,24 @@ namespace zetasql { +// Forward declarations to avoid an interface and a v-table lookup on every +// token. +namespace parser { +class DisambiguatorLexer; +} // namespace parser + namespace parser_internal { + +// Forward declarations of wrappers so that the generated parser can call the +// disambiguator without an interface and a v-table lookup on every token. +using zetasql::parser::BisonParserMode; +using zetasql::parser::DisambiguatorLexer; + +void SetForceTerminate(DisambiguatorLexer*, int*); +void PushBisonParserMode(DisambiguatorLexer*, BisonParserMode); +void PopBisonParserMode(DisambiguatorLexer*); +int GetNextToken(DisambiguatorLexer*, absl::string_view*, ParseLocationRange*); + enum class NotKeywordPresence { kPresent, kAbsent }; enum class AllOrDistinctKeyword { @@ -217,11 +230,14 @@ class SeparatedIdentifierTmpNode final : public zetasql::ASTNode { }; template -inline int zetasql_bison_parserlex( - SemanticType* yylval, zetasql_bison_parser::location* yylloc, - zetasql::parser::ZetaSqlFlexTokenizer* tokenizer) { +inline int zetasql_bison_parserlex(SemanticType* yylval, + ParseLocationRange* yylloc, + DisambiguatorLexer* tokenizer) { ABSL_DCHECK(tokenizer != nullptr); - return tokenizer->GetNextTokenFlex(yylloc); + absl::string_view text; + int token = GetNextToken(tokenizer, &text, yylloc); + yylval->string_view = {text.data(), text.length()}; + return token; } // Adds 'children' to 'node' and then returns 'node'. @@ -240,10 +256,10 @@ inline ASTNodeType* WithExtraChildren( // locations are nonempty, returns the first location. template inline Location FirstNonEmptyLocation(const Location& a, const Location& b) { - if (a.begin.column != a.end.column) { + if (a.start().GetByteOffset() != a.end().GetByteOffset()) { return a; } - if (b.begin.column != b.end.column) { + if (b.start().GetByteOffset() != b.end().GetByteOffset()) { return b; } return a; @@ -254,15 +270,15 @@ inline Location NonEmptyRangeLocation(const Location& first_location, const MoreLocations&... locations) { std::optional range; for (const Location& location : {first_location, locations...}) { - if (location.begin.column != location.end.column) { + if (location.start().GetByteOffset() != location.end().GetByteOffset()) { if (!range.has_value()) { range = location; } else { - if (location.begin.column < range->begin.column) { - range->begin = location.begin; + if (location.start().GetByteOffset() < range->start().GetByteOffset()) { + range->set_start(location.start()); } - if (location.end.column > range->end.column) { - range->end = location.end; + if (location.end().GetByteOffset() > range->end().GetByteOffset()) { + range->set_end(location.end()); } } } @@ -279,6 +295,15 @@ inline bool IsUnparenthesizedNotExpression(zetasql::ASTNode* node) { expr->op() == ASTUnaryExpression::NOT; } +// Makes a zero-length location range: [point, point). +// This is to simulate a required AST node whose child nodes are all optional. +// The location range of the node when all children are unspecified is an empty +// range. +template +inline ParseLocationRange LocationFromOffset(const LocationPoint& point) { + return ParseLocationRange(point, point); +} + using zetasql::ASTInsertStatement; } // namespace parser_internal diff --git a/zetasql/parser/parser_macro_expansion_test.cc b/zetasql/parser/parser_macro_expansion_test.cc new file mode 100644 index 000000000..5179108b9 --- /dev/null +++ b/zetasql/parser/parser_macro_expansion_test.cc @@ -0,0 +1,116 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include + +#include "zetasql/parser/macros/macro_catalog.h" +#include "zetasql/parser/parser.h" +#include "zetasql/proto/internal_error_location.pb.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/options.pb.h" +#include "zetasql/public/parse_resume_location.h" +#include "zetasql/public/proto/logging.pb.h" +#include "gmock/gmock.h" +#include "gtest/gtest.h" +#include "absl/strings/string_view.h" + +namespace zetasql { + +using MacroCatalog = parser::macros::MacroCatalog; + +using ::testing::_; +using ::testing::HasSubstr; +using ::zetasql_base::testing::StatusIs; + +static LanguageOptions GetLanguageOptions() { + LanguageOptions language_options; + language_options.EnableMaximumLanguageFeatures(); + language_options.EnableLanguageFeature(FEATURE_V_1_4_SQL_MACROS); + return language_options; +} + +TEST(ParserMacroExpansionTest, ExpandsMacros) { + MacroCatalog macro_catalog; + macro_catalog.insert({"m", "$1 b"}); + ParserOptions parser_options(GetLanguageOptions(), ¯o_catalog); + + ParseResumeLocation resume_location = + ParseResumeLocation::FromStringView("SELECT a$m(x)2"); + bool at_end_of_input; + std::unique_ptr parser_output; + ZETASQL_ASSERT_OK(ParseNextStatement(&resume_location, parser_options, &parser_output, + &at_end_of_input)); + EXPECT_TRUE(at_end_of_input); + EXPECT_EQ(parser_output->runtime_info().num_lexical_tokens(), 9); + + EXPECT_EQ(parser_output->statement()->DebugString(), + R"(QueryStatement [0-macro:m:4] + Query [0-macro:m:4] + Select [0-macro:m:4] + SelectList [7-macro:m:4] + SelectColumn [7-macro:m:4] + PathExpression [7-8] + Identifier(ax) [7-8] + Alias [macro:m:3-4] + Identifier(b2) [macro:m:3-4] +)"); +} + +TEST(ParserMacroExpansionTest, RecognizesOnlyOriginalDefineMacroStatements) { + MacroCatalog macro_catalog; + macro_catalog.insert({"def", "DEFINE MACRO x 1"}); + ParserOptions parser_options(GetLanguageOptions(), ¯o_catalog); + + ParseResumeLocation resume_location = + ParseResumeLocation::FromStringView("DEFINE MACRO m 1; $def"); + bool at_end_of_input; + std::unique_ptr parser_output; + ZETASQL_ASSERT_OK(ParseNextStatement(&resume_location, parser_options, &parser_output, + &at_end_of_input)); + EXPECT_FALSE(at_end_of_input); + EXPECT_EQ(parser_output->runtime_info().num_lexical_tokens(), 7); + + EXPECT_EQ(parser_output->statement()->DebugString(), + R"(DefineMacroStatement [0-16] + Identifier(m) [13-14] + MacroBody(1) [15-16] +)"); + + EXPECT_THAT( + ParseNextStatement(&resume_location, parser_options, &parser_output, + &at_end_of_input), + StatusIs(_, HasSubstr("Syntax error: DEFINE MACRO statements cannot be " + "composed from other expansions"))); +} + +TEST(ParserMacroExpansionTest, + CorrectErrorOnPartiallyGeneratedDefineMacroStatement) { + MacroCatalog macro_catalog; + macro_catalog.insert({"def", "define"}); + ParserOptions parser_options(GetLanguageOptions(), ¯o_catalog); + + ParseResumeLocation resume_location = + ParseResumeLocation::FromStringView("$def MACRO m 1"); + bool at_end_of_input; + std::unique_ptr parser_output; + EXPECT_THAT( + ParseNextStatement(&resume_location, parser_options, &parser_output, + &at_end_of_input), + StatusIs(_, HasSubstr("Syntax error: DEFINE MACRO statements cannot be " + "composed from other expansions"))); +} + +} // namespace zetasql diff --git a/zetasql/parser/parser_test.cc b/zetasql/parser/parser_test.cc new file mode 100644 index 000000000..0c0b10fcd --- /dev/null +++ b/zetasql/parser/parser_test.cc @@ -0,0 +1,702 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/parser.h" + +#include +#include +#include +#include + +#include "zetasql/base/commandlineflags.h" +#include "zetasql/base/logging.h" +#include "zetasql/base/helpers.h" +#include "zetasql/base/options.h" +#include "parsers/sql/sql_parser_test_helpers.h" +#include "zetasql/parser/ast_node_kind.h" +#include "zetasql/parser/keywords.h" +#include "zetasql/parser/parser_runtime_info.h" +#include "zetasql/proto/internal_error_location.pb.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/options.pb.h" +#include "zetasql/public/parse_location.h" +#include "zetasql/public/parse_resume_location.h" +#include "zetasql/public/proto/logging.pb.h" +#include "benchmark/benchmark.h" +#include "gmock/gmock.h" +#include "gtest/gtest.h" +#include "absl/flags/declare.h" +#include "absl/flags/flag.h" +#include "zetasql/base/check.h" +#include "absl/status/status.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/status_payload.h" + +ABSL_FLAG(std::string, parser_benchmark_query, + "SELECT Column FROM Table WHERE OtherColumn = 123", + "Test sql to run in BM_ParseQueryFromFlag benchmark"); + +ABSL_FLAG(std::string, parser_benchmark_query_file, "", + "File containing sql to run in BM_ParseQueryFromFile benchmark"); + +ABSL_DECLARE_FLAG(bool, zetasql_parser_strip_errors); + +namespace zetasql { + +using ::absl::StatusCode::kInvalidArgument; +using ::testing::AllOf; +using ::testing::Contains; +using ::testing::ElementsAre; +using ::testing::Eq; +using ::testing::ExplainMatchResult; +using ::testing::HasSubstr; +using ::testing::Not; +using ::zetasql_base::testing::StatusIs; + +MATCHER_P(StatusHasByteOffset, byte_offset, "") { + const InternalErrorLocation& location = + ::util::GetPayload(arg); + return ExplainMatchResult(Eq(byte_offset), location.byte_offset(), + result_listener); +} + +MATCHER_P2(WarningHasSubstrAndByteOffset, substr, byte_offset, "") { + return ExplainMatchResult( + AllOf(StatusIs(absl::StatusCode::kInvalidArgument, HasSubstr(substr)), + StatusHasByteOffset(byte_offset)), + arg, result_listener); +} + +TEST(ParserTest, IssuesWarningWhenQualifyIsUsedAsAnIdentifier) { + std::unique_ptr parser_output; + LanguageOptions options_with_qualify_feature; + options_with_qualify_feature.EnableLanguageFeature(FEATURE_V_1_3_QUALIFY); + ZETASQL_ASSERT_OK( + options_with_qualify_feature.EnableReservableKeyword("QUALIFY", + /*reserved=*/false)); + + // Unambiguous places produce no warning. Same for quoted identifiers. + // Note: the tokenizer doesn't recognize keywords as keywords when prefixed. + // It tags them as identifiers instead. + ZETASQL_ASSERT_OK(ParseStatement( + "select `qualify`.qualify FROM `QUALIFY` WHERE a = 1 " + "QUALIFY `qualify` = 1", + ParserOptions(options_with_qualify_feature), &parser_output)); + + EXPECT_TRUE(parser_output->warnings().empty()); + + const absl::string_view sql_causing_warnings = + "select qualify(1), qualify AS qUaLifY FROM qualify(), qualify AS " + "qualify WHERE qualify = 1 QUALIFY qualify"; + ZETASQL_ASSERT_OK(ParseStatement(sql_causing_warnings, + ParserOptions(options_with_qualify_feature), + &parser_output)); + + // Note the offsets: everything here causes a warning, except the legitimate + // one, which is fully capitalized in this example. + const absl::string_view kQualifyWarning = + "QUALIFY is used as an identifier. QUALIFY may become a reserved word in " + "the future. To make this statement robust, add backticks around QUALIFY " + "to make the identifier unambiguous"; + EXPECT_THAT(parser_output->warnings(), + ElementsAre(WarningHasSubstrAndByteOffset(kQualifyWarning, 7), + WarningHasSubstrAndByteOffset(kQualifyWarning, 19), + WarningHasSubstrAndByteOffset(kQualifyWarning, 30), + WarningHasSubstrAndByteOffset(kQualifyWarning, 43), + WarningHasSubstrAndByteOffset(kQualifyWarning, 54), + WarningHasSubstrAndByteOffset(kQualifyWarning, 65), + WarningHasSubstrAndByteOffset(kQualifyWarning, 79), + WarningHasSubstrAndByteOffset(kQualifyWarning, 99))); + + // Verify that the legitimate use (which we capitalized above) does not + // generate a warning + const int legitimate_usage_start_offset = 91; + EXPECT_EQ(sql_causing_warnings.substr(legitimate_usage_start_offset, + std::string("QUALIFY").length()), + "QUALIFY"); + EXPECT_THAT(parser_output->warnings(), + Not(Contains(WarningHasSubstrAndByteOffset( + kQualifyWarning, legitimate_usage_start_offset)))); +} + +TEST(ParserTest, IssuesWarningWhenFunctionOptionsPlacedAfterBody) { + std::unique_ptr parser_output; + + ZETASQL_ASSERT_OK(ParseStatement( + R"sql(CREATE FUNCTION fn(s STRING) RETURNS STRING LANGUAGE testlang + OPTIONS ( a=b, bruce=springsteen ) AS "return 'a'";)sql", + ParserOptions(), &parser_output)); + EXPECT_TRUE(parser_output->warnings().empty()); + + ZETASQL_ASSERT_OK(ParseStatement( + R"sql(CREATE FUNCTION fn(s STRING) RETURNS STRING LANGUAGE testlang + AS "return 'a'" OPTIONS ( a=b, bruce=springsteen );)sql", + ParserOptions(), &parser_output)); + EXPECT_THAT(parser_output->warnings(), + ElementsAre(WarningHasSubstrAndByteOffset( + "The preferred style places the OPTIONS clause before the " + "function body", + 94))); +} + +TEST(ParserTest, ErrorWhenConcatenatedLiteralsAreNotSeparate) { + std::unique_ptr parser_output; + LanguageOptions language_options; + EXPECT_THAT( + ParseStatement("select 'x''y'", ParserOptions(language_options), + &parser_output), + StatusIs(kInvalidArgument, + HasSubstr("Syntax error: concatenated string literals must be " + "separated by whitespace or comments"))); + + EXPECT_THAT( + ParseStatement("select \"x\"r'y'", ParserOptions(language_options), + &parser_output), + StatusIs(kInvalidArgument, + HasSubstr("Syntax error: concatenated string literals must be " + "separated by whitespace or comments"))); + + EXPECT_THAT( + ParseStatement("select b'x'b'y'", ParserOptions(language_options), + &parser_output), + StatusIs(kInvalidArgument, + HasSubstr("Syntax error: concatenated bytes literals must be " + "separated by whitespace or comments"))); + + EXPECT_THAT( + ParseStatement("select b\"x\"rb'y'", ParserOptions(language_options), + &parser_output), + StatusIs(kInvalidArgument, + HasSubstr("Syntax error: concatenated bytes literals must be " + "separated by whitespace or comments"))); +} + +class DeepStackQueriesTest + : public parsers_sql::LargeExpressionAtParserLimitTest { + void TryToParseExpression( + const int piece_repeat_count, + const parsers_sql::ExpressionTestSpec &test_spec, + bool* expression_parsed) override { + std::string query = ""; + AppendTestExpression(piece_repeat_count, test_spec, &query); + + std::unique_ptr parser_output; + + // There should be no stack based limitations because Bison is not a + // recursive descent parser. + ZETASQL_EXPECT_OK(ParseExpression(query, ParserOptions(), &parser_output)); + } +}; + +PARSE_LIMIT_POINT_TEST(DeepStackQueriesTest, true); + +// Test that comments at the end of queries do not cause errors. +// File-based tests cannot be used since they would add a newline to the end +// of the query. +class ParseQueryEndingWithCommentTest : public ::testing::Test { + public: + absl::Status RunStatementTest( + absl::string_view test_case_input, + std::unique_ptr* parser_output) { + return ParseStatement( + test_case_input, ParserOptions(), parser_output); + } + + absl::Status RunNextStatementTest( + absl::string_view test_case_input, + std::unique_ptr* parser_output) { + bool at_end_of_input; + ParseResumeLocation resume_location = + ParseResumeLocation::FromStringView(test_case_input); + absl::Status status = ParseNextStatement( + &resume_location, ParserOptions(), parser_output, &at_end_of_input); + return status; + } + + absl::Status RunNextScriptStatementTest( + absl::string_view test_case_input, + std::unique_ptr* parser_output) { + bool at_end_of_input; + ParseResumeLocation resume_location = + ParseResumeLocation::FromStringView(test_case_input); + absl::Status status = ParseNextScriptStatement( + &resume_location, ParserOptions(), parser_output, &at_end_of_input); + return status; + } + + ASTNodeKind RunStatementKindTest( + absl::string_view test_case_input) { + bool is_ctas; + LanguageOptions language_options; + return ParseStatementKind( + test_case_input, language_options, &is_ctas); + } + + absl::Status RunScriptTest( + absl::string_view test_case_input, + std::unique_ptr* parser_output) { + return ParseScript( + test_case_input, ParserOptions(), ERROR_MESSAGE_WITH_PAYLOAD, + parser_output); + } +}; + +TEST_F(ParseQueryEndingWithCommentTest, DashComment) { + absl::string_view query = "select 0;--comment"; + const int kStmtLength = 8; + std::unique_ptr parser_output; + + ZETASQL_EXPECT_OK(RunStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextScriptStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ASTNodeKind node_kind = RunStatementKindTest(query); + ASSERT_EQ(node_kind, AST_QUERY_STATEMENT); + + ZETASQL_EXPECT_OK(RunScriptTest(query, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 9); +} + +TEST_F(ParseQueryEndingWithCommentTest, PoundComment) { + absl::string_view query = "select 0;#comment"; + const int kStmtLength = 8; + std::unique_ptr parser_output; + + ZETASQL_EXPECT_OK(RunStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextScriptStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ASTNodeKind node_kind = RunStatementKindTest(query); + ASSERT_EQ(node_kind, AST_QUERY_STATEMENT); + + ZETASQL_EXPECT_OK(RunScriptTest(query, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 9); +} + +TEST_F(ParseQueryEndingWithCommentTest, CommentEndingWithWhitespace) { + absl::string_view query = "select 0;#comment "; + const int kStmtLength = 8; + std::unique_ptr parser_output; + + ZETASQL_EXPECT_OK(RunStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextScriptStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ASTNodeKind node_kind = RunStatementKindTest(query); + ASSERT_EQ(node_kind, AST_QUERY_STATEMENT); + + ZETASQL_EXPECT_OK(RunScriptTest(query, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 9); +} + +TEST_F(ParseQueryEndingWithCommentTest, EmptyComment) { + absl::string_view query = "select 0;--"; + const int kStmtLength = 8; + std::unique_ptr parser_output; + + ZETASQL_EXPECT_OK(RunStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ZETASQL_EXPECT_OK(RunNextScriptStatementTest(query, &parser_output)); + ASSERT_NE(parser_output->statement(), nullptr); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->statement()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(kStmtLength)); + + ASTNodeKind node_kind = RunStatementKindTest(query); + ASSERT_EQ(node_kind, AST_QUERY_STATEMENT); + + ZETASQL_EXPECT_OK(RunScriptTest(query, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 9); +} + +TEST_F(ParseQueryEndingWithCommentTest, JustComment) { + absl::string_view query = "--comment"; + std::unique_ptr parser_output; + EXPECT_THAT( + RunStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + EXPECT_THAT( + RunNextStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + EXPECT_THAT( + RunNextScriptStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + ASTNodeKind node_kind = RunStatementKindTest(query); + ASSERT_EQ(node_kind, kUnknownASTNodeKind); + + ZETASQL_EXPECT_OK(RunScriptTest(query, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 0); +} + +TEST_F(ParseQueryEndingWithCommentTest, JustEmptyComment) { + absl::string_view query = "--"; + std::unique_ptr parser_output; + EXPECT_THAT( + RunStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + EXPECT_THAT( + RunNextStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + EXPECT_THAT( + RunNextScriptStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + ASTNodeKind node_kind = RunStatementKindTest(query); + ASSERT_EQ(node_kind, kUnknownASTNodeKind); + + ZETASQL_EXPECT_OK(RunScriptTest(query, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 0); +} + +TEST_F(ParseQueryEndingWithCommentTest, IncompleteStmt) { + absl::string_view query = "SELECT --comment"; + std::unique_ptr parser_output; + EXPECT_THAT( + RunStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + EXPECT_THAT( + RunNextStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + EXPECT_THAT( + RunNextScriptStatementTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of statement"))); + + ASTNodeKind node_kind = RunStatementKindTest(query); + ASSERT_EQ(node_kind, AST_QUERY_STATEMENT); + + EXPECT_THAT( + RunScriptTest(query, &parser_output), + zetasql_base::testing::StatusIs( + testing::_, + testing::HasSubstr("Syntax error: Unexpected end of script"))); +} + +TEST_F(ParseQueryEndingWithCommentTest, ParseType) { + absl::string_view type = "int64--comment"; + std::unique_ptr parser_output; + ZETASQL_EXPECT_OK(ParseType(type, ParserOptions(), &parser_output)); + ASSERT_NE(parser_output->type(), nullptr); + EXPECT_EQ(parser_output->type()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->type()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(5)); +} + +TEST_F(ParseQueryEndingWithCommentTest, ParseExpression) { + absl::string_view expression = "x + 1--comment "; + std::unique_ptr parser_output; + ZETASQL_EXPECT_OK(ParseExpression(expression, ParserOptions(), &parser_output)); + ASSERT_NE(parser_output->expression(), nullptr); + EXPECT_EQ(parser_output->expression()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ(parser_output->expression()->GetParseLocationRange().end(), + ParseLocationPoint::FromByteOffset(5)); +} + +TEST_F(ParseQueryEndingWithCommentTest, ScriptDashComment) { + absl::string_view script = "select 0;select 1;--comment"; + std::unique_ptr parser_output; + ZETASQL_EXPECT_OK(RunScriptTest(script, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 18); +} + +TEST_F(ParseQueryEndingWithCommentTest, ScriptPoundCommentWithWhitespace) { + absl::string_view script = "select 0;select 1;#comment "; + std::unique_ptr parser_output; + ZETASQL_EXPECT_OK(RunScriptTest(script, &parser_output)); + ASSERT_NE(parser_output->script(), nullptr); + EXPECT_EQ(parser_output->script()->GetParseLocationRange().start(), + ParseLocationPoint::FromByteOffset(0)); + EXPECT_EQ( + parser_output->script()->GetParseLocationRange().end().GetByteOffset(), + 18); +} + +static int64_t TotalNanos(const google::protobuf::Duration duration) { + return 1000 * duration.seconds() + duration.nanos(); +} + +class LanguageOptionsMigrationTest : public ::testing::Test {}; + +TEST_F(LanguageOptionsMigrationTest, LanguageOptionsTest) { + // For this test case, we use FEATURE_V_1_3_ALLOW_SLASH_PATHS, which is + // disabled by default, as a testbed to ensure language options are + // properly propagated and respected, regardless of how ParserOptions + // are constructed/copied/moved, etc. + constexpr absl::string_view query = "select 0 from /a/b"; + std::unique_ptr parser_output; + // By default, this should not parse, because FEATURE_V_1_3_ALLOW_SLASH_PATHS + // is disabled. + EXPECT_THAT( + ParseStatement(query, ParserOptions{LanguageOptions{}}, &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + + EXPECT_THAT( + ParseStatement(query, ParserOptions{nullptr, nullptr, LanguageOptions{}}, + &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + + // ... Also during migration, this is also disabled by default + EXPECT_THAT(ParseStatement(query, ParserOptions{}, &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + EXPECT_THAT(ParseStatement(query, ParserOptions{nullptr, nullptr, nullptr}, + &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + + // If we provide default language options explicitly, it should behave + // the same way. + LanguageOptions explicit_language_options; + EXPECT_THAT(ParseStatement(query, ParserOptions{explicit_language_options}, + &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + + EXPECT_THAT( + ParseStatement(query, + ParserOptions{nullptr, nullptr, explicit_language_options}, + &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + + EXPECT_THAT( + ParseStatement( + query, ParserOptions{nullptr, nullptr, &explicit_language_options}, + &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + + { + // Weird corner case during migration, clarify that we make a copy, and + // will ignore mutations to mutable_language_options after construction. + LanguageOptions mutable_language_options; + + ParserOptions options{nullptr, nullptr, &mutable_language_options}; + EXPECT_THAT(ParseStatement(query, options, &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + mutable_language_options.EnableLanguageFeature( + FEATURE_V_1_3_ALLOW_SLASH_PATHS); + EXPECT_THAT(ParseStatement(query, options, &parser_output), + zetasql_base::testing::StatusIs(absl::StatusCode::kInvalidArgument)); + } +} + +static void BM_ParseQuery(absl::string_view query, benchmark::State& state) { + for (auto s : state) { + std::unique_ptr parser_output; + absl::Status status = + ParseStatement(query, ParserOptions(), &parser_output); + ZETASQL_EXPECT_OK(status); + + // Don't measure the destruction times. The AST is passed into the + // AnalyzerOutput and can be destroyed outside the critical path if needed. + state.PauseTiming(); + parser_output.reset(); + state.ResumeTiming(); + } +} + +// Generate expressions like "select 1+1+1+1+..." with additions. +// This is a baseline that doesn't demonstrate bad runtime. +static void BM_ParseAdd(benchmark::State& state) { + const int size = state.range(0); + + std::string query = "select 1"; + for (int i = 0; i < size; ++i) { + query += "+1"; + } + BM_ParseQuery(query, state); +} +BENCHMARK(BM_ParseAdd)->Range(1, 2048); + +// Generate expressions like "(((((1)))))" with parentheses. +// This demonstrates the bad quadratic performance of lookahead on parentheses +// in expressions. +static void BM_ParseParensExpr(benchmark::State& state) { + const int size = state.range(0); + + const std::string query = absl::StrCat("SELECT ", std::string(size, '('), "1", + std::string(size, ')')); + BM_ParseQuery(query, state); +} +BENCHMARK(BM_ParseParensExpr)->Range(1, 512); + +// Generate expressions like "(((((select 1)))))" with parentheses. +// This demonstrates the bad quadratic performance of lookahead on parentheses +// in queries. +static void BM_ParseParensQuery(benchmark::State& state) { + const int size = state.range(0); + + const std::string query = + absl::StrCat(std::string(size, '('), "select 1", std::string(size, ')')); + BM_ParseQuery(query, state); +} +BENCHMARK(BM_ParseParensQuery)->Range(1, 2048); + +// Generate expressions like "select 1 from (((((select 1)))))" +// with parentheses. This demonstrates the bad quadratic performance of +// lookahead on parentheses in FROM clauses (looking for joins). +static void BM_ParseParensJoin(benchmark::State& state) { + const int size = state.range(0); + + const std::string query = + absl::StrCat("select 1 from ", std::string(size, '('), "select 1", + std::string(size, ')')); + BM_ParseQuery(query, state); +} +BENCHMARK(BM_ParseParensJoin)->Range(1, 2048); + +static void BM_ParseQueryTrivial(benchmark::State& state) { + BM_ParseQuery("SELECT 1", state); +} +BENCHMARK(BM_ParseQueryTrivial); + +static void BM_ParseQueryFromFlag(benchmark::State& state) { + BM_ParseQuery(absl::GetFlag(FLAGS_parser_benchmark_query), state); +} +BENCHMARK(BM_ParseQueryFromFlag); + +static void BM_ParseQueryFromFile(benchmark::State& state) { + if (absl::GetFlag(FLAGS_parser_benchmark_query_file).empty()) return; + std::string query; + ZETASQL_QCHECK_OK(zetasql_base::GetContents(absl::GetFlag(FLAGS_parser_benchmark_query_file), + &query, ::zetasql_base::Defaults())); + BM_ParseQuery(query, state); +} +BENCHMARK(BM_ParseQueryFromFile); + +} // namespace zetasql diff --git a/zetasql/parser/run_parser_test.cc b/zetasql/parser/run_parser_test.cc index 1c9eb4239..56db07c76 100644 --- a/zetasql/parser/run_parser_test.cc +++ b/zetasql/parser/run_parser_test.cc @@ -150,7 +150,7 @@ class RunParserTest : public ::testing::Test { private: // Adds the test outputs in 'test_outputs' to 'annotated_outputs', annotated // with 'annotation'. - void AddAnnotatedTestOutputs(const std::vector& test_outputs, + void AddAnnotatedTestOutputs(absl::Span test_outputs, absl::string_view annotation, std::vector* annotated_outputs) { for (const std::string& test_output : test_outputs) { @@ -370,12 +370,12 @@ class RunParserTest : public ::testing::Test { // and just compare the shape of the tree for those. We also erase the // location information. static const RE2 cleanups[] = { - {R"((StringLiteral)\(('[^']*')\))"}, - {R"((StringLiteral)\(("[^"]*")\))"}, - {R"((StringLiteral)\((""".*""")\))"}, - {R"((StringLiteral)\(('''.*''')\))"}, - {R"((StringLiteral)\([^)]*\))"}, - {R"((BytesLiteral)\([^)]*\))", RE2::Latin1}, + {R"((StringLiteralComponent)\(('[^']*')\))"}, + {R"((StringLiteralComponent)\(("[^"]*")\))"}, + {R"((StringLiteralComponent)\((""".*""")\))"}, + {R"((StringLiteralComponent)\(('''.*''')\))"}, + {R"((StringLiteralComponent)\([^)]*\))"}, + {R"((BytesLiteralComponent)\([^)]*\))", RE2::Latin1}, {R"((FloatLiteral)\([^)]*\))"}, {R"((IntLiteral)\([^)]*\))"}, {R"((NumericLiteral)\([^)]*\))"}, @@ -595,7 +595,6 @@ class RunParserTest : public ::testing::Test { ZETASQL_ASSIGN_OR_RETURN(LanguageOptions::LanguageFeatureSet features, GetRequiredLanguageFeatures(test_case_options_)); language_options_->SetEnabledLanguageFeatures(features); - language_options_->EnableLanguageFeature(FEATURE_TEXTMAPPER_PARSER); language_options_->EnableLanguageFeature(FEATURE_SHADOW_PARSING); if (test_case_options_.GetBool(kQualifyReserved)) { diff --git a/zetasql/parser/testdata/aggregation.test b/zetasql/parser/testdata/aggregation.test index c882ead77..19d0915f3 100644 --- a/zetasql/parser/testdata/aggregation.test +++ b/zetasql/parser/testdata/aggregation.test @@ -605,7 +605,8 @@ QueryStatement [0-73] [select x,...group by x] PathExpression [32-33] [y] Identifier(y) [32-33] [y] Collate [34-49] [collate "en_US"] - StringLiteral("en_US") [42-49] ["en_US"] + StringLiteral [42-49] ["en_US"] + StringLiteralComponent("en_US") [42-49] ["en_US"] FromClause [56-62] [from T] TablePathExpression [61-62] [T] PathExpression [61-62] [T] @@ -2201,3 +2202,13 @@ GROUP BY GROUPING SETS(x, x+1 AS y, ROLLUP(x AS y), CUBE(z AS zz)) ERROR: Syntax error: Expected ")" or "," but got keyword AS [at 3:31] GROUP BY GROUPING SETS(x, x+1 AS y, ROLLUP(x AS y), CUBE(z AS zz)) ^ +== + +# Trailing commas don't work. It would be nice, but causes a parser conflict. +select COUNT(*) +from T +group by x,y, +-- +ERROR: Syntax error: Unexpected end of statement [at 3:14] +group by x,y, + ^ diff --git a/zetasql/parser/testdata/alter_column_set_drop_default.test b/zetasql/parser/testdata/alter_column_set_drop_default.test index 9b2332fd8..5b803d5d4 100644 --- a/zetasql/parser/testdata/alter_column_set_drop_default.test +++ b/zetasql/parser/testdata/alter_column_set_drop_default.test @@ -7,7 +7,8 @@ AlterTableStatement [0-57] [ALTER TABLE...my default'] AlterActionList [16-57] [ALTER COLUMN...my default'] AlterColumnSetDefaultAction [16-57] [ALTER COLUMN...my default'] Identifier(bar) [29-32] [bar] - StringLiteral('my default') [45-57] ['my default'] + StringLiteral [45-57] ['my default'] + StringLiteralComponent('my default') [45-57] ['my default'] -- ALTER TABLE foo ALTER COLUMN bar SET DEFAULT 'my default' == @@ -58,7 +59,8 @@ AlterTableStatement [0-66] [ALTER TABLE...collate(y, 'x')] Identifier(`collate`) [51-58] [collate] PathExpression [59-60] [y] Identifier(y) [59-60] [y] - StringLiteral('x') [62-65] ['x'] + StringLiteral [62-65] ['x'] + StringLiteralComponent('x') [62-65] ['x'] -- ALTER TABLE foo ALTER COLUMN `collate` SET DEFAULT `collate`(y, 'x') == @@ -119,4 +121,3 @@ AlterTableStatement [0-55] [ALTER TABLE...DROP DEFAULT] Identifier(bar) [39-42] [bar] -- ALTER TABLE foo ALTER COLUMN IF EXISTS bar DROP DEFAULT -== diff --git a/zetasql/parser/testdata/alter_column_type.test b/zetasql/parser/testdata/alter_column_type.test index b3bcd4d5e..0c93ee442 100644 --- a/zetasql/parser/testdata/alter_column_type.test +++ b/zetasql/parser/testdata/alter_column_type.test @@ -158,7 +158,8 @@ AlterTableStatement [0-56] [ALTER TABLE...foo="bar")] OptionsList [45-56] [(foo="bar")] OptionsEntry [46-55] [foo="bar"] Identifier(foo) [46-49] [foo] - StringLiteral("bar") [50-55] ["bar"] + StringLiteral [50-55] ["bar"] + StringLiteralComponent("bar") [50-55] ["bar"] -- ALTER TABLE foo ALTER COLUMN bar SET OPTIONS(foo = "bar") @@ -177,7 +178,8 @@ AlterTableStatement [0-66] [ALTER TABLE...foo="bar")] OptionsList [55-66] [(foo="bar")] OptionsEntry [56-65] [foo="bar"] Identifier(foo) [56-59] [foo] - StringLiteral("bar") [60-65] ["bar"] + StringLiteral [60-65] ["bar"] + StringLiteralComponent("bar") [60-65] ["bar"] -- ALTER TABLE foo ALTER COLUMN IF EXISTS bar SET OPTIONS(foo = "bar") @@ -257,7 +259,8 @@ AlterTableStatement [0-103] [ALTER TABLE...description")] OptionsList [72-103] [(description...description")] OptionsEntry [73-102] [description="new description"] Identifier(description) [73-84] [description] - StringLiteral("new description") [85-102] ["new description"] + StringLiteral [85-102] ["new description"] + StringLiteralComponent("new description") [85-102] ["new description"] -- ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE STRING NOT NULL OPTIONS(description = "new description") -- @@ -277,7 +280,8 @@ AlterTableStatement [0-112] [ALTER TABLE...description")] OptionsList [81-112] [(description...description")] OptionsEntry [82-111] [description="new description"] Identifier(description) [82-93] [description] - StringLiteral("new description") [94-111] ["new description"] + StringLiteral [94-111] ["new description"] + StringLiteralComponent("new description") [94-111] ["new description"] -- ALTER TABLE foo ALTER COLUMN IF EXISTS bar SET DATA TYPE STRING NOT NULL OPTIONS(description = "new description") == @@ -297,7 +301,8 @@ AlterTableStatement [0-75] [ALTER TABLE...unicode:ci'] PathExpression [48-54] [STRING] Identifier(STRING) [48-54] [STRING] Collate [55-75] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [63-75] ['unicode:ci'] + StringLiteral [63-75] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [63-75] ['unicode:ci'] -- ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE STRING COLLATE 'unicode:ci' -- @@ -313,7 +318,8 @@ AlterTableStatement [0-84] [ALTER TABLE...unicode:ci'] PathExpression [57-63] [STRING] Identifier(STRING) [57-63] [STRING] Collate [64-84] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [72-84] ['unicode:ci'] + StringLiteral [72-84] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [72-84] ['unicode:ci'] -- ALTER TABLE foo ALTER COLUMN IF EXISTS bar SET DATA TYPE STRING COLLATE 'unicode:ci' == @@ -335,7 +341,8 @@ AlterTableStatement [0-82] [ALTER TABLE...unicode:ci'>] PathExpression [54-60] [STRING] Identifier(STRING) [54-60] [STRING] Collate [61-81] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [69-81] ['unicode:ci'] + StringLiteral [69-81] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [69-81] ['unicode:ci'] -- ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE ARRAY< STRING COLLATE 'unicode:ci' > -- @@ -352,7 +359,8 @@ AlterTableStatement [0-91] [ALTER TABLE...unicode:ci'>] PathExpression [63-69] [STRING] Identifier(STRING) [63-69] [STRING] Collate [70-90] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [78-90] ['unicode:ci'] + StringLiteral [78-90] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [78-90] ['unicode:ci'] -- ALTER TABLE foo ALTER COLUMN IF EXISTS bar SET DATA TYPE ARRAY< STRING COLLATE 'unicode:ci' > == @@ -375,7 +383,8 @@ AlterTableStatement [0-96] [ALTER TABLE...b NUMERIC>] PathExpression [57-63] [STRING] Identifier(STRING) [57-63] [STRING] Collate [64-84] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [72-84] ['unicode:ci'] + StringLiteral [72-84] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [72-84] ['unicode:ci'] StructColumnField [86-95] [b NUMERIC] Identifier(b) [86-87] [b] SimpleColumnSchema [88-95] [NUMERIC] @@ -399,7 +408,8 @@ AlterTableStatement [0-105] [ALTER TABLE...b NUMERIC>] PathExpression [66-72] [STRING] Identifier(STRING) [66-72] [STRING] Collate [73-93] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [81-93] ['unicode:ci'] + StringLiteral [81-93] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [81-93] ['unicode:ci'] StructColumnField [95-104] [b NUMERIC] Identifier(b) [95-96] [b] SimpleColumnSchema [97-104] [NUMERIC] diff --git a/zetasql/parser/testdata/alter_row_access_policy.test b/zetasql/parser/testdata/alter_row_access_policy.test index 081e1130b..22e21f195 100644 --- a/zetasql/parser/testdata/alter_row_access_policy.test +++ b/zetasql/parser/testdata/alter_row_access_policy.test @@ -28,7 +28,8 @@ AlterRowAccessPolicyStatement [0-60] AlterActionList [33-60] GrantToClause [33-60] GranteeList [43-59] - StringLiteral('foo@google.com') [43-59] + StringLiteral [43-59] + StringLiteralComponent('foo@google.com') [43-59] -- ALTER ROW ACCESS POLICY p1 ON t1 GRANT TO ('foo@google.com') == @@ -60,7 +61,8 @@ AlterRowAccessPolicyStatement [0-63] AlterActionList [33-63] RevokeFromClause [33-63] GranteeList [46-62] - StringLiteral('bar@google.com') [46-62] + StringLiteral [46-62] + StringLiteralComponent('bar@google.com') [46-62] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ('bar@google.com') == @@ -94,15 +96,18 @@ AlterRowAccessPolicyStatement [0-132] Identifier(p2) [43-45] GrantToClause [47-74] GranteeList [57-73] - StringLiteral("foo@google.com") [57-73] + StringLiteral [57-73] + StringLiteralComponent("foo@google.com") [57-73] RevokeFromClause [76-106] GranteeList [89-105] - StringLiteral("bar@google.com") [89-105] + StringLiteral [89-105] + StringLiteralComponent("bar@google.com") [89-105] FilterUsingClause [108-132] BinaryExpression(=) [122-131] PathExpression [122-123] Identifier(c) [122-123] - StringLiteral("foo") [126-131] + StringLiteral [126-131] + StringLiteralComponent("foo") [126-131] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, GRANT TO ("foo@google.com"), REVOKE FROM ("bar@google.com"), FILTER USING (c = "foo") @@ -119,13 +124,15 @@ AlterRowAccessPolicyStatement [0-117] Identifier(p2) [43-45] GrantToClause [47-74] GranteeList [57-73] - StringLiteral("foo@google.com") [57-73] + StringLiteral [57-73] + StringLiteralComponent("foo@google.com") [57-73] RevokeFromClause(is_revoke_from_all) [76-91] FilterUsingClause [93-117] BinaryExpression(=) [107-116] PathExpression [107-108] Identifier(c) [107-108] - StringLiteral("foo") [111-116] + StringLiteral [111-116] + StringLiteralComponent("foo") [111-116] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, GRANT TO ("foo@google.com"), REVOKE FROM ALL, FILTER USING (c = "foo") @@ -142,12 +149,14 @@ AlterRowAccessPolicyStatement [0-101] Identifier(p2) [43-45] GrantToClause [47-74] GranteeList [57-73] - StringLiteral("foo@google.com") [57-73] + StringLiteral [57-73] + StringLiteralComponent("foo@google.com") [57-73] FilterUsingClause [77-101] BinaryExpression(=) [91-100] PathExpression [91-92] Identifier(c) [91-92] - StringLiteral("foo") [95-100] + StringLiteral [95-100] + StringLiteralComponent("foo") [95-100] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, GRANT TO ("foo@google.com"), FILTER USING (c = "foo") -- @@ -163,12 +172,14 @@ AlterRowAccessPolicyStatement [0-104] Identifier(p2) [43-45] RevokeFromClause [48-78] GranteeList [61-77] - StringLiteral("bar@google.com") [61-77] + StringLiteral [61-77] + StringLiteralComponent("bar@google.com") [61-77] FilterUsingClause [80-104] BinaryExpression(=) [94-103] PathExpression [94-95] Identifier(c) [94-95] - StringLiteral("foo") [98-103] + StringLiteral [98-103] + StringLiteralComponent("foo") [98-103] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, REVOKE FROM ("bar@google.com"), FILTER USING (c = "foo") -- @@ -187,7 +198,8 @@ AlterRowAccessPolicyStatement [0-89] BinaryExpression(=) [79-88] PathExpression [79-80] Identifier(c) [79-80] - StringLiteral("foo") [83-88] + StringLiteral [83-88] + StringLiteralComponent("foo") [83-88] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, REVOKE FROM ALL, FILTER USING (c = "foo") -- @@ -205,7 +217,8 @@ AlterRowAccessPolicyStatement [0-73] BinaryExpression(=) [63-72] PathExpression [63-64] Identifier(c) [63-64] - StringLiteral("foo") [67-72] + StringLiteral [67-72] + StringLiteralComponent("foo") [67-72] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, FILTER USING (c = "foo") -- @@ -218,15 +231,18 @@ AlterRowAccessPolicyStatement [0-119] AlterActionList [34-119] GrantToClause [34-61] GranteeList [44-60] - StringLiteral("foo@google.com") [44-60] + StringLiteral [44-60] + StringLiteralComponent("foo@google.com") [44-60] RevokeFromClause [63-93] GranteeList [76-92] - StringLiteral("bar@google.com") [76-92] + StringLiteral [76-92] + StringLiteralComponent("bar@google.com") [76-92] FilterUsingClause [95-119] BinaryExpression(=) [109-118] PathExpression [109-110] Identifier(c) [109-110] - StringLiteral("foo") [113-118] + StringLiteral [113-118] + StringLiteralComponent("foo") [113-118] -- ALTER ROW ACCESS POLICY p1 ON t1 GRANT TO ("foo@google.com"), REVOKE FROM ("bar@google.com"), FILTER USING (c = "foo") @@ -240,13 +256,15 @@ AlterRowAccessPolicyStatement [0-104] AlterActionList [34-104] GrantToClause [34-61] GranteeList [44-60] - StringLiteral("foo@google.com") [44-60] + StringLiteral [44-60] + StringLiteralComponent("foo@google.com") [44-60] RevokeFromClause(is_revoke_from_all) [63-78] FilterUsingClause [80-104] BinaryExpression(=) [94-103] PathExpression [94-95] Identifier(c) [94-95] - StringLiteral("foo") [98-103] + StringLiteral [98-103] + StringLiteralComponent("foo") [98-103] -- ALTER ROW ACCESS POLICY p1 ON t1 GRANT TO ("foo@google.com"), REVOKE FROM ALL, FILTER USING (c = "foo") -- @@ -259,12 +277,14 @@ AlterRowAccessPolicyStatement [0-88] AlterActionList [34-88] GrantToClause [34-61] GranteeList [44-60] - StringLiteral("foo@google.com") [44-60] + StringLiteral [44-60] + StringLiteralComponent("foo@google.com") [44-60] FilterUsingClause [64-88] BinaryExpression(=) [78-87] PathExpression [78-79] Identifier(c) [78-79] - StringLiteral("foo") [82-87] + StringLiteral [82-87] + StringLiteralComponent("foo") [82-87] -- ALTER ROW ACCESS POLICY p1 ON t1 GRANT TO ("foo@google.com"), FILTER USING (c = "foo") -- @@ -277,12 +297,14 @@ AlterRowAccessPolicyStatement [0-91] AlterActionList [35-91] RevokeFromClause [35-65] GranteeList [48-64] - StringLiteral("bar@google.com") [48-64] + StringLiteral [48-64] + StringLiteralComponent("bar@google.com") [48-64] FilterUsingClause [67-91] BinaryExpression(=) [81-90] PathExpression [81-82] Identifier(c) [81-82] - StringLiteral("foo") [85-90] + StringLiteral [85-90] + StringLiteralComponent("foo") [85-90] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ("bar@google.com"), FILTER USING (c = "foo") -- @@ -298,7 +320,8 @@ AlterRowAccessPolicyStatement [0-76] BinaryExpression(=) [66-75] PathExpression [66-67] Identifier(c) [66-67] - StringLiteral("foo") [70-75] + StringLiteral [70-75] + StringLiteralComponent("foo") [70-75] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ALL, FILTER USING (c = "foo") -- @@ -313,7 +336,8 @@ AlterRowAccessPolicyStatement [0-60] BinaryExpression(=) [50-59] PathExpression [50-51] Identifier(c) [50-51] - StringLiteral("foo") [54-59] + StringLiteral [54-59] + StringLiteralComponent("foo") [54-59] -- ALTER ROW ACCESS POLICY p1 ON t1 FILTER USING (c = "foo") == @@ -332,7 +356,8 @@ AlterRowAccessPolicyStatement [0-132] AlterActionList [33-132] RevokeFromClause [33-63] GranteeList [46-62] - StringLiteral("bar@google.com") [46-62] + StringLiteral [46-62] + StringLiteralComponent("bar@google.com") [46-62] RenameToClause [65-77] PathExpression [75-77] Identifier(p2) [75-77] @@ -340,10 +365,12 @@ AlterRowAccessPolicyStatement [0-132] BinaryExpression(=) [93-102] PathExpression [93-94] Identifier(c) [93-94] - StringLiteral("foo") [97-102] + StringLiteral [97-102] + StringLiteralComponent("foo") [97-102] GrantToClause [105-132] GranteeList [115-131] - StringLiteral("foo@google.com") [115-131] + StringLiteral [115-131] + StringLiteralComponent("foo@google.com") [115-131] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ("bar@google.com"), RENAME TO p2, FILTER USING (c = "foo"), GRANT TO ("foo@google.com") @@ -357,13 +384,15 @@ AlterRowAccessPolicyStatement [0-107] AlterActionList [33-107] RevokeFromClause [33-63] GranteeList [46-62] - StringLiteral("bar@google.com") [46-62] + StringLiteral [46-62] + StringLiteralComponent("bar@google.com") [46-62] RenameToClause [65-77] PathExpression [75-77] Identifier(p2) [75-77] GrantToClause [80-107] GranteeList [90-106] - StringLiteral("foo@google.com") [90-106] + StringLiteral [90-106] + StringLiteralComponent("foo@google.com") [90-106] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ("bar@google.com"), RENAME TO p2, GRANT TO ("foo@google.com") -- @@ -376,15 +405,18 @@ AlterRowAccessPolicyStatement [0-119] AlterActionList [33-119] RevokeFromClause [33-63] GranteeList [46-62] - StringLiteral("bar@google.com") [46-62] + StringLiteral [46-62] + StringLiteralComponent("bar@google.com") [46-62] FilterUsingClause [66-90] BinaryExpression(=) [80-89] PathExpression [80-81] Identifier(c) [80-81] - StringLiteral("foo") [84-89] + StringLiteral [84-89] + StringLiteralComponent("foo") [84-89] GrantToClause [92-119] GranteeList [102-118] - StringLiteral("foo@google.com") [102-118] + StringLiteral [102-118] + StringLiteralComponent("foo@google.com") [102-118] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ("bar@google.com"), FILTER USING (c = "foo"), GRANT TO ("foo@google.com") -- @@ -397,10 +429,12 @@ AlterRowAccessPolicyStatement [0-94] AlterActionList [33-94] RevokeFromClause [33-63] GranteeList [46-62] - StringLiteral("bar@google.com") [46-62] + StringLiteral [46-62] + StringLiteralComponent("bar@google.com") [46-62] GrantToClause [67-94] GranteeList [77-93] - StringLiteral("foo@google.com") [77-93] + StringLiteral [77-93] + StringLiteralComponent("foo@google.com") [77-93] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ("bar@google.com"), GRANT TO ("foo@google.com") -- @@ -419,10 +453,12 @@ AlterRowAccessPolicyStatement [0-117] BinaryExpression(=) [78-87] PathExpression [78-79] Identifier(c) [78-79] - StringLiteral("foo") [82-87] + StringLiteral [82-87] + StringLiteralComponent("foo") [82-87] GrantToClause [90-117] GranteeList [100-116] - StringLiteral("foo@google.com") [100-116] + StringLiteral [100-116] + StringLiteralComponent("foo@google.com") [100-116] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ALL, RENAME TO p2, FILTER USING (c = "foo"), GRANT TO ("foo@google.com") -- @@ -439,7 +475,8 @@ AlterRowAccessPolicyStatement [0-92] Identifier(p2) [60-62] GrantToClause [65-92] GranteeList [75-91] - StringLiteral("foo@google.com") [75-91] + StringLiteral [75-91] + StringLiteralComponent("foo@google.com") [75-91] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ALL, RENAME TO p2, GRANT TO ("foo@google.com") -- @@ -455,10 +492,12 @@ AlterRowAccessPolicyStatement [0-104] BinaryExpression(=) [65-74] PathExpression [65-66] Identifier(c) [65-66] - StringLiteral("foo") [69-74] + StringLiteral [69-74] + StringLiteralComponent("foo") [69-74] GrantToClause [77-104] GranteeList [87-103] - StringLiteral("foo@google.com") [87-103] + StringLiteral [87-103] + StringLiteralComponent("foo@google.com") [87-103] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ALL, FILTER USING (c = "foo"), GRANT TO ("foo@google.com") -- @@ -472,7 +511,8 @@ AlterRowAccessPolicyStatement [0-79] RevokeFromClause(is_revoke_from_all) [33-48] GrantToClause [52-79] GranteeList [62-78] - StringLiteral("foo@google.com") [62-78] + StringLiteral [62-78] + StringLiteralComponent("foo@google.com") [62-78] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ALL, GRANT TO ("foo@google.com") -- @@ -490,10 +530,12 @@ AlterRowAccessPolicyStatement [0-101] BinaryExpression(=) [62-71] PathExpression [62-63] Identifier(c) [62-63] - StringLiteral("foo") [66-71] + StringLiteral [66-71] + StringLiteralComponent("foo") [66-71] GrantToClause [74-101] GranteeList [84-100] - StringLiteral("foo@google.com") [84-100] + StringLiteral [84-100] + StringLiteralComponent("foo@google.com") [84-100] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, FILTER USING (c = "foo"), GRANT TO ("foo@google.com") -- @@ -509,7 +551,8 @@ AlterRowAccessPolicyStatement [0-76] Identifier(p2) [44-46] GrantToClause [49-76] GranteeList [59-75] - StringLiteral("foo@google.com") [59-75] + StringLiteral [59-75] + StringLiteralComponent("foo@google.com") [59-75] -- ALTER ROW ACCESS POLICY p1 ON t1 RENAME TO p2, GRANT TO ("foo@google.com") -- @@ -524,10 +567,12 @@ AlterRowAccessPolicyStatement [0-88] BinaryExpression(=) [49-58] PathExpression [49-50] Identifier(c) [49-50] - StringLiteral("foo") [53-58] + StringLiteral [53-58] + StringLiteralComponent("foo") [53-58] GrantToClause [61-88] GranteeList [71-87] - StringLiteral("foo@google.com") [71-87] + StringLiteral [71-87] + StringLiteralComponent("foo@google.com") [71-87] -- ALTER ROW ACCESS POLICY p1 ON t1 FILTER USING (c = "foo"), GRANT TO ("foo@google.com") -- @@ -540,7 +585,8 @@ AlterRowAccessPolicyStatement [0-63] AlterActionList [36-63] GrantToClause [36-63] GranteeList [46-62] - StringLiteral("foo@google.com") [46-62] + StringLiteral [46-62] + StringLiteralComponent("foo@google.com") [46-62] -- ALTER ROW ACCESS POLICY p1 ON t1 GRANT TO ("foo@google.com") == @@ -685,7 +731,8 @@ AlterRowAccessPolicyStatement [0-84] AlterActionList [34-84] GrantToClause [34-61] GranteeList [44-60] - StringLiteral("foo@google.com") [44-60] + StringLiteral [44-60] + StringLiteralComponent("foo@google.com") [44-60] FilterUsingClause [63-84] BinaryExpression(=) [77-83] PathExpression [77-79] @@ -826,8 +873,10 @@ AlterRowAccessPolicyStatement [0-79] AlterActionList [36-79] GrantToClause [36-79] GranteeList [46-78] - StringLiteral('foo@google.com') [46-62] - StringLiteral('mdbgroup/bar') [64-78] + StringLiteral [46-62] + StringLiteralComponent('foo@google.com') [46-62] + StringLiteral [64-78] + StringLiteralComponent('mdbgroup/bar') [64-78] -- ALTER ROW ACCESS POLICY p1 ON n1.t1 GRANT TO ('foo@google.com', 'mdbgroup/bar') == @@ -874,8 +923,10 @@ AlterRowAccessPolicyStatement [0-75] AlterActionList [33-75] GrantToClause [33-75] GranteeList [43-74] - StringLiteral('foo@google.com') [43-59] - StringLiteral("mdbuser/bar") [61-74] + StringLiteral [43-59] + StringLiteralComponent('foo@google.com') [43-59] + StringLiteral [61-74] + StringLiteralComponent("mdbuser/bar") [61-74] -- ALTER ROW ACCESS POLICY p1 ON t1 GRANT TO ('foo@google.com', "mdbuser/bar") == @@ -980,7 +1031,8 @@ AlterRowAccessPolicyStatement [0-57] BinaryExpression(=) [46-56] PathExpression [46-48] Identifier(c1) [46-48] - StringLiteral('foo') [51-56] + StringLiteral [51-56] + StringLiteralComponent('foo') [51-56] -- ALTER ROW ACCESS POLICY p1 ON t1 FILTER USING (c1 = 'foo') == @@ -1011,8 +1063,10 @@ AlterRowAccessPolicyStatement [0-78] AlterActionList [33-78] RevokeFromClause [33-78] GranteeList [46-77] - StringLiteral('foo@google.com') [46-62] - StringLiteral("mdbuser/bar") [64-77] + StringLiteral [46-62] + StringLiteralComponent('foo@google.com') [46-62] + StringLiteral [64-77] + StringLiteralComponent("mdbuser/bar") [64-77] -- ALTER ROW ACCESS POLICY p1 ON t1 REVOKE FROM ('foo@google.com', "mdbuser/bar") == @@ -1194,8 +1248,10 @@ AlterAllRowAccessPoliciesStatement [0-84] Identifier(t1) [33-35] RevokeFromClause [36-84] GranteeList [49-83] - StringLiteral("foo@google.com") [49-65] - StringLiteral("bar@google.com") [67-83] + StringLiteral [49-65] + StringLiteralComponent("foo@google.com") [49-65] + StringLiteral [67-83] + StringLiteralComponent("bar@google.com") [67-83] -- ALTER ALL ROW ACCESS POLICIES ON t1 REVOKE FROM ("foo@google.com", "bar@google.com") == @@ -1222,7 +1278,8 @@ AlterAllRowAccessPoliciesStatement [0-82] Identifier(t1) [49-51] RevokeFromClause [52-82] GranteeList [65-81] - StringLiteral("foo@google.com") [65-81] + StringLiteral [65-81] + StringLiteralComponent("foo@google.com") [65-81] -- ALTER ALL ROW ACCESS POLICIES ON namespace.`all`.t1 REVOKE FROM ("foo@google.com") == diff --git a/zetasql/parser/testdata/alter_set_options.test b/zetasql/parser/testdata/alter_set_options.test index 72dce2800..745069cdd 100644 --- a/zetasql/parser/testdata/alter_set_options.test +++ b/zetasql/parser/testdata/alter_set_options.test @@ -2,7 +2,7 @@ # verified as correct. [default no_show_parse_location_text] -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo set OPTIONS +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo set OPTIONS -- ALTERNATION GROUP: DATABASE -- @@ -16,6 +16,12 @@ ERROR: Syntax error: Expected "(" but got end of statement [at 1:29] ALTER SCHEMA foo set OPTIONS ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Expected "(" but got end of statement [at 1:38] +ALTER EXTERNAL SCHEMA foo set OPTIONS + ^ +-- ALTERNATION GROUP: TABLE -- ERROR: Syntax error: Expected "(" but got end of statement [at 1:28] @@ -48,7 +54,7 @@ ALTER APPROX VIEW foo set OPTIONS ^ == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo set timestamp; +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo set timestamp; -- ALTERNATION GROUP: DATABASE -- @@ -63,6 +69,12 @@ ERROR: Syntax error: Expected keyword AS or keyword DEFAULT or keyword ON or key ALTER SCHEMA foo set timestamp; ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Expected keyword AS or keyword DEFAULT or keyword ON or keyword OPTIONS but got keyword TIMESTAMP [at 1:31] +ALTER EXTERNAL SCHEMA foo set timestamp; + ^ +-- ALTERNATION GROUP: TABLE -- ERROR: Syntax error: Expected keyword AS or keyword DEFAULT or keyword ON or keyword OPTIONS but got keyword TIMESTAMP [at 1:21] @@ -94,7 +106,7 @@ ALTER APPROX VIEW foo set timestamp; ^ == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo drop; +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo drop; -- ALTERNATION GROUP: DATABASE -- @@ -108,6 +120,12 @@ ERROR: Syntax error: Unexpected ";" [at 1:22] ALTER SCHEMA foo drop; ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Unexpected ";" [at 1:31] +ALTER EXTERNAL SCHEMA foo drop; + ^ +-- ALTERNATION GROUP: TABLE -- ERROR: Syntax error: Unexpected ";" [at 1:21] @@ -139,7 +157,7 @@ ALTER APPROX VIEW foo drop; ^ == -ALTER {{AGGREGATE FUNCTION|CONSTANT|DATABASE|EXTERNAL TABLE|FUNCTION|INDEX|MATERIALIZED VIEW|MODEL|PROCEDURE|SCHEMA|TABLE FUNCTION|APPROX VIEW}}; +ALTER {{AGGREGATE FUNCTION|CONSTANT|DATABASE|EXTERNAL TABLE|FUNCTION|INDEX|MATERIALIZED VIEW|MODEL|PROCEDURE|SCHEMA|EXTERNAL SCHEMA|TABLE FUNCTION|APPROX VIEW}}; -- ALTERNATION GROUP: AGGREGATE FUNCTION -- @@ -201,6 +219,12 @@ ERROR: Syntax error: Unexpected ";" [at 1:13] ALTER SCHEMA; ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Unexpected ";" [at 1:22] +ALTER EXTERNAL SCHEMA; + ^ +-- ALTERNATION GROUP: TABLE FUNCTION -- ERROR: Syntax error: Unexpected ";" [at 1:21] @@ -221,7 +245,7 @@ ALTER ; ^ == -ALTER {{Aggregate FUNCTION|CONSTANT|DATABASE|EXTERNAL TABLE|FUNCTION|INDEX|MATERIALIZED VIEW|MODEL|SCHEMA|TABLE|TABLE FUNCTION|APPROX VIEW}} adsdw.foo +ALTER {{Aggregate FUNCTION|CONSTANT|DATABASE|EXTERNAL TABLE|FUNCTION|INDEX|MATERIALIZED VIEW|MODEL|SCHEMA|EXTERNAL SCHEMA|TABLE|TABLE FUNCTION|APPROX VIEW}} adsdw.foo SET OPTIONS (); -- ALTERNATION GROUP: Aggregate FUNCTION @@ -302,6 +326,18 @@ AlterSchemaStatement [0-37] -- ALTER SCHEMA adsdw.foo SET OPTIONS() -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-46] + PathExpression [22-31] + Identifier(adsdw) [22-27] + Identifier(foo) [28-31] + AlterActionList [32-46] + SetOptionsAction [32-46] + OptionsList [44-46] +-- +ALTER EXTERNAL SCHEMA adsdw.foo SET OPTIONS() +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-36] @@ -333,7 +369,7 @@ AlterApproxViewStatement [0-42] ALTER APPROX VIEW adsdw.foo SET OPTIONS() == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW}} adsdw.foo SET OPTIONS (), SET OPTIONS (), SET OPTIONS (); +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW}} adsdw.foo SET OPTIONS (), SET OPTIONS (), SET OPTIONS (); -- ALTERNATION GROUP: DATABASE -- @@ -369,6 +405,23 @@ AlterSchemaStatement [0-69] -- ALTER SCHEMA adsdw.foo SET OPTIONS(), SET OPTIONS(), SET OPTIONS() -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-78] + PathExpression [22-31] + Identifier(adsdw) [22-27] + Identifier(foo) [28-31] + AlterActionList [32-78] + SetOptionsAction [32-46] + OptionsList [44-46] + SetOptionsAction [48-62] + OptionsList [60-62] + SetOptionsAction [64-78] + OptionsList [76-78] + +-- +ALTER EXTERNAL SCHEMA adsdw.foo SET OPTIONS(), SET OPTIONS(), SET OPTIONS() +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-68] @@ -419,7 +472,7 @@ AlterMaterializedViewStatement [0-80] ALTER MATERIALIZED VIEW adsdw.foo SET OPTIONS(), SET OPTIONS(), SET OPTIONS() == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|APPROX VIEW}} IF EXISTS adsdw.foo SET OPTIONS (); +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|APPROX VIEW}} IF EXISTS adsdw.foo SET OPTIONS (); -- ALTERNATION GROUP: DATABASE -- @@ -445,6 +498,18 @@ AlterSchemaStatement(is_if_exists) [0-47] -- ALTER SCHEMA IF EXISTS adsdw.foo SET OPTIONS() -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement(is_if_exists) [0-56] + PathExpression [32-41] + Identifier(adsdw) [32-37] + Identifier(foo) [38-41] + AlterActionList [42-56] + SetOptionsAction [42-56] + OptionsList [54-56] +-- +ALTER EXTERNAL SCHEMA IF EXISTS adsdw.foo SET OPTIONS() +-- ALTERNATION GROUP: TABLE -- AlterTableStatement(is_if_exists) [0-46] @@ -494,7 +559,7 @@ AlterApproxViewStatement(is_if_exists) [0-52] ALTER APPROX VIEW IF EXISTS adsdw.foo SET OPTIONS() == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} adsdw.foo SET OPTIONS ( +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} adsdw.foo SET OPTIONS ( quota_accounting_owner='adsdw-etl@google.com'); -- ALTERNATION GROUP: DATABASE @@ -508,7 +573,8 @@ AlterDatabaseStatement [0-87] OptionsList [37-87] OptionsEntry [41-86] Identifier(quota_accounting_owner) [41-63] - StringLiteral('adsdw-etl@google.com') [64-86] + StringLiteral [64-86] + StringLiteralComponent('adsdw-etl@google.com') [64-86] -- ALTER DATABASE adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') @@ -524,11 +590,29 @@ AlterSchemaStatement [0-85] OptionsList [35-85] OptionsEntry [39-84] Identifier(quota_accounting_owner) [39-61] - StringLiteral('adsdw-etl@google.com') [62-84] + StringLiteral [62-84] + StringLiteralComponent('adsdw-etl@google.com') [62-84] -- ALTER SCHEMA adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-94] + PathExpression [22-31] + Identifier(adsdw) [22-27] + Identifier(foo) [28-31] + AlterActionList [32-94] + SetOptionsAction [32-94] + OptionsList [44-94] + OptionsEntry [48-93] + Identifier(quota_accounting_owner) [48-70] + StringLiteral [71-93] + StringLiteralComponent('adsdw-etl@google.com') [71-93] + +-- +ALTER EXTERNAL SCHEMA adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-84] @@ -540,7 +624,8 @@ AlterTableStatement [0-84] OptionsList [34-84] OptionsEntry [38-83] Identifier(quota_accounting_owner) [38-60] - StringLiteral('adsdw-etl@google.com') [61-83] + StringLiteral [61-83] + StringLiteralComponent('adsdw-etl@google.com') [61-83] -- ALTER TABLE adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') @@ -556,7 +641,8 @@ AlterViewStatement [0-83] OptionsList [33-83] OptionsEntry [37-82] Identifier(quota_accounting_owner) [37-59] - StringLiteral('adsdw-etl@google.com') [60-82] + StringLiteral [60-82] + StringLiteralComponent('adsdw-etl@google.com') [60-82] -- ALTER VIEW adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') @@ -572,7 +658,8 @@ AlterMaterializedViewStatement [0-96] OptionsList [46-96] OptionsEntry [50-95] Identifier(quota_accounting_owner) [50-72] - StringLiteral('adsdw-etl@google.com') [73-95] + StringLiteral [73-95] + StringLiteralComponent('adsdw-etl@google.com') [73-95] -- ALTER MATERIALIZED VIEW adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') -- @@ -587,7 +674,8 @@ AlterModelStatement [0-84] OptionsList [34-84] OptionsEntry [38-83] Identifier(quota_accounting_owner) [38-60] - StringLiteral('adsdw-etl@google.com') [61-83] + StringLiteral [61-83] + StringLiteralComponent('adsdw-etl@google.com') [61-83] -- ALTER MODEL adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') -- @@ -602,12 +690,13 @@ AlterApproxViewStatement [0-90] OptionsList [40-90] OptionsEntry [44-89] Identifier(quota_accounting_owner) [44-66] - StringLiteral('adsdw-etl@google.com') [67-89] + StringLiteral [67-89] + StringLiteralComponent('adsdw-etl@google.com') [67-89] -- ALTER APPROX VIEW adsdw.foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com') == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} adsdw.foo SET OPTIONS (ttl_seconds=3600); +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} adsdw.foo SET OPTIONS (ttl_seconds=3600); -- ALTERNATION GROUP: DATABASE -- @@ -639,6 +728,21 @@ AlterSchemaStatement [0-53] -- ALTER SCHEMA adsdw.foo SET OPTIONS(ttl_seconds = 3600) -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-62] + PathExpression [22-31] + Identifier(adsdw) [22-27] + Identifier(foo) [28-31] + AlterActionList [32-62] + SetOptionsAction [32-62] + OptionsList [44-62] + OptionsEntry [45-61] + Identifier(ttl_seconds) [45-56] + IntLiteral(3600) [57-61] +-- +ALTER EXTERNAL SCHEMA adsdw.foo SET OPTIONS(ttl_seconds = 3600) +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-52] @@ -715,7 +819,7 @@ AlterApproxViewStatement [0-58] ALTER APPROX VIEW adsdw.foo SET OPTIONS(ttl_seconds = 3600) == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} adsdw.foo SET OPTIONS (ttl_seconds=NULL); +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} adsdw.foo SET OPTIONS (ttl_seconds=NULL); -- ALTERNATION GROUP: DATABASE -- @@ -747,6 +851,21 @@ AlterSchemaStatement [0-53] -- ALTER SCHEMA adsdw.foo SET OPTIONS(ttl_seconds = NULL) -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-62] + PathExpression [22-31] + Identifier(adsdw) [22-27] + Identifier(foo) [28-31] + AlterActionList [32-62] + SetOptionsAction [32-62] + OptionsList [44-62] + OptionsEntry [45-61] + Identifier(ttl_seconds) [45-56] + NullLiteral(NULL) [57-61] +-- +ALTER EXTERNAL SCHEMA adsdw.foo SET OPTIONS(ttl_seconds = NULL) +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-52] @@ -823,7 +942,7 @@ AlterApproxViewStatement [0-58] ALTER APPROX VIEW adsdw.foo SET OPTIONS(ttl_seconds = NULL) == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo SET OPTIONS ( +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL|APPROX VIEW}} foo SET OPTIONS ( quota_accounting_owner = 'adsdw-etl@google.com', ttl_seconds=3600); -- ALTERNATION GROUP: DATABASE @@ -836,7 +955,8 @@ AlterDatabaseStatement [0-101] OptionsList [31-101] OptionsEntry [35-82] Identifier(quota_accounting_owner) [35-57] - StringLiteral('adsdw-etl@google.com') [60-82] + StringLiteral [60-82] + StringLiteralComponent('adsdw-etl@google.com') [60-82] OptionsEntry [84-100] Identifier(ttl_seconds) [84-95] IntLiteral(3600) [96-100] @@ -853,13 +973,33 @@ AlterSchemaStatement [0-99] OptionsList [29-99] OptionsEntry [33-80] Identifier(quota_accounting_owner) [33-55] - StringLiteral('adsdw-etl@google.com') [58-80] + StringLiteral [58-80] + StringLiteralComponent('adsdw-etl@google.com') [58-80] OptionsEntry [82-98] Identifier(ttl_seconds) [82-93] IntLiteral(3600) [94-98] -- ALTER SCHEMA foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', ttl_seconds = 3600) -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-108] + PathExpression [22-25] + Identifier(foo) [22-25] + AlterActionList [26-108] + SetOptionsAction [26-108] + OptionsList [38-108] + OptionsEntry [42-89] + Identifier(quota_accounting_owner) [42-64] + StringLiteral [67-89] + StringLiteralComponent('adsdw-etl@google.com') [67-89] + OptionsEntry [91-107] + Identifier(ttl_seconds) [91-102] + IntLiteral(3600) [103-107] +-- +ALTER EXTERNAL SCHEMA foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', ttl_seconds = + 3600) +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-98] @@ -870,7 +1010,8 @@ AlterTableStatement [0-98] OptionsList [28-98] OptionsEntry [32-79] Identifier(quota_accounting_owner) [32-54] - StringLiteral('adsdw-etl@google.com') [57-79] + StringLiteral [57-79] + StringLiteralComponent('adsdw-etl@google.com') [57-79] OptionsEntry [81-97] Identifier(ttl_seconds) [81-92] IntLiteral(3600) [93-97] @@ -887,7 +1028,8 @@ AlterViewStatement [0-97] OptionsList [27-97] OptionsEntry [31-78] Identifier(quota_accounting_owner) [31-53] - StringLiteral('adsdw-etl@google.com') [56-78] + StringLiteral [56-78] + StringLiteralComponent('adsdw-etl@google.com') [56-78] OptionsEntry [80-96] Identifier(ttl_seconds) [80-91] IntLiteral(3600) [92-96] @@ -904,7 +1046,8 @@ AlterMaterializedViewStatement [0-110] OptionsList [40-110] OptionsEntry [44-91] Identifier(quota_accounting_owner) [44-66] - StringLiteral('adsdw-etl@google.com') [69-91] + StringLiteral [69-91] + StringLiteralComponent('adsdw-etl@google.com') [69-91] OptionsEntry [93-109] Identifier(ttl_seconds) [93-104] IntLiteral(3600) [105-109] @@ -922,7 +1065,8 @@ AlterModelStatement [0-98] OptionsList [28-98] OptionsEntry [32-79] Identifier(quota_accounting_owner) [32-54] - StringLiteral('adsdw-etl@google.com') [57-79] + StringLiteral [57-79] + StringLiteralComponent('adsdw-etl@google.com') [57-79] OptionsEntry [81-97] Identifier(ttl_seconds) [81-92] IntLiteral(3600) [93-97] @@ -939,7 +1083,8 @@ AlterApproxViewStatement [0-104] OptionsList [34-104] OptionsEntry [38-85] Identifier(quota_accounting_owner) [38-60] - StringLiteral('adsdw-etl@google.com') [63-85] + StringLiteral [63-85] + StringLiteralComponent('adsdw-etl@google.com') [63-85] OptionsEntry [87-103] Identifier(ttl_seconds) [87-98] IntLiteral(3600) [99-103] @@ -947,7 +1092,7 @@ AlterApproxViewStatement [0-104] ALTER APPROX VIEW foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', ttl_seconds = 3600) == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW}} foo SET OPTIONS ( +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW}} foo SET OPTIONS ( quota_accounting_owner = 'adsdw-etl@google.com', quota_accounting_owner = 'adsdw-etl@google.com'); -- @@ -961,10 +1106,12 @@ AlterDatabaseStatement [0-134] OptionsList [31-134] OptionsEntry [35-82] Identifier(quota_accounting_owner) [35-57] - StringLiteral('adsdw-etl@google.com') [60-82] + StringLiteral [60-82] + StringLiteralComponent('adsdw-etl@google.com') [60-82] OptionsEntry [86-133] Identifier(quota_accounting_owner) [86-108] - StringLiteral('adsdw-etl@google.com') [111-133] + StringLiteral [111-133] + StringLiteralComponent('adsdw-etl@google.com') [111-133] -- ALTER DATABASE foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', quota_accounting_owner = 'adsdw-etl@google.com') @@ -979,14 +1126,36 @@ AlterSchemaStatement [0-132] OptionsList [29-132] OptionsEntry [33-80] Identifier(quota_accounting_owner) [33-55] - StringLiteral('adsdw-etl@google.com') [58-80] + StringLiteral [58-80] + StringLiteralComponent('adsdw-etl@google.com') [58-80] OptionsEntry [84-131] Identifier(quota_accounting_owner) [84-106] - StringLiteral('adsdw-etl@google.com') [109-131] + StringLiteral [109-131] + StringLiteralComponent('adsdw-etl@google.com') [109-131] -- ALTER SCHEMA foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', quota_accounting_owner = 'adsdw-etl@google.com') -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-141] + PathExpression [22-25] + Identifier(foo) [22-25] + AlterActionList [26-141] + SetOptionsAction [26-141] + OptionsList [38-141] + OptionsEntry [42-89] + Identifier(quota_accounting_owner) [42-64] + StringLiteral [67-89] + StringLiteralComponent('adsdw-etl@google.com') [67-89] + OptionsEntry [93-140] + Identifier(quota_accounting_owner) [93-115] + StringLiteral [118-140] + StringLiteralComponent('adsdw-etl@google.com') [118-140] +-- +ALTER EXTERNAL SCHEMA foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', quota_accounting_owner = + 'adsdw-etl@google.com') +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-131] @@ -997,10 +1166,12 @@ AlterTableStatement [0-131] OptionsList [28-131] OptionsEntry [32-79] Identifier(quota_accounting_owner) [32-54] - StringLiteral('adsdw-etl@google.com') [57-79] + StringLiteral [57-79] + StringLiteralComponent('adsdw-etl@google.com') [57-79] OptionsEntry [83-130] Identifier(quota_accounting_owner) [83-105] - StringLiteral('adsdw-etl@google.com') [108-130] + StringLiteral [108-130] + StringLiteralComponent('adsdw-etl@google.com') [108-130] -- ALTER TABLE foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', quota_accounting_owner = 'adsdw-etl@google.com') @@ -1015,10 +1186,12 @@ AlterViewStatement [0-130] OptionsList [27-130] OptionsEntry [31-78] Identifier(quota_accounting_owner) [31-53] - StringLiteral('adsdw-etl@google.com') [56-78] + StringLiteral [56-78] + StringLiteralComponent('adsdw-etl@google.com') [56-78] OptionsEntry [82-129] Identifier(quota_accounting_owner) [82-104] - StringLiteral('adsdw-etl@google.com') [107-129] + StringLiteral [107-129] + StringLiteralComponent('adsdw-etl@google.com') [107-129] -- ALTER VIEW foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', quota_accounting_owner = 'adsdw-etl@google.com') @@ -1033,16 +1206,18 @@ AlterMaterializedViewStatement [0-143] OptionsList [40-143] OptionsEntry [44-91] Identifier(quota_accounting_owner) [44-66] - StringLiteral('adsdw-etl@google.com') [69-91] + StringLiteral [69-91] + StringLiteralComponent('adsdw-etl@google.com') [69-91] OptionsEntry [95-142] Identifier(quota_accounting_owner) [95-117] - StringLiteral('adsdw-etl@google.com') [120-142] + StringLiteral [120-142] + StringLiteralComponent('adsdw-etl@google.com') [120-142] -- ALTER MATERIALIZED VIEW foo SET OPTIONS(quota_accounting_owner = 'adsdw-etl@google.com', quota_accounting_owner = 'adsdw-etl@google.com') == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo SET OPTIONS (a = 'a', b = 1), SET OPTIONS (c = 'c'), SET OPTIONS (a = 'a', b = 1) @@ -1058,7 +1233,8 @@ AlterDatabaseStatement [0-100] OptionsList [31-47] OptionsEntry [32-39] Identifier(a) [32-33] - StringLiteral('a') [36-39] + StringLiteral [36-39] + StringLiteralComponent('a') [36-39] OptionsEntry [41-46] Identifier(b) [41-42] IntLiteral(1) [45-46] @@ -1066,12 +1242,14 @@ AlterDatabaseStatement [0-100] OptionsList [61-70] OptionsEntry [62-69] Identifier(c) [62-63] - StringLiteral('c') [66-69] + StringLiteral [66-69] + StringLiteralComponent('c') [66-69] SetOptionsAction [72-100] OptionsList [84-100] OptionsEntry [85-92] Identifier(a) [85-86] - StringLiteral('a') [89-92] + StringLiteral [89-92] + StringLiteralComponent('a') [89-92] OptionsEntry [94-99] Identifier(b) [94-95] IntLiteral(1) [98-99] @@ -1088,7 +1266,8 @@ AlterSchemaStatement [0-98] OptionsList [29-45] OptionsEntry [30-37] Identifier(a) [30-31] - StringLiteral('a') [34-37] + StringLiteral [34-37] + StringLiteralComponent('a') [34-37] OptionsEntry [39-44] Identifier(b) [39-40] IntLiteral(1) [43-44] @@ -1096,18 +1275,54 @@ AlterSchemaStatement [0-98] OptionsList [59-68] OptionsEntry [60-67] Identifier(c) [60-61] - StringLiteral('c') [64-67] + StringLiteral [64-67] + StringLiteralComponent('c') [64-67] SetOptionsAction [70-98] OptionsList [82-98] OptionsEntry [83-90] Identifier(a) [83-84] - StringLiteral('a') [87-90] + StringLiteral [87-90] + StringLiteralComponent('a') [87-90] OptionsEntry [92-97] Identifier(b) [92-93] IntLiteral(1) [96-97] -- ALTER SCHEMA foo SET OPTIONS(a = 'a', b = 1), SET OPTIONS(c = 'c'), SET OPTIONS(a = 'a', b = 1) -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +AlterExternalSchemaStatement [0-107] + PathExpression [22-25] + Identifier(foo) [22-25] + AlterActionList [26-107] + SetOptionsAction [26-54] + OptionsList [38-54] + OptionsEntry [39-46] + Identifier(a) [39-40] + StringLiteral [43-46] + StringLiteralComponent('a') [43-46] + OptionsEntry [48-53] + Identifier(b) [48-49] + IntLiteral(1) [52-53] + SetOptionsAction [56-77] + OptionsList [68-77] + OptionsEntry [69-76] + Identifier(c) [69-70] + StringLiteral [73-76] + StringLiteralComponent('c') [73-76] + SetOptionsAction [79-107] + OptionsList [91-107] + OptionsEntry [92-99] + Identifier(a) [92-93] + StringLiteral [96-99] + StringLiteralComponent('a') [96-99] + OptionsEntry [101-106] + Identifier(b) [101-102] + IntLiteral(1) [105-106] +-- +ALTER EXTERNAL SCHEMA foo SET OPTIONS(a = 'a', b = 1), SET OPTIONS(c = 'c'), SET OPTIONS(a = 'a', b = + 1) +-- ALTERNATION GROUP: TABLE -- AlterTableStatement [0-97] @@ -1118,7 +1333,8 @@ AlterTableStatement [0-97] OptionsList [28-44] OptionsEntry [29-36] Identifier(a) [29-30] - StringLiteral('a') [33-36] + StringLiteral [33-36] + StringLiteralComponent('a') [33-36] OptionsEntry [38-43] Identifier(b) [38-39] IntLiteral(1) [42-43] @@ -1126,12 +1342,14 @@ AlterTableStatement [0-97] OptionsList [58-67] OptionsEntry [59-66] Identifier(c) [59-60] - StringLiteral('c') [63-66] + StringLiteral [63-66] + StringLiteralComponent('c') [63-66] SetOptionsAction [69-97] OptionsList [81-97] OptionsEntry [82-89] Identifier(a) [82-83] - StringLiteral('a') [86-89] + StringLiteral [86-89] + StringLiteralComponent('a') [86-89] OptionsEntry [91-96] Identifier(b) [91-92] IntLiteral(1) [95-96] @@ -1148,7 +1366,8 @@ AlterViewStatement [0-96] OptionsList [27-43] OptionsEntry [28-35] Identifier(a) [28-29] - StringLiteral('a') [32-35] + StringLiteral [32-35] + StringLiteralComponent('a') [32-35] OptionsEntry [37-42] Identifier(b) [37-38] IntLiteral(1) [41-42] @@ -1156,12 +1375,14 @@ AlterViewStatement [0-96] OptionsList [57-66] OptionsEntry [58-65] Identifier(c) [58-59] - StringLiteral('c') [62-65] + StringLiteral [62-65] + StringLiteralComponent('c') [62-65] SetOptionsAction [68-96] OptionsList [80-96] OptionsEntry [81-88] Identifier(a) [81-82] - StringLiteral('a') [85-88] + StringLiteral [85-88] + StringLiteralComponent('a') [85-88] OptionsEntry [90-95] Identifier(b) [90-91] IntLiteral(1) [94-95] @@ -1178,7 +1399,8 @@ AlterMaterializedViewStatement [0-109] OptionsList [40-56] OptionsEntry [41-48] Identifier(a) [41-42] - StringLiteral('a') [45-48] + StringLiteral [45-48] + StringLiteralComponent('a') [45-48] OptionsEntry [50-55] Identifier(b) [50-51] IntLiteral(1) [54-55] @@ -1186,12 +1408,14 @@ AlterMaterializedViewStatement [0-109] OptionsList [70-79] OptionsEntry [71-78] Identifier(c) [71-72] - StringLiteral('c') [75-78] + StringLiteral [75-78] + StringLiteralComponent('c') [75-78] SetOptionsAction [81-109] OptionsList [93-109] OptionsEntry [94-101] Identifier(a) [94-95] - StringLiteral('a') [98-101] + StringLiteral [98-101] + StringLiteralComponent('a') [98-101] OptionsEntry [103-108] Identifier(b) [103-104] IntLiteral(1) [107-108] @@ -1209,7 +1433,8 @@ AlterModelStatement [0-97] OptionsList [28-44] OptionsEntry [29-36] Identifier(a) [29-30] - StringLiteral('a') [33-36] + StringLiteral [33-36] + StringLiteralComponent('a') [33-36] OptionsEntry [38-43] Identifier(b) [38-39] IntLiteral(1) [42-43] @@ -1217,12 +1442,14 @@ AlterModelStatement [0-97] OptionsList [58-67] OptionsEntry [59-66] Identifier(c) [59-60] - StringLiteral('c') [63-66] + StringLiteral [63-66] + StringLiteralComponent('c') [63-66] SetOptionsAction [69-97] OptionsList [81-97] OptionsEntry [82-89] Identifier(a) [82-83] - StringLiteral('a') [86-89] + StringLiteral [86-89] + StringLiteralComponent('a') [86-89] OptionsEntry [91-96] Identifier(b) [91-92] IntLiteral(1) [95-96] @@ -1230,7 +1457,7 @@ AlterModelStatement [0-97] ALTER MODEL foo SET OPTIONS(a = 'a', b = 1), SET OPTIONS(c = 'c'), SET OPTIONS(a = 'a', b = 1) == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} -- ALTERNATION GROUP: DATABASE -- @@ -1244,6 +1471,12 @@ ERROR: Syntax error: Unexpected end of statement [at 1:13] ALTER SCHEMA ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Unexpected end of statement [at 1:22] +ALTER EXTERNAL SCHEMA + ^ +-- ALTERNATION GROUP: TABLE -- ERROR: Syntax error: Unexpected end of statement [at 1:12] @@ -1269,7 +1502,7 @@ ALTER MODEL ^ == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo; +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo; -- ALTERNATION GROUP: DATABASE -- @@ -1283,6 +1516,12 @@ ERROR: Syntax error: Unexpected ";" [at 1:17] ALTER SCHEMA foo; ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Unexpected ";" [at 1:26] +ALTER EXTERNAL SCHEMA foo; + ^ +-- ALTERNATION GROUP: TABLE -- ERROR: Syntax error: Unexpected ";" [at 1:16] @@ -1308,7 +1547,7 @@ ALTER MODEL foo; ^ == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo SET OPTIONS (a = 'a', b = 1), +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo SET OPTIONS (a = 'a', b = 1), -- ALTERNATION GROUP: DATABASE -- @@ -1322,6 +1561,12 @@ ERROR: Syntax error: Unexpected end of statement [at 1:47] ALTER SCHEMA foo SET OPTIONS (a = 'a', b = 1), ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Unexpected end of statement [at 1:56] +ALTER EXTERNAL SCHEMA foo SET OPTIONS (a = 'a', b = 1), + ^ +-- ALTERNATION GROUP: TABLE -- ERROR: Syntax error: Unexpected end of statement [at 1:46] @@ -1347,7 +1592,7 @@ ALTER MODEL foo SET OPTIONS (a = 'a', b = 1), ^ == -ALTER {{DATABASE|SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo SET OPTIONS (a = 'a', b = 1), SET +ALTER {{DATABASE|SCHEMA|EXTERNAL SCHEMA|TABLE|VIEW|MATERIALIZED VIEW|MODEL}} foo SET OPTIONS (a = 'a', b = 1), SET -- ALTERNATION GROUP: DATABASE -- @@ -1361,6 +1606,12 @@ ERROR: Syntax error: Expected keyword AS or keyword DEFAULT or keyword ON or key ALTER SCHEMA foo SET OPTIONS (a = 'a', b = 1), SET ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA +-- +ERROR: Syntax error: Expected keyword AS or keyword DEFAULT or keyword ON or keyword OPTIONS but got end of statement [at 1:60] +ALTER EXTERNAL SCHEMA foo SET OPTIONS (a = 'a', b = 1), SET + ^ +-- ALTERNATION GROUP: TABLE -- ERROR: Syntax error: Expected keyword AS or keyword DEFAULT or keyword ON or keyword OPTIONS but got end of statement [at 1:50] diff --git a/zetasql/parser/testdata/alter_table_add_column.test b/zetasql/parser/testdata/alter_table_add_column.test index f80e45e81..59b660771 100644 --- a/zetasql/parser/testdata/alter_table_add_column.test +++ b/zetasql/parser/testdata/alter_table_add_column.test @@ -387,7 +387,8 @@ AlterTableStatement [0-61] PathExpression [34-40] Identifier(STRING) [34-40] Collate [41-61] - StringLiteral('unicode:ci') [49-61] + StringLiteral [49-61] + StringLiteralComponent('unicode:ci') [49-61] -- ALTER TABLE foo ADD COLUMN column STRING COLLATE 'unicode:ci' == @@ -473,7 +474,8 @@ AlterTableStatement [0-62] SimpleColumnSchema [31-62] PathExpression [31-37] Identifier(STRING) [31-37] - StringLiteral("default string") [46-62] + StringLiteral [46-62] + StringLiteralComponent("default string") [46-62] -- ALTER TABLE foo ADD COLUMN bar STRING DEFAULT "default string" == diff --git a/zetasql/parser/testdata/analytic_functions.test b/zetasql/parser/testdata/analytic_functions.test index 6214b81ff..69b727eb3 100644 --- a/zetasql/parser/testdata/analytic_functions.test +++ b/zetasql/parser/testdata/analytic_functions.test @@ -1480,7 +1480,8 @@ QueryStatement [0-63] PathExpression [26-27] Identifier(k) [26-27] Collate [28-55] - StringLiteral("latin1_german2_ci") [36-55] + StringLiteral [36-55] + StringLiteralComponent("latin1_german2_ci") [36-55] FromClause [57-63] TablePathExpression [62-63] PathExpression [62-63] diff --git a/zetasql/parser/testdata/arrays.test b/zetasql/parser/testdata/arrays.test index 1337a1685..565554ba4 100644 --- a/zetasql/parser/testdata/arrays.test +++ b/zetasql/parser/testdata/arrays.test @@ -107,12 +107,14 @@ QueryStatement [0-47] Select [0-47] SelectList [7-47] SelectColumn [7-10] - StringLiteral("a") [7-10] + StringLiteral [7-10] + StringLiteralComponent("a") [7-10] SelectColumn [12-13] IntLiteral(3) [12-13] SelectColumn [15-23] ArrayConstructor [15-23] - StringLiteral("a") [16-19] + StringLiteral [16-19] + StringLiteralComponent("a") [16-19] IntLiteral(3) [21-22] SelectColumn [25-47] ArrayConstructor [25-47] @@ -121,7 +123,8 @@ QueryStatement [0-47] PathExpression [31-37] Identifier(string) [31-37] IntLiteral(3) [40-41] - StringLiteral("a") [43-46] + StringLiteral [43-46] + StringLiteralComponent("a") [43-46] -- SELECT "a", @@ -551,7 +554,8 @@ QueryStatement [0-29] PathExpression [14-15] Identifier(a) [14-15] Location [15-16] - StringLiteral('x') [16-19] + StringLiteral [16-19] + StringLiteralComponent('x') [16-19] SelectColumn [22-29] ArrayElement [22-29] PathExpression [22-23] diff --git a/zetasql/parser/testdata/assert.test b/zetasql/parser/testdata/assert.test index 85526952e..176533e3a 100644 --- a/zetasql/parser/testdata/assert.test +++ b/zetasql/parser/testdata/assert.test @@ -53,7 +53,8 @@ AssertStatement [0-38] SelectColumn [15-16] IntLiteral(1) [15-16] IntLiteral(1) [20-21] - StringLiteral("simple test") [25-38] + StringLiteral [25-38] + StringLiteralComponent("simple test") [25-38] -- ASSERT( SELECT @@ -85,7 +86,8 @@ AssertStatement [0-33] ParameterExpr [7-13] Identifier(param) [8-13] IntLiteral(1) [16-17] - StringLiteral("param test") [21-33] + StringLiteral [21-33] + StringLiteralComponent("param test") [21-33] -- ASSERT @param = 1 AS "param test" == @@ -98,7 +100,8 @@ AssertStatement [0-36] PathExpression [9-15] Identifier(sysvar) [9-15] IntLiteral(1) [18-19] - StringLiteral("sysvar test") [23-36] + StringLiteral [23-36] + StringLiteralComponent("sysvar test") [23-36] -- ASSERT @@sysvar = 1 AS "sysvar test" == @@ -107,11 +110,14 @@ ASSERT "123" IN ("123", "456"); -- AssertStatement [0-30] InExpression(IN) [7-30] - StringLiteral("123") [7-12] + StringLiteral [7-12] + StringLiteralComponent("123") [7-12] Location [13-15] InList [17-29] - StringLiteral("123") [17-22] - StringLiteral("456") [24-29] + StringLiteral [17-22] + StringLiteralComponent("123") [17-22] + StringLiteral [24-29] + StringLiteralComponent("456") [24-29] -- ASSERT "123" IN ("123", "456") == @@ -129,9 +135,12 @@ AssertStatement [0-59] FunctionCall [24-50] PathExpression [24-33] Identifier(ENDS_WITH) [24-33] - StringLiteral("suffix") [34-42] - StringLiteral("fix") [44-49] - StringLiteral("abc") [54-59] + StringLiteral [34-42] + StringLiteralComponent("suffix") [34-42] + StringLiteral [44-49] + StringLiteralComponent("fix") [44-49] + StringLiteral [54-59] + StringLiteralComponent("abc") [54-59] -- ASSERT IS_NAN(NULL) AND ENDS_WITH("suffix", "fix") AS "abc" -- @@ -146,9 +155,12 @@ AssertStatement [0-58] FunctionCall [23-49] PathExpression [23-32] Identifier(ENDS_WITH) [23-32] - StringLiteral("suffix") [33-41] - StringLiteral("fix") [43-48] - StringLiteral("abc") [53-58] + StringLiteral [33-41] + StringLiteralComponent("suffix") [33-41] + StringLiteral [43-48] + StringLiteralComponent("fix") [43-48] + StringLiteral [53-58] + StringLiteralComponent("abc") [53-58] -- ASSERT IS_NAN(NULL) OR ENDS_WITH("suffix", "fix") AS "abc" == @@ -157,7 +169,8 @@ ASSERT "123" IS NOT NULL; -- AssertStatement [0-24] BinaryExpression(IS NOT) [7-24] - StringLiteral("123") [7-12] + StringLiteral [7-12] + StringLiteralComponent("123") [7-12] NullLiteral(NULL) [20-24] -- ASSERT "123" IS NOT NULL diff --git a/zetasql/parser/testdata/begin.test b/zetasql/parser/testdata/begin.test index 7dd957e48..d9fac6b59 100644 --- a/zetasql/parser/testdata/begin.test +++ b/zetasql/parser/testdata/begin.test @@ -20,8 +20,8 @@ END begin end; -- -Script [0-11] - StatementList [0-11] +Script [0-10] + StatementList [0-10] BeginEndBlock [0-9] StatementList [5-5] -- @@ -36,10 +36,10 @@ begin select 4; end; -- -Script [0-35] - StatementList [0-35] +Script [0-34] + StatementList [0-34] BeginEndBlock [0-33] - StatementList [8-30] + StatementList [8-29] QueryStatement [8-16] Query [8-16] Select [8-16] @@ -73,12 +73,12 @@ begin select 5; end; -- -Script [0-66] - StatementList [0-66] +Script [0-65] + StatementList [0-65] BeginEndBlock [0-64] - StatementList [8-61] + StatementList [8-60] BeginEndBlock [8-47] - StatementList [18-44] + StatementList [18-41] QueryStatement [18-26] Query [18-26] Select [18-26] @@ -130,9 +130,9 @@ end; begin select 3; -- -ERROR: Syntax error: Expected keyword END but got end of script [at 3:1] - -^ +ERROR: Syntax error: Expected keyword END but got end of script [at 2:12] + select 3; + ^ == # Error (extra semi-colon after BEGIN) diff --git a/zetasql/parser/testdata/between.test b/zetasql/parser/testdata/between.test index d895e59db..81b13440f 100644 --- a/zetasql/parser/testdata/between.test +++ b/zetasql/parser/testdata/between.test @@ -189,8 +189,10 @@ QueryStatement [0-48] Identifier(T) [22-23] Identifier(name) [24-28] Location [29-36] - StringLiteral('A') [37-40] - StringLiteral('B') [45-48] + StringLiteral [37-40] + StringLiteralComponent('A') [37-40] + StringLiteral [45-48] + StringLiteralComponent('B') [45-48] -- SELECT * diff --git a/zetasql/parser/testdata/break_continue.test b/zetasql/parser/testdata/break_continue.test index 06aa4461f..c5d36b873 100644 --- a/zetasql/parser/testdata/break_continue.test +++ b/zetasql/parser/testdata/break_continue.test @@ -6,16 +6,16 @@ -- ALTERNATION GROUP: BREAK -- -Script [0-7] - StatementList [0-7] +Script [0-6] + StatementList [0-6] BreakStatement [0-5] -- BREAK ; -- ALTERNATION GROUP: LEAVE -- -Script [0-7] - StatementList [0-7] +Script [0-6] + StatementList [0-6] BreakStatement [0-5] -- LEAVE ; @@ -25,16 +25,16 @@ LEAVE ; -- ALTERNATION GROUP: CONTINUE -- -Script [0-10] - StatementList [0-10] +Script [0-9] + StatementList [0-9] ContinueStatement [0-8] -- CONTINUE ; -- ALTERNATION GROUP: ITERATE -- -Script [0-9] - StatementList [0-9] +Script [0-8] + StatementList [0-8] ContinueStatement [0-7] -- ITERATE ; @@ -46,10 +46,10 @@ END LOOP; -- ALTERNATION GROUP: BREAK -- -Script [0-24] - StatementList [0-24] +Script [0-23] + StatementList [0-23] WhileStatement [0-22] - StatementList [7-14] + StatementList [7-13] BreakStatement [7-12] -- LOOP @@ -58,10 +58,10 @@ END LOOP ; -- ALTERNATION GROUP: LEAVE -- -Script [0-24] - StatementList [0-24] +Script [0-23] + StatementList [0-23] WhileStatement [0-22] - StatementList [7-14] + StatementList [7-13] BreakStatement [7-12] -- LOOP @@ -75,10 +75,10 @@ END LOOP; -- ALTERNATION GROUP: CONTINUE -- -Script [0-27] - StatementList [0-27] +Script [0-26] + StatementList [0-26] WhileStatement [0-25] - StatementList [7-17] + StatementList [7-16] ContinueStatement [7-15] -- LOOP @@ -87,10 +87,10 @@ END LOOP ; -- ALTERNATION GROUP: ITERATE -- -Script [0-26] - StatementList [0-26] +Script [0-25] + StatementList [0-25] WhileStatement [0-24] - StatementList [7-16] + StatementList [7-15] ContinueStatement [7-14] -- LOOP @@ -104,11 +104,11 @@ END WHILE; -- ALTERNATION GROUP: BREAK -- -Script [0-34] - StatementList [0-34] +Script [0-33] + StatementList [0-33] WhileStatement [0-32] BooleanLiteral(TRUE) [6-10] - StatementList [16-23] + StatementList [16-22] BreakStatement [16-21] -- WHILE TRUE DO @@ -117,11 +117,11 @@ END WHILE ; -- ALTERNATION GROUP: LEAVE -- -Script [0-34] - StatementList [0-34] +Script [0-33] + StatementList [0-33] WhileStatement [0-32] BooleanLiteral(TRUE) [6-10] - StatementList [16-23] + StatementList [16-22] BreakStatement [16-21] -- WHILE TRUE DO @@ -135,11 +135,11 @@ END WHILE; -- ALTERNATION GROUP: CONTINUE -- -Script [0-37] - StatementList [0-37] +Script [0-36] + StatementList [0-36] WhileStatement [0-35] BooleanLiteral(TRUE) [6-10] - StatementList [16-26] + StatementList [16-25] ContinueStatement [16-24] -- WHILE TRUE DO @@ -148,11 +148,11 @@ END WHILE ; -- ALTERNATION GROUP: ITERATE -- -Script [0-36] - StatementList [0-36] +Script [0-35] + StatementList [0-35] WhileStatement [0-34] BooleanLiteral(TRUE) [6-10] - StatementList [16-25] + StatementList [16-24] ContinueStatement [16-23] -- WHILE TRUE DO @@ -165,10 +165,10 @@ LOOP BREAK; END LOOP; -- -Script [0-36] - StatementList [0-36] +Script [0-35] + StatementList [0-35] WhileStatement [0-34] - StatementList [7-26] + StatementList [7-25] QueryStatement [7-15] Query [7-15] Select [7-15] @@ -193,16 +193,16 @@ LOOP END IF; END LOOP; -- -Script [0-69] - StatementList [0-69] +Script [0-68] + StatementList [0-68] WhileStatement [0-67] - StatementList [7-59] + StatementList [7-58] IfStatement [7-57] PathExpression [10-11] Identifier(x) [10-11] - StatementList [21-30] + StatementList [21-27] BreakStatement [21-26] - StatementList [39-51] + StatementList [39-48] ContinueStatement [39-47] -- LOOP @@ -224,20 +224,20 @@ LOOP END IF; END LOOP; -- -Script [0-97] - StatementList [0-97] +Script [0-96] + StatementList [0-96] WhileStatement [0-95] - StatementList [7-87] + StatementList [7-86] IfStatement [7-85] PathExpression [10-11] Identifier(x) [10-11] - StatementList [21-58] + StatementList [21-55] IfStatement [21-54] PathExpression [24-25] Identifier(y) [24-25] - StatementList [37-48] + StatementList [37-43] BreakStatement [37-42] - StatementList [67-79] + StatementList [67-76] ContinueStatement [67-75] -- LOOP @@ -260,18 +260,18 @@ LOOP END WHILE; END LOOP; -- -Script [0-94] - StatementList [0-94] +Script [0-93] + StatementList [0-93] WhileStatement [0-92] - StatementList [7-84] + StatementList [7-83] WhileStatement [7-82] PathExpression [13-14] Identifier(x) [13-14] - StatementList [22-73] + StatementList [22-70] IfStatement [22-55] PathExpression [25-26] Identifier(y) [25-26] - StatementList [38-49] + StatementList [38-44] BreakStatement [38-43] ContinueStatement [61-69] -- @@ -289,12 +289,12 @@ IF y THEN BREAK; END IF; -- -Script [0-27] - StatementList [0-27] +Script [0-26] + StatementList [0-26] IfStatement [0-25] PathExpression [3-4] Identifier(y) [3-4] - StatementList [12-19] + StatementList [12-18] BreakStatement [12-17] -- IF y THEN @@ -308,12 +308,12 @@ END WHILE; -- ALTERNATION GROUP: BREAK -- -Script [0-38] - StatementList [0-38] +Script [0-37] + StatementList [0-37] WhileStatement [0-36] PathExpression [6-11] Identifier(BREAK) [6-11] - StatementList [17-27] + StatementList [17-26] ContinueStatement [17-25] -- WHILE BREAK DO @@ -322,12 +322,12 @@ END WHILE ; -- ALTERNATION GROUP: LEAVE -- -Script [0-38] - StatementList [0-38] +Script [0-37] + StatementList [0-37] WhileStatement [0-36] PathExpression [6-11] Identifier(LEAVE) [6-11] - StatementList [17-27] + StatementList [17-26] ContinueStatement [17-25] -- WHILE LEAVE DO @@ -336,12 +336,12 @@ END WHILE ; -- ALTERNATION GROUP: CONTINUE -- -Script [0-41] - StatementList [0-41] +Script [0-40] + StatementList [0-40] WhileStatement [0-39] PathExpression [6-14] Identifier(CONTINUE) [6-14] - StatementList [20-30] + StatementList [20-29] ContinueStatement [20-28] -- WHILE CONTINUE DO @@ -350,12 +350,12 @@ END WHILE ; -- ALTERNATION GROUP: ITERATE -- -Script [0-40] - StatementList [0-40] +Script [0-39] + StatementList [0-39] WhileStatement [0-38] PathExpression [6-13] Identifier(ITERATE) [6-13] - StatementList [19-29] + StatementList [19-28] ContinueStatement [19-27] -- WHILE ITERATE DO diff --git a/zetasql/parser/testdata/call.test b/zetasql/parser/testdata/call.test index 94695c7d4..818b71bcb 100644 --- a/zetasql/parser/testdata/call.test +++ b/zetasql/parser/testdata/call.test @@ -33,7 +33,8 @@ CallStatement [0-50] IntLiteral(1) [17-18] IntLiteral(2) [21-22] TVFArgument [24-27] - StringLiteral("a") [24-27] + StringLiteral [24-27] + StringLiteralComponent("a") [24-27] TVFArgument [29-49] CastExpression [29-49] NullLiteral(NULL) [34-38] diff --git a/zetasql/parser/testdata/case.test b/zetasql/parser/testdata/case.test index 9978800dd..ff63791dc 100644 --- a/zetasql/parser/testdata/case.test +++ b/zetasql/parser/testdata/case.test @@ -219,7 +219,8 @@ QueryStatement [0-119] [select case...'bbb') end] PathExpression [63-65] [a3] Identifier(a3) [63-65] [a3] IntLiteral(4) [66-67] [4] - StringLiteral('ddd') [73-78] ['ddd'] + StringLiteral [73-78] ['ddd'] + StringLiteralComponent('ddd') [73-78] ['ddd'] FunctionCall [91-115] [if(a4 = 6, 'aaa', 'bbb')] PathExpression [91-93] [if] Identifier(`if`) [91-93] [if] @@ -227,8 +228,10 @@ QueryStatement [0-119] [select case...'bbb') end] PathExpression [94-96] [a4] Identifier(a4) [94-96] [a4] IntLiteral(6) [99-100] [6] - StringLiteral('aaa') [102-107] ['aaa'] - StringLiteral('bbb') [109-114] ['bbb'] + StringLiteral [102-107] ['aaa'] + StringLiteralComponent('aaa') [102-107] ['aaa'] + StringLiteral [109-114] ['bbb'] + StringLiteralComponent('bbb') [109-114] ['bbb'] -- SELECT CASE a1 + 57 diff --git a/zetasql/parser/testdata/clone_data.test b/zetasql/parser/testdata/clone_data.test index a9d0b8895..721469236 100644 --- a/zetasql/parser/testdata/clone_data.test +++ b/zetasql/parser/testdata/clone_data.test @@ -163,7 +163,8 @@ CloneDataStatement [0-218] [CLONE DATA...`zoo$1234`] BinaryExpression(=) [117-132] [location = 'US'] PathExpression [117-125] [location] Identifier(location) [117-125] [location] - StringLiteral('US') [128-132] ['US'] + StringLiteral [128-132] ['US'] + StringLiteralComponent('US') [128-132] ['US'] CloneDataSource [143-197] [baz FOR SYSTEM_TI...'premium'] PathExpression [143-146] [baz] Identifier(baz) [143-146] [baz] @@ -175,7 +176,8 @@ CloneDataStatement [0-218] [CLONE DATA...`zoo$1234`] BinaryExpression(=) [181-197] [type = 'premium'] PathExpression [181-185] [type] Identifier(type) [181-185] [type] - StringLiteral('premium') [188-197] ['premium'] + StringLiteral [188-197] ['premium'] + StringLiteralComponent('premium') [188-197] ['premium'] CloneDataSource [208-218] [`zoo$1234`] PathExpression [208-218] [`zoo$1234`] Identifier(`zoo$1234`) [208-218] [`zoo$1234`] diff --git a/zetasql/parser/testdata/create_constant.test b/zetasql/parser/testdata/create_constant.test index e962561ec..3f3381503 100644 --- a/zetasql/parser/testdata/create_constant.test +++ b/zetasql/parser/testdata/create_constant.test @@ -15,7 +15,8 @@ CreateConstantStatement [0-29] [create constant a.b.c = 'str'] Identifier(a) [16-17] [a] Identifier(b) [18-19] [b] Identifier(c) [20-21] [c] - StringLiteral('str') [24-29] ['str'] + StringLiteral [24-29] ['str'] + StringLiteralComponent('str') [24-29] ['str'] -- CREATE CONSTANT a.b.c = 'str' == @@ -364,7 +365,8 @@ create constant table = 'a'; CreateConstantStatement [0-27] [create constant table = 'a'] PathExpression [16-21] [table] Identifier(table) [16-21] [table] - StringLiteral('a') [24-27] ['a'] + StringLiteral [24-27] ['a'] + StringLiteralComponent('a') [24-27] ['a'] -- CREATE CONSTANT table = 'a' == diff --git a/zetasql/parser/testdata/create_database.test b/zetasql/parser/testdata/create_database.test index 48406b78e..99d26137d 100644 --- a/zetasql/parser/testdata/create_database.test +++ b/zetasql/parser/testdata/create_database.test @@ -73,7 +73,8 @@ CreateDatabaseStatement [0-51] [create database...ption_2='2')] IntLiteral(1) [36-37] [1] OptionsEntry [38-50] [option_2='2'] Identifier(option_2) [38-46] [option_2] - StringLiteral('2') [47-50] ['2'] + StringLiteral [47-50] ['2'] + StringLiteralComponent('2') [47-50] ['2'] -- CREATE DATABASE db OPTIONS(option_1 = 1, option_2 = '2') == @@ -89,7 +90,8 @@ CreateDatabaseStatement [0-56] [CREATE DATABASE...ption_2='2')] IntLiteral(1) [41-42] [1] OptionsEntry [43-55] [option_2='2'] Identifier(option_2) [43-51] [option_2] - StringLiteral('2') [52-55] ['2'] + StringLiteral [52-55] ['2'] + StringLiteralComponent('2') [52-55] ['2'] -- CREATE DATABASE OPTIONS OPTIONS(option_1 = 1, option_2 = '2') == diff --git a/zetasql/parser/testdata/create_external_schema.test b/zetasql/parser/testdata/create_external_schema.test index 2eb4652bc..2cdf65a34 100644 --- a/zetasql/parser/testdata/create_external_schema.test +++ b/zetasql/parser/testdata/create_external_schema.test @@ -128,7 +128,8 @@ CreateExternalSchemaStatement [0-73] [CREATE EXTERNAL..., c="def")] Identifier(b) [62-63] [b] OptionsEntry [65-72] [c="def"] Identifier(c) [65-66] [c] - StringLiteral("def") [67-72] ["def"] + StringLiteral [67-72] ["def"] + StringLiteralComponent("def") [67-72] ["def"] -- CREATE EXTERNAL SCHEMA foo WITH CONNECTION bar.baz OPTIONS(a = b, c = "def") @@ -206,4 +207,3 @@ CREATE EXTERNAL SCHEMA foo WITH CONNECTION bar ERROR: Syntax error: Expected keyword OPTIONS but got end of statement [at 1:47] CREATE EXTERNAL SCHEMA foo WITH CONNECTION bar ^ -== diff --git a/zetasql/parser/testdata/create_external_table.test b/zetasql/parser/testdata/create_external_table.test index 5707fa91a..226c09be6 100644 --- a/zetasql/parser/testdata/create_external_table.test +++ b/zetasql/parser/testdata/create_external_table.test @@ -12,7 +12,8 @@ CreateExternalTableStatement [0-47] [create external..., c="def")] Identifier(b) [36-37] [b] OptionsEntry [39-46] [c="def"] Identifier(c) [39-40] [c] - StringLiteral("def") [41-46] ["def"] + StringLiteral [41-46] ["def"] + StringLiteralComponent("def") [41-46] ["def"] -- CREATE EXTERNAL TABLE t1 OPTIONS(a = b, c = "def") == @@ -563,10 +564,12 @@ CreateExternalTableStatement [0-104] [create external..., c="def")] OptionsList [65-104] [(a='{"jsonkey..., c="def")] OptionsEntry [66-94] [a='{"jsonkey": "jsonvalue"}'] Identifier(a) [66-67] [a] - StringLiteral('{"jsonkey": "jsonvalue"}') [68-94] ['{"jsonkey": "jsonvalue"}'] + StringLiteral [68-94] ['{"jsonkey": "jsonvalue"}'] + StringLiteralComponent('{"jsonkey": "jsonvalue"}') [68-94] ['{"jsonkey": "jsonvalue"}'] OptionsEntry [96-103] [c="def"] Identifier(c) [96-97] [c] - StringLiteral("def") [98-103] ["def"] + StringLiteral [98-103] ["def"] + StringLiteralComponent("def") [98-103] ["def"] -- CREATE EXTERNAL TABLE t1 ( @@ -735,7 +738,8 @@ CreateExternalTableStatement [0-49] [create external..., c="def")] Identifier(b) [38-39] [b] OptionsEntry [41-48] [c="def"] Identifier(c) [41-42] [c] - StringLiteral("def") [43-48] ["def"] + StringLiteral [43-48] ["def"] + StringLiteralComponent("def") [43-48] ["def"] -- CREATE EXTERNAL TABLE `t1-2` OPTIONS(a = b, c = "def") == diff --git a/zetasql/parser/testdata/create_function.test b/zetasql/parser/testdata/create_function.test index e4a4fc2e1..97abd46fe 100644 --- a/zetasql/parser/testdata/create_function.test +++ b/zetasql/parser/testdata/create_function.test @@ -440,7 +440,8 @@ CreateFunctionStatement [0-105] [create function...'\n'; """] PathExpression [39-45] [string] Identifier(string) [39-45] [string] Identifier(testlang) [55-63] [testlang] - StringLiteral(""" return + StringLiteral [67-105] [""" return...'\n'; """] + StringLiteralComponent(""" return "presto!" + s + '\n'; """) [67-105] [""" return...'\n'; """] -- @@ -470,7 +471,8 @@ CreateFunctionStatement [0-107] [create function...return 'a';"] PathExpression [39-45] [string] Identifier(string) [39-45] [string] Identifier(testlang) [55-63] [testlang] - StringLiteral("return 'a';") [94-107] ["return 'a';"] + StringLiteral [94-107] ["return 'a';"] + StringLiteralComponent("return 'a';") [94-107] ["return 'a';"] OptionsList [72-90] [( a=b, bruce=lee )] OptionsEntry [74-77] [a=b] Identifier(a) [74-75] [a] @@ -505,7 +507,8 @@ CreateFunctionStatement [0-107] [create function...bruce=lee )] PathExpression [39-45] [string] Identifier(string) [39-45] [string] Identifier(testlang) [55-63] [testlang] - StringLiteral("return 'a';") [67-80] ["return 'a';"] + StringLiteral [67-80] ["return 'a';"] + StringLiteralComponent("return 'a';") [67-80] ["return 'a';"] OptionsList [89-107] [( a=b, bruce=lee )] OptionsEntry [91-94] [a=b] Identifier(a) [91-92] [a] @@ -561,7 +564,8 @@ CreateFunctionStatement [0-78] [create function...return 'a';"] PathExpression [37-43] [string] Identifier(string) [37-43] [string] Identifier(testlang) [53-61] [testlang] - StringLiteral("return 'a';") [65-78] ["return 'a';"] + StringLiteral [65-78] ["return 'a';"] + StringLiteralComponent("return 'a';") [65-78] ["return 'a';"] -- CREATE FUNCTION fn(string) RETURNS string LANGUAGE testlang AS "return 'a';" @@ -599,7 +603,8 @@ CreateFunctionStatement [0-103] [create function...return 'a'"] PathExpression [63-69] [string] Identifier(string) [63-69] [string] Identifier(testlang) [79-87] [testlang] - StringLiteral("return 'a'") [91-103] ["return 'a'"] + StringLiteral [91-103] ["return 'a'"] + StringLiteralComponent("return 'a'") [91-103] ["return 'a'"] -- CREATE FUNCTION fn(string, s string, int32, i int32) RETURNS string LANGUAGE testlang AS "return 'a'" @@ -1003,12 +1008,13 @@ CreateFunctionStatement [0-70] [create function...turns double] SimpleType [25-30] [int64] PathExpression [25-30] [int64] Identifier(int64) [25-30] [int64] - FunctionParameter(default_value=(StringLiteral("abc"))) [32-54] [b string default "abc"] + FunctionParameter(default_value=(StringLiteral)) [32-54] [b string default "abc"] Identifier(b) [32-33] [b] SimpleType [34-40] [string] PathExpression [34-40] [string] Identifier(string) [34-40] [string] - StringLiteral("abc") [49-54] ["abc"] + StringLiteral [49-54] ["abc"] + StringLiteralComponent("abc") [49-54] ["abc"] SimpleType [64-70] [double] PathExpression [64-70] [double] Identifier(double) [64-70] [double] @@ -1035,12 +1041,13 @@ CreateFunctionStatement [0-152] [CREATE FUNCTION...turns double] Identifier(int64) [25-30] [int64] UnaryExpression(-) [39-43] [-314] IntLiteral(314) [40-43] [314] - FunctionParameter(default_value=(StringLiteral("abc"))) [68-90] [b string DEFAULT "abc"] + FunctionParameter(default_value=(StringLiteral)) [68-90] [b string DEFAULT "abc"] Identifier(b) [68-69] [b] SimpleType [70-76] [string] PathExpression [70-76] [string] Identifier(string) [70-76] [string] - StringLiteral("abc") [85-90] ["abc"] + StringLiteral [85-90] ["abc"] + StringLiteralComponent("abc") [85-90] ["abc"] FunctionParameter(default_value=(FloatLiteral(3.14))) [115-136] [c double DEFAULT 3.14] Identifier(c) [115-116] [c] SimpleType [117-123] [double] @@ -1071,10 +1078,11 @@ CreateFunctionStatement [0-143] [Create Function...turns double] SimpleType [25-30] [int64] PathExpression [25-30] [int64] Identifier(int64) [25-30] [int64] - FunctionParameter(default_value=(StringLiteral("abc"))) [55-79] [b ANY TYPE Default "abc"] + FunctionParameter(default_value=(StringLiteral)) [55-79] [b ANY TYPE Default "abc"] Identifier(b) [55-56] [b] TemplatedParameterType [57-65] [ANY TYPE] - StringLiteral("abc") [74-79] ["abc"] + StringLiteral [74-79] ["abc"] + StringLiteralComponent("abc") [74-79] ["abc"] FunctionParameter(default_value=(FloatLiteral(3.14))) [104-127] [c ANY TYPE Default 3.14] Identifier(c) [104-105] [c] TemplatedParameterType [106-114] [ANY TYPE] @@ -1579,14 +1587,16 @@ CreateFunctionStatement(is_public) [0-296] [CREATE PUBLIC...END )] PathExpression [112-123] [bucket.type] Identifier(bucket) [112-118] [bucket] Identifier(type) [119-123] [type] - StringLiteral("point") [133-140] ["point"] + StringLiteral [133-140] ["point"] + StringLiteralComponent("point") [133-140] ["point"] BinaryExpression(=) [146-166] [value = bucket.point] PathExpression [146-151] [value] Identifier(value) [146-151] [value] PathExpression [154-166] [bucket.point] Identifier(bucket) [154-160] [bucket] Identifier(point) [161-166] [point] - StringLiteral("range") [176-183] ["range"] + StringLiteral [176-183] ["range"] + StringLiteralComponent("range") [176-183] ["range"] AndExpr [189-244] [bucket.range...range.high] BinaryExpression(<=) [189-214] [bucket.range.low <= value] PathExpression [189-205] [bucket.range.low] @@ -1605,7 +1615,8 @@ CreateFunctionStatement(is_public) [0-296] [CREATE PUBLIC...END )] FunctionCall [254-286] [ERROR("Unsupporte...ket type")] PathExpression [254-259] [ERROR] Identifier(ERROR) [254-259] [ERROR] - StringLiteral("Unsupported bucket type") [260-285] ["Unsupported bucket type"] + StringLiteral [260-285] ["Unsupported bucket type"] + StringLiteralComponent("Unsupported bucket type") [260-285] ["Unsupported bucket type"] -- CREATE PUBLIC FUNCTION MatchesBucket(value DOUBLE, bucket project_namespace.Bucket) @@ -2013,7 +2024,8 @@ CreateFunctionStatement [0-70] [create function...) as (1)] PathExpression [37-42] [INT64] Identifier(INT64) [37-42] [INT64] Collate [44-60] [collate 'und:ci'] - StringLiteral('und:ci') [52-60] ['und:ci'] + StringLiteral [52-60] ['und:ci'] + StringLiteralComponent('und:ci') [52-60] ['und:ci'] SqlFunctionBody [67-70] [(1)] IntLiteral(1) [68-69] [1] -- diff --git a/zetasql/parser/testdata/create_index.test b/zetasql/parser/testdata/create_index.test index 85f24d6c5..5f3432c6a 100644 --- a/zetasql/parser/testdata/create_index.test +++ b/zetasql/parser/testdata/create_index.test @@ -1235,7 +1235,8 @@ CreateIndexStatement(SEARCH) [0-55] [CREATE SEARCH...OPTIONS()] Identifier(t1) [26-28] [t1] IndexItemList [29-45] [('ALL', COLUMNS)] OrderingExpression(ASC) [30-35] ['ALL'] - StringLiteral('ALL') [30-35] ['ALL'] + StringLiteral [30-35] ['ALL'] + StringLiteralComponent('ALL') [30-35] ['ALL'] OrderingExpression(ASC) [37-44] [COLUMNS] PathExpression [37-44] [COLUMNS] Identifier(COLUMNS) [37-44] [COLUMNS] @@ -1313,4 +1314,3 @@ ALTERNATION GROUP: or replace ERROR: Syntax error: Expected keyword INDEX but got keyword SEARCH [at 1:26] create or replace vector search index on i1 on t1(a); ^ -== diff --git a/zetasql/parser/testdata/create_model.test b/zetasql/parser/testdata/create_model.test index c78b6b8fd..1c30f43c1 100644 --- a/zetasql/parser/testdata/create_model.test +++ b/zetasql/parser/testdata/create_model.test @@ -709,7 +709,8 @@ CreateModelStatement(is_temp) [0-104] [create temporary...as select 2] IntLiteral(5) [39-40] [5] OptionsEntry [42-51] [y = 'abc'] Identifier(y) [42-43] [y] - StringLiteral('abc') [46-51] ['abc'] + StringLiteral [46-51] ['abc'] + StringLiteralComponent('abc') [46-51] ['abc'] OptionsEntry [53-63] [z = @param] Identifier(z) [53-54] [z] ParameterExpr [57-63] [@param] @@ -812,7 +813,8 @@ CreateModelStatement [0-54] [create model...as select 2] OptionsList [24-42] [(y='b.c', z=`b.c`)] OptionsEntry [25-32] [y='b.c'] Identifier(y) [25-26] [y] - StringLiteral('b.c') [27-32] ['b.c'] + StringLiteral [27-32] ['b.c'] + StringLiteralComponent('b.c') [27-32] ['b.c'] OptionsEntry [34-41] [z=`b.c`] Identifier(z) [34-35] [z] PathExpression [36-41] [`b.c`] diff --git a/zetasql/parser/testdata/create_procedure.test b/zetasql/parser/testdata/create_procedure.test index 615ebd20b..c88aac751 100644 --- a/zetasql/parser/testdata/create_procedure.test +++ b/zetasql/parser/testdata/create_procedure.test @@ -57,7 +57,8 @@ CreateProcedureStatement [0-63] [CREATE PROCEDURE...BEGIN END] IntLiteral(1) [44-45] [1] OptionsEntry [47-52] [b="2"] Identifier(b) [47-48] [b] - StringLiteral("2") [49-52] ["2"] + StringLiteral [49-52] ["2"] + StringLiteralComponent("2") [49-52] ["2"] Script [54-63] [BEGIN END] StatementList [54-63] [BEGIN END] BeginEndBlock [54-63] [BEGIN END] @@ -157,7 +158,7 @@ CreateProcedureStatement [0-117] [CREATE PROCEDURE...test"; END] Script [52-117] [BEGIN DECLARE...test"; END] StatementList [52-117] [BEGIN DECLARE...test"; END] BeginEndBlock [52-117] [BEGIN DECLARE...test"; END] - StatementList [60-114] [DECLARE a...= "test";] + StatementList [60-113] [DECLARE a...= "test";] VariableDeclaration [60-75] [DECLARE a int32] IdentifierList [68-69] [a] Identifier(a) [68-69] [a] @@ -169,7 +170,8 @@ CreateProcedureStatement [0-117] [CREATE PROCEDURE...test"; END] IntLiteral(1) [87-88] [1] SingleAssignment [92-112] [SET param_a = "test"] Identifier(param_a) [96-103] [param_a] - StringLiteral("test") [106-112] ["test"] + StringLiteral [106-112] ["test"] + StringLiteralComponent("test") [106-112] ["test"] -- CREATE PROCEDURE procedure_name(OUT param_a string) BEGIN @@ -357,13 +359,13 @@ CreateProcedureStatement [0-101] [CREATE PROCEDURE...END IF; END] Script [48-101] [BEGIN IF...END IF; END] StatementList [48-101] [BEGIN IF...END IF; END] BeginEndBlock [48-101] [BEGIN IF...END IF; END] - StatementList [56-98] [IF param_a...END IF;] + StatementList [56-97] [IF param_a...END IF;] IfStatement [56-96] [IF param_a...; END IF] BinaryExpression(>) [59-70] [param_a > 0] PathExpression [59-66] [param_a] Identifier(param_a) [59-66] [param_a] IntLiteral(0) [69-70] [0] - StatementList [80-90] [RETURN;] + StatementList [80-87] [RETURN;] ReturnStatement [80-86] [RETURN] -- CREATE PROCEDURE procedure_name(param_a int32) @@ -483,8 +485,8 @@ BEGIN SELECT 1; END; -- -Script [0-115] [CREATE PROCEDURE...ECT 1; END;] - StatementList [0-115] [CREATE PROCEDURE...ECT 1; END;] +Script [0-114] [CREATE PROCEDURE...ECT 1; END;] + StatementList [0-114] [CREATE PROCEDURE...ECT 1; END;] CreateProcedureStatement [0-55] [CREATE PROCEDURE...LECT 1; END] PathExpression [17-31] [procedure_name] Identifier(procedure_name) [17-31] [procedure_name] @@ -492,7 +494,7 @@ Script [0-115] [CREATE PROCEDURE...ECT 1; END;] Script [34-55] [BEGIN SELECT 1; END] StatementList [34-55] [BEGIN SELECT 1; END] BeginEndBlock [34-55] [BEGIN SELECT 1; END] - StatementList [42-52] [SELECT 1;] + StatementList [42-51] [SELECT 1;] QueryStatement [42-50] [SELECT 1] Query [42-50] [SELECT 1] Select [42-50] [SELECT 1] @@ -506,7 +508,7 @@ Script [0-115] [CREATE PROCEDURE...ECT 1; END;] Script [92-113] [BEGIN SELECT 1; END] StatementList [92-113] [BEGIN SELECT 1; END] BeginEndBlock [92-113] [BEGIN SELECT 1; END] - StatementList [100-110] [SELECT 1;] + StatementList [100-109] [SELECT 1;] QueryStatement [100-108] [SELECT 1] Query [100-108] [SELECT 1] Select [100-108] [SELECT 1] @@ -542,7 +544,7 @@ CreateProcedureStatement [0-67] [CREATE PROCEDURE...TERVAL; END] Script [34-67] [BEGIN DECLARE...NTERVAL; END] StatementList [34-67] [BEGIN DECLARE...NTERVAL; END] BeginEndBlock [34-67] [BEGIN DECLARE...NTERVAL; END] - StatementList [42-64] [DECLARE ABC INTERVAL;] + StatementList [42-63] [DECLARE ABC INTERVAL;] VariableDeclaration [42-62] [DECLARE ABC INTERVAL] IdentifierList [50-53] [ABC] Identifier(ABC) [50-53] [ABC] @@ -596,7 +598,8 @@ CreateProcedureStatement [0-182] [CREATE PROCEDURE...ython code"] PathExpression [114-127] [connection_id] Identifier(connection_id) [114-127] [connection_id] Identifier(PYTHON) [159-165] [PYTHON] - StringLiteral("python code") [169-182] ["python code"] + StringLiteral [169-182] ["python code"] + StringLiteralComponent("python code") [169-182] ["python code"] -- CREATE PROCEDURE procedure_name(param_a int32, OUT param_b string, INOUT param_c ANY TYPE) WITH CONNECTION connection_id OPTIONS @@ -636,7 +639,8 @@ CreateProcedureStatement [0-170] [CREATE PROCEDURE...ython code"] PathExpression [114-127] [connection_id] Identifier(connection_id) [114-127] [connection_id] Identifier(PYTHON) [147-153] [PYTHON] - StringLiteral("python code") [157-170] ["python code"] + StringLiteral [157-170] ["python code"] + StringLiteralComponent("python code") [157-170] ["python code"] -- CREATE PROCEDURE procedure_name(param_a int32, OUT param_b string, INOUT param_c ANY TYPE) WITH CONNECTION connection_id OPTIONS @@ -679,7 +683,8 @@ CreateProcedureStatement [0-152] [CREATE PROCEDURE...ython code"] PathExpression [117-118] [d] Identifier(d) [117-118] [d] Identifier(PYTHON) [129-135] [PYTHON] - StringLiteral("python code") [139-152] ["python code"] + StringLiteral [139-152] ["python code"] + StringLiteralComponent("python code") [139-152] ["python code"] -- CREATE PROCEDURE procedure_name(param_a int32, OUT param_b string, INOUT param_c ANY TYPE) OPTIONS @@ -917,7 +922,7 @@ CreateProcedureStatement [0-90] [CREATE PROCEDURE...LECT 1; END] Script [69-90] [BEGIN SELECT 1; END] StatementList [69-90] [BEGIN SELECT 1; END] BeginEndBlock [69-90] [BEGIN SELECT 1; END] - StatementList [77-87] [SELECT 1;] + StatementList [77-86] [SELECT 1;] QueryStatement [77-85] [SELECT 1] Query [77-85] [SELECT 1] Select [77-85] [SELECT 1] diff --git a/zetasql/parser/testdata/create_row_access_policy.test b/zetasql/parser/testdata/create_row_access_policy.test index f2cbb68b2..fede47adb 100644 --- a/zetasql/parser/testdata/create_row_access_policy.test +++ b/zetasql/parser/testdata/create_row_access_policy.test @@ -5,12 +5,14 @@ CreateRowAccessPolicyStatement [0-83] [create row...c1 = 'foo')] Identifier(t1) [28-30] [t1] GrantToClause [31-58] [grant to ('foo@google.com')] GranteeList [41-57] ['foo@google.com'] - StringLiteral('foo@google.com') [41-57] ['foo@google.com'] + StringLiteral [41-57] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [41-57] ['foo@google.com'] FilterUsingClause [59-83] [filter using(c1 = 'foo')] BinaryExpression(=) [72-82] [c1 = 'foo'] PathExpression [72-74] [c1] Identifier(c1) [72-74] [c1] - StringLiteral('foo') [77-82] ['foo'] + StringLiteral [77-82] ['foo'] + StringLiteralComponent('foo') [77-82] ['foo'] -- CREATE ROW ACCESS POLICY ON t1 GRANT TO ('foo@google.com') FILTER USING (c1 = 'foo') == @@ -22,12 +24,14 @@ CreateRowAccessPolicyStatement [0-83] [create row...c2 = 'foo')] Identifier(t1) [31-33] [t1] GrantToClause [34-58] [grant to ('mdbuser/bar')] GranteeList [44-57] ['mdbuser/bar'] - StringLiteral('mdbuser/bar') [44-57] ['mdbuser/bar'] + StringLiteral [44-57] ['mdbuser/bar'] + StringLiteralComponent('mdbuser/bar') [44-57] ['mdbuser/bar'] FilterUsingClause [59-83] [filter using(c2 = 'foo')] BinaryExpression(=) [72-82] [c2 = 'foo'] PathExpression [72-74] [c2] Identifier(c2) [72-74] [c2] - StringLiteral('foo') [77-82] ['foo'] + StringLiteral [77-82] ['foo'] + StringLiteralComponent('foo') [77-82] ['foo'] PathExpression [25-27] [p1] Identifier(p1) [25-27] [p1] -- @@ -41,8 +45,10 @@ CreateRowAccessPolicyStatement [0-91] [create row...using(c1)] Identifier(t1) [28-30] [t1] GrantToClause [31-74] [grant to (...mdbgroup/bar')] GranteeList [41-73] ['foo@google...mdbgroup/bar'] - StringLiteral('foo@google.com') [41-57] ['foo@google.com'] - StringLiteral('mdbgroup/bar') [59-73] ['mdbgroup/bar'] + StringLiteral [41-57] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [41-57] ['foo@google.com'] + StringLiteral [59-73] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [59-73] ['mdbgroup/bar'] FilterUsingClause [75-91] [filter using(c1)] PathExpression [88-90] [c1] Identifier(c1) [88-90] [c1] @@ -58,8 +64,10 @@ CreateRowAccessPolicyStatement [0-94] [create row...using(c1)] Identifier(t1) [31-33] [t1] GrantToClause [34-77] [grant to (...mdbgroup/bar')] GranteeList [44-76] ['foo@google...mdbgroup/bar'] - StringLiteral('foo@google.com') [44-60] ['foo@google.com'] - StringLiteral('mdbgroup/bar') [62-76] ['mdbgroup/bar'] + StringLiteral [44-60] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [44-60] ['foo@google.com'] + StringLiteral [62-76] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [62-76] ['mdbgroup/bar'] FilterUsingClause [78-94] [filter using(c1)] PathExpression [91-93] [c1] Identifier(c1) [91-93] [c1] @@ -75,8 +83,10 @@ CreateRowAccessPolicyStatement [0-93] [create row...filter using(1)] Identifier(t1) [31-33] [t1] GrantToClause [34-77] [grant to (...mdbgroup/bar')] GranteeList [44-76] ['foo@google...mdbgroup/bar'] - StringLiteral('foo@google.com') [44-60] ['foo@google.com'] - StringLiteral('mdbgroup/bar') [62-76] ['mdbgroup/bar'] + StringLiteral [44-60] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [44-60] ['foo@google.com'] + StringLiteral [62-76] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [62-76] ['mdbgroup/bar'] FilterUsingClause [78-93] [filter using(1)] IntLiteral(1) [91-92] [1] -- @@ -198,7 +208,8 @@ CreateRowAccessPolicyStatement [0-63] [create row...using(true)] Identifier(t1) [28-30] [t1] GrantToClause [31-44] [grant to ("")] GranteeList [41-43] [""] - StringLiteral("") [41-43] [""] + StringLiteral [41-43] [""] + StringLiteralComponent("") [41-43] [""] FilterUsingClause [45-63] [filter using(true)] BooleanLiteral(true) [58-62] [true] -- @@ -269,12 +280,14 @@ CreateRowAccessPolicyStatement [0-90] [create row...region = "us")] Identifier(t1) [31-33] [t1] GrantToClause [34-61] [grant to ("bar@google.com")] GranteeList [44-60] ["bar@google.com"] - StringLiteral("bar@google.com") [44-60] ["bar@google.com"] + StringLiteral [44-60] ["bar@google.com"] + StringLiteralComponent("bar@google.com") [44-60] ["bar@google.com"] FilterUsingClause [62-90] [filter using (region = "us")] BinaryExpression(=) [76-89] [region = "us"] PathExpression [76-82] [region] Identifier(region) [76-82] [region] - StringLiteral("us") [85-89] ["us"] + StringLiteral [85-89] ["us"] + StringLiteralComponent("us") [85-89] ["us"] PathExpression [25-27] [p1] Identifier(p1) [25-27] [p1] -- @@ -287,12 +300,14 @@ CreateRowAccessPolicyStatement [0-84] [create row...region = "us")] Identifier(t1) [31-33] [t1] GrantToClause [34-61] [grant to ("bar@google.com")] GranteeList [44-60] ["bar@google.com"] - StringLiteral("bar@google.com") [44-60] ["bar@google.com"] + StringLiteral [44-60] ["bar@google.com"] + StringLiteralComponent("bar@google.com") [44-60] ["bar@google.com"] FilterUsingClause [61-84] [using (region = "us")] BinaryExpression(=) [70-83] [region = "us"] PathExpression [70-76] [region] Identifier(region) [70-76] [region] - StringLiteral("us") [79-83] ["us"] + StringLiteral [79-83] ["us"] + StringLiteralComponent("us") [79-83] ["us"] PathExpression [25-27] [p1] Identifier(p1) [25-27] [p1] -- @@ -317,12 +332,14 @@ CreateRowAccessPolicyStatement [0-84] [create row...region = "us")] Identifier(t1) [25-27] [t1] GrantToClause [28-55] [grant to ("bar@google.com")] GranteeList [38-54] ["bar@google.com"] - StringLiteral("bar@google.com") [38-54] ["bar@google.com"] + StringLiteral [38-54] ["bar@google.com"] + StringLiteralComponent("bar@google.com") [38-54] ["bar@google.com"] FilterUsingClause [56-84] [filter using (region = "us")] BinaryExpression(=) [70-83] [region = "us"] PathExpression [70-76] [region] Identifier(region) [70-76] [region] - StringLiteral("us") [79-83] ["us"] + StringLiteral [79-83] ["us"] + StringLiteralComponent("us") [79-83] ["us"] PathExpression [19-21] [p1] Identifier(p1) [19-21] [p1] -- @@ -335,12 +352,14 @@ CreateRowAccessPolicyStatement [0-78] [create row...region = "us")] Identifier(t1) [25-27] [t1] GrantToClause [28-55] [grant to ("bar@google.com")] GranteeList [38-54] ["bar@google.com"] - StringLiteral("bar@google.com") [38-54] ["bar@google.com"] + StringLiteral [38-54] ["bar@google.com"] + StringLiteralComponent("bar@google.com") [38-54] ["bar@google.com"] FilterUsingClause [55-78] [using (region = "us")] BinaryExpression(=) [64-77] [region = "us"] PathExpression [64-70] [region] Identifier(region) [64-70] [region] - StringLiteral("us") [73-77] ["us"] + StringLiteral [73-77] ["us"] + StringLiteralComponent("us") [73-77] ["us"] PathExpression [19-21] [p1] Identifier(p1) [19-21] [p1] -- diff --git a/zetasql/parser/testdata/create_row_policy.test b/zetasql/parser/testdata/create_row_policy.test index 0e966e352..c26a0036b 100644 --- a/zetasql/parser/testdata/create_row_policy.test +++ b/zetasql/parser/testdata/create_row_policy.test @@ -5,12 +5,14 @@ CreateRowAccessPolicyStatement [0-61] [create row...c1 = 'foo')] Identifier(t1) [21-23] [t1] GrantToClause [24-43] [to 'foo@google.com'] GranteeList [27-43] ['foo@google.com'] - StringLiteral('foo@google.com') [27-43] ['foo@google.com'] + StringLiteral [27-43] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [27-43] ['foo@google.com'] FilterUsingClause [43-61] [using(c1 = 'foo')] BinaryExpression(=) [50-60] [c1 = 'foo'] PathExpression [50-52] [c1] Identifier(c1) [50-52] [c1] - StringLiteral('foo') [55-60] ['foo'] + StringLiteral [55-60] ['foo'] + StringLiteralComponent('foo') [55-60] ['foo'] -- CREATE ROW POLICY ON t1 TO 'foo@google.com' USING (c1 = 'foo') == @@ -22,12 +24,14 @@ CreateRowAccessPolicyStatement [0-61] [create row...c2 = 'foo')] Identifier(t1) [24-26] [t1] GrantToClause [27-43] [to 'mdbuser/bar'] GranteeList [30-43] ['mdbuser/bar'] - StringLiteral('mdbuser/bar') [30-43] ['mdbuser/bar'] + StringLiteral [30-43] ['mdbuser/bar'] + StringLiteralComponent('mdbuser/bar') [30-43] ['mdbuser/bar'] FilterUsingClause [43-61] [using(c2 = 'foo')] BinaryExpression(=) [50-60] [c2 = 'foo'] PathExpression [50-52] [c2] Identifier(c2) [50-52] [c2] - StringLiteral('foo') [55-60] ['foo'] + StringLiteral [55-60] ['foo'] + StringLiteralComponent('foo') [55-60] ['foo'] PathExpression [18-20] [p1] Identifier(p1) [18-20] [p1] -- @@ -41,8 +45,10 @@ CreateRowAccessPolicyStatement [0-69] [create row...using(c1)] Identifier(t1) [21-23] [t1] GrantToClause [24-59] [to 'foo@google...mdbgroup/bar'] GranteeList [27-59] ['foo@google...mdbgroup/bar'] - StringLiteral('foo@google.com') [27-43] ['foo@google.com'] - StringLiteral('mdbgroup/bar') [45-59] ['mdbgroup/bar'] + StringLiteral [27-43] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [27-43] ['foo@google.com'] + StringLiteral [45-59] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [45-59] ['mdbgroup/bar'] FilterUsingClause [59-69] [using(c1)] PathExpression [66-68] [c1] Identifier(c1) [66-68] [c1] @@ -58,8 +64,10 @@ CreateRowAccessPolicyStatement [0-72] [create row...using(c1)] Identifier(t1) [24-26] [t1] GrantToClause [27-62] [to 'foo@google...mdbgroup/bar'] GranteeList [30-62] ['foo@google...mdbgroup/bar'] - StringLiteral('foo@google.com') [30-46] ['foo@google.com'] - StringLiteral('mdbgroup/bar') [48-62] ['mdbgroup/bar'] + StringLiteral [30-46] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [30-46] ['foo@google.com'] + StringLiteral [48-62] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [48-62] ['mdbgroup/bar'] FilterUsingClause [62-72] [using(c1)] PathExpression [69-71] [c1] Identifier(c1) [69-71] [c1] @@ -75,8 +83,10 @@ CreateRowAccessPolicyStatement [0-71] [create row...' using(1)] Identifier(t1) [24-26] [t1] GrantToClause [27-62] [to 'foo@google...mdbgroup/bar'] GranteeList [30-62] ['foo@google...mdbgroup/bar'] - StringLiteral('foo@google.com') [30-46] ['foo@google.com'] - StringLiteral('mdbgroup/bar') [48-62] ['mdbgroup/bar'] + StringLiteral [30-46] ['foo@google.com'] + StringLiteralComponent('foo@google.com') [30-46] ['foo@google.com'] + StringLiteral [48-62] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [48-62] ['mdbgroup/bar'] FilterUsingClause [62-71] [using(1)] IntLiteral(1) [69-70] [1] -- @@ -219,7 +229,8 @@ CreateRowAccessPolicyStatement [0-41] [create row...using(true)] Identifier(t1) [21-23] [t1] GrantToClause [24-29] [to ""] GranteeList [27-29] [""] - StringLiteral("") [27-29] [""] + StringLiteral [27-29] [""] + StringLiteralComponent("") [27-29] [""] FilterUsingClause [29-41] [using(true)] BooleanLiteral(true) [36-40] [true] -- diff --git a/zetasql/parser/testdata/create_schema.test b/zetasql/parser/testdata/create_schema.test index 10bc283b1..f07a3c5f7 100644 --- a/zetasql/parser/testdata/create_schema.test +++ b/zetasql/parser/testdata/create_schema.test @@ -65,10 +65,12 @@ CreateSchemaStatement [0-38] [CREATE SCHEMA...'a',y='b')] OptionsList [25-38] [(x='a',y='b')] OptionsEntry [26-31] [x='a'] Identifier(x) [26-27] [x] - StringLiteral('a') [28-31] ['a'] + StringLiteral [28-31] ['a'] + StringLiteralComponent('a') [28-31] ['a'] OptionsEntry [32-37] [y='b'] Identifier(y) [32-33] [y] - StringLiteral('b') [34-37] ['b'] + StringLiteral [34-37] ['b'] + StringLiteralComponent('b') [34-37] ['b'] -- CREATE SCHEMA foo OPTIONS(x = 'a', y = 'b') @@ -84,7 +86,8 @@ CreateSchemaStatement [0-62] [CREATE SCHEMA...options()] Identifier(foo) [16-19] [foo] Identifier(bar) [20-23] [bar] Collate [32-52] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [40-52] ['unicode:ci'] + StringLiteral [40-52] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [40-52] ['unicode:ci'] OptionsList [60-62] [()] -- CREATE SCHEMA foo.bar DEFAULT COLLATE 'unicode:ci' @@ -97,7 +100,8 @@ CreateSchemaStatement(is_if_not_exists) [0-75] [CREATE SCHEMA...options()] Identifier(foo) [29-32] [foo] Identifier(bar) [33-36] [bar] Collate [45-65] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [53-65] ['unicode:ci'] + StringLiteral [53-65] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [53-65] ['unicode:ci'] OptionsList [73-75] [()] -- CREATE SCHEMA IF NOT EXISTS foo.bar DEFAULT COLLATE 'unicode:ci' @@ -110,7 +114,8 @@ CreateSchemaStatement(is_or_replace) [0-72] [CREATE or...options()] Identifier(foo) [26-29] [foo] Identifier(bar) [30-33] [bar] Collate [42-62] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [50-62] ['unicode:ci'] + StringLiteral [50-62] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [50-62] ['unicode:ci'] OptionsList [70-72] [()] -- CREATE OR REPLACE SCHEMA foo.bar DEFAULT COLLATE 'unicode:ci' @@ -123,7 +128,8 @@ CreateSchemaStatement(is_or_replace, is_if_not_exists) [0-85] [CREATE or...optio Identifier(foo) [39-42] [foo] Identifier(bar) [43-46] [bar] Collate [55-75] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [63-75] ['unicode:ci'] + StringLiteral [63-75] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [63-75] ['unicode:ci'] OptionsList [83-85] [()] -- CREATE OR REPLACE SCHEMA IF NOT EXISTS foo.bar DEFAULT COLLATE 'unicode:ci' @@ -137,14 +143,17 @@ CreateSchemaStatement [0-67] [CREATE SCHEMA...'a',y='b')] PathExpression [14-17] [foo] Identifier(foo) [14-17] [foo] Collate [26-46] [COLLATE 'unicode:ci'] - StringLiteral('unicode:ci') [34-46] ['unicode:ci'] + StringLiteral [34-46] ['unicode:ci'] + StringLiteralComponent('unicode:ci') [34-46] ['unicode:ci'] OptionsList [54-67] [(x='a',y='b')] OptionsEntry [55-60] [x='a'] Identifier(x) [55-56] [x] - StringLiteral('a') [57-60] ['a'] + StringLiteral [57-60] ['a'] + StringLiteralComponent('a') [57-60] ['a'] OptionsEntry [61-66] [y='b'] Identifier(y) [61-62] [y] - StringLiteral('b') [63-66] ['b'] + StringLiteral [63-66] ['b'] + StringLiteralComponent('b') [63-66] ['b'] -- CREATE SCHEMA foo DEFAULT COLLATE 'unicode:ci' OPTIONS(x = 'a', y = 'b') @@ -177,10 +186,12 @@ CreateSchemaStatement [0-57] [CREATE SCHEMA...'a',y='b')] OptionsList [44-57] [(x='a',y='b')] OptionsEntry [45-50] [x='a'] Identifier(x) [45-46] [x] - StringLiteral('a') [47-50] ['a'] + StringLiteral [47-50] ['a'] + StringLiteralComponent('a') [47-50] ['a'] OptionsEntry [51-56] [y='b'] Identifier(y) [51-52] [y] - StringLiteral('b') [53-56] ['b'] + StringLiteral [53-56] ['b'] + StringLiteralComponent('b') [53-56] ['b'] -- CREATE SCHEMA foo DEFAULT COLLATE @a OPTIONS(x = 'a', y = 'b') @@ -197,10 +208,12 @@ CreateSchemaStatement [0-58] [CREATE SCHEMA...'a',y='b')] OptionsList [45-58] [(x='a',y='b')] OptionsEntry [46-51] [x='a'] Identifier(x) [46-47] [x] - StringLiteral('a') [48-51] ['a'] + StringLiteral [48-51] ['a'] + StringLiteralComponent('a') [48-51] ['a'] OptionsEntry [52-57] [y='b'] Identifier(y) [52-53] [y] - StringLiteral('b') [54-57] ['b'] + StringLiteral [54-57] ['b'] + StringLiteralComponent('b') [54-57] ['b'] -- CREATE SCHEMA foo DEFAULT COLLATE @@a OPTIONS(x = 'a', y = 'b') @@ -216,10 +229,12 @@ CreateSchemaStatement [0-56] [CREATE SCHEMA...'a',y='b')] OptionsList [43-56] [(x='a',y='b')] OptionsEntry [44-49] [x='a'] Identifier(x) [44-45] [x] - StringLiteral('a') [46-49] ['a'] + StringLiteral [46-49] ['a'] + StringLiteralComponent('a') [46-49] ['a'] OptionsEntry [50-55] [y='b'] Identifier(y) [50-51] [y] - StringLiteral('b') [52-55] ['b'] + StringLiteral [52-55] ['b'] + StringLiteralComponent('b') [52-55] ['b'] -- CREATE SCHEMA foo DEFAULT COLLATE ? OPTIONS(x = 'a', y = 'b') @@ -237,4 +252,3 @@ CREATE SCHEMA foo OPTIONS(a="b", c="d") DEFAULT COLLATE 'unicode:ci'; ERROR: Syntax error: Expected end of input but got keyword DEFAULT [at 1:41] CREATE SCHEMA foo OPTIONS(a="b", c="d") DEFAULT COLLATE 'unicode:ci'; ^ -== diff --git a/zetasql/parser/testdata/create_sql_function.test b/zetasql/parser/testdata/create_sql_function.test index c1fa6112f..37561657a 100644 --- a/zetasql/parser/testdata/create_sql_function.test +++ b/zetasql/parser/testdata/create_sql_function.test @@ -439,7 +439,8 @@ CreateFunctionStatement [0-114] [create function...ons', 99) )] PathExpression [89-94] [int32] Identifier(int32) [89-94] [int32] StructConstructorArg [96-106] ['balloons'] - StringLiteral('balloons') [96-106] ['balloons'] + StringLiteral [96-106] ['balloons'] + StringLiteralComponent('balloons') [96-106] ['balloons'] StructConstructorArg [108-110] [99] IntLiteral(99) [108-110] [99] -- @@ -472,7 +473,8 @@ CreateFunctionStatement [0-87] [create function...', true) )] Identifier(boolean) [56-63] [boolean] SqlFunctionBody [70-87] [( ('abc', true) )] StructConstructorWithParens [72-85] [('abc', true)] - StringLiteral('abc') [73-78] ['abc'] + StringLiteral [73-78] ['abc'] + StringLiteralComponent('abc') [73-78] ['abc'] BooleanLiteral(true) [80-84] [true] -- CREATE FUNCTION myfunc() @@ -495,7 +497,8 @@ CreateFunctionStatement [0-66] [create function...llo world' )] PathExpression [37-43] [string] Identifier(string) [37-43] [string] SqlFunctionBody [49-66] [( 'hello world' )] - StringLiteral('hello world') [51-64] ['hello world'] + StringLiteral [51-64] ['hello world'] + StringLiteralComponent('hello world') [51-64] ['hello world'] -- CREATE FUNCTION myfunc() RETURNS string AS ( @@ -522,7 +525,8 @@ CreateFunctionStatement [0-73] [create function...llo world' )] PathExpression [44-50] [string] Identifier(string) [44-50] [string] SqlFunctionBody [56-73] [( 'hello world' )] - StringLiteral('hello world') [58-71] ['hello world'] + StringLiteral [58-71] ['hello world'] + StringLiteralComponent('hello world') [58-71] ['hello world'] -- CREATE FUNCTION myfunc(string) RETURNS string AS ( @@ -563,7 +567,8 @@ CreateFunctionStatement [0-99] [create function...llo world' )] PathExpression [70-76] [string] Identifier(string) [70-76] [string] SqlFunctionBody [82-99] [( 'hello world' )] - StringLiteral('hello world') [84-97] ['hello world'] + StringLiteral [84-97] ['hello world'] + StringLiteralComponent('hello world') [84-97] ['hello world'] -- CREATE FUNCTION myfunc(string, s string, int32, i int32) RETURNS string AS ( diff --git a/zetasql/parser/testdata/create_table_as_select.test b/zetasql/parser/testdata/create_table_as_select.test index dc4f51e91..666b18a3b 100644 --- a/zetasql/parser/testdata/create_table_as_select.test +++ b/zetasql/parser/testdata/create_table_as_select.test @@ -340,7 +340,8 @@ CreateTableStatement(is_temp) [0-104] [create temporary...as select 2] IntLiteral(5) [39-40] [5] OptionsEntry [42-51] [y = 'abc'] Identifier(y) [42-43] [y] - StringLiteral('abc') [46-51] ['abc'] + StringLiteral [46-51] ['abc'] + StringLiteralComponent('abc') [46-51] ['abc'] OptionsEntry [53-63] [z = @param] Identifier(z) [53-54] [z] ParameterExpr [57-63] [@param] @@ -443,7 +444,8 @@ CreateTableStatement [0-54] [create table...as select 2] OptionsList [24-42] [(y='b.c', z=`b.c`)] OptionsEntry [25-32] [y='b.c'] Identifier(y) [25-26] [y] - StringLiteral('b.c') [27-32] ['b.c'] + StringLiteral [27-32] ['b.c'] + StringLiteralComponent('b.c') [27-32] ['b.c'] OptionsEntry [34-41] [z=`b.c`] Identifier(z) [34-35] [z] PathExpression [36-41] [`b.c`] @@ -697,7 +699,8 @@ CreateTableStatement [0-57] [create table...a, 'hi' b] Alias [48-49] [a] Identifier(a) [48-49] [a] SelectColumn [51-57] ['hi' b] - StringLiteral('hi') [51-55] ['hi'] + StringLiteral [51-55] ['hi'] + StringLiteralComponent('hi') [51-55] ['hi'] Alias [56-57] [b] Identifier(b) [56-57] [b] -- @@ -874,7 +877,8 @@ CreateTableStatement [0-72] [create table...a, 'hi' b] Alias [63-64] [a] Identifier(a) [63-64] [a] SelectColumn [66-72] ['hi' b] - StringLiteral('hi') [66-70] ['hi'] + StringLiteral [66-70] ['hi'] + StringLiteralComponent('hi') [66-70] ['hi'] Alias [71-72] [b] Identifier(b) [71-72] [b] -- @@ -950,7 +954,8 @@ CreateTableStatement [0-70] [create table...a, 'hi' b] Alias [61-62] [a] Identifier(a) [61-62] [a] SelectColumn [64-70] ['hi' b] - StringLiteral('hi') [64-68] ['hi'] + StringLiteral [64-68] ['hi'] + StringLiteralComponent('hi') [64-68] ['hi'] Alias [69-70] [b] Identifier(b) [69-70] [b] -- @@ -993,7 +998,8 @@ CreateTableStatement [0-107] [create table...a, 'hi' b] OptionsList [72-85] [(key='value')] OptionsEntry [73-84] [key='value'] Identifier(key) [73-76] [key] - StringLiteral('value') [77-84] ['value'] + StringLiteral [77-84] ['value'] + StringLiteralComponent('value') [77-84] ['value'] Query [89-107] [SELECT 1 a, 'hi' b] Select [89-107] [SELECT 1 a, 'hi' b] SelectList [96-107] [1 a, 'hi' b] @@ -1002,7 +1008,8 @@ CreateTableStatement [0-107] [create table...a, 'hi' b] Alias [98-99] [a] Identifier(a) [98-99] [a] SelectColumn [101-107] ['hi' b] - StringLiteral('hi') [101-105] ['hi'] + StringLiteral [101-105] ['hi'] + StringLiteralComponent('hi') [101-105] ['hi'] Alias [106-107] [b] Identifier(b) [106-107] [b] -- @@ -1095,7 +1102,8 @@ CreateTableStatement [0-60] [create table..., 'foo' b,] Alias [49-50] [a] Identifier(a) [49-50] [a] SelectColumn [52-59] ['foo' b] - StringLiteral('foo') [52-57] ['foo'] + StringLiteral [52-57] ['foo'] + StringLiteralComponent('foo') [52-57] ['foo'] Alias [58-59] [b] Identifier(b) [58-59] [b] -- @@ -1140,7 +1148,8 @@ CreateTableStatement [0-89] [create table...a, 'foo' b] Alias [79-80] [a] Identifier(a) [79-80] [a] SelectColumn [82-89] ['foo' b] - StringLiteral('foo') [82-87] ['foo'] + StringLiteral [82-87] ['foo'] + StringLiteralComponent('foo') [82-87] ['foo'] Alias [88-89] [b] Identifier(b) [88-89] [b] -- @@ -1204,7 +1213,8 @@ CreateTableStatement [0-110] [create table...a, 'foo' b] Alias [100-101] [a] Identifier(a) [100-101] [a] SelectColumn [103-110] ['foo' b] - StringLiteral('foo') [103-108] ['foo'] + StringLiteral [103-108] ['foo'] + StringLiteralComponent('foo') [103-108] ['foo'] Alias [109-110] [b] Identifier(b) [109-110] [b] -- @@ -1260,7 +1270,8 @@ CreateTableStatement [0-113] [create table...a, 'foo' b] Alias [103-104] [a] Identifier(a) [103-104] [a] SelectColumn [106-113] ['foo' b] - StringLiteral('foo') [106-111] ['foo'] + StringLiteral [106-111] ['foo'] + StringLiteralComponent('foo') [106-111] ['foo'] Alias [112-113] [b] Identifier(b) [112-113] [b] -- @@ -1317,7 +1328,8 @@ CreateTableStatement [0-123] [create table...a, 'foo' b] Alias [113-114] [a] Identifier(a) [113-114] [a] SelectColumn [116-123] ['foo' b] - StringLiteral('foo') [116-121] ['foo'] + StringLiteral [116-121] ['foo'] + StringLiteralComponent('foo') [116-121] ['foo'] Alias [122-123] [b] Identifier(b) [122-123] [b] -- @@ -1381,7 +1393,8 @@ CreateTableStatement [0-111] [create table...a, 'foo' b] Alias [101-102] [a] Identifier(a) [101-102] [a] SelectColumn [104-111] ['foo' b] - StringLiteral('foo') [104-109] ['foo'] + StringLiteral [104-109] ['foo'] + StringLiteralComponent('foo') [104-109] ['foo'] Alias [110-111] [b] Identifier(b) [110-111] [b] -- diff --git a/zetasql/parser/testdata/create_view.test b/zetasql/parser/testdata/create_view.test index 881ca81f6..accbc8e80 100644 --- a/zetasql/parser/testdata/create_view.test +++ b/zetasql/parser/testdata/create_view.test @@ -332,7 +332,8 @@ CreateViewStatement(is_temp) [0-104] [create temporary...as select 2] IntLiteral(5) [39-40] [5] OptionsEntry [42-51] [y = 'abc'] Identifier(y) [42-43] [y] - StringLiteral('abc') [46-51] ['abc'] + StringLiteral [46-51] ['abc'] + StringLiteralComponent('abc') [46-51] ['abc'] OptionsEntry [53-63] [z = @param] Identifier(z) [53-54] [z] ParameterExpr [57-63] [@param] @@ -447,7 +448,8 @@ CreateViewStatement [0-53] [create view...as select 2] OptionsList [23-41] [(y='b.c', z=`b.c`)] OptionsEntry [24-31] [y='b.c'] Identifier(y) [24-25] [y] - StringLiteral('b.c') [26-31] ['b.c'] + StringLiteral [26-31] ['b.c'] + StringLiteralComponent('b.c') [26-31] ['b.c'] OptionsEntry [33-40] [z=`b.c`] Identifier(z) [33-34] [z] PathExpression [35-40] [`b.c`] @@ -900,7 +902,8 @@ CreateViewStatement [0-75] [create view...select 1 a, 2 b] OptionsList [26-51] [(description="test view")] OptionsEntry [27-50] [description="test view"] Identifier(description) [27-38] [description] - StringLiteral("test view") [39-50] ["test view"] + StringLiteral [39-50] ["test view"] + StringLiteralComponent("test view") [39-50] ["test view"] ColumnWithOptions [53-55] [b2] Identifier(b2) [53-55] [b2] Query [60-75] [select 1 a, 2 b] @@ -931,7 +934,8 @@ CreateMaterializedViewStatement [0-87] [create materializ...t 1 a, 2 b] OptionsList [38-63] [(description="test view")] OptionsEntry [39-62] [description="test view"] Identifier(description) [39-50] [description] - StringLiteral("test view") [51-62] ["test view"] + StringLiteral [51-62] ["test view"] + StringLiteralComponent("test view") [51-62] ["test view"] ColumnWithOptions [65-67] [b2] Identifier(b2) [65-67] [b2] Query [72-87] [select 1 a, 2 b] @@ -962,7 +966,8 @@ CreateApproxViewStatement [0-81] [create approx...elect 1 a, 2 b] OptionsList [32-57] [(description="test view")] OptionsEntry [33-56] [description="test view"] Identifier(description) [33-44] [description] - StringLiteral("test view") [45-56] ["test view"] + StringLiteral [45-56] ["test view"] + StringLiteralComponent("test view") [45-56] ["test view"] ColumnWithOptions [59-61] [b2] Identifier(b2) [59-61] [b2] Query [66-81] [select 1 a, 2 b] @@ -998,7 +1003,8 @@ CreateViewStatement [0-85] [create view...select 1 a, 2 b] OptionsList [26-51] [(description="test view")] OptionsEntry [27-50] [description="test view"] Identifier(description) [27-38] [description] - StringLiteral("test view") [39-50] ["test view"] + StringLiteral [39-50] ["test view"] + StringLiteralComponent("test view") [39-50] ["test view"] ColumnWithOptions [53-65] [b2 OPTIONS()] Identifier(b2) [53-55] [b2] OptionsList [63-65] [()] @@ -1030,7 +1036,8 @@ CreateMaterializedViewStatement [0-97] [create materializ...t 1 a, 2 b] OptionsList [38-63] [(description="test view")] OptionsEntry [39-62] [description="test view"] Identifier(description) [39-50] [description] - StringLiteral("test view") [51-62] ["test view"] + StringLiteral [51-62] ["test view"] + StringLiteralComponent("test view") [51-62] ["test view"] ColumnWithOptions [65-77] [b2 OPTIONS()] Identifier(b2) [65-67] [b2] OptionsList [75-77] [()] @@ -1062,7 +1069,8 @@ CreateApproxViewStatement [0-91] [create approx...elect 1 a, 2 b] OptionsList [32-57] [(description="test view")] OptionsEntry [33-56] [description="test view"] Identifier(description) [33-44] [description] - StringLiteral("test view") [45-56] ["test view"] + StringLiteral [45-56] ["test view"] + StringLiteralComponent("test view") [45-56] ["test view"] ColumnWithOptions [59-71] [b2 OPTIONS()] Identifier(b2) [59-61] [b2] OptionsList [69-71] [()] diff --git a/zetasql/parser/testdata/define_table.test b/zetasql/parser/testdata/define_table.test index ac365ce91..a43d68dd4 100644 --- a/zetasql/parser/testdata/define_table.test +++ b/zetasql/parser/testdata/define_table.test @@ -9,7 +9,8 @@ DefineTableStatement [0-40] [define table....4,d=true)] IntLiteral(1) [19-20] [1] OptionsEntry [21-26] [b="a"] Identifier(b) [21-22] [b] - StringLiteral("a") [23-26] ["a"] + StringLiteral [23-26] ["a"] + StringLiteralComponent("a") [23-26] ["a"] OptionsEntry [27-32] [c=1.4] Identifier(c) [27-28] [c] FloatLiteral(1.4) [29-32] [1.4] @@ -31,11 +32,13 @@ DefineTableStatement [0-75] [define table...=@@sysvar)] OptionsList [25-75] [(x=''' foo...=@@sysvar)] OptionsEntry [26-38] [x=''' foo'''] Identifier(x) [26-27] [x] - StringLiteral(''' + StringLiteral [28-38] [''' foo'''] + StringLiteralComponent(''' foo''') [28-38] [''' foo'''] OptionsEntry [39-53] [y="2011-10-22"] Identifier(y) [39-40] [y] - StringLiteral("2011-10-22") [41-53] ["2011-10-22"] + StringLiteral [41-53] ["2011-10-22"] + StringLiteralComponent("2011-10-22") [41-53] ["2011-10-22"] OptionsEntry [54-62] [z=@param] Identifier(z) [54-55] [z] ParameterExpr [56-62] [@param] @@ -172,10 +175,12 @@ DefineTableStatement [0-48] [define table...hash="HASH")] OptionsList [16-48] [(PROTO="PROTO...hash="HASH")] OptionsEntry [17-35] [PROTO="PROTO"] Identifier(`PROTO`) [17-22] [PROTO] - StringLiteral("PROTO") [23-35] ["PROTO"] + StringLiteral [23-35] ["PROTO"] + StringLiteralComponent("PROTO") [23-35] ["PROTO"] OptionsEntry [36-47] [hash="HASH"] Identifier(`hash`) [36-40] [hash] - StringLiteral("HASH") [41-47] ["HASH"] + StringLiteral [41-47] ["HASH"] + StringLiteralComponent("HASH") [41-47] ["HASH"] -- DEFINE TABLE t1(`PROTO` = "PROTO", `hash` = "HASH") == diff --git a/zetasql/parser/testdata/dml_insert.test b/zetasql/parser/testdata/dml_insert.test index e90412995..ff1db32e2 100644 --- a/zetasql/parser/testdata/dml_insert.test +++ b/zetasql/parser/testdata/dml_insert.test @@ -478,7 +478,8 @@ InsertStatement(insert_mode=IGNORE) [0-79] [insert or...rt_rows_modified 5] IntLiteral(5) [43-44] [5] IntLiteral(6) [45-46] [6] InsertValuesRow [49-56] [('abc')] - StringLiteral('abc') [50-55] ['abc'] + StringLiteral [50-55] ['abc'] + StringLiteralComponent('abc') [50-55] ['abc'] AssertRowsModified [57-79] [assert_rows_modified 5] IntLiteral(5) [78-79] [5] -- @@ -499,7 +500,8 @@ InsertStatement(insert_mode=IGNORE) [0-71] [insert or...rt_rows_modified 5] IntLiteral(5) [35-36] [5] IntLiteral(6) [37-38] [6] InsertValuesRow [41-48] [('abc')] - StringLiteral('abc') [42-47] ['abc'] + StringLiteral [42-47] ['abc'] + StringLiteralComponent('abc') [42-47] ['abc'] AssertRowsModified [49-71] [assert_rows_modified 5] IntLiteral(5) [70-71] [5] -- @@ -522,7 +524,8 @@ InsertStatement(insert_mode=IGNORE) [0-82] [insert or...rt_rows_modified 5] IntLiteral(5) [46-47] [5] IntLiteral(6) [48-49] [6] InsertValuesRow [52-59] [('abc')] - StringLiteral('abc') [53-58] ['abc'] + StringLiteral [53-58] ['abc'] + StringLiteralComponent('abc') [53-58] ['abc'] AssertRowsModified [60-82] [assert_rows_modified 5] IntLiteral(5) [81-82] [5] -- @@ -542,7 +545,8 @@ InsertStatement(insert_mode=IGNORE) [0-74] [insert or...rt_rows_modified 5] IntLiteral(5) [38-39] [5] IntLiteral(6) [40-41] [6] InsertValuesRow [44-51] [('abc')] - StringLiteral('abc') [45-50] ['abc'] + StringLiteral [45-50] ['abc'] + StringLiteralComponent('abc') [45-50] ['abc'] AssertRowsModified [52-74] [assert_rows_modified 5] IntLiteral(5) [73-74] [5] -- @@ -566,7 +570,8 @@ InsertStatement(insert_mode=IGNORE) [0-75] [insert or...rt_rows_modified 5] IntLiteral(5) [39-40] [5] IntLiteral(6) [41-42] [6] InsertValuesRow [45-52] [('abc')] - StringLiteral('abc') [46-51] ['abc'] + StringLiteral [46-51] ['abc'] + StringLiteralComponent('abc') [46-51] ['abc'] AssertRowsModified [53-75] [assert_rows_modified 5] IntLiteral(5) [74-75] [5] -- @@ -587,7 +592,8 @@ InsertStatement(insert_mode=IGNORE) [0-67] [insert or...rt_rows_modified 5] IntLiteral(5) [31-32] [5] IntLiteral(6) [33-34] [6] InsertValuesRow [37-44] [('abc')] - StringLiteral('abc') [38-43] ['abc'] + StringLiteral [38-43] ['abc'] + StringLiteralComponent('abc') [38-43] ['abc'] AssertRowsModified [45-67] [assert_rows_modified 5] IntLiteral(5) [66-67] [5] -- @@ -610,7 +616,8 @@ InsertStatement(insert_mode=IGNORE) [0-78] [insert or...rt_rows_modified 5] IntLiteral(5) [42-43] [5] IntLiteral(6) [44-45] [6] InsertValuesRow [48-55] [('abc')] - StringLiteral('abc') [49-54] ['abc'] + StringLiteral [49-54] ['abc'] + StringLiteralComponent('abc') [49-54] ['abc'] AssertRowsModified [56-78] [assert_rows_modified 5] IntLiteral(5) [77-78] [5] -- @@ -630,7 +637,8 @@ InsertStatement(insert_mode=IGNORE) [0-70] [insert or...rt_rows_modified 5] IntLiteral(5) [34-35] [5] IntLiteral(6) [36-37] [6] InsertValuesRow [40-47] [('abc')] - StringLiteral('abc') [41-46] ['abc'] + StringLiteral [41-46] ['abc'] + StringLiteralComponent('abc') [41-46] ['abc'] AssertRowsModified [48-70] [assert_rows_modified 5] IntLiteral(5) [69-70] [5] -- @@ -654,7 +662,8 @@ InsertStatement(insert_mode=IGNORE) [0-77] [insert ignore...ws_modified 5] IntLiteral(5) [41-42] [5] IntLiteral(6) [43-44] [6] InsertValuesRow [47-54] [('abc')] - StringLiteral('abc') [48-53] ['abc'] + StringLiteral [48-53] ['abc'] + StringLiteralComponent('abc') [48-53] ['abc'] AssertRowsModified [55-77] [assert_rows_modified 5] IntLiteral(5) [76-77] [5] -- @@ -675,7 +684,8 @@ InsertStatement(insert_mode=IGNORE) [0-69] [insert ignore...ws_modified 5] IntLiteral(5) [33-34] [5] IntLiteral(6) [35-36] [6] InsertValuesRow [39-46] [('abc')] - StringLiteral('abc') [40-45] ['abc'] + StringLiteral [40-45] ['abc'] + StringLiteralComponent('abc') [40-45] ['abc'] AssertRowsModified [47-69] [assert_rows_modified 5] IntLiteral(5) [68-69] [5] -- @@ -698,7 +708,8 @@ InsertStatement(insert_mode=IGNORE) [0-80] [insert ignore...ws_modified 5] IntLiteral(5) [44-45] [5] IntLiteral(6) [46-47] [6] InsertValuesRow [50-57] [('abc')] - StringLiteral('abc') [51-56] ['abc'] + StringLiteral [51-56] ['abc'] + StringLiteralComponent('abc') [51-56] ['abc'] AssertRowsModified [58-80] [assert_rows_modified 5] IntLiteral(5) [79-80] [5] -- @@ -718,7 +729,8 @@ InsertStatement(insert_mode=IGNORE) [0-72] [insert ignore...ws_modified 5] IntLiteral(5) [36-37] [5] IntLiteral(6) [38-39] [6] InsertValuesRow [42-49] [('abc')] - StringLiteral('abc') [43-48] ['abc'] + StringLiteral [43-48] ['abc'] + StringLiteralComponent('abc') [43-48] ['abc'] AssertRowsModified [50-72] [assert_rows_modified 5] IntLiteral(5) [71-72] [5] -- @@ -742,7 +754,8 @@ InsertStatement(insert_mode=IGNORE) [0-73] [insert ignore...ws_modified 5] IntLiteral(5) [37-38] [5] IntLiteral(6) [39-40] [6] InsertValuesRow [43-50] [('abc')] - StringLiteral('abc') [44-49] ['abc'] + StringLiteral [44-49] ['abc'] + StringLiteralComponent('abc') [44-49] ['abc'] AssertRowsModified [51-73] [assert_rows_modified 5] IntLiteral(5) [72-73] [5] -- @@ -763,7 +776,8 @@ InsertStatement(insert_mode=IGNORE) [0-65] [insert ignore...ws_modified 5] IntLiteral(5) [29-30] [5] IntLiteral(6) [31-32] [6] InsertValuesRow [35-42] [('abc')] - StringLiteral('abc') [36-41] ['abc'] + StringLiteral [36-41] ['abc'] + StringLiteralComponent('abc') [36-41] ['abc'] AssertRowsModified [43-65] [assert_rows_modified 5] IntLiteral(5) [64-65] [5] -- @@ -786,7 +800,8 @@ InsertStatement(insert_mode=IGNORE) [0-76] [insert ignore...ws_modified 5] IntLiteral(5) [40-41] [5] IntLiteral(6) [42-43] [6] InsertValuesRow [46-53] [('abc')] - StringLiteral('abc') [47-52] ['abc'] + StringLiteral [47-52] ['abc'] + StringLiteralComponent('abc') [47-52] ['abc'] AssertRowsModified [54-76] [assert_rows_modified 5] IntLiteral(5) [75-76] [5] -- @@ -806,7 +821,8 @@ InsertStatement(insert_mode=IGNORE) [0-68] [insert ignore...ws_modified 5] IntLiteral(5) [32-33] [5] IntLiteral(6) [34-35] [6] InsertValuesRow [38-45] [('abc')] - StringLiteral('abc') [39-44] ['abc'] + StringLiteral [39-44] ['abc'] + StringLiteralComponent('abc') [39-44] ['abc'] AssertRowsModified [46-68] [assert_rows_modified 5] IntLiteral(5) [67-68] [5] -- diff --git a/zetasql/parser/testdata/drop.test b/zetasql/parser/testdata/drop.test index 0b09607b6..3bbbbb090 100644 --- a/zetasql/parser/testdata/drop.test +++ b/zetasql/parser/testdata/drop.test @@ -199,7 +199,7 @@ drop SNAPSHOT TABLE # The follow object types are syntactically valid (no syntax errors), # although AGGREGATE FUNCTION and TABLE FUNCTION are not currently supported. # Other object types are not valid. -drop {{AGGREGATE FUNCTION|CONSTANT|DATABASE|EXTERNAL TABLE|FUNCTION|INDEX|MATERIALIZED VIEW|MODEL|PROCEDURE|TABLE|TABLE FUNCTION|VIEW|EXTERNAL TABLE FUNCTION|SCHEMA|SNAPSHOT TABLE}} foo {{|RESTRICT|Restrict|CASCADE|INVALID_DROP_MODE|DROP_MODE_UNSPECIFIED}}; +drop {{AGGREGATE FUNCTION|CONSTANT|DATABASE|EXTERNAL TABLE|FUNCTION|INDEX|MATERIALIZED VIEW|MODEL|PROCEDURE|TABLE|TABLE FUNCTION|VIEW|EXTERNAL TABLE FUNCTION|SCHEMA|EXTERNAL SCHEMA|SNAPSHOT TABLE}} foo {{|RESTRICT|Restrict|CASCADE|INVALID_DROP_MODE|DROP_MODE_UNSPECIFIED}}; -- ALTERNATION GROUP: AGGREGATE FUNCTION, -- @@ -735,6 +735,44 @@ ERROR: Syntax error: Expected end of input but got identifier "DROP_MODE_UNSPECI drop SCHEMA foo DROP_MODE_UNSPECIFIED; ^ -- +ALTERNATION GROUP: EXTERNAL SCHEMA, +-- +DropStatement EXTERNAL SCHEMA [0-24] [drop EXTERNAL SCHEMA foo] + PathExpression [21-24] [foo] + Identifier(foo) [21-24] [foo] +-- +DROP EXTERNAL SCHEMA foo +-- +ALTERNATION GROUP: EXTERNAL SCHEMA,RESTRICT +-- +ERROR: Syntax error: 'RESTRICT' is not supported for DROP EXTERNAL SCHEMA [at 1:26] +drop EXTERNAL SCHEMA foo RESTRICT; + ^ +-- +ALTERNATION GROUP: EXTERNAL SCHEMA,Restrict +-- +ERROR: Syntax error: 'RESTRICT' is not supported for DROP EXTERNAL SCHEMA [at 1:26] +drop EXTERNAL SCHEMA foo Restrict; + ^ +-- +ALTERNATION GROUP: EXTERNAL SCHEMA,CASCADE +-- +ERROR: Syntax error: 'CASCADE' is not supported for DROP EXTERNAL SCHEMA [at 1:26] +drop EXTERNAL SCHEMA foo CASCADE; + ^ +-- +ALTERNATION GROUP: EXTERNAL SCHEMA,INVALID_DROP_MODE +-- +ERROR: Syntax error: Expected end of input but got identifier "INVALID_DROP_MODE" [at 1:26] +drop EXTERNAL SCHEMA foo INVALID_DROP_MODE; + ^ +-- +ALTERNATION GROUP: EXTERNAL SCHEMA,DROP_MODE_UNSPECIFIED +-- +ERROR: Syntax error: Expected end of input but got identifier "DROP_MODE_UNSPECIFIED" [at 1:26] +drop EXTERNAL SCHEMA foo DROP_MODE_UNSPECIFIED; + ^ +-- ALTERNATION GROUP: SNAPSHOT TABLE, -- DropSnapshotTableStatement [0-23] [drop SNAPSHOT TABLE foo] diff --git a/zetasql/parser/testdata/exception_handler.test b/zetasql/parser/testdata/exception_handler.test index 2a7ea3064..4862dabfc 100644 --- a/zetasql/parser/testdata/exception_handler.test +++ b/zetasql/parser/testdata/exception_handler.test @@ -25,10 +25,10 @@ BEGIN EXCEPTION WHEN ERROR THEN END; -- -Script [0-49] [BEGIN SELECT...THEN END;] - StatementList [0-49] [BEGIN SELECT...THEN END;] +Script [0-48] [BEGIN SELECT...THEN END;] + StatementList [0-48] [BEGIN SELECT...THEN END;] BeginEndBlock [0-47] [BEGIN SELECT...RROR THEN END] - StatementList [8-18] [SELECT 1;] + StatementList [8-17] [SELECT 1;] QueryStatement [8-16] [SELECT 1] Query [8-16] [SELECT 1] Select [8-16] [SELECT 1] @@ -52,13 +52,13 @@ END # Empty block, non-empty handler BEGIN EXCEPTION WHEN ERROR THEN SELECT 1; END; -- -Script [0-47] [BEGIN EXCEPTION...LECT 1; END;] - StatementList [0-47] [BEGIN EXCEPTION...LECT 1; END;] +Script [0-46] [BEGIN EXCEPTION...LECT 1; END;] + StatementList [0-46] [BEGIN EXCEPTION...LECT 1; END;] BeginEndBlock [0-45] [BEGIN EXCEPTION...ELECT 1; END] StatementList [5-5] [] - ExceptionHandlerList [6-42] [EXCEPTION...SELECT 1;] - ExceptionHandler [16-42] [WHEN ERROR THEN SELECT 1;] - StatementList [32-42] [SELECT 1;] + ExceptionHandlerList [6-41] [EXCEPTION...SELECT 1;] + ExceptionHandler [16-41] [WHEN ERROR THEN SELECT 1;] + StatementList [32-41] [SELECT 1;] QueryStatement [32-40] [SELECT 1] Query [32-40] [SELECT 1] Select [32-40] [SELECT 1] @@ -85,10 +85,10 @@ EXCEPTION WHEN ERROR THEN SELECT 4; END; -- -Script [0-85] [BEGIN SELECT...ELECT 4; END;] - StatementList [0-85] [BEGIN SELECT...ELECT 4; END;] +Script [0-84] [BEGIN SELECT...ELECT 4; END;] + StatementList [0-84] [BEGIN SELECT...ELECT 4; END;] BeginEndBlock [0-83] [BEGIN SELECT...SELECT 4; END] - StatementList [8-30] [SELECT 1; SELECT 2;] + StatementList [8-29] [SELECT 1; SELECT 2;] QueryStatement [8-16] [SELECT 1] Query [8-16] [SELECT 1] Select [8-16] [SELECT 1] @@ -101,9 +101,9 @@ Script [0-85] [BEGIN SELECT...ELECT 4; END;] SelectList [27-28] [2] SelectColumn [27-28] [2] IntLiteral(2) [27-28] [2] - ExceptionHandlerList [30-80] [EXCEPTION...SELECT 4;] - ExceptionHandler [40-80] [WHEN ERROR...SELECT 4;] - StatementList [58-80] [SELECT 3; SELECT 4;] + ExceptionHandlerList [30-79] [EXCEPTION...SELECT 4;] + ExceptionHandler [40-79] [WHEN ERROR...SELECT 4;] + StatementList [58-79] [SELECT 3; SELECT 4;] QueryStatement [58-66] [SELECT 3] Query [58-66] [SELECT 3] Select [58-66] [SELECT 3] @@ -146,10 +146,10 @@ EXCEPTION WHEN ERROR THEN SELECT 4; END; -- -Script [0-101] [BEGIN SELECT...ELECT 4; END;] - StatementList [0-101] [BEGIN SELECT...ELECT 4; END;] +Script [0-100] [BEGIN SELECT...ELECT 4; END;] + StatementList [0-100] [BEGIN SELECT...ELECT 4; END;] BeginEndBlock [0-99] [BEGIN SELECT...SELECT 4; END] - StatementList [8-46] [SELECT EXCEPTION...OM Foo.Bar;] + StatementList [8-45] [SELECT EXCEPTION...OM Foo.Bar;] QueryStatement [8-44] [SELECT EXCEPTION...ROM Foo.Bar] Query [8-44] [SELECT EXCEPTION...ROM Foo.Bar] Select [8-44] [SELECT EXCEPTION...ROM Foo.Bar] @@ -165,9 +165,9 @@ Script [0-101] [BEGIN SELECT...ELECT 4; END;] PathExpression [37-44] [Foo.Bar] Identifier(Foo) [37-40] [Foo] Identifier(Bar) [41-44] [Bar] - ExceptionHandlerList [46-96] [EXCEPTION...SELECT 4;] - ExceptionHandler [56-96] [WHEN ERROR...SELECT 4;] - StatementList [74-96] [SELECT 3; SELECT 4;] + ExceptionHandlerList [46-95] [EXCEPTION...SELECT 4;] + ExceptionHandler [56-95] [WHEN ERROR...SELECT 4;] + StatementList [74-95] [SELECT 3; SELECT 4;] QueryStatement [74-82] [SELECT 3] Query [74-82] [SELECT 3] Select [74-82] [SELECT 3] @@ -211,21 +211,21 @@ EXCEPTION WHEN ERROR THEN END; END; -- -Script [0-122] [BEGIN SELECT...END; END;] - StatementList [0-122] [BEGIN SELECT...END; END;] +Script [0-121] [BEGIN SELECT...END; END;] + StatementList [0-121] [BEGIN SELECT...END; END;] BeginEndBlock [0-120] [BEGIN SELECT...END; END] - StatementList [8-18] [SELECT 1;] + StatementList [8-17] [SELECT 1;] QueryStatement [8-16] [SELECT 1] Query [8-16] [SELECT 1] Select [8-16] [SELECT 1] SelectList [15-16] [1] SelectColumn [15-16] [1] IntLiteral(1) [15-16] [1] - ExceptionHandlerList [18-117] [EXCEPTION...2; END;] - ExceptionHandler [28-117] [WHEN ERROR...2; END;] - StatementList [46-117] [BEGIN...2; END;] + ExceptionHandlerList [18-116] [EXCEPTION...2; END;] + ExceptionHandler [28-116] [WHEN ERROR...2; END;] + StatementList [46-116] [BEGIN...2; END;] BeginEndBlock [46-115] [BEGIN...SELECT 2; END] - StatementList [56-70] [SELECT foo;] + StatementList [56-67] [SELECT foo;] QueryStatement [56-66] [SELECT foo] Query [56-66] [SELECT foo] Select [56-66] [SELECT foo] @@ -233,9 +233,9 @@ Script [0-122] [BEGIN SELECT...END; END;] SelectColumn [63-66] [foo] PathExpression [63-66] [foo] Identifier(foo) [63-66] [foo] - ExceptionHandlerList [70-112] [EXCEPTION...SELECT 2;] - ExceptionHandler [80-112] [WHEN ERROR THEN SELECT 2;] - StatementList [100-112] [SELECT 2;] + ExceptionHandlerList [70-109] [EXCEPTION...SELECT 2;] + ExceptionHandler [80-109] [WHEN ERROR THEN SELECT 2;] + StatementList [100-109] [SELECT 2;] QueryStatement [100-108] [SELECT 2] Query [100-108] [SELECT 2] Select [100-108] [SELECT 2] diff --git a/zetasql/parser/testdata/execute_immediate.test b/zetasql/parser/testdata/execute_immediate.test index c84821ae5..44eaaaa75 100644 --- a/zetasql/parser/testdata/execute_immediate.test +++ b/zetasql/parser/testdata/execute_immediate.test @@ -18,7 +18,8 @@ EXECUTE IMMEDIATE {{"select 1"|CONCAT("a", "b")|x|@x|@@x}}; ALTERNATION GROUP: "select 1" -- ExecuteImmediateStatement [0-28] [EXECUTE IMMEDIATE "select 1"] - StringLiteral("select 1") [18-28] ["select 1"] + StringLiteral [18-28] ["select 1"] + StringLiteralComponent("select 1") [18-28] ["select 1"] -- EXECUTE IMMEDIATE "select 1" -- @@ -28,8 +29,10 @@ ExecuteImmediateStatement [0-34] [EXECUTE IMMEDIATE...("a", "b")] FunctionCall [18-34] [CONCAT("a", "b")] PathExpression [18-24] [CONCAT] Identifier(CONCAT) [18-24] [CONCAT] - StringLiteral("a") [25-28] ["a"] - StringLiteral("b") [30-33] ["b"] + StringLiteral [25-28] ["a"] + StringLiteralComponent("a") [25-28] ["a"] + StringLiteral [30-33] ["b"] + StringLiteralComponent("b") [30-33] ["b"] -- EXECUTE IMMEDIATE CONCAT("a", "b") -- @@ -103,7 +106,8 @@ ALTERNATION GROUP: "?" USING 0 -- ExecuteImmediateStatement [0-29] [EXECUTE IMMEDIATE "?" USING 0] - StringLiteral("?") [18-21] ["?"] + StringLiteral [18-21] ["?"] + StringLiteralComponent("?") [18-21] ["?"] ExecuteUsingClause [28-29] [0] ExecuteUsingArgument [28-29] [0] IntLiteral(0) [28-29] [0] @@ -113,7 +117,8 @@ EXECUTE IMMEDIATE "?" USING 0 ALTERNATION GROUP: "? ?" USING 0, 1 -- ExecuteImmediateStatement [0-34] [EXECUTE IMMEDIATE...USING 0, 1] - StringLiteral("? ?") [18-23] ["? ?"] + StringLiteral [18-23] ["? ?"] + StringLiteralComponent("? ?") [18-23] ["? ?"] ExecuteUsingClause [30-34] [0, 1] ExecuteUsingArgument [30-31] [0] IntLiteral(0) [30-31] [0] @@ -286,7 +291,8 @@ FROM EXECUTE IMMEDIATE "select ?, @x" USING 0, 5 AS x; -- ExecuteImmediateStatement [0-48] [EXECUTE IMMEDIATE...0, 5 AS x] - StringLiteral("select ?, @x") [18-32] ["select ?, @x"] + StringLiteral [18-32] ["select ?, @x"] + StringLiteralComponent("select ?, @x") [18-32] ["select ?, @x"] ExecuteUsingClause [39-48] [0, 5 AS x] ExecuteUsingArgument [39-40] [0] IntLiteral(0) [39-40] [0] diff --git a/zetasql/parser/testdata/export_data.test b/zetasql/parser/testdata/export_data.test index e6f2af45c..5191a6f1e 100644 --- a/zetasql/parser/testdata/export_data.test +++ b/zetasql/parser/testdata/export_data.test @@ -146,7 +146,8 @@ ExportDataStatement [0-45] [export data...(select 1)] Identifier(b) [22-23] [b] OptionsEntry [25-30] [c='d'] Identifier(c) [25-26] [c] - StringLiteral('d') [27-30] ['d'] + StringLiteral [27-30] ['d'] + StringLiteralComponent('d') [27-30] ['d'] Query [36-44] [select 1] Select [36-44] [select 1] SelectList [43-44] [1] diff --git a/zetasql/parser/testdata/export_model.test b/zetasql/parser/testdata/export_model.test index dfe8ea5b1..6bcc9d624 100644 --- a/zetasql/parser/testdata/export_model.test +++ b/zetasql/parser/testdata/export_model.test @@ -12,7 +12,8 @@ ExportModelStatement [0-80] [export model...destination")] OptionsList [61-80] [(uri="destination")] OptionsEntry [62-79] [uri="destination"] Identifier(uri) [62-65] [uri] - StringLiteral("destination") [66-79] ["destination"] + StringLiteral [66-79] ["destination"] + StringLiteralComponent("destination") [66-79] ["destination"] -- EXPORT MODEL model.name WITH CONNECTION connection_id OPTIONS(uri = "destination") == @@ -27,7 +28,8 @@ ExportModelStatement [0-52] [export model...destination")] OptionsList [31-52] [(uri = "destination")] OptionsEntry [32-51] [uri = "destination"] Identifier(uri) [32-35] [uri] - StringLiteral("destination") [38-51] ["destination"] + StringLiteral [38-51] ["destination"] + StringLiteralComponent("destination") [38-51] ["destination"] -- EXPORT MODEL model.name OPTIONS(uri = "destination") == diff --git a/zetasql/parser/testdata/field_access.test b/zetasql/parser/testdata/field_access.test index 5840b44a6..ddcb25988 100644 --- a/zetasql/parser/testdata/field_access.test +++ b/zetasql/parser/testdata/field_access.test @@ -3438,9 +3438,11 @@ QueryStatement [0-44] [select 1,2...Customer where 5] SelectColumn [9-10] [2] IntLiteral(2) [9-10] [2] SelectColumn [11-16] ['abc'] - StringLiteral('abc') [11-16] ['abc'] + StringLiteral [11-16] ['abc'] + StringLiteralComponent('abc') [11-16] ['abc'] SelectColumn [17-22] ["def"] - StringLiteral("def") [17-22] ["def"] + StringLiteral [17-22] ["def"] + StringLiteralComponent("def") [17-22] ["def"] FromClause [23-36] [from Customer] TablePathExpression [28-36] [Customer] PathExpression [28-36] [Customer] diff --git a/zetasql/parser/testdata/from.test b/zetasql/parser/testdata/from.test index 7ae23b43a..a1d719859 100644 --- a/zetasql/parser/testdata/from.test +++ b/zetasql/parser/testdata/from.test @@ -1420,7 +1420,8 @@ QueryStatement [0-19] [select 1 a, 'foo' b] Alias [9-10] [a] Identifier(a) [9-10] [a] SelectColumn [12-19] ['foo' b] - StringLiteral('foo') [12-17] ['foo'] + StringLiteral [12-17] ['foo'] + StringLiteralComponent('foo') [12-17] ['foo'] Alias [18-19] [b] Identifier(b) [18-19] [b] -- @@ -1439,7 +1440,8 @@ QueryStatement [0-21] [select 1 a, 'foo' b ,] Alias [9-10] [a] Identifier(a) [9-10] [a] SelectColumn [12-19] ['foo' b] - StringLiteral('foo') [12-17] ['foo'] + StringLiteral [12-17] ['foo'] + StringLiteralComponent('foo') [12-17] ['foo'] Alias [18-19] [b] Identifier(b) [18-19] [b] -- diff --git a/zetasql/parser/testdata/function_call_argument_alias.test b/zetasql/parser/testdata/function_call_argument_alias.test index 2f5fc5be4..78ebed28c 100644 --- a/zetasql/parser/testdata/function_call_argument_alias.test +++ b/zetasql/parser/testdata/function_call_argument_alias.test @@ -177,7 +177,8 @@ QueryStatement [0-33] FunctionCall [7-25] PathExpression [7-11] Identifier(date) [7-11] - StringLiteral('2021-07-04') [12-24] + StringLiteral [12-24] + StringLiteralComponent('2021-07-04') [12-24] Alias [26-33] Identifier(date) [29-33] -- @@ -195,7 +196,8 @@ QueryStatement [0-41] PathExpression [7-11] Identifier(date) [7-11] ExpressionWithAlias [12-32] - StringLiteral('2021-07-04') [12-24] + StringLiteral [12-24] + StringLiteralComponent('2021-07-04') [12-24] Alias [25-32] Identifier(date) [28-32] Alias [34-41] diff --git a/zetasql/parser/testdata/function_call_with_group_rows.test b/zetasql/parser/testdata/function_call_with_group_rows.test index 95efa179a..42802211f 100644 --- a/zetasql/parser/testdata/function_call_with_group_rows.test +++ b/zetasql/parser/testdata/function_call_with_group_rows.test @@ -1110,7 +1110,8 @@ QueryStatement [0-40] [SELECT COUNT...UP_ROWS(1, "a")] TVFArgument [33-34] [1] IntLiteral(1) [33-34] [1] TVFArgument [36-39] ["a"] - StringLiteral("a") [36-39] ["a"] + StringLiteral [36-39] ["a"] + StringLiteralComponent("a") [36-39] ["a"] -- SELECT COUNT(f1) @@ -1400,7 +1401,8 @@ HintedStatement [0-91] [@{hint_statement_...c"} FROM T] IntLiteral(789) [54-57] [789] HintEntry [59-83] [hint_call_string="a b c"] Identifier(hint_call_string) [59-75] [hint_call_string] - StringLiteral("a b c") [76-83] ["a b c"] + StringLiteral [76-83] ["a b c"] + StringLiteralComponent("a b c") [76-83] ["a b c"] FromClause [85-91] [FROM T] TablePathExpression [90-91] [T] PathExpression [90-91] [T] diff --git a/zetasql/parser/testdata/generic_ddl.test b/zetasql/parser/testdata/generic_ddl.test index 4d6289481..4332435e7 100644 --- a/zetasql/parser/testdata/generic_ddl.test +++ b/zetasql/parser/testdata/generic_ddl.test @@ -53,7 +53,9 @@ CreateEntityStatement(is_or_replace, is_if_not_exists) [0-112] [CREATE OR..."val Identifier(c) [81-82] [c] PathExpression [83-84] [d] Identifier(d) [83-84] [d] - JSONLiteral('{"key": "value"}') [89-112] [JSON '{"key": "value"}'] + JSONLiteral [89-112] [JSON '{"key": "value"}'] + StringLiteral [94-112] ['{"key": "value"}'] + StringLiteralComponent('{"key": "value"}') [94-112] ['{"key": "value"}'] -- CREATE OR REPLACE RESERVATION IF NOT EXISTS myProject.myReservation OPTIONS(a = b, c = d) @@ -83,7 +85,8 @@ CreateEntityStatement [0-90] [CREATE EXPLORE...lore { } """] Identifier(c) [48-49] [c] PathExpression [50-51] [d] Identifier(d) [50-51] [d] - StringLiteral(""" + StringLiteral [56-90] [""" explore...explore { } """] + StringLiteralComponent(""" explore: test_explore { } @@ -164,7 +167,9 @@ AlterEntityStatement [0-71] [ALTER RESERVATION...1" : "v"}'] Identifier(myReservation) [28-41] [myReservation] AlterActionList [42-71] [SET AS JSON r'{"k\n1" : "v"}'] SetAsAction [42-71] [SET AS JSON r'{"k\n1" : "v"}'] - JSONLiteral(r'{"k\n1" : "v"}') [49-71] [JSON r'{"k\n1" : "v"}'] + JSONLiteral [49-71] [JSON r'{"k\n1" : "v"}'] + StringLiteral [54-71] [r'{"k\n1" : "v"}'] + StringLiteralComponent(r'{"k\n1" : "v"}') [54-71] [r'{"k\n1" : "v"}'] -- ALTER RESERVATION myProject.myReservation SET AS JSON r'{"k\n1" : "v"}' == @@ -186,7 +191,9 @@ AlterEntityStatement [0-95] [ALTER RESERVATION..."value"}'] PathExpression [59-60] [b] Identifier(b) [59-60] [b] SetAsAction [65-95] [SET AS JSON '{"key": "value"}'] - JSONLiteral('{"key": "value"}') [72-95] [JSON '{"key": "value"}'] + JSONLiteral [72-95] [JSON '{"key": "value"}'] + StringLiteral [77-95] ['{"key": "value"}'] + StringLiteralComponent('{"key": "value"}') [77-95] ['{"key": "value"}'] -- ALTER RESERVATION myProject.myReservation SET OPTIONS(a = b), SET AS JSON '{"key": "value"}' == @@ -200,7 +207,8 @@ AlterEntityStatement [0-68] [ALTER EXPLORE...t_explore { }"] Identifier(myExplore) [24-33] [myExplore] AlterActionList [34-68] [SET AS "explore...explore { }"] SetAsAction [34-68] [SET AS "explore...explore { }"] - StringLiteral("explore: test_explore { }") [41-68] ["explore: test_explore { }"] + StringLiteral [41-68] ["explore: test_explore { }"] + StringLiteralComponent("explore: test_explore { }") [41-68] ["explore: test_explore { }"] -- ALTER EXPLORE myProject.myExplore SET AS "explore: test_explore { }" == diff --git a/zetasql/parser/testdata/grant_and_revoke.test b/zetasql/parser/testdata/grant_and_revoke.test index e3d038f40..9d4959311 100644 --- a/zetasql/parser/testdata/grant_and_revoke.test +++ b/zetasql/parser/testdata/grant_and_revoke.test @@ -10,7 +10,8 @@ GrantStatement [0-54] [grant select...google.com'] PathExpression [30-33] [foo] Identifier(foo) [30-33] [foo] GranteeList [37-54] ['user@google.com'] - StringLiteral('user@google.com') [37-54] ['user@google.com'] + StringLiteral [37-54] ['user@google.com'] + StringLiteralComponent('user@google.com') [37-54] ['user@google.com'] -- GRANT `select`, `update` ON table foo TO 'user@google.com' == @@ -28,8 +29,10 @@ GrantStatement [0-83] [grant all...mdbuser/bar2'] SystemVariableExpr [44-51] [@@user2] PathExpression [46-51] [user2] Identifier(user2) [46-51] [user2] - StringLiteral('mdbuser/bar1') [53-67] ['mdbuser/bar1'] - StringLiteral('mdbuser/bar2') [69-83] ['mdbuser/bar2'] + StringLiteral [53-67] ['mdbuser/bar1'] + StringLiteralComponent('mdbuser/bar1') [53-67] ['mdbuser/bar1'] + StringLiteral [69-83] ['mdbuser/bar2'] + StringLiteralComponent('mdbuser/bar2') [69-83] ['mdbuser/bar2'] -- GRANT ALL PRIVILEGES ON view foo TO @user1, @@user2, 'mdbuser/bar1', 'mdbuser/bar2' @@ -46,7 +49,8 @@ GrantStatement [0-58] [grant select...google.com'] PathExpression [34-37] [foo] Identifier(foo) [34-37] [foo] GranteeList [41-58] ['user@google.com'] - StringLiteral('user@google.com') [41-58] ['user@google.com'] + StringLiteral [41-58] ['user@google.com'] + StringLiteralComponent('user@google.com') [41-58] ['user@google.com'] -- GRANT `select` ON materialized view foo TO 'user@google.com' == @@ -62,7 +66,8 @@ RevokeStatement [0-61] [revoke select...google.com'] PathExpression [35-38] [foo] Identifier(foo) [35-38] [foo] GranteeList [44-61] ['user@google.com'] - StringLiteral('user@google.com') [44-61] ['user@google.com'] + StringLiteral [44-61] ['user@google.com'] + StringLiteralComponent('user@google.com') [44-61] ['user@google.com'] -- REVOKE `select` ON materialized view foo FROM 'user@google.com' == @@ -75,7 +80,8 @@ GrantStatement [0-35] [grant all...foo to 'bar'] Identifier(datascape) [13-22] [datascape] Identifier(foo) [23-26] [foo] GranteeList [30-35] ['bar'] - StringLiteral('bar') [30-35] ['bar'] + StringLiteral [30-35] ['bar'] + StringLiteralComponent('bar') [30-35] ['bar'] -- GRANT ALL PRIVILEGES ON datascape.foo TO 'bar' == @@ -103,7 +109,8 @@ GrantStatement [0-77] [grant select...mdbgroup/bar'] PathExpression [56-59] [foo] Identifier(foo) [56-59] [foo] GranteeList [63-77] ['mdbgroup/bar'] - StringLiteral('mdbgroup/bar') [63-77] ['mdbgroup/bar'] + StringLiteral [63-77] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [63-77] ['mdbgroup/bar'] -- GRANT `select`, insert(col1, col2, col3), `update`(col2) ON foo TO 'mdbgroup/bar' == @@ -119,7 +126,8 @@ GrantStatement [0-66] [grant execute...google.com'] Identifier(datascape) [24-33] [datascape] Identifier(script_foo) [34-44] [script_foo] GranteeList [48-66] ['group@google.com'] - StringLiteral('group@google.com') [48-66] ['group@google.com'] + StringLiteral [48-66] ['group@google.com'] + StringLiteralComponent('group@google.com') [48-66] ['group@google.com'] -- GRANT execute ON script datascape.script_foo TO 'group@google.com' == @@ -180,7 +188,8 @@ RevokeStatement [0-28] [revoke all on foo from 'bar'] PathExpression [14-17] [foo] Identifier(foo) [14-17] [foo] GranteeList [23-28] ['bar'] - StringLiteral('bar') [23-28] ['bar'] + StringLiteral [23-28] ['bar'] + StringLiteralComponent('bar') [23-28] ['bar'] -- REVOKE ALL PRIVILEGES ON foo FROM 'bar' == @@ -195,7 +204,8 @@ RevokeStatement [0-45] [revoke delete...mdbuser/bar'] PathExpression [23-26] [foo] Identifier(foo) [23-26] [foo] GranteeList [32-45] ['mdbuser/bar'] - StringLiteral('mdbuser/bar') [32-45] ['mdbuser/bar'] + StringLiteral [32-45] ['mdbuser/bar'] + StringLiteralComponent('mdbuser/bar') [32-45] ['mdbuser/bar'] -- REVOKE delete ON table foo FROM 'mdbuser/bar' == @@ -208,10 +218,12 @@ RevokeStatement [0-71] [revoke all...', @@user4] PathExpression [20-25] [table] Identifier(table) [20-25] [table] GranteeList [31-71] ['mdbuser/user...', @@user4] - StringLiteral('mdbuser/user') [31-45] ['mdbuser/user'] + StringLiteral [31-45] ['mdbuser/user'] + StringLiteralComponent('mdbuser/user') [31-45] ['mdbuser/user'] ParameterExpr [47-53] [@user2] Identifier(user2) [48-53] [user2] - StringLiteral('user3') [55-62] ['user3'] + StringLiteral [55-62] ['user3'] + StringLiteralComponent('user3') [55-62] ['user3'] SystemVariableExpr [64-71] [@@user4] PathExpression [66-71] [user4] Identifier(user4) [66-71] [user4] @@ -234,7 +246,8 @@ RevokeStatement [0-58] [revoke delete...mdbgroup/bar'] PathExpression [35-38] [foo] Identifier(foo) [35-38] [foo] GranteeList [44-58] ['mdbgroup/bar'] - StringLiteral('mdbgroup/bar') [44-58] ['mdbgroup/bar'] + StringLiteral [44-58] ['mdbgroup/bar'] + StringLiteralComponent('mdbgroup/bar') [44-58] ['mdbgroup/bar'] -- REVOKE delete, `update`(col2) ON view foo FROM 'mdbgroup/bar' == diff --git a/zetasql/parser/testdata/in.test b/zetasql/parser/testdata/in.test index 327bd002d..f547e2ed6 100644 --- a/zetasql/parser/testdata/in.test +++ b/zetasql/parser/testdata/in.test @@ -6,12 +6,16 @@ QueryStatement [0-29] [select 'a' IN ('a', 'b', 'c')] SelectList [7-29] ['a' IN ('a', 'b', 'c')] SelectColumn [7-29] ['a' IN ('a', 'b', 'c')] InExpression(IN) [7-29] ['a' IN ('a', 'b', 'c')] - StringLiteral('a') [7-10] ['a'] + StringLiteral [7-10] ['a'] + StringLiteralComponent('a') [7-10] ['a'] Location [11-13] [IN] InList [15-28] ['a', 'b', 'c'] - StringLiteral('a') [15-18] ['a'] - StringLiteral('b') [20-23] ['b'] - StringLiteral('c') [25-28] ['c'] + StringLiteral [15-18] ['a'] + StringLiteralComponent('a') [15-18] ['a'] + StringLiteral [20-23] ['b'] + StringLiteralComponent('b') [20-23] ['b'] + StringLiteral [25-28] ['c'] + StringLiteralComponent('c') [25-28] ['c'] -- SELECT 'a' IN ('a', 'b', 'c') @@ -28,7 +32,8 @@ QueryStatement [0-20] [select true IN ('a')] BooleanLiteral(true) [7-11] [true] Location [12-14] [IN] InList [16-19] ['a'] - StringLiteral('a') [16-19] ['a'] + StringLiteral [16-19] ['a'] + StringLiteralComponent('a') [16-19] ['a'] -- SELECT true IN ('a') @@ -59,7 +64,8 @@ QueryStatement [0-32] [select 5 IN...5, f(b.a))] IntLiteral(5) [7-8] [5] Location [9-11] [IN] InList [13-31] ['a', 4 + 5, f(b.a)] - StringLiteral('a') [13-16] ['a'] + StringLiteral [13-16] ['a'] + StringLiteralComponent('a') [13-16] ['a'] BinaryExpression(+) [18-23] [4 + 5] IntLiteral(4) [18-19] [4] IntLiteral(5) [22-23] [5] @@ -144,7 +150,8 @@ QueryStatement [0-56] [select col...AND f(x,y)] Identifier(col) [7-10] [col] Location [11-13] [IN] InList [15-18] ['a'] - StringLiteral('a') [15-18] ['a'] + StringLiteral [15-18] ['a'] + StringLiteralComponent('a') [15-18] ['a'] BetweenExpression(NOT BETWEEN) [24-56] [col NOT BETWEEN...AND f(x,y)] PathExpression [24-27] [col] Identifier(col) [24-27] [col] diff --git a/zetasql/parser/testdata/keywords.test b/zetasql/parser/testdata/keywords.test index 63da01b50..f4efc8628 100644 --- a/zetasql/parser/testdata/keywords.test +++ b/zetasql/parser/testdata/keywords.test @@ -104,7 +104,8 @@ QueryStatement [0-109] [select lang...by language] BinaryExpression(LIKE) [54-73] [language like 'sv%'] PathExpression [54-62] [language] Identifier(language) [54-62] [language] - StringLiteral('sv%') [68-73] ['sv%'] + StringLiteral [68-73] ['sv%'] + StringLiteralComponent('sv%') [68-73] ['sv%'] GroupBy [74-91] [group by language] GroupingItem [83-91] [language] PathExpression [83-91] [language] @@ -155,7 +156,8 @@ QueryStatement [0-109] [select func...by function] BinaryExpression(LIKE) [54-73] [function like 'sv%'] PathExpression [54-62] [function] Identifier(`function`) [54-62] [function] - StringLiteral('sv%') [68-73] ['sv%'] + StringLiteral [68-73] ['sv%'] + StringLiteralComponent('sv%') [68-73] ['sv%'] GroupBy [74-91] [group by function] GroupingItem [83-91] [function] PathExpression [83-91] [function] @@ -206,7 +208,8 @@ QueryStatement [0-104] [select func...by returns] BinaryExpression(LIKE) [52-70] [returns like 'sv%'] PathExpression [52-59] [returns] Identifier(returns) [52-59] [returns] - StringLiteral('sv%') [65-70] ['sv%'] + StringLiteral [65-70] ['sv%'] + StringLiteralComponent('sv%') [65-70] ['sv%'] GroupBy [71-87] [group by returns] GroupingItem [80-87] [returns] PathExpression [80-87] [returns] @@ -1636,7 +1639,8 @@ QueryStatement [0-214] [select system...of @system)] SelectList [58-120] [time '10:20...system_time] SelectColumn [58-83] [time '10:20:30' as system] DateOrTimeLiteral(TYPE_TIME) [58-73] [time '10:20:30'] - StringLiteral('10:20:30') [63-73] ['10:20:30'] + StringLiteral [63-73] ['10:20:30'] + StringLiteralComponent('10:20:30') [63-73] ['10:20:30'] Alias [74-83] [as system] Identifier(system) [77-83] [system] SelectColumn [85-120] [cast(system...system_time] diff --git a/zetasql/parser/testdata/limit.test b/zetasql/parser/testdata/limit.test index b59a1b57e..c35dbea4e 100644 --- a/zetasql/parser/testdata/limit.test +++ b/zetasql/parser/testdata/limit.test @@ -130,9 +130,26 @@ LIMIT CAST(@@sysvar1 AS int32) OFFSET CAST(@@sysvar2 AS string) SELECT 1 LIMIT cast(cast(1 as int32) as int32); -- -ERROR: Syntax error: Expected "@" or "@@" or integer literal but got keyword CAST [at 1:21] -SELECT 1 LIMIT cast(cast(1 as int32) as int32); - ^ +QueryStatement [0-46] [SELECT 1 LIMIT...as int32)] + Query [0-46] [SELECT 1 LIMIT...as int32)] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-46] [LIMIT cast...as int32)] + CastExpression [15-46] [cast(cast(...as int32)] + CastExpression [20-36] [cast(1 as int32)] + IntLiteral(1) [25-26] [1] + SimpleType [30-35] [int32] + PathExpression [30-35] [int32] + Identifier(int32) [30-35] [int32] + SimpleType [40-45] [int32] + PathExpression [40-45] [int32] + Identifier(int32) [40-45] [int32] +-- +SELECT + 1 +LIMIT CAST(CAST(1 AS int32) AS int32) == SELECT 1 LIMIT 1 OFFSET 1 @@ -365,20 +382,187 @@ ORDER BY 1 LIMIT 2 OFFSET 1 == -# OFFSET is not a reserved keyword, but is rejected here because we don't -# expect an identifier here. -select 1 limit offset; +# LIMIT can take an identifier for argument. +select a from t order by a, b LIMIT a OFFSET 10; -- -ERROR: Syntax error: Unexpected keyword OFFSET [at 1:16] +QueryStatement [0-47] [select a from...OFFSET 10] + Query [0-47] [select a from...OFFSET 10] + Select [0-15] [select a from t] + SelectList [7-8] [a] + SelectColumn [7-8] [a] + PathExpression [7-8] [a] + Identifier(a) [7-8] [a] + FromClause [9-15] [from t] + TablePathExpression [14-15] [t] + PathExpression [14-15] [t] + Identifier(t) [14-15] [t] + OrderBy [16-29] [order by a, b] + OrderingExpression(ASC) [25-26] [a] + PathExpression [25-26] [a] + Identifier(a) [25-26] [a] + OrderingExpression(ASC) [28-29] [b] + PathExpression [28-29] [b] + Identifier(b) [28-29] [b] + LimitOffset [30-47] [LIMIT a OFFSET 10] + PathExpression [36-37] [a] + Identifier(a) [36-37] [a] + IntLiteral(10) [45-47] [10] +-- +SELECT + a +FROM + t +ORDER BY a, b +LIMIT a OFFSET 10 +== + +# OFFSET is not a reserved keyword and so it is parsed as an identifier here. select 1 limit offset; - ^ +-- +QueryStatement [0-21] [select 1 limit offset] + Query [0-21] [select 1 limit offset] + Select [0-8] [select 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-21] [limit offset] + PathExpression [15-21] [offset] + Identifier(offset) [15-21] [offset] +-- +SELECT + 1 +LIMIT offset == select 1 from t limit 5 offset offset; -- -ERROR: Syntax error: Unexpected keyword OFFSET [at 1:32] -select 1 from t limit 5 offset offset; - ^ +QueryStatement [0-37] [select 1 from...offset offset] + Query [0-37] [select 1 from...offset offset] + Select [0-15] [select 1 from t] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + FromClause [9-15] [from t] + TablePathExpression [14-15] [t] + PathExpression [14-15] [t] + Identifier(t) [14-15] [t] + LimitOffset [16-37] [limit 5 offset offset] + IntLiteral(5) [22-23] [5] + PathExpression [31-37] [offset] + Identifier(offset) [31-37] [offset] +-- +SELECT + 1 +FROM + t +LIMIT 5 OFFSET offset +== + +# Negative arguments to LIMIT/OFFSET will parse but later get rejected by +# analyzer. +SELECT 1 LIMIT -1; +-- +QueryStatement [0-17] [SELECT 1 LIMIT -1] + Query [0-17] [SELECT 1 LIMIT -1] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-17] [LIMIT -1] + UnaryExpression(-) [15-17] [-1] + IntLiteral(1) [16-17] [1] +-- +SELECT + 1 +LIMIT -1 +== + +SELECT 1 LIMIT 10 OFFSET -1; +-- +QueryStatement [0-27] [SELECT 1 LIMIT 10 OFFSET -1] + Query [0-27] [SELECT 1 LIMIT 10 OFFSET -1] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-27] [LIMIT 10 OFFSET -1] + IntLiteral(10) [15-17] [10] + UnaryExpression(-) [25-27] [-1] + IntLiteral(1) [26-27] [1] +-- +SELECT + 1 +LIMIT 10 OFFSET -1 +== + +SELECT 1 LIMIT -10 OFFSET -22; +-- +QueryStatement [0-29] [SELECT 1 LIMIT -10 OFFSET -22] + Query [0-29] [SELECT 1 LIMIT -10 OFFSET -22] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-29] [LIMIT -10 OFFSET -22] + UnaryExpression(-) [15-18] [-10] + IntLiteral(10) [16-18] [10] + UnaryExpression(-) [26-29] [-22] + IntLiteral(22) [27-29] [22] +-- +SELECT + 1 +LIMIT -10 OFFSET -22 +== + +# Non-INT64 argument to LIMIT/OFFSET will parse but later get rejected by analyzer. +SELECT 1 LIMIT 1.5; +-- +QueryStatement [0-18] [SELECT 1 LIMIT 1.5] + Query [0-18] [SELECT 1 LIMIT 1.5] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-18] [LIMIT 1.5] + FloatLiteral(1.5) [15-18] [1.5] +-- +SELECT + 1 +LIMIT 1.5 +== + +SELECT 1 LIMIT 'abc'; +-- +QueryStatement [0-20] [SELECT 1 LIMIT 'abc'] + Query [0-20] [SELECT 1 LIMIT 'abc'] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-20] [LIMIT 'abc'] + StringLiteral [15-20] ['abc'] + StringLiteralComponent('abc') [15-20] ['abc'] +-- +SELECT + 1 +LIMIT 'abc' +== + +SELECT 1 LIMIT @param; +-- +QueryStatement [0-21] [SELECT 1 LIMIT @param] + Query [0-21] [SELECT 1 LIMIT @param] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-21] [LIMIT @param] + ParameterExpr [15-21] [@param] + Identifier(param) [16-21] [param] +-- +SELECT + 1 +LIMIT @param == # Expected Errors Below @@ -396,13 +580,6 @@ select 1 limit 5 offset; ^ == -select a from t order by a, b LIMIT a OFFSET 10; --- -ERROR: Syntax error: Unexpected identifier "a" [at 1:37] -select a from t order by a, b LIMIT a OFFSET 10; - ^ -== - select 1 limit 1 order by 1; -- ERROR: Syntax error: Expected end of input but got keyword ORDER [at 1:18] @@ -438,53 +615,34 @@ SELECT 1 LIMIT 10 OFFSET 1 LIMIT 1; ^ == -SELECT 1 LIMIT -1; --- -ERROR: Syntax error: Unexpected "-" [at 1:16] -SELECT 1 LIMIT -1; - ^ -== - -SELECT 1 LIMIT 10 OFFSET -1; --- -ERROR: Syntax error: Unexpected "-" [at 1:26] -SELECT 1 LIMIT 10 OFFSET -1; - ^ -== - -SELECT 1 LIMIT -10 OFFSET -22; --- -ERROR: Syntax error: Unexpected "-" [at 1:16] -SELECT 1 LIMIT -10 OFFSET -22; - ^ -== - -SELECT 1 LIMIT 1.5; +SELECT 1 LIMIT NULL -- -ERROR: Syntax error: Unexpected floating point literal "1.5" [at 1:16] -SELECT 1 LIMIT 1.5; - ^ -== - -SELECT 1 LIMIT 'abc'; +QueryStatement [0-19] [SELECT 1 LIMIT NULL] + Query [0-19] [SELECT 1 LIMIT NULL] + Select [0-8] [SELECT 1] + SelectList [7-8] [1] + SelectColumn [7-8] [1] + IntLiteral(1) [7-8] [1] + LimitOffset [9-19] [LIMIT NULL] + NullLiteral(NULL) [15-19] [NULL] -- -ERROR: Syntax error: Unexpected string literal 'abc' [at 1:16] -SELECT 1 LIMIT 'abc'; - ^ +SELECT + 1 +LIMIT NULL == -SELECT 1 LIMIT @param; +SELECT 1 LIMIT 1 OFFSET NULL -- -QueryStatement [0-21] [SELECT 1 LIMIT @param] - Query [0-21] [SELECT 1 LIMIT @param] +QueryStatement [0-28] [SELECT 1 LIMIT 1 OFFSET NULL] + Query [0-28] [SELECT 1 LIMIT 1 OFFSET NULL] Select [0-8] [SELECT 1] SelectList [7-8] [1] SelectColumn [7-8] [1] IntLiteral(1) [7-8] [1] - LimitOffset [9-21] [LIMIT @param] - ParameterExpr [15-21] [@param] - Identifier(param) [16-21] [param] + LimitOffset [9-28] [LIMIT 1 OFFSET NULL] + IntLiteral(1) [15-16] [1] + NullLiteral(NULL) [24-28] [NULL] -- SELECT 1 -LIMIT @param +LIMIT 1 OFFSET NULL diff --git a/zetasql/parser/testdata/literal_concatenation.test b/zetasql/parser/testdata/literal_concatenation.test new file mode 100644 index 000000000..ea8fc1e8f --- /dev/null +++ b/zetasql/parser/testdata/literal_concatenation.test @@ -0,0 +1,156 @@ +# STRING literals concatenate and raw modifiers apply only to their immediate +# component. +select r'\n' '\n' "b" """c"d"e""" '''f'g'h''' "1" "2" +-- + +QueryStatement [0-53] [select r'\...'' "1" "2"] + Query [0-53] [select r'\...'' "1" "2"] + Select [0-53] [select r'\...'' "1" "2"] + SelectList [7-53] [r'\n' '\n'...'' "1" "2"] + SelectColumn [7-53] [r'\n' '\n'...'' "1" "2"] + StringLiteral [7-53] [r'\n' '\n'...'' "1" "2"] + StringLiteralComponent(r'\n') [7-12] [r'\n'] + StringLiteralComponent('\n') [13-17] ['\n'] + StringLiteralComponent("b") [18-21] ["b"] + StringLiteralComponent("""c"d"e""") [22-33] ["""c"d"e"""] + StringLiteralComponent('''f'g'h''') [34-45] ['''f'g'h'''] + StringLiteralComponent("1") [46-49] ["1"] + StringLiteralComponent("2") [50-53] ["2"] +-- +SELECT + r'\n' '\n' "b" """c"d"e""" '''f'g'h''' "1" "2" +== + +# BYTES literals concatenate and raw modifiers apply only to their immediate +# component. +select br'\n' b'\n' b"b" b"""c"d"e""" b'''f'g'h''' b"1" b"2" +-- + +QueryStatement [0-60] [select br'...b"1" b"2"] + Query [0-60] [select br'...b"1" b"2"] + Select [0-60] [select br'...b"1" b"2"] + SelectList [7-60] [br'\n' b'\...b"1" b"2"] + SelectColumn [7-60] [br'\n' b'\...b"1" b"2"] + BytesLiteral [7-60] [br'\n' b'\...b"1" b"2"] + BytesLiteralComponent(br'\n') [7-13] [br'\n'] + BytesLiteralComponent(b'\n') [14-19] [b'\n'] + BytesLiteralComponent(b"b") [20-24] [b"b"] + BytesLiteralComponent(b"""c"d"e""") [25-37] [b"""c"d"e"""] + BytesLiteralComponent(b'''f'g'h''') [38-50] [b'''f'g'h'''] + BytesLiteralComponent(b"1") [51-55] [b"1"] + BytesLiteralComponent(b"2") [56-60] [b"2"] +-- +SELECT + br'\n' b'\n' b"b" b"""c"d"e""" b'''f'g'h''' b"1" b"2" +== + +# STRING and BYTES literals do not concatenate +select '1' b'2' +-- + +ERROR: Syntax error: string and bytes literals cannot be concatenated. [at 1:12] +select '1' b'2' + ^ +== + +# BYTES and STRING literals do not concatenate +select b'1' '2' +-- + +ERROR: Syntax error: string and bytes literals cannot be concatenated. [at 1:13] +select b'1' '2' + ^ +== + +# JSON literal with concatenation. Comments can interleave. +# In this example, we broke a string value that has quotes and slashes. +select JSON /*some_comment*/ '{"number": 1, "str_with_slashes": ' /*begin_val*/ r'"\escaped\"' /*end_val*/ ' "other": 2}' +-- + +QueryStatement [0-121] [select JSON...other": 2}'] + Query [0-121] [select JSON...other": 2}'] + Select [0-121] [select JSON...other": 2}'] + SelectList [7-121] [JSON /*some_comme...ther": 2}'] + SelectColumn [7-121] [JSON /*some_comme...ther": 2}'] + JSONLiteral [7-121] [JSON /*some_comme...ther": 2}'] + StringLiteral [29-121] ['{"number"...other": 2}'] + StringLiteralComponent('{"number": 1, "str_with_slashes": ') [29-65] ['{"number"..._with_slashes": '] + StringLiteralComponent(r'"\escaped\"') [80-94] [r'"\escaped\"'] + StringLiteralComponent(' "other": 2}') [107-121] [' "other": 2}'] +-- +SELECT + JSON '{"number": 1, "str_with_slashes": ' r'"\escaped\"' ' "other": 2}' +== + +# Other literals can also concat +SELECT + NUMERIC "1" r'2', + DECIMAL /*whole:*/ '1' /*fractional:*/ ".23" /*exponent=*/ "e+6", + BIGNUMERIC '1' r"2", + BIGDECIMAL /*sign*/ '-' /*whole:*/ '1' /*fractional:*/ ".23" /*exponent=*/ "e+6", + RANGE '[2014-01-01,' /*comment*/ "2015-01-01)", + DATE '2014' "-01-01", + DATETIME '2016-01-01' r"12:00:00", + TIMESTAMP '2018-10-01' "12:00:00+08" +-- +QueryStatement [0-357] [SELECT NUMERIC...:00:00+08"] + Query [0-357] [SELECT NUMERIC...:00:00+08"] + Select [0-357] [SELECT NUMERIC...:00:00+08"] + SelectList [9-357] [NUMERIC "1...:00:00+08"] + SelectColumn [9-25] [NUMERIC "1" r'2'] + NumericLiteral [9-25] [NUMERIC "1" r'2'] + StringLiteral [17-25] ["1" r'2'] + StringLiteralComponent("1") [17-20] ["1"] + StringLiteralComponent(r'2') [21-25] [r'2'] + SelectColumn [29-93] [DECIMAL /*...exponent=*/ "e+6"] + NumericLiteral [29-93] [DECIMAL /*...exponent=*/ "e+6"] + StringLiteral [48-93] ['1' /*fractional...nt=*/ "e+6"] + StringLiteralComponent('1') [48-51] ['1'] + StringLiteralComponent(".23") [68-73] [".23"] + StringLiteralComponent("e+6") [88-93] ["e+6"] + SelectColumn [97-116] [BIGNUMERIC '1' r"2"] + BigNumericLiteral [97-116] [BIGNUMERIC '1' r"2"] + StringLiteral [108-116] ['1' r"2"] + StringLiteralComponent('1') [108-111] ['1'] + StringLiteralComponent(r"2") [112-116] [r"2"] + SelectColumn [120-200] [BIGDECIMAL...exponent=*/ "e+6"] + BigNumericLiteral [120-200] [BIGDECIMAL...exponent=*/ "e+6"] + StringLiteral [140-200] ['-' /*whole...xponent=*/ "e+6"] + StringLiteralComponent('-') [140-143] ['-'] + StringLiteralComponent('1') [155-158] ['1'] + StringLiteralComponent(".23") [175-180] [".23"] + StringLiteralComponent("e+6") [195-200] ["e+6"] + SelectColumn [204-256] [RANGE] + SimpleType [210-214] [DATE] + PathExpression [210-214] [DATE] + Identifier(DATE) [210-214] [DATE] + StringLiteral [216-256] ['[2014-01-...2015-01-01)"] + StringLiteralComponent('[2014-01-01,') [216-230] ['[2014-01-01,'] + StringLiteralComponent("2015-01-01)") [243-256] ["2015-01-01)"] + SelectColumn [260-280] [DATE '2014' "-01-01"] + DateOrTimeLiteral(TYPE_DATE) [260-280] [DATE '2014' "-01-01"] + StringLiteral [265-280] ['2014' "-01-01"] + StringLiteralComponent('2014') [265-271] ['2014'] + StringLiteralComponent("-01-01") [272-280] ["-01-01"] + SelectColumn [284-317] [DATETIME '..."12:00:00"] + DateOrTimeLiteral(TYPE_DATETIME) [284-317] [DATETIME '..."12:00:00"] + StringLiteral [293-317] ['2016-01-01' r"12:00:00"] + StringLiteralComponent('2016-01-01') [293-305] ['2016-01-01'] + StringLiteralComponent(r"12:00:00") [306-317] [r"12:00:00"] + SelectColumn [321-357] [TIMESTAMP...:00:00+08"] + DateOrTimeLiteral(TYPE_TIMESTAMP) [321-357] [TIMESTAMP...:00:00+08"] + StringLiteral [331-357] ['2018-10-01' "12:00:00+08"] + StringLiteralComponent('2018-10-01') [331-343] ['2018-10-01'] + StringLiteralComponent("12:00:00+08") [344-357] ["12:00:00+08"] +-- +SELECT + NUMERIC "1" r'2', + NUMERIC '1' ".23" "e+6", + BIGNUMERIC '1' r"2", + BIGNUMERIC '-' '1' ".23" "e+6", + RANGE< DATE > '[2014-01-01,' "2015-01-01)", + DATE '2014' "-01-01", + DATETIME '2016-01-01' r"12:00:00", + TIMESTAMP '2018-10-01' "12:00:00+08" diff --git a/zetasql/parser/testdata/literals.test b/zetasql/parser/testdata/literals.test index 8397d5320..4d7751064 100644 --- a/zetasql/parser/testdata/literals.test +++ b/zetasql/parser/testdata/literals.test @@ -9,9 +9,11 @@ QueryStatement [0-36] [select NULL...y", 1, 1.0] SelectColumn [13-17] [TRUE] BooleanLiteral(TRUE) [13-17] [TRUE] SelectColumn [19-22] ["x"] - StringLiteral("x") [19-22] ["x"] + StringLiteral [19-22] ["x"] + StringLiteralComponent("x") [19-22] ["x"] SelectColumn [24-28] [b"y"] - BytesLiteral(b"y") [24-28] [b"y"] + BytesLiteral [24-28] [b"y"] + BytesLiteralComponent(b"y") [24-28] [b"y"] SelectColumn [30-31] [1] IntLiteral(1) [30-31] [1] SelectColumn [33-36] [1.0] @@ -67,17 +69,23 @@ QueryStatement [0-61] [select 'abc...U00012346"] Select [0-61] [select 'abc...U00012346"] SelectList [7-61] ['abc', 'a\...U00012346"] SelectColumn [7-12] ['abc'] - StringLiteral('abc') [7-12] ['abc'] + StringLiteral [7-12] ['abc'] + StringLiteralComponent('abc') [7-12] ['abc'] SelectColumn [14-19] ['a\b'] - StringLiteral('a\b') [14-19] ['a\b'] + StringLiteral [14-19] ['a\b'] + StringLiteralComponent('a\b') [14-19] ['a\b'] SelectColumn [21-26] ["def"] - StringLiteral("def") [21-26] ["def"] + StringLiteral [21-26] ["def"] + StringLiteralComponent("def") [21-26] ["def"] SelectColumn [28-35] ['\\x53'] - StringLiteral('\\x53') [28-35] ['\\x53'] + StringLiteral [28-35] ['\\x53'] + StringLiteralComponent('\\x53') [28-35] ['\\x53'] SelectColumn [37-46] ['\\u1235'] - StringLiteral('\\u1235') [37-46] ['\\u1235'] + StringLiteral [37-46] ['\\u1235'] + StringLiteralComponent('\\u1235') [37-46] ['\\u1235'] SelectColumn [48-61] ["\\U00012346"] - StringLiteral("\\U00012346") [48-61] ["\\U00012346"] + StringLiteral [48-61] ["\\U00012346"] + StringLiteralComponent("\\U00012346") [48-61] ["\\U00012346"] -- SELECT 'abc', @@ -95,9 +103,11 @@ QueryStatement [0-31] [select """...''line1'''] Select [0-31] [select """...''line1'''] SelectList [7-31] ["""line1""", '''line1'''] SelectColumn [7-18] ["""line1"""] - StringLiteral("""line1""") [7-18] ["""line1"""] + StringLiteral [7-18] ["""line1"""] + StringLiteralComponent("""line1""") [7-18] ["""line1"""] SelectColumn [20-31] ['''line1'''] - StringLiteral('''line1''') [20-31] ['''line1'''] + StringLiteral [20-31] ['''line1'''] + StringLiteralComponent('''line1''') [20-31] ['''line1'''] -- SELECT """line1""", @@ -111,9 +121,11 @@ QueryStatement [0-31] [select '''...''abc\\'''] Select [0-31] [select '''...''abc\\'''] SelectList [7-31] ['''abc\'''', '''abc\\'''] SelectColumn [7-18] ['''abc\''''] - StringLiteral('''abc\'''') [7-18] ['''abc\''''] + StringLiteral [7-18] ['''abc\''''] + StringLiteralComponent('''abc\'''') [7-18] ['''abc\''''] SelectColumn [20-31] ['''abc\\'''] - StringLiteral('''abc\\''') [20-31] ['''abc\\'''] + StringLiteral [20-31] ['''abc\\'''] + StringLiteralComponent('''abc\\''') [20-31] ['''abc\\'''] -- SELECT '''abc\'''', @@ -128,9 +140,11 @@ QueryStatement [0-43] [select '''...'\\'def'''] Select [0-43] [select '''...'\\'def'''] SelectList [7-43] ['''abc'\\'...'\\'def'''] SelectColumn [7-24] ['''abc'\\''def'''] - StringLiteral('''abc'\\''def''') [7-24] ['''abc'\\''def'''] + StringLiteral [7-24] ['''abc'\\''def'''] + StringLiteralComponent('''abc'\\''def''') [7-24] ['''abc'\\''def'''] SelectColumn [26-43] ['''abc''\\'def'''] - StringLiteral('''abc''\\'def''') [26-43] ['''abc''\\'def'''] + StringLiteral [26-43] ['''abc''\\'def'''] + StringLiteralComponent('''abc''\\'def''') [26-43] ['''abc''\\'def'''] -- SELECT '''abc'\\''def''', @@ -145,9 +159,11 @@ QueryStatement [0-43] [select """..."\\"def"""] Select [0-43] [select """..."\\"def"""] SelectList [7-43] ["""abc"\\"..."\\"def"""] SelectColumn [7-24] ["""abc"\\""def"""] - StringLiteral("""abc"\\""def""") [7-24] ["""abc"\\""def"""] + StringLiteral [7-24] ["""abc"\\""def"""] + StringLiteralComponent("""abc"\\""def""") [7-24] ["""abc"\\""def"""] SelectColumn [26-43] ["""abc""\\"def"""] - StringLiteral("""abc""\\"def""") [26-43] ["""abc""\\"def"""] + StringLiteral [26-43] ["""abc""\\"def"""] + StringLiteralComponent("""abc""\\"def""") [26-43] ["""abc""\\"def"""] -- SELECT """abc"\\""def""", @@ -169,7 +185,8 @@ QueryStatement [0-17] [select """"a" """] Select [0-17] [select """"a" """] SelectList [7-17] [""""a" """] SelectColumn [7-17] [""""a" """] - StringLiteral(""""a" """) [7-17] [""""a" """] + StringLiteral [7-17] [""""a" """] + StringLiteralComponent(""""a" """) [7-17] [""""a" """] -- SELECT """"a" """ @@ -182,7 +199,8 @@ QueryStatement [0-75] [select """...U000022FD"""] Select [0-75] [select """...U000022FD"""] SelectList [7-75] ["""line1 '...U000022FD"""] SelectColumn [7-75] ["""line1 '...U000022FD"""] - StringLiteral("""line1 'single_quote' "double_quote" \\x41g \\u22FD \\U000022FD""") [7-75] ["""line1 '...U000022FD"""] + StringLiteral [7-75] ["""line1 '...U000022FD"""] + StringLiteralComponent("""line1 'single_quote' "double_quote" \\x41g \\u22FD \\U000022FD""") [7-75] ["""line1 '...U000022FD"""] -- SELECT """line1 'single_quote' "double_quote" \\x41g \\u22FD \\U000022FD""" @@ -195,7 +213,8 @@ QueryStatement [0-75] [select '''...U000022FD'''] Select [0-75] [select '''...U000022FD'''] SelectList [7-75] ['''line1 '...U000022FD'''] SelectColumn [7-75] ['''line1 '...U000022FD'''] - StringLiteral('''line1 'single_quote' "double_quote" \\x41g \\u22FD \\U000022FD''') [7-75] ['''line1 '...U000022FD'''] + StringLiteral [7-75] ['''line1 '...U000022FD'''] + StringLiteralComponent('''line1 'single_quote' "double_quote" \\x41g \\u22FD \\U000022FD''') [7-75] ['''line1 '...U000022FD'''] -- SELECT '''line1 'single_quote' "double_quote" \\x41g \\u22FD \\U000022FD''' @@ -228,13 +247,16 @@ QueryStatement [0-85] [select """...line2 line3'''] Select [0-85] [select """...line2 line3'''] SelectList [7-85] ["""line1 line2...ine2 line3'''] SelectColumn [7-30] ["""line1 line2 line3"""] - StringLiteral("""line1 + StringLiteral [7-30] ["""line1 line2 line3"""] + StringLiteralComponent("""line1 line2 line3""") [7-30] ["""line1 line2 line3"""] SelectColumn [32-60] ['--------------------------'] - StringLiteral('--------------------------') [32-60] ['--------------------------'] + StringLiteral [32-60] ['--------------------------'] + StringLiteralComponent('--------------------------') [32-60] ['--------------------------'] SelectColumn [62-85] ['''line1 line2 line3'''] - StringLiteral('''line1 + StringLiteral [62-85] ['''line1 line2 line3'''] + StringLiteralComponent('''line1 line2 line3''') [62-85] ['''line1 line2 line3'''] -- @@ -275,12 +297,15 @@ QueryStatement [0-80] [select """...\ line2'''] Select [0-80] [select """...\ line2'''] SelectList [7-80] ["""line1\\...\ line2'''] SelectColumn [7-26] ["""line1\\ line2"""] - StringLiteral("""line1\\ + StringLiteral [7-26] ["""line1\\ line2"""] + StringLiteralComponent("""line1\\ line2""") [7-26] ["""line1\\ line2"""] SelectColumn [28-59] ['---------...---------'] - StringLiteral('-----------------------------') [28-59] ['---------...---------'] + StringLiteral [28-59] ['---------...---------'] + StringLiteralComponent('-----------------------------') [28-59] ['---------...---------'] SelectColumn [61-80] ['''line1\\ line2'''] - StringLiteral('''line1\\ + StringLiteral [61-80] ['''line1\\ line2'''] + StringLiteralComponent('''line1\\ line2''') [61-80] ['''line1\\ line2'''] -- SELECT @@ -307,7 +332,8 @@ QueryStatement [0-22] [select """a'''a'''a"""] Select [0-22] [select """a'''a'''a"""] SelectList [7-22] ["""a'''a'''a"""] SelectColumn [7-22] ["""a'''a'''a"""] - StringLiteral("""a'''a'''a""") [7-22] ["""a'''a'''a"""] + StringLiteral [7-22] ["""a'''a'''a"""] + StringLiteralComponent("""a'''a'''a""") [7-22] ["""a'''a'''a"""] -- SELECT """a'''a'''a""" @@ -320,28 +346,13 @@ QueryStatement [0-22] [select '''a"""a"""a'''] Select [0-22] [select '''a"""a"""a'''] SelectList [7-22] ['''a"""a"""a'''] SelectColumn [7-22] ['''a"""a"""a'''] - StringLiteral('''a"""a"""a''') [7-22] ['''a"""a"""a'''] + StringLiteral [7-22] ['''a"""a"""a'''] + StringLiteralComponent('''a"""a"""a''') [7-22] ['''a"""a"""a'''] -- SELECT '''a"""a"""a''' == -select """line1""""""line2""""""line3""" --- -ERROR: Syntax error: Expected end of input but got string literal """line2""" [at 1:19] -select """line1""""""line2""""""line3""" - ^ -== - -select """line1""" -"""line2""" -"""line3""" --- -ERROR: Syntax error: Expected end of input but got string literal """line2""" [at 2:1] -"""line2""" -^ -== - select """line1""", """line2""", @@ -352,11 +363,14 @@ QueryStatement [0-44] [select """...""line3"""] Select [0-44] [select """...""line3"""] SelectList [7-44] ["""line1""...""line3"""] SelectColumn [7-18] ["""line1"""] - StringLiteral("""line1""") [7-18] ["""line1"""] + StringLiteral [7-18] ["""line1"""] + StringLiteralComponent("""line1""") [7-18] ["""line1"""] SelectColumn [20-31] ["""line2"""] - StringLiteral("""line2""") [20-31] ["""line2"""] + StringLiteral [20-31] ["""line2"""] + StringLiteralComponent("""line2""") [20-31] ["""line2"""] SelectColumn [33-44] ["""line3"""] - StringLiteral("""line3""") [33-44] ["""line3"""] + StringLiteral [33-44] ["""line3"""] + StringLiteralComponent("""line3""") [33-44] ["""line3"""] -- SELECT """line1""", @@ -371,9 +385,11 @@ QueryStatement [0-54] [select /*...*/ """a"""] Select [0-54] [select /*...*/ """a"""] SelectList [24-54] ["a", /* comment """a*/ """a"""] SelectColumn [24-27] ["a"] - StringLiteral("a") [24-27] ["a"] + StringLiteral [24-27] ["a"] + StringLiteralComponent("a") [24-27] ["a"] SelectColumn [47-54] ["""a"""] - StringLiteral("""a""") [47-54] ["""a"""] + StringLiteral [47-54] ["""a"""] + StringLiteralComponent("""a""") [47-54] ["""a"""] -- SELECT "a", @@ -387,9 +403,11 @@ QueryStatement [0-48] [select "a...comment */"""] Select [0-48] [select "a...comment */"""] SelectList [7-48] ["a /* comment...comment */"""] SelectColumn [7-25] ["a /* comment */ "] - StringLiteral("a /* comment */ ") [7-25] ["a /* comment */ "] + StringLiteral [7-25] ["a /* comment */ "] + StringLiteralComponent("a /* comment */ ") [7-25] ["a /* comment */ "] SelectColumn [27-48] ["""a /* comment */"""] - StringLiteral("""a /* comment */""") [27-48] ["""a /* comment */"""] + StringLiteral [27-48] ["""a /* comment */"""] + StringLiteralComponent("""a /* comment */""") [27-48] ["""a /* comment */"""] -- SELECT "a /* comment */ ", @@ -403,13 +421,17 @@ QueryStatement [0-67] [select """...*/ jkl """] Select [0-67] [select """...*/ jkl """] SelectList [7-67] ["""abc # "...*/ jkl """] SelectColumn [7-19] ["""abc # """] - StringLiteral("""abc # """) [7-19] ["""abc # """] + StringLiteral [7-19] ["""abc # """] + StringLiteralComponent("""abc # """) [7-19] ["""abc # """] SelectColumn [21-35] [""" def -- """] - StringLiteral(""" def -- """) [21-35] [""" def -- """] + StringLiteral [21-35] [""" def -- """] + StringLiteralComponent(""" def -- """) [21-35] [""" def -- """] SelectColumn [37-51] [""" ghi /* """] - StringLiteral(""" ghi /* """) [37-51] [""" ghi /* """] + StringLiteral [37-51] [""" ghi /* """] + StringLiteralComponent(""" ghi /* """) [37-51] [""" ghi /* """] SelectColumn [53-67] [""" */ jkl """] - StringLiteral(""" */ jkl """) [53-67] [""" */ jkl """] + StringLiteral [53-67] [""" */ jkl """] + StringLiteralComponent(""" */ jkl """) [53-67] [""" */ jkl """] -- SELECT """abc # """, @@ -445,83 +467,103 @@ QueryStatement [0-695] [select '\x53...long_UTF8_char] Select [0-695] [select '\x53...long_UTF8_char] SelectList [7-695] ['\x53'...long_UTF8_char] SelectColumn [7-34] ['\x53' as OneHexByte] - StringLiteral('\x53') [7-13] ['\x53'] + StringLiteral [7-13] ['\x53'] + StringLiteralComponent('\x53') [7-13] ['\x53'] Alias [21-34] [as OneHexByte] Identifier(OneHexByte) [24-34] [OneHexByte] SelectColumn [42-73] ['\X41'...AnotherHexByte] - StringLiteral('\X41') [42-48] ['\X41'] + StringLiteral [42-48] ['\X41'] + StringLiteralComponent('\X41') [42-48] ['\X41'] Alias [56-73] [as AnotherHexByte] Identifier(AnotherHexByte) [59-73] [AnotherHexByte] SelectColumn [81-110] ['\001' as OneOctalByte] - StringLiteral('\001') [81-87] ['\001'] + StringLiteral [81-87] ['\001'] + StringLiteralComponent('\001') [81-87] ['\001'] Alias [95-110] [as OneOctalByte] Identifier(OneOctalByte) [98-110] [OneOctalByte] SelectColumn [118-136] ['\a...' as a] - StringLiteral('\a...') [118-125] ['\a...'] + StringLiteral [118-125] ['\a...'] + StringLiteralComponent('\a...') [118-125] ['\a...'] Alias [132-136] [as a] Identifier(a) [135-136] [a] SelectColumn [144-162] ['\b...' as b] - StringLiteral('\b...') [144-151] ['\b...'] + StringLiteral [144-151] ['\b...'] + StringLiteralComponent('\b...') [144-151] ['\b...'] Alias [158-162] [as b] Identifier(b) [161-162] [b] SelectColumn [170-188] ['\f...' as f] - StringLiteral('\f...') [170-177] ['\f...'] + StringLiteral [170-177] ['\f...'] + StringLiteralComponent('\f...') [170-177] ['\f...'] Alias [184-188] [as f] Identifier(f) [187-188] [f] SelectColumn [196-214] ['\n...' as n] - StringLiteral('\n...') [196-203] ['\n...'] + StringLiteral [196-203] ['\n...'] + StringLiteralComponent('\n...') [196-203] ['\n...'] Alias [210-214] [as n] Identifier(n) [213-214] [n] SelectColumn [222-240] ['\r...' as r] - StringLiteral('\r...') [222-229] ['\r...'] + StringLiteral [222-229] ['\r...'] + StringLiteralComponent('\r...') [222-229] ['\r...'] Alias [236-240] [as r] Identifier(r) [239-240] [r] SelectColumn [248-266] ['\t...' as t] - StringLiteral('\t...') [248-255] ['\t...'] + StringLiteral [248-255] ['\t...'] + StringLiteralComponent('\t...') [248-255] ['\t...'] Alias [262-266] [as t] Identifier(t) [265-266] [t] SelectColumn [274-292] ['\v...' as v] - StringLiteral('\v...') [274-281] ['\v...'] + StringLiteral [274-281] ['\v...'] + StringLiteralComponent('\v...') [274-281] ['\v...'] Alias [288-292] [as v] Identifier(v) [291-292] [v] SelectColumn [300-325] ['\\...' as backslash] - StringLiteral('\\...') [300-307] ['\\...'] + StringLiteral [300-307] ['\\...'] + StringLiteralComponent('\\...') [300-307] ['\\...'] Alias [313-325] [as backslash] Identifier(backslash) [316-325] [backslash] SelectColumn [333-358] ['\?...' as question] - StringLiteral('\?...') [333-340] ['\?...'] + StringLiteral [333-340] ['\?...'] + StringLiteralComponent('\?...') [333-340] ['\?...'] Alias [347-358] [as question] Identifier(question) [350-358] [question] SelectColumn [366-402] ['\"...'...single_double_quote] - StringLiteral('\"...') [366-373] ['\"...'] + StringLiteral [366-373] ['\"...'] + StringLiteralComponent('\"...') [366-373] ['\"...'] Alias [380-402] [as single_double_quote] Identifier(single_double_quote) [383-402] [single_double_quote] SelectColumn [410-446] ['\'...'...single_single_quote] - StringLiteral('\'...') [410-417] ['\'...'] + StringLiteral [410-417] ['\'...'] + StringLiteralComponent('\'...') [410-417] ['\'...'] Alias [424-446] [as single_single_quote] Identifier(single_single_quote) [427-446] [single_single_quote] SelectColumn [454-487] ['\`...'...single_back_tick] - StringLiteral('\`...') [454-461] ['\`...'] + StringLiteral [454-461] ['\`...'] + StringLiteralComponent('\`...') [454-461] ['\`...'] Alias [468-487] [as single_back_tick] Identifier(single_back_tick) [471-487] [single_back_tick] SelectColumn [495-531] ["\"..."...double_double_quote] - StringLiteral("\"...") [495-502] ["\"..."] + StringLiteral [495-502] ["\"..."] + StringLiteralComponent("\"...") [495-502] ["\"..."] Alias [509-531] [as double_double_quote] Identifier(double_double_quote) [512-531] [double_double_quote] SelectColumn [539-575] ["\'..."...double_single_quote] - StringLiteral("\'...") [539-546] ["\'..."] + StringLiteral [539-546] ["\'..."] + StringLiteralComponent("\'...") [539-546] ["\'..."] Alias [553-575] [as double_single_quote] Identifier(double_single_quote) [556-575] [double_single_quote] SelectColumn [583-616] ["\`..."...double_back_tick] - StringLiteral("\`...") [583-590] ["\`..."] + StringLiteral [583-590] ["\`..."] + StringLiteralComponent("\`...") [583-590] ["\`..."] Alias [597-616] [as double_back_tick] Identifier(double_back_tick) [600-616] [double_back_tick] SelectColumn [624-656] ['\uabcd'...short_UTF8_char] - StringLiteral('\uabcd') [624-632] ['\uabcd'] + StringLiteral [624-632] ['\uabcd'] + StringLiteralComponent('\uabcd') [624-632] ['\uabcd'] Alias [638-656] [as short_UTF8_char] Identifier(short_UTF8_char) [641-656] [short_UTF8_char] SelectColumn [664-695] ['\U0010FFFF...long_UTF8_char] - StringLiteral('\U0010FFFF') [664-676] ['\U0010FFFF'] + StringLiteral [664-676] ['\U0010FFFF'] + StringLiteralComponent('\U0010FFFF') [664-676] ['\U0010FFFF'] Alias [678-695] [as long_UTF8_char] Identifier(long_UTF8_char) [681-695] [long_UTF8_char] -- @@ -556,7 +598,8 @@ QueryStatement [0-16] [select 'ab\x41g'] Select [0-16] [select 'ab\x41g'] SelectList [7-16] ['ab\x41g'] SelectColumn [7-16] ['ab\x41g'] - StringLiteral('ab\x41g') [7-16] ['ab\x41g'] + StringLiteral [7-16] ['ab\x41g'] + StringLiteralComponent('ab\x41g') [7-16] ['ab\x41g'] -- SELECT 'ab\x41g' @@ -570,7 +613,8 @@ QueryStatement [0-23] [select 'ab\012\345\067'] Select [0-23] [select 'ab\012\345\067'] SelectList [7-23] ['ab\012\345\067'] SelectColumn [7-23] ['ab\012\345\067'] - StringLiteral('ab\012\345\067') [7-23] ['ab\012\345\067'] + StringLiteral [7-23] ['ab\012\345\067'] + StringLiteralComponent('ab\012\345\067') [7-23] ['ab\012\345\067'] -- SELECT 'ab\012\345\067' @@ -584,9 +628,11 @@ QueryStatement [0-27] [select 'ab\000A', 'ab\377B'] Select [0-27] [select 'ab\000A', 'ab\377B'] SelectList [7-27] ['ab\000A', 'ab\377B'] SelectColumn [7-16] ['ab\000A'] - StringLiteral('ab\000A') [7-16] ['ab\000A'] + StringLiteral [7-16] ['ab\000A'] + StringLiteralComponent('ab\000A') [7-16] ['ab\000A'] SelectColumn [18-27] ['ab\377B'] - StringLiteral('ab\377B') [18-27] ['ab\377B'] + StringLiteral [18-27] ['ab\377B'] + StringLiteralComponent('ab\377B') [18-27] ['ab\377B'] -- SELECT 'ab\000A', @@ -728,7 +774,8 @@ QueryStatement [0-25] [select 'abc\x00\x99\xffd'] Select [0-25] [select 'abc\x00\x99\xffd'] SelectList [7-25] ['abc\x00\x99\xffd'] SelectColumn [7-25] ['abc\x00\x99\xffd'] - StringLiteral('abc\x00\x99\xffd') [7-25] ['abc\x00\x99\xffd'] + StringLiteral [7-25] ['abc\x00\x99\xffd'] + StringLiteralComponent('abc\x00\x99\xffd') [7-25] ['abc\x00\x99\xffd'] -- SELECT 'abc\x00\x99\xffd' @@ -813,7 +860,8 @@ QueryStatement [0-10] [select ""a] Select [0-10] [select ""a] SelectList [7-10] [""a] SelectColumn [7-10] [""a] - StringLiteral("") [7-9] [""] + StringLiteral [7-9] [""] + StringLiteralComponent("") [7-9] [""] Alias [9-10] [a] Identifier(a) [9-10] [a] -- @@ -829,13 +877,17 @@ QueryStatement [0-29] [select '', "", """""", ''''''] Select [0-29] [select '', "", """""", ''''''] SelectList [7-29] ['', "", """""", ''''''] SelectColumn [7-9] [''] - StringLiteral('') [7-9] [''] + StringLiteral [7-9] [''] + StringLiteralComponent('') [7-9] [''] SelectColumn [11-13] [""] - StringLiteral("") [11-13] [""] + StringLiteral [11-13] [""] + StringLiteralComponent("") [11-13] [""] SelectColumn [15-21] [""""""] - StringLiteral("""""") [15-21] [""""""] + StringLiteral [15-21] [""""""] + StringLiteralComponent("""""") [15-21] [""""""] SelectColumn [23-29] [''''''] - StringLiteral('''''') [23-29] [''''''] + StringLiteral [23-29] [''''''] + StringLiteralComponent('''''') [23-29] [''''''] -- SELECT '', @@ -851,17 +903,23 @@ QueryStatement [0-45] [select b'abc...B'`', B"`"] Select [0-45] [select b'abc...B'`', B"`"] SelectList [7-45] [b'abc', B"...B'`', B"`"] SelectColumn [7-13] [b'abc'] - BytesLiteral(b'abc') [7-13] [b'abc'] + BytesLiteral [7-13] [b'abc'] + BytesLiteralComponent(b'abc') [7-13] [b'abc'] SelectColumn [15-21] [B"def"] - BytesLiteral(B"def") [15-21] [B"def"] + BytesLiteral [15-21] [B"def"] + BytesLiteralComponent(B"def") [15-21] [B"def"] SelectColumn [23-27] [B'"'] - BytesLiteral(B'"') [23-27] [B'"'] + BytesLiteral [23-27] [B'"'] + BytesLiteralComponent(B'"') [23-27] [B'"'] SelectColumn [29-33] [B"'"] - BytesLiteral(B"'") [29-33] [B"'"] + BytesLiteral [29-33] [B"'"] + BytesLiteralComponent(B"'") [29-33] [B"'"] SelectColumn [35-39] [B'`'] - BytesLiteral(B'`') [35-39] [B'`'] + BytesLiteral [35-39] [B'`'] + BytesLiteralComponent(B'`') [35-39] [B'`'] SelectColumn [41-45] [B"`"] - BytesLiteral(B"`") [41-45] [B"`"] + BytesLiteral [41-45] [B"`"] + BytesLiteralComponent(B"`") [41-45] [B"`"] -- SELECT b'abc', @@ -879,17 +937,23 @@ QueryStatement [0-77] [select b""..., B"""`"""] Select [0-77] [select b""..., B"""`"""] SelectList [7-77] [b"""abc"""..., B"""`"""] SelectColumn [7-17] [b"""abc"""] - BytesLiteral(b"""abc""") [7-17] [b"""abc"""] + BytesLiteral [7-17] [b"""abc"""] + BytesLiteralComponent(b"""abc""") [7-17] [b"""abc"""] SelectColumn [19-29] [B'''def'''] - BytesLiteral(B'''def''') [19-29] [B'''def'''] + BytesLiteral [19-29] [B'''def'''] + BytesLiteralComponent(B'''def''') [19-29] [B'''def'''] SelectColumn [31-45] [B"""'''a'''"""] - BytesLiteral(B"""'''a'''""") [31-45] [B"""'''a'''"""] + BytesLiteral [31-45] [B"""'''a'''"""] + BytesLiteralComponent(B"""'''a'''""") [31-45] [B"""'''a'''"""] SelectColumn [47-57] [b'''"a"'''] - BytesLiteral(b'''"a"''') [47-57] [b'''"a"'''] + BytesLiteral [47-57] [b'''"a"'''] + BytesLiteralComponent(b'''"a"''') [47-57] [b'''"a"'''] SelectColumn [59-67] [B'''`'''] - BytesLiteral(B'''`''') [59-67] [B'''`'''] + BytesLiteral [59-67] [B'''`'''] + BytesLiteralComponent(B'''`''') [59-67] [B'''`'''] SelectColumn [69-77] [B"""`"""] - BytesLiteral(B"""`""") [69-77] [B"""`"""] + BytesLiteral [69-77] [B"""`"""] + BytesLiteralComponent(B"""`""") [69-77] [B"""`"""] -- SELECT b"""abc""", @@ -923,71 +987,88 @@ QueryStatement [0-509] [select b'\...double_back_tick] Select [0-509] [select b'\...double_back_tick] SelectList [7-509] [b'\x53'...double_back_tick] SelectColumn [7-30] [b'\x53' as OneHexByte] - BytesLiteral(b'\x53') [7-14] [b'\x53'] + BytesLiteral [7-14] [b'\x53'] + BytesLiteralComponent(b'\x53') [7-14] [b'\x53'] Alias [17-30] [as OneHexByte] Identifier(OneHexByte) [20-30] [OneHexByte] SelectColumn [38-63] [b'\001' as OneOctalByte] - BytesLiteral(b'\001') [38-45] [b'\001'] + BytesLiteral [38-45] [b'\001'] + BytesLiteralComponent(b'\001') [38-45] [b'\001'] Alias [48-63] [as OneOctalByte] Identifier(OneOctalByte) [51-63] [OneOctalByte] SelectColumn [71-85] [b'\a...' as a] - BytesLiteral(b'\a...') [71-79] [b'\a...'] + BytesLiteral [71-79] [b'\a...'] + BytesLiteralComponent(b'\a...') [71-79] [b'\a...'] Alias [81-85] [as a] Identifier(a) [84-85] [a] SelectColumn [93-107] [b'\b...' as b] - BytesLiteral(b'\b...') [93-101] [b'\b...'] + BytesLiteral [93-101] [b'\b...'] + BytesLiteralComponent(b'\b...') [93-101] [b'\b...'] Alias [103-107] [as b] Identifier(b) [106-107] [b] SelectColumn [115-129] [b'\f...' as f] - BytesLiteral(b'\f...') [115-123] [b'\f...'] + BytesLiteral [115-123] [b'\f...'] + BytesLiteralComponent(b'\f...') [115-123] [b'\f...'] Alias [125-129] [as f] Identifier(f) [128-129] [f] SelectColumn [137-151] [b'\n...' as n] - BytesLiteral(b'\n...') [137-145] [b'\n...'] + BytesLiteral [137-145] [b'\n...'] + BytesLiteralComponent(b'\n...') [137-145] [b'\n...'] Alias [147-151] [as n] Identifier(n) [150-151] [n] SelectColumn [159-173] [b'\r...' as r] - BytesLiteral(b'\r...') [159-167] [b'\r...'] + BytesLiteral [159-167] [b'\r...'] + BytesLiteralComponent(b'\r...') [159-167] [b'\r...'] Alias [169-173] [as r] Identifier(r) [172-173] [r] SelectColumn [181-195] [b'\t...' as t] - BytesLiteral(b'\t...') [181-189] [b'\t...'] + BytesLiteral [181-189] [b'\t...'] + BytesLiteralComponent(b'\t...') [181-189] [b'\t...'] Alias [191-195] [as t] Identifier(t) [194-195] [t] SelectColumn [203-217] [b'\v...' as v] - BytesLiteral(b'\v...') [203-211] [b'\v...'] + BytesLiteral [203-211] [b'\v...'] + BytesLiteralComponent(b'\v...') [203-211] [b'\v...'] Alias [213-217] [as v] Identifier(v) [216-217] [v] SelectColumn [225-246] [b'\\...' as backslash] - BytesLiteral(b'\\...') [225-233] [b'\\...'] + BytesLiteral [225-233] [b'\\...'] + BytesLiteralComponent(b'\\...') [225-233] [b'\\...'] Alias [234-246] [as backslash] Identifier(backslash) [237-246] [backslash] SelectColumn [254-275] [b'\?...' as question] - BytesLiteral(b'\?...') [254-262] [b'\?...'] + BytesLiteral [254-262] [b'\?...'] + BytesLiteralComponent(b'\?...') [254-262] [b'\?...'] Alias [264-275] [as question] Identifier(question) [267-275] [question] SelectColumn [283-315] [b'\"...'...single_double_quote] - BytesLiteral(b'\"...') [283-291] [b'\"...'] + BytesLiteral [283-291] [b'\"...'] + BytesLiteralComponent(b'\"...') [283-291] [b'\"...'] Alias [293-315] [as single_double_quote] Identifier(single_double_quote) [296-315] [single_double_quote] SelectColumn [323-355] [b'\'...'...single_single_quote] - BytesLiteral(b'\'...') [323-331] [b'\'...'] + BytesLiteral [323-331] [b'\'...'] + BytesLiteralComponent(b'\'...') [323-331] [b'\'...'] Alias [333-355] [as single_single_quote] Identifier(single_single_quote) [336-355] [single_single_quote] SelectColumn [363-392] [b'\`...' as single_back_tick] - BytesLiteral(b'\`...') [363-371] [b'\`...'] + BytesLiteral [363-371] [b'\`...'] + BytesLiteralComponent(b'\`...') [363-371] [b'\`...'] Alias [373-392] [as single_back_tick] Identifier(single_back_tick) [376-392] [single_back_tick] SelectColumn [400-432] [b"\"..."...double_double_quote] - BytesLiteral(b"\"...") [400-408] [b"\"..."] + BytesLiteral [400-408] [b"\"..."] + BytesLiteralComponent(b"\"...") [400-408] [b"\"..."] Alias [410-432] [as double_double_quote] Identifier(double_double_quote) [413-432] [double_double_quote] SelectColumn [440-472] [b"\'..."...double_single_quote] - BytesLiteral(b"\'...") [440-448] [b"\'..."] + BytesLiteral [440-448] [b"\'..."] + BytesLiteralComponent(b"\'...") [440-448] [b"\'..."] Alias [450-472] [as double_single_quote] Identifier(double_single_quote) [453-472] [double_single_quote] SelectColumn [480-509] [b"\`..." as double_back_tick] - BytesLiteral(b"\`...") [480-488] [b"\`..."] + BytesLiteral [480-488] [b"\`..."] + BytesLiteralComponent(b"\`...") [480-488] [b"\`..."] Alias [490-509] [as double_back_tick] Identifier(double_back_tick) [493-509] [double_back_tick] -- @@ -1034,71 +1115,88 @@ QueryStatement [0-577] [select b''...double_back_tick] Select [0-577] [select b''...double_back_tick] SelectList [7-577] [b'''\x53''...double_back_tick] SelectColumn [7-34] [b'''\x53''' as OneHexByte] - BytesLiteral(b'''\x53''') [7-18] [b'''\x53'''] + BytesLiteral [7-18] [b'''\x53'''] + BytesLiteralComponent(b'''\x53''') [7-18] [b'''\x53'''] Alias [21-34] [as OneHexByte] Identifier(OneHexByte) [24-34] [OneHexByte] SelectColumn [42-71] [b'''\001''' as OneOctalByte] - BytesLiteral(b'''\001''') [42-53] [b'''\001'''] + BytesLiteral [42-53] [b'''\001'''] + BytesLiteralComponent(b'''\001''') [42-53] [b'''\001'''] Alias [56-71] [as OneOctalByte] Identifier(OneOctalByte) [59-71] [OneOctalByte] SelectColumn [79-97] [b'''\a...''' as a] - BytesLiteral(b'''\a...''') [79-91] [b'''\a...'''] + BytesLiteral [79-91] [b'''\a...'''] + BytesLiteralComponent(b'''\a...''') [79-91] [b'''\a...'''] Alias [93-97] [as a] Identifier(a) [96-97] [a] SelectColumn [105-123] [b'''\b...''' as b] - BytesLiteral(b'''\b...''') [105-117] [b'''\b...'''] + BytesLiteral [105-117] [b'''\b...'''] + BytesLiteralComponent(b'''\b...''') [105-117] [b'''\b...'''] Alias [119-123] [as b] Identifier(b) [122-123] [b] SelectColumn [131-149] [b'''\f...''' as f] - BytesLiteral(b'''\f...''') [131-143] [b'''\f...'''] + BytesLiteral [131-143] [b'''\f...'''] + BytesLiteralComponent(b'''\f...''') [131-143] [b'''\f...'''] Alias [145-149] [as f] Identifier(f) [148-149] [f] SelectColumn [157-175] [b'''\n...''' as n] - BytesLiteral(b'''\n...''') [157-169] [b'''\n...'''] + BytesLiteral [157-169] [b'''\n...'''] + BytesLiteralComponent(b'''\n...''') [157-169] [b'''\n...'''] Alias [171-175] [as n] Identifier(n) [174-175] [n] SelectColumn [183-201] [b'''\r...''' as r] - BytesLiteral(b'''\r...''') [183-195] [b'''\r...'''] + BytesLiteral [183-195] [b'''\r...'''] + BytesLiteralComponent(b'''\r...''') [183-195] [b'''\r...'''] Alias [197-201] [as r] Identifier(r) [200-201] [r] SelectColumn [209-227] [b'''\t...''' as t] - BytesLiteral(b'''\t...''') [209-221] [b'''\t...'''] + BytesLiteral [209-221] [b'''\t...'''] + BytesLiteralComponent(b'''\t...''') [209-221] [b'''\t...'''] Alias [223-227] [as t] Identifier(t) [226-227] [t] SelectColumn [235-253] [b'''\v...''' as v] - BytesLiteral(b'''\v...''') [235-247] [b'''\v...'''] + BytesLiteral [235-247] [b'''\v...'''] + BytesLiteralComponent(b'''\v...''') [235-247] [b'''\v...'''] Alias [249-253] [as v] Identifier(v) [252-253] [v] SelectColumn [261-286] [b'''\\...''' as backslash] - BytesLiteral(b'''\\...''') [261-273] [b'''\\...'''] + BytesLiteral [261-273] [b'''\\...'''] + BytesLiteralComponent(b'''\\...''') [261-273] [b'''\\...'''] Alias [274-286] [as backslash] Identifier(backslash) [277-286] [backslash] SelectColumn [294-319] [b'''\?...''' as question] - BytesLiteral(b'''\?...''') [294-306] [b'''\?...'''] + BytesLiteral [294-306] [b'''\?...'''] + BytesLiteralComponent(b'''\?...''') [294-306] [b'''\?...'''] Alias [308-319] [as question] Identifier(question) [311-319] [question] SelectColumn [327-363] [b'''\"...'...ngle_double_quote] - BytesLiteral(b'''\"...''') [327-339] [b'''\"...'''] + BytesLiteral [327-339] [b'''\"...'''] + BytesLiteralComponent(b'''\"...''') [327-339] [b'''\"...'''] Alias [341-363] [as single_double_quote] Identifier(single_double_quote) [344-363] [single_double_quote] SelectColumn [371-407] [b'''\'...'...ngle_single_quote] - BytesLiteral(b'''\'...''') [371-383] [b'''\'...'''] + BytesLiteral [371-383] [b'''\'...'''] + BytesLiteralComponent(b'''\'...''') [371-383] [b'''\'...'''] Alias [385-407] [as single_single_quote] Identifier(single_single_quote) [388-407] [single_single_quote] SelectColumn [415-448] [b'''\`...'...single_back_tick] - BytesLiteral(b'''\`...''') [415-427] [b'''\`...'''] + BytesLiteral [415-427] [b'''\`...'''] + BytesLiteralComponent(b'''\`...''') [415-427] [b'''\`...'''] Alias [429-448] [as single_back_tick] Identifier(single_back_tick) [432-448] [single_back_tick] SelectColumn [456-492] [b"""\"..."...uble_double_quote] - BytesLiteral(b"""\"...""") [456-468] [b"""\"..."""] + BytesLiteral [456-468] [b"""\"..."""] + BytesLiteralComponent(b"""\"...""") [456-468] [b"""\"..."""] Alias [470-492] [as double_double_quote] Identifier(double_double_quote) [473-492] [double_double_quote] SelectColumn [500-536] [b"""\'..."...uble_single_quote] - BytesLiteral(b"""\'...""") [500-512] [b"""\'..."""] + BytesLiteral [500-512] [b"""\'..."""] + BytesLiteralComponent(b"""\'...""") [500-512] [b"""\'..."""] Alias [514-536] [as double_single_quote] Identifier(double_single_quote) [517-536] [double_single_quote] SelectColumn [544-577] [b"""\`..."...double_back_tick] - BytesLiteral(b"""\`...""") [544-556] [b"""\`..."""] + BytesLiteral [544-556] [b"""\`..."""] + BytesLiteralComponent(b"""\`...""") [544-556] [b"""\`..."""] Alias [558-577] [as double_back_tick] Identifier(double_back_tick) [561-577] [double_back_tick] -- @@ -1194,7 +1292,8 @@ QueryStatement [0-32] [select b''......'''] Select [0-32] [select b''......'''] SelectList [7-32] [b'''... ...'''] SelectColumn [7-32] [b'''... ...'''] - BytesLiteral(b'''... + BytesLiteral [7-32] [b'''... ...'''] + BytesLiteralComponent(b'''... ...''') [7-32] [b'''... ...'''] -- SELECT @@ -1210,13 +1309,17 @@ QueryStatement [0-33] [select b''...', B""""""] Select [0-33] [select b''...', B""""""] SelectList [7-33] [b'', B"", b'''''', B""""""] SelectColumn [7-10] [b''] - BytesLiteral(b'') [7-10] [b''] + BytesLiteral [7-10] [b''] + BytesLiteralComponent(b'') [7-10] [b''] SelectColumn [12-15] [B""] - BytesLiteral(B"") [12-15] [B""] + BytesLiteral [12-15] [B""] + BytesLiteralComponent(B"") [12-15] [B""] SelectColumn [17-24] [b''''''] - BytesLiteral(b'''''') [17-24] [b''''''] + BytesLiteral [17-24] [b''''''] + BytesLiteralComponent(b'''''') [17-24] [b''''''] SelectColumn [26-33] [B""""""] - BytesLiteral(B"""""") [26-33] [B""""""] + BytesLiteral [26-33] [B""""""] + BytesLiteralComponent(B"""""") [26-33] [B""""""] -- SELECT b'', @@ -1232,9 +1335,11 @@ QueryStatement [0-33] [select b''...''abc\\'''] Select [0-33] [select b''...''abc\\'''] SelectList [7-33] [b'''abc\'''', B'''abc\\'''] SelectColumn [7-19] [b'''abc\''''] - BytesLiteral(b'''abc\'''') [7-19] [b'''abc\''''] + BytesLiteral [7-19] [b'''abc\''''] + BytesLiteralComponent(b'''abc\'''') [7-19] [b'''abc\''''] SelectColumn [21-33] [B'''abc\\'''] - BytesLiteral(B'''abc\\''') [21-33] [B'''abc\\'''] + BytesLiteral [21-33] [B'''abc\\'''] + BytesLiteralComponent(B'''abc\\''') [21-33] [B'''abc\\'''] -- SELECT b'''abc\'''', @@ -1263,29 +1368,41 @@ QueryStatement [0-165] [select r"1....*),def\?'] Select [0-165] [select r"1....*),def\?'] SelectList [7-165] [r"1",....*),def\?'] SelectColumn [7-11] [r"1"] - StringLiteral(r"1") [7-11] [r"1"] + StringLiteral [7-11] [r"1"] + StringLiteralComponent(r"1") [7-11] [r"1"] SelectColumn [20-27] [r"\x53"] - StringLiteral(r"\x53") [20-27] [r"\x53"] + StringLiteral [20-27] [r"\x53"] + StringLiteralComponent(r"\x53") [20-27] [r"\x53"] SelectColumn [29-37] [r"\x123"] - StringLiteral(r"\x123") [29-37] [r"\x123"] + StringLiteral [29-37] [r"\x123"] + StringLiteralComponent(r"\x123") [29-37] [r"\x123"] SelectColumn [39-46] [r'\001'] - StringLiteral(r'\001') [39-46] [r'\001'] + StringLiteral [39-46] [r'\001'] + StringLiteralComponent(r'\001') [39-46] [r'\001'] SelectColumn [48-57] [r'a\444A'] - StringLiteral(r'a\444A') [48-57] [r'a\444A'] + StringLiteral [48-57] [r'a\444A'] + StringLiteralComponent(r'a\444A') [48-57] [r'a\444A'] SelectColumn [66-72] [r'a\e'] - StringLiteral(r'a\e') [66-72] [r'a\e'] + StringLiteral [66-72] [r'a\e'] + StringLiteralComponent(r'a\e') [66-72] [r'a\e'] SelectColumn [74-80] [r'\ea'] - StringLiteral(r'\ea') [74-80] [r'\ea'] + StringLiteral [74-80] [r'\ea'] + StringLiteralComponent(r'\ea') [74-80] [r'\ea'] SelectColumn [89-98] [r"\U1234"] - StringLiteral(r"\U1234") [89-98] [r"\U1234"] + StringLiteral [89-98] [r"\U1234"] + StringLiteralComponent(r"\U1234") [89-98] [r"\U1234"] SelectColumn [100-105] [R"\u"] - StringLiteral(R"\u") [100-105] [R"\u"] + StringLiteral [100-105] [R"\u"] + StringLiteralComponent(R"\u") [100-105] [R"\u"] SelectColumn [114-123] [r'\xc2\\'] - StringLiteral(r'\xc2\\') [114-123] [r'\xc2\\'] + StringLiteral [114-123] [r'\xc2\\'] + StringLiteralComponent(r'\xc2\\') [114-123] [r'\xc2\\'] SelectColumn [125-136] [r'|\xc2|\\'] - StringLiteral(r'|\xc2|\\') [125-136] [r'|\xc2|\\'] + StringLiteral [125-136] [r'|\xc2|\\'] + StringLiteralComponent(r'|\xc2|\\') [125-136] [r'|\xc2|\\'] SelectColumn [145-165] [r'f\(abc,(.*),def\?'] - StringLiteral(r'f\(abc,(.*),def\?') [145-165] [r'f\(abc,(.*),def\?'] + StringLiteral [145-165] [r'f\(abc,(.*),def\?'] + StringLiteralComponent(r'f\(abc,(.*),def\?') [145-165] [r'f\(abc,(.*),def\?'] -- SELECT r"1", @@ -1315,29 +1432,41 @@ QueryStatement [0-177] [select rb"....*),def\?'] Select [0-177] [select rb"....*),def\?'] SelectList [7-177] [rb"1",....*),def\?'] SelectColumn [7-12] [rb"1"] - BytesLiteral(rb"1") [7-12] [rb"1"] + BytesLiteral [7-12] [rb"1"] + BytesLiteralComponent(rb"1") [7-12] [rb"1"] SelectColumn [21-29] [rb"\x53"] - BytesLiteral(rb"\x53") [21-29] [rb"\x53"] + BytesLiteral [21-29] [rb"\x53"] + BytesLiteralComponent(rb"\x53") [21-29] [rb"\x53"] SelectColumn [31-40] [rb"\x123"] - BytesLiteral(rb"\x123") [31-40] [rb"\x123"] + BytesLiteral [31-40] [rb"\x123"] + BytesLiteralComponent(rb"\x123") [31-40] [rb"\x123"] SelectColumn [42-50] [rb'\001'] - BytesLiteral(rb'\001') [42-50] [rb'\001'] + BytesLiteral [42-50] [rb'\001'] + BytesLiteralComponent(rb'\001') [42-50] [rb'\001'] SelectColumn [52-62] [rb'a\444A'] - BytesLiteral(rb'a\444A') [52-62] [rb'a\444A'] + BytesLiteral [52-62] [rb'a\444A'] + BytesLiteralComponent(rb'a\444A') [52-62] [rb'a\444A'] SelectColumn [71-78] [rb'a\e'] - BytesLiteral(rb'a\e') [71-78] [rb'a\e'] + BytesLiteral [71-78] [rb'a\e'] + BytesLiteralComponent(rb'a\e') [71-78] [rb'a\e'] SelectColumn [80-87] [rb'\ea'] - BytesLiteral(rb'\ea') [80-87] [rb'\ea'] + BytesLiteral [80-87] [rb'\ea'] + BytesLiteralComponent(rb'\ea') [80-87] [rb'\ea'] SelectColumn [96-106] [rb"\U1234"] - BytesLiteral(rb"\U1234") [96-106] [rb"\U1234"] + BytesLiteral [96-106] [rb"\U1234"] + BytesLiteralComponent(rb"\U1234") [96-106] [rb"\U1234"] SelectColumn [108-114] [RB"\u"] - BytesLiteral(RB"\u") [108-114] [RB"\u"] + BytesLiteral [108-114] [RB"\u"] + BytesLiteralComponent(RB"\u") [108-114] [RB"\u"] SelectColumn [123-133] [rb'\xc2\\'] - BytesLiteral(rb'\xc2\\') [123-133] [rb'\xc2\\'] + BytesLiteral [123-133] [rb'\xc2\\'] + BytesLiteralComponent(rb'\xc2\\') [123-133] [rb'\xc2\\'] SelectColumn [135-147] [rb'|\xc2|\\'] - BytesLiteral(rb'|\xc2|\\') [135-147] [rb'|\xc2|\\'] + BytesLiteral [135-147] [rb'|\xc2|\\'] + BytesLiteralComponent(rb'|\xc2|\\') [135-147] [rb'|\xc2|\\'] SelectColumn [156-177] [rb'f\(abc,(.*),def\?'] - BytesLiteral(rb'f\(abc,(.*),def\?') [156-177] [rb'f\(abc,(.*),def\?'] + BytesLiteral [156-177] [rb'f\(abc,(.*),def\?'] + BytesLiteralComponent(rb'f\(abc,(.*),def\?') [156-177] [rb'f\(abc,(.*),def\?'] -- SELECT rb"1", @@ -1367,29 +1496,41 @@ QueryStatement [0-225] [select rb"...),def\?'''] Select [0-225] [select rb"...),def\?'''] SelectList [7-225] [rb"""1""",...),def\?'''] SelectColumn [7-16] [rb"""1"""] - BytesLiteral(rb"""1""") [7-16] [rb"""1"""] + BytesLiteral [7-16] [rb"""1"""] + BytesLiteralComponent(rb"""1""") [7-16] [rb"""1"""] SelectColumn [25-37] [rb"""\x53"""] - BytesLiteral(rb"""\x53""") [25-37] [rb"""\x53"""] + BytesLiteral [25-37] [rb"""\x53"""] + BytesLiteralComponent(rb"""\x53""") [25-37] [rb"""\x53"""] SelectColumn [39-52] [rb"""\x123"""] - BytesLiteral(rb"""\x123""") [39-52] [rb"""\x123"""] + BytesLiteral [39-52] [rb"""\x123"""] + BytesLiteralComponent(rb"""\x123""") [39-52] [rb"""\x123"""] SelectColumn [54-66] [rb'''\001'''] - BytesLiteral(rb'''\001''') [54-66] [rb'''\001'''] + BytesLiteral [54-66] [rb'''\001'''] + BytesLiteralComponent(rb'''\001''') [54-66] [rb'''\001'''] SelectColumn [68-82] [rb'''a\444A'''] - BytesLiteral(rb'''a\444A''') [68-82] [rb'''a\444A'''] + BytesLiteral [68-82] [rb'''a\444A'''] + BytesLiteralComponent(rb'''a\444A''') [68-82] [rb'''a\444A'''] SelectColumn [91-102] [rb'''a\e'''] - BytesLiteral(rb'''a\e''') [91-102] [rb'''a\e'''] + BytesLiteral [91-102] [rb'''a\e'''] + BytesLiteralComponent(rb'''a\e''') [91-102] [rb'''a\e'''] SelectColumn [104-115] [rb'''\ea'''] - BytesLiteral(rb'''\ea''') [104-115] [rb'''\ea'''] + BytesLiteral [104-115] [rb'''\ea'''] + BytesLiteralComponent(rb'''\ea''') [104-115] [rb'''\ea'''] SelectColumn [124-138] [rb"""\U1234"""] - BytesLiteral(rb"""\U1234""") [124-138] [rb"""\U1234"""] + BytesLiteral [124-138] [rb"""\U1234"""] + BytesLiteralComponent(rb"""\U1234""") [124-138] [rb"""\U1234"""] SelectColumn [140-150] [RB"""\u"""] - BytesLiteral(RB"""\u""") [140-150] [RB"""\u"""] + BytesLiteral [140-150] [RB"""\u"""] + BytesLiteralComponent(RB"""\u""") [140-150] [RB"""\u"""] SelectColumn [159-173] [rb'''\xc2\\'''] - BytesLiteral(rb'''\xc2\\''') [159-173] [rb'''\xc2\\'''] + BytesLiteral [159-173] [rb'''\xc2\\'''] + BytesLiteralComponent(rb'''\xc2\\''') [159-173] [rb'''\xc2\\'''] SelectColumn [175-191] [rb'''|\xc2|\\'''] - BytesLiteral(rb'''|\xc2|\\''') [175-191] [rb'''|\xc2|\\'''] + BytesLiteral [175-191] [rb'''|\xc2|\\'''] + BytesLiteralComponent(rb'''|\xc2|\\''') [175-191] [rb'''|\xc2|\\'''] SelectColumn [200-225] [rb'''f\(abc,(.*),def\?'''] - BytesLiteral(rb'''f\(abc,(.*),def\?''') [200-225] [rb'''f\(abc,(.*),def\?'''] + BytesLiteral [200-225] [rb'''f\(abc,(.*),def\?'''] + BytesLiteralComponent(rb'''f\(abc,(.*),def\?''') [200-225] [rb'''f\(abc,(.*),def\?'''] -- SELECT rb"""1""", @@ -1419,29 +1560,41 @@ QueryStatement [0-177] [select br"....*),def\?'] Select [0-177] [select br"....*),def\?'] SelectList [7-177] [br"1",....*),def\?'] SelectColumn [7-12] [br"1"] - BytesLiteral(br"1") [7-12] [br"1"] + BytesLiteral [7-12] [br"1"] + BytesLiteralComponent(br"1") [7-12] [br"1"] SelectColumn [21-29] [br"\x53"] - BytesLiteral(br"\x53") [21-29] [br"\x53"] + BytesLiteral [21-29] [br"\x53"] + BytesLiteralComponent(br"\x53") [21-29] [br"\x53"] SelectColumn [31-40] [br"\x123"] - BytesLiteral(br"\x123") [31-40] [br"\x123"] + BytesLiteral [31-40] [br"\x123"] + BytesLiteralComponent(br"\x123") [31-40] [br"\x123"] SelectColumn [42-50] [br'\001'] - BytesLiteral(br'\001') [42-50] [br'\001'] + BytesLiteral [42-50] [br'\001'] + BytesLiteralComponent(br'\001') [42-50] [br'\001'] SelectColumn [52-62] [br'a\444A'] - BytesLiteral(br'a\444A') [52-62] [br'a\444A'] + BytesLiteral [52-62] [br'a\444A'] + BytesLiteralComponent(br'a\444A') [52-62] [br'a\444A'] SelectColumn [71-78] [br'a\e'] - BytesLiteral(br'a\e') [71-78] [br'a\e'] + BytesLiteral [71-78] [br'a\e'] + BytesLiteralComponent(br'a\e') [71-78] [br'a\e'] SelectColumn [80-87] [br'\ea'] - BytesLiteral(br'\ea') [80-87] [br'\ea'] + BytesLiteral [80-87] [br'\ea'] + BytesLiteralComponent(br'\ea') [80-87] [br'\ea'] SelectColumn [96-106] [br"\U1234"] - BytesLiteral(br"\U1234") [96-106] [br"\U1234"] + BytesLiteral [96-106] [br"\U1234"] + BytesLiteralComponent(br"\U1234") [96-106] [br"\U1234"] SelectColumn [108-114] [BR"\u"] - BytesLiteral(BR"\u") [108-114] [BR"\u"] + BytesLiteral [108-114] [BR"\u"] + BytesLiteralComponent(BR"\u") [108-114] [BR"\u"] SelectColumn [123-133] [br'\xc2\\'] - BytesLiteral(br'\xc2\\') [123-133] [br'\xc2\\'] + BytesLiteral [123-133] [br'\xc2\\'] + BytesLiteralComponent(br'\xc2\\') [123-133] [br'\xc2\\'] SelectColumn [135-147] [br'|\xc2|\\'] - BytesLiteral(br'|\xc2|\\') [135-147] [br'|\xc2|\\'] + BytesLiteral [135-147] [br'|\xc2|\\'] + BytesLiteralComponent(br'|\xc2|\\') [135-147] [br'|\xc2|\\'] SelectColumn [156-177] [br'f\(abc,(.*),def\?'] - BytesLiteral(br'f\(abc,(.*),def\?') [156-177] [br'f\(abc,(.*),def\?'] + BytesLiteral [156-177] [br'f\(abc,(.*),def\?'] + BytesLiteralComponent(br'f\(abc,(.*),def\?') [156-177] [br'f\(abc,(.*),def\?'] -- SELECT br"1", @@ -1471,29 +1624,41 @@ QueryStatement [0-225] [select br"...),def\?'''] Select [0-225] [select br"...),def\?'''] SelectList [7-225] [br"""1""",...),def\?'''] SelectColumn [7-16] [br"""1"""] - BytesLiteral(br"""1""") [7-16] [br"""1"""] + BytesLiteral [7-16] [br"""1"""] + BytesLiteralComponent(br"""1""") [7-16] [br"""1"""] SelectColumn [25-37] [br"""\x53"""] - BytesLiteral(br"""\x53""") [25-37] [br"""\x53"""] + BytesLiteral [25-37] [br"""\x53"""] + BytesLiteralComponent(br"""\x53""") [25-37] [br"""\x53"""] SelectColumn [39-52] [br"""\x123"""] - BytesLiteral(br"""\x123""") [39-52] [br"""\x123"""] + BytesLiteral [39-52] [br"""\x123"""] + BytesLiteralComponent(br"""\x123""") [39-52] [br"""\x123"""] SelectColumn [54-66] [br'''\001'''] - BytesLiteral(br'''\001''') [54-66] [br'''\001'''] + BytesLiteral [54-66] [br'''\001'''] + BytesLiteralComponent(br'''\001''') [54-66] [br'''\001'''] SelectColumn [68-82] [br'''a\444A'''] - BytesLiteral(br'''a\444A''') [68-82] [br'''a\444A'''] + BytesLiteral [68-82] [br'''a\444A'''] + BytesLiteralComponent(br'''a\444A''') [68-82] [br'''a\444A'''] SelectColumn [91-102] [br'''a\e'''] - BytesLiteral(br'''a\e''') [91-102] [br'''a\e'''] + BytesLiteral [91-102] [br'''a\e'''] + BytesLiteralComponent(br'''a\e''') [91-102] [br'''a\e'''] SelectColumn [104-115] [br'''\ea'''] - BytesLiteral(br'''\ea''') [104-115] [br'''\ea'''] + BytesLiteral [104-115] [br'''\ea'''] + BytesLiteralComponent(br'''\ea''') [104-115] [br'''\ea'''] SelectColumn [124-138] [br"""\U1234"""] - BytesLiteral(br"""\U1234""") [124-138] [br"""\U1234"""] + BytesLiteral [124-138] [br"""\U1234"""] + BytesLiteralComponent(br"""\U1234""") [124-138] [br"""\U1234"""] SelectColumn [140-150] [BR"""\u"""] - BytesLiteral(BR"""\u""") [140-150] [BR"""\u"""] + BytesLiteral [140-150] [BR"""\u"""] + BytesLiteralComponent(BR"""\u""") [140-150] [BR"""\u"""] SelectColumn [159-173] [br'''\xc2\\'''] - BytesLiteral(br'''\xc2\\''') [159-173] [br'''\xc2\\'''] + BytesLiteral [159-173] [br'''\xc2\\'''] + BytesLiteralComponent(br'''\xc2\\''') [159-173] [br'''\xc2\\'''] SelectColumn [175-191] [br'''|\xc2|\\'''] - BytesLiteral(br'''|\xc2|\\''') [175-191] [br'''|\xc2|\\'''] + BytesLiteral [175-191] [br'''|\xc2|\\'''] + BytesLiteralComponent(br'''|\xc2|\\''') [175-191] [br'''|\xc2|\\'''] SelectColumn [200-225] [br'''f\(abc,(.*),def\?'''] - BytesLiteral(br'''f\(abc,(.*),def\?''') [200-225] [br'''f\(abc,(.*),def\?'''] + BytesLiteral [200-225] [br'''f\(abc,(.*),def\?'''] + BytesLiteralComponent(br'''f\(abc,(.*),def\?''') [200-225] [br'''f\(abc,(.*),def\?'''] -- SELECT br"""1""", @@ -1532,9 +1697,11 @@ QueryStatement [0-21] [select r'\\', r'\\\\'] Select [0-21] [select r'\\', r'\\\\'] SelectList [7-21] [r'\\', r'\\\\'] SelectColumn [7-12] [r'\\'] - StringLiteral(r'\\') [7-12] [r'\\'] + StringLiteral [7-12] [r'\\'] + StringLiteralComponent(r'\\') [7-12] [r'\\'] SelectColumn [14-21] [r'\\\\'] - StringLiteral(r'\\\\') [14-21] [r'\\\\'] + StringLiteral [14-21] [r'\\\\'] + StringLiteralComponent(r'\\\\') [14-21] [r'\\\\'] -- SELECT r'\\', @@ -1577,21 +1744,29 @@ QueryStatement [0-127] [select r'a..."a""\"b"""] Select [0-127] [select r'a..."a""\"b"""] SelectList [7-127] [r'a\'b', r..."a""\"b"""] SelectColumn [7-14] [r'a\'b'] - StringLiteral(r'a\'b') [7-14] [r'a\'b'] + StringLiteral [7-14] [r'a\'b'] + StringLiteralComponent(r'a\'b') [7-14] [r'a\'b'] SelectColumn [16-23] [r"a\"b"] - StringLiteral(r"a\"b") [16-23] [r"a\"b"] + StringLiteral [16-23] [r"a\"b"] + StringLiteralComponent(r"a\"b") [16-23] [r"a\"b"] SelectColumn [32-45] [r'''a\'''b'''] - StringLiteral(r'''a\'''b''') [32-45] [r'''a\'''b'''] + StringLiteral [32-45] [r'''a\'''b'''] + StringLiteralComponent(r'''a\'''b''') [32-45] [r'''a\'''b'''] SelectColumn [47-60] [r'''a'\''b'''] - StringLiteral(r'''a'\''b''') [47-60] [r'''a'\''b'''] + StringLiteral [47-60] [r'''a'\''b'''] + StringLiteralComponent(r'''a'\''b''') [47-60] [r'''a'\''b'''] SelectColumn [62-75] [r'''a''\'b'''] - StringLiteral(r'''a''\'b''') [62-75] [r'''a''\'b'''] + StringLiteral [62-75] [r'''a''\'b'''] + StringLiteralComponent(r'''a''\'b''') [62-75] [r'''a''\'b'''] SelectColumn [84-97] [r"""a\"""b"""] - StringLiteral(r"""a\"""b""") [84-97] [r"""a\"""b"""] + StringLiteral [84-97] [r"""a\"""b"""] + StringLiteralComponent(r"""a\"""b""") [84-97] [r"""a\"""b"""] SelectColumn [99-112] [r"""a"\""b"""] - StringLiteral(r"""a"\""b""") [99-112] [r"""a"\""b"""] + StringLiteral [99-112] [r"""a"\""b"""] + StringLiteralComponent(r"""a"\""b""") [99-112] [r"""a"\""b"""] SelectColumn [114-127] [r"""a""\"b"""] - StringLiteral(r"""a""\"b""") [114-127] [r"""a""\"b"""] + StringLiteral [114-127] [r"""a""\"b"""] + StringLiteralComponent(r"""a""\"b""") [114-127] [r"""a""\"b"""] -- SELECT r'a\'b', @@ -1646,22 +1821,28 @@ QueryStatement [0-213] [select r''......'''] Select [0-213] [select r''......'''] SelectList [7-213] [r'''.........'''] SelectColumn [7-32] [r'''... ...'''] - StringLiteral(r'''... + StringLiteral [7-32] [r'''... ...'''] + StringLiteralComponent(r'''... ...''') [7-32] [r'''... ...'''] SelectColumn [41-67] [r'''...\ ...'''] - StringLiteral(r'''...\ + StringLiteral [41-67] [r'''...\ ...'''] + StringLiteralComponent(r'''...\ ...''') [41-67] [r'''...\ ...'''] SelectColumn [76-103] [br'''... ...'''] - BytesLiteral(br'''... + BytesLiteral [76-103] [br'''... ...'''] + BytesLiteralComponent(br'''... ...''') [76-103] [br'''... ...'''] SelectColumn [112-140] [br'''...\ ...'''] - BytesLiteral(br'''...\ + BytesLiteral [112-140] [br'''...\ ...'''] + BytesLiteralComponent(br'''...\ ...''') [112-140] [br'''...\ ...'''] SelectColumn [149-176] [rb'''... ...'''] - BytesLiteral(rb'''... + BytesLiteral [149-176] [rb'''... ...'''] + BytesLiteralComponent(rb'''... ...''') [149-176] [rb'''... ...'''] SelectColumn [185-213] [rb'''...\ ...'''] - BytesLiteral(rb'''...\ + BytesLiteral [185-213] [rb'''...\ ...'''] + BytesLiteralComponent(rb'''...\ ...''') [185-213] [rb'''...\ ...'''] -- SELECT @@ -1687,13 +1868,17 @@ QueryStatement [0-33] [select r''...', r""""""] Select [0-33] [select r''...', r""""""] SelectList [7-33] [r'', r"", r'''''', r""""""] SelectColumn [7-10] [r''] - StringLiteral(r'') [7-10] [r''] + StringLiteral [7-10] [r''] + StringLiteralComponent(r'') [7-10] [r''] SelectColumn [12-15] [r""] - StringLiteral(r"") [12-15] [r""] + StringLiteral [12-15] [r""] + StringLiteralComponent(r"") [12-15] [r""] SelectColumn [17-24] [r''''''] - StringLiteral(r'''''') [17-24] [r''''''] + StringLiteral [17-24] [r''''''] + StringLiteralComponent(r'''''') [17-24] [r''''''] SelectColumn [26-33] [r""""""] - StringLiteral(r"""""") [26-33] [r""""""] + StringLiteral [26-33] [r""""""] + StringLiteralComponent(r"""""") [26-33] [r""""""] -- SELECT r'', @@ -1714,13 +1899,17 @@ QueryStatement [0-158] [select '...quotes'''] Select [0-158] [select '...quotes'''] SelectList [9-158] ['Sentence...quotes'''] SelectColumn [9-32] ['Sentence with ) in it'] - StringLiteral('Sentence with ) in it') [9-32] ['Sentence with ) in it'] + StringLiteral [9-32] ['Sentence with ) in it'] + StringLiteralComponent('Sentence with ) in it') [9-32] ['Sentence with ) in it'] SelectColumn [36-70] ["Also with...double quotes"] - StringLiteral("Also with ) and in double quotes") [36-70] ["Also with...double quotes"] + StringLiteral [36-70] ["Also with...double quotes"] + StringLiteralComponent("Also with ) and in double quotes") [36-70] ["Also with...double quotes"] SelectColumn [74-114] ["""with )...quotes"""] - StringLiteral("""with ) and in triple double quotes""") [74-114] ["""with )...quotes"""] + StringLiteral [74-114] ["""with )...quotes"""] + StringLiteralComponent("""with ) and in triple double quotes""") [74-114] ["""with )...quotes"""] SelectColumn [118-158] ['''with )...quotes'''] - StringLiteral('''with ) and in triple single quotes''') [118-158] ['''with )...quotes'''] + StringLiteral [118-158] ['''with )...quotes'''] + StringLiteralComponent('''with ) and in triple single quotes''') [118-158] ['''with )...quotes'''] -- SELECT 'Sentence with ) in it', @@ -1737,21 +1926,29 @@ QueryStatement [0-69] [select rb'..., br""""""] Select [0-69] [select rb'..., br""""""] SelectList [7-69] [rb'', br''..., br""""""] SelectColumn [7-11] [rb''] - BytesLiteral(rb'') [7-11] [rb''] + BytesLiteral [7-11] [rb''] + BytesLiteralComponent(rb'') [7-11] [rb''] SelectColumn [13-17] [br''] - BytesLiteral(br'') [13-17] [br''] + BytesLiteral [13-17] [br''] + BytesLiteralComponent(br'') [13-17] [br''] SelectColumn [19-23] [rb""] - BytesLiteral(rb"") [19-23] [rb""] + BytesLiteral [19-23] [rb""] + BytesLiteralComponent(rb"") [19-23] [rb""] SelectColumn [25-29] [br""] - BytesLiteral(br"") [25-29] [br""] + BytesLiteral [25-29] [br""] + BytesLiteralComponent(br"") [25-29] [br""] SelectColumn [31-39] [rb''''''] - BytesLiteral(rb'''''') [31-39] [rb''''''] + BytesLiteral [31-39] [rb''''''] + BytesLiteralComponent(rb'''''') [31-39] [rb''''''] SelectColumn [41-49] [br''''''] - BytesLiteral(br'''''') [41-49] [br''''''] + BytesLiteral [41-49] [br''''''] + BytesLiteralComponent(br'''''') [41-49] [br''''''] SelectColumn [51-59] [rb""""""] - BytesLiteral(rb"""""") [51-59] [rb""""""] + BytesLiteral [51-59] [rb""""""] + BytesLiteralComponent(rb"""""") [51-59] [rb""""""] SelectColumn [61-69] [br""""""] - BytesLiteral(br"""""") [61-69] [br""""""] + BytesLiteral [61-69] [br""""""] + BytesLiteralComponent(br"""""") [61-69] [br""""""] -- SELECT rb'', @@ -1960,53 +2157,101 @@ QueryStatement [0-693] [select NUMERIC...NUMERIC '-'] Select [0-693] [select NUMERIC...NUMERIC '-'] SelectList [7-693] [NUMERIC '1...NUMERIC '-'] SelectColumn [7-20] [NUMERIC '1.1'] - NumericLiteral('1.1') [7-20] [NUMERIC '1.1'] + NumericLiteral [7-20] [NUMERIC '1.1'] + StringLiteral [15-20] ['1.1'] + StringLiteralComponent('1.1') [15-20] ['1.1'] SelectColumn [29-44] [NUMERIC '0.123'] - NumericLiteral('0.123') [29-44] [NUMERIC '0.123'] + NumericLiteral [29-44] [NUMERIC '0.123'] + StringLiteral [37-44] ['0.123'] + StringLiteralComponent('0.123') [37-44] ['0.123'] SelectColumn [53-68] [NUMERIC '456.0'] - NumericLiteral('456.0') [53-68] [NUMERIC '456.0'] + NumericLiteral [53-68] [NUMERIC '456.0'] + StringLiteral [61-68] ['456.0'] + StringLiteralComponent('456.0') [61-68] ['456.0'] SelectColumn [77-90] [NUMERIC '123'] - NumericLiteral('123') [77-90] [NUMERIC '123'] + NumericLiteral [77-90] [NUMERIC '123'] + StringLiteral [85-90] ['123'] + StringLiteralComponent('123') [85-90] ['123'] SelectColumn [99-112] [NUMERIC "123"] - NumericLiteral("123") [99-112] [NUMERIC "123"] + NumericLiteral [99-112] [NUMERIC "123"] + StringLiteral [107-112] ["123"] + StringLiteralComponent("123") [107-112] ["123"] SelectColumn [121-132] [NUMERIC '0'] - NumericLiteral('0') [121-132] [NUMERIC '0'] + NumericLiteral [121-132] [NUMERIC '0'] + StringLiteral [129-132] ['0'] + StringLiteralComponent('0') [129-132] ['0'] SelectColumn [141-155] [NUMERIC '-1.1'] - NumericLiteral('-1.1') [141-155] [NUMERIC '-1.1'] + NumericLiteral [141-155] [NUMERIC '-1.1'] + StringLiteral [149-155] ['-1.1'] + StringLiteralComponent('-1.1') [149-155] ['-1.1'] SelectColumn [164-180] [NUMERIC '-0.123'] - NumericLiteral('-0.123') [164-180] [NUMERIC '-0.123'] + NumericLiteral [164-180] [NUMERIC '-0.123'] + StringLiteral [172-180] ['-0.123'] + StringLiteralComponent('-0.123') [172-180] ['-0.123'] SelectColumn [189-205] [NUMERIC '-456.0'] - NumericLiteral('-456.0') [189-205] [NUMERIC '-456.0'] + NumericLiteral [189-205] [NUMERIC '-456.0'] + StringLiteral [197-205] ['-456.0'] + StringLiteralComponent('-456.0') [197-205] ['-456.0'] SelectColumn [214-228] [NUMERIC '-123'] - NumericLiteral('-123') [214-228] [NUMERIC '-123'] + NumericLiteral [214-228] [NUMERIC '-123'] + StringLiteral [222-228] ['-123'] + StringLiteralComponent('-123') [222-228] ['-123'] SelectColumn [237-251] [NUMERIC "-123"] - NumericLiteral("-123") [237-251] [NUMERIC "-123"] + NumericLiteral [237-251] [NUMERIC "-123"] + StringLiteral [245-251] ["-123"] + StringLiteralComponent("-123") [245-251] ["-123"] SelectColumn [260-281] [NUMERIC '0.999999999'] - NumericLiteral('0.999999999') [260-281] [NUMERIC '0.999999999'] + NumericLiteral [260-281] [NUMERIC '0.999999999'] + StringLiteral [268-281] ['0.999999999'] + StringLiteralComponent('0.999999999') [268-281] ['0.999999999'] SelectColumn [290-339] [NUMERIC '99999999...999999999'] - NumericLiteral('99999999999999999999999999999.999999999') [290-339] [NUMERIC '99999999...999999999'] + NumericLiteral [290-339] [NUMERIC '99999999...999999999'] + StringLiteral [298-339] ['9999999999999999...999999999'] + StringLiteralComponent('99999999999999999999999999999.999999999') [298-339] ['9999999999999999...999999999'] SelectColumn [348-370] [NUMERIC '-0.999999999'] - NumericLiteral('-0.999999999') [348-370] [NUMERIC '-0.999999999'] + NumericLiteral [348-370] [NUMERIC '-0.999999999'] + StringLiteral [356-370] ['-0.999999999'] + StringLiteralComponent('-0.999999999') [356-370] ['-0.999999999'] SelectColumn [379-429] [NUMERIC '-...999999999'] - NumericLiteral('-99999999999999999999999999999.999999999') [379-429] [NUMERIC '-...999999999'] + NumericLiteral [379-429] [NUMERIC '-...999999999'] + StringLiteral [387-429] ['-999999999999999...999999999'] + StringLiteralComponent('-99999999999999999999999999999.999999999') [387-429] ['-999999999999999...999999999'] SelectColumn [438-450] [NUMERIC '-0'] - NumericLiteral('-0') [438-450] [NUMERIC '-0'] + NumericLiteral [438-450] [NUMERIC '-0'] + StringLiteral [446-450] ['-0'] + StringLiteralComponent('-0') [446-450] ['-0'] SelectColumn [459-471] [NUMERIC '+5'] - NumericLiteral('+5') [459-471] [NUMERIC '+5'] + NumericLiteral [459-471] [NUMERIC '+5'] + StringLiteral [467-471] ['+5'] + StringLiteralComponent('+5') [467-471] ['+5'] SelectColumn [480-496] [NUMERIC '+5.123'] - NumericLiteral('+5.123') [480-496] [NUMERIC '+5.123'] + NumericLiteral [480-496] [NUMERIC '+5.123'] + StringLiteral [488-496] ['+5.123'] + StringLiteralComponent('+5.123') [488-496] ['+5.123'] SelectColumn [505-515] [NUMERIC ''] - NumericLiteral('') [505-515] [NUMERIC ''] + NumericLiteral [505-515] [NUMERIC ''] + StringLiteral [513-515] [''] + StringLiteralComponent('') [513-515] [''] SelectColumn [524-537] [NUMERIC 'abc'] - NumericLiteral('abc') [524-537] [NUMERIC 'abc'] + NumericLiteral [524-537] [NUMERIC 'abc'] + StringLiteral [532-537] ['abc'] + StringLiteralComponent('abc') [532-537] ['abc'] SelectColumn [546-594] [NUMERIC '99999999...999999999'] - NumericLiteral('99999999999999999999999999999999999999') [546-594] [NUMERIC '99999999...999999999'] + NumericLiteral [546-594] [NUMERIC '99999999...999999999'] + StringLiteral [554-594] ['9999999999999999...999999999'] + StringLiteralComponent('99999999999999999999999999999999999999') [554-594] ['9999999999999999...999999999'] SelectColumn [603-653] [NUMERIC '0...9999999999999999'] - NumericLiteral('0.99999999999999999999999999999999999999') [603-653] [NUMERIC '0...9999999999999999'] + NumericLiteral [603-653] [NUMERIC '0...9999999999999999'] + StringLiteral [611-653] ['0.99999999999999...999999999'] + StringLiteralComponent('0.99999999999999999999999999999999999999') [611-653] ['0.99999999999999...999999999'] SelectColumn [662-673] [NUMERIC '+'] - NumericLiteral('+') [662-673] [NUMERIC '+'] + NumericLiteral [662-673] [NUMERIC '+'] + StringLiteral [670-673] ['+'] + StringLiteralComponent('+') [670-673] ['+'] SelectColumn [682-693] [NUMERIC '-'] - NumericLiteral('-') [682-693] [NUMERIC '-'] + NumericLiteral [682-693] [NUMERIC '-'] + StringLiteral [690-693] ['-'] + StringLiteralComponent('-') [690-693] ['-'] -- SELECT NUMERIC '1.1', @@ -2042,7 +2287,9 @@ QueryStatement [0-30] [select NUMERIC '100' `NUMERIC`] Select [0-30] [select NUMERIC '100' `NUMERIC`] SelectList [7-30] [NUMERIC '100' `NUMERIC`] SelectColumn [7-30] [NUMERIC '100' `NUMERIC`] - NumericLiteral('100') [7-20] [NUMERIC '100'] + NumericLiteral [7-20] [NUMERIC '100'] + StringLiteral [15-20] ['100'] + StringLiteralComponent('100') [15-20] ['100'] Alias [21-30] [`NUMERIC`] Identifier(NUMERIC) [21-30] [`NUMERIC`] -- @@ -2057,7 +2304,9 @@ QueryStatement [0-28] [select NUMERIC '100' numeric] Select [0-28] [select NUMERIC '100' numeric] SelectList [7-28] [NUMERIC '100' numeric] SelectColumn [7-28] [NUMERIC '100' numeric] - NumericLiteral('100') [7-20] [NUMERIC '100'] + NumericLiteral [7-20] [NUMERIC '100'] + StringLiteral [15-20] ['100'] + StringLiteralComponent('100') [15-20] ['100'] Alias [21-28] [numeric] Identifier(numeric) [21-28] [numeric] -- @@ -2084,8 +2333,12 @@ QueryStatement [0-38] [select NUMERIC...NUMERIC '200'] SelectList [7-38] [NUMERIC '100...NUMERIC '200'] SelectColumn [7-38] [NUMERIC '100...NUMERIC '200'] BinaryExpression(+) [7-38] [NUMERIC '100...NUMERIC '200'] - NumericLiteral('100.1') [7-22] [NUMERIC '100.1'] - NumericLiteral('200') [25-38] [NUMERIC '200'] + NumericLiteral [7-22] [NUMERIC '100.1'] + StringLiteral [15-22] ['100.1'] + StringLiteralComponent('100.1') [15-22] ['100.1'] + NumericLiteral [25-38] [NUMERIC '200'] + StringLiteral [33-38] ['200'] + StringLiteralComponent('200') [33-38] ['200'] -- SELECT NUMERIC '100.1' + NUMERIC '200' @@ -2102,7 +2355,9 @@ QueryStatement [0-21] [select -NUMERIC '100'] SelectList [7-21] [-NUMERIC '100'] SelectColumn [7-21] [-NUMERIC '100'] UnaryExpression(-) [7-21] [-NUMERIC '100'] - NumericLiteral('100') [8-21] [NUMERIC '100'] + NumericLiteral [8-21] [NUMERIC '100'] + StringLiteral [16-21] ['100'] + StringLiteralComponent('100') [16-21] ['100'] -- SELECT -NUMERIC '100' @@ -2141,53 +2396,101 @@ QueryStatement [0-941] [select BIGNUMERIC...UMERIC '-'] Select [0-941] [select BIGNUMERIC...UMERIC '-'] SelectList [7-941] [BIGNUMERIC...BIGNUMERIC '-'] SelectColumn [7-23] [BIGNUMERIC '1.1'] - BigNumericLiteral('1.1') [7-23] [BIGNUMERIC '1.1'] + BigNumericLiteral [7-23] [BIGNUMERIC '1.1'] + StringLiteral [18-23] ['1.1'] + StringLiteralComponent('1.1') [18-23] ['1.1'] SelectColumn [32-50] [BIGNUMERIC '0.123'] - BigNumericLiteral('0.123') [32-50] [BIGNUMERIC '0.123'] + BigNumericLiteral [32-50] [BIGNUMERIC '0.123'] + StringLiteral [43-50] ['0.123'] + StringLiteralComponent('0.123') [43-50] ['0.123'] SelectColumn [59-77] [BIGNUMERIC '456.0'] - BigNumericLiteral('456.0') [59-77] [BIGNUMERIC '456.0'] + BigNumericLiteral [59-77] [BIGNUMERIC '456.0'] + StringLiteral [70-77] ['456.0'] + StringLiteralComponent('456.0') [70-77] ['456.0'] SelectColumn [86-102] [BIGNUMERIC '123'] - BigNumericLiteral('123') [86-102] [BIGNUMERIC '123'] + BigNumericLiteral [86-102] [BIGNUMERIC '123'] + StringLiteral [97-102] ['123'] + StringLiteralComponent('123') [97-102] ['123'] SelectColumn [111-127] [BIGNUMERIC "123"] - BigNumericLiteral("123") [111-127] [BIGNUMERIC "123"] + BigNumericLiteral [111-127] [BIGNUMERIC "123"] + StringLiteral [122-127] ["123"] + StringLiteralComponent("123") [122-127] ["123"] SelectColumn [136-150] [BIGNUMERIC '0'] - BigNumericLiteral('0') [136-150] [BIGNUMERIC '0'] + BigNumericLiteral [136-150] [BIGNUMERIC '0'] + StringLiteral [147-150] ['0'] + StringLiteralComponent('0') [147-150] ['0'] SelectColumn [159-176] [BIGNUMERIC '-1.1'] - BigNumericLiteral('-1.1') [159-176] [BIGNUMERIC '-1.1'] + BigNumericLiteral [159-176] [BIGNUMERIC '-1.1'] + StringLiteral [170-176] ['-1.1'] + StringLiteralComponent('-1.1') [170-176] ['-1.1'] SelectColumn [185-204] [BIGNUMERIC '-0.123'] - BigNumericLiteral('-0.123') [185-204] [BIGNUMERIC '-0.123'] + BigNumericLiteral [185-204] [BIGNUMERIC '-0.123'] + StringLiteral [196-204] ['-0.123'] + StringLiteralComponent('-0.123') [196-204] ['-0.123'] SelectColumn [213-232] [BIGNUMERIC '-456.0'] - BigNumericLiteral('-456.0') [213-232] [BIGNUMERIC '-456.0'] + BigNumericLiteral [213-232] [BIGNUMERIC '-456.0'] + StringLiteral [224-232] ['-456.0'] + StringLiteralComponent('-456.0') [224-232] ['-456.0'] SelectColumn [241-258] [BIGNUMERIC '-123'] - BigNumericLiteral('-123') [241-258] [BIGNUMERIC '-123'] + BigNumericLiteral [241-258] [BIGNUMERIC '-123'] + StringLiteral [252-258] ['-123'] + StringLiteralComponent('-123') [252-258] ['-123'] SelectColumn [267-284] [BIGNUMERIC "-123"] - BigNumericLiteral("-123") [267-284] [BIGNUMERIC "-123"] + BigNumericLiteral [267-284] [BIGNUMERIC "-123"] + StringLiteral [278-284] ["-123"] + StringLiteralComponent("-123") [278-284] ["-123"] SelectColumn [293-346] [BIGNUMERIC...9999999999999999'] - BigNumericLiteral('0.99999999999999999999999999999999999999') [293-346] [BIGNUMERIC...9999999999999999'] + BigNumericLiteral [293-346] [BIGNUMERIC...9999999999999999'] + StringLiteral [304-346] ['0.99999999999999...999999999'] + StringLiteralComponent('0.99999999999999999999999999999999999999') [304-346] ['0.99999999999999...999999999'] SelectColumn [355-446] [BIGNUMERIC...2003956564819967'] - BigNumericLiteral('578960446186580977117854925043439539266.34992332820282019728792003956564819967') [355-446] [BIGNUMERIC...2003956564819967'] + BigNumericLiteral [355-446] [BIGNUMERIC...2003956564819967'] + StringLiteral [366-446] ['5789604461865809...564819967'] + StringLiteralComponent('578960446186580977117854925043439539266.34992332820282019728792003956564819967') [366-446] ['5789604461865809...564819967'] SelectColumn [455-509] [BIGNUMERIC...9999999999999999'] - BigNumericLiteral('-0.99999999999999999999999999999999999999') [455-509] [BIGNUMERIC...9999999999999999'] + BigNumericLiteral [455-509] [BIGNUMERIC...9999999999999999'] + StringLiteral [466-509] ['-0.9999999999999...999999999'] + StringLiteralComponent('-0.99999999999999999999999999999999999999') [466-509] ['-0.9999999999999...999999999'] SelectColumn [518-610] [BIGNUMERIC...2003956564819968'] - BigNumericLiteral('-578960446186580977117854925043439539266.34992332820282019728792003956564819968') [518-610] [BIGNUMERIC...2003956564819968'] + BigNumericLiteral [518-610] [BIGNUMERIC...2003956564819968'] + StringLiteral [529-610] ['-578960446186580...564819968'] + StringLiteralComponent('-578960446186580977117854925043439539266.34992332820282019728792003956564819968') [529-610] ['-578960446186580...564819968'] SelectColumn [619-634] [BIGNUMERIC '-0'] - BigNumericLiteral('-0') [619-634] [BIGNUMERIC '-0'] + BigNumericLiteral [619-634] [BIGNUMERIC '-0'] + StringLiteral [630-634] ['-0'] + StringLiteralComponent('-0') [630-634] ['-0'] SelectColumn [643-658] [BIGNUMERIC '+5'] - BigNumericLiteral('+5') [643-658] [BIGNUMERIC '+5'] + BigNumericLiteral [643-658] [BIGNUMERIC '+5'] + StringLiteral [654-658] ['+5'] + StringLiteralComponent('+5') [654-658] ['+5'] SelectColumn [667-686] [BIGNUMERIC '+5.123'] - BigNumericLiteral('+5.123') [667-686] [BIGNUMERIC '+5.123'] + BigNumericLiteral [667-686] [BIGNUMERIC '+5.123'] + StringLiteral [678-686] ['+5.123'] + StringLiteralComponent('+5.123') [678-686] ['+5.123'] SelectColumn [695-708] [BIGNUMERIC ''] - BigNumericLiteral('') [695-708] [BIGNUMERIC ''] + BigNumericLiteral [695-708] [BIGNUMERIC ''] + StringLiteral [706-708] [''] + StringLiteralComponent('') [706-708] [''] SelectColumn [717-733] [BIGNUMERIC 'abc'] - BigNumericLiteral('abc') [717-733] [BIGNUMERIC 'abc'] + BigNumericLiteral [717-733] [BIGNUMERIC 'abc'] + StringLiteral [728-733] ['abc'] + StringLiteralComponent('abc') [728-733] ['abc'] SelectColumn [742-814] [BIGNUMERIC...9999999999999999'] - BigNumericLiteral('99999999999999999999999999999999999999999999999999999999999') [742-814] [BIGNUMERIC...9999999999999999'] + BigNumericLiteral [742-814] [BIGNUMERIC...9999999999999999'] + StringLiteral [753-814] ['9999999999999999...999999999'] + StringLiteralComponent('99999999999999999999999999999999999999999999999999999999999') [753-814] ['9999999999999999...999999999'] SelectColumn [823-895] [BIGNUMERIC...9999999999999999'] - BigNumericLiteral('0.999999999999999999999999999999999999999999999999999999999') [823-895] [BIGNUMERIC...9999999999999999'] + BigNumericLiteral [823-895] [BIGNUMERIC...9999999999999999'] + StringLiteral [834-895] ['0.99999999999999...999999999'] + StringLiteralComponent('0.999999999999999999999999999999999999999999999999999999999') [834-895] ['0.99999999999999...999999999'] SelectColumn [904-918] [BIGNUMERIC '+'] - BigNumericLiteral('+') [904-918] [BIGNUMERIC '+'] + BigNumericLiteral [904-918] [BIGNUMERIC '+'] + StringLiteral [915-918] ['+'] + StringLiteralComponent('+') [915-918] ['+'] SelectColumn [927-941] [BIGNUMERIC '-'] - BigNumericLiteral('-') [927-941] [BIGNUMERIC '-'] + BigNumericLiteral [927-941] [BIGNUMERIC '-'] + StringLiteral [938-941] ['-'] + StringLiteralComponent('-') [938-941] ['-'] -- SELECT BIGNUMERIC '1.1', @@ -2223,7 +2526,9 @@ QueryStatement [0-36] [select BIGNUMERIC...IGNUMERIC`] Select [0-36] [select BIGNUMERIC...IGNUMERIC`] SelectList [7-36] [BIGNUMERIC '100' `BIGNUMERIC`] SelectColumn [7-36] [BIGNUMERIC '100' `BIGNUMERIC`] - BigNumericLiteral('100') [7-23] [BIGNUMERIC '100'] + BigNumericLiteral [7-23] [BIGNUMERIC '100'] + StringLiteral [18-23] ['100'] + StringLiteralComponent('100') [18-23] ['100'] Alias [24-36] [`BIGNUMERIC`] Identifier(BIGNUMERIC) [24-36] [`BIGNUMERIC`] -- @@ -2238,7 +2543,9 @@ QueryStatement [0-31] [select BIGNUMERIC...0' numeric] Select [0-31] [select BIGNUMERIC...0' numeric] SelectList [7-31] [BIGNUMERIC '100' numeric] SelectColumn [7-31] [BIGNUMERIC '100' numeric] - BigNumericLiteral('100') [7-23] [BIGNUMERIC '100'] + BigNumericLiteral [7-23] [BIGNUMERIC '100'] + StringLiteral [18-23] ['100'] + StringLiteralComponent('100') [18-23] ['100'] Alias [24-31] [numeric] Identifier(numeric) [24-31] [numeric] -- @@ -2265,8 +2572,12 @@ QueryStatement [0-44] [select BIGNUMERIC...ERIC '200'] SelectList [7-44] [BIGNUMERIC...BIGNUMERIC '200'] SelectColumn [7-44] [BIGNUMERIC...BIGNUMERIC '200'] BinaryExpression(+) [7-44] [BIGNUMERIC...BIGNUMERIC '200'] - BigNumericLiteral('100.1') [7-25] [BIGNUMERIC '100.1'] - BigNumericLiteral('200') [28-44] [BIGNUMERIC '200'] + BigNumericLiteral [7-25] [BIGNUMERIC '100.1'] + StringLiteral [18-25] ['100.1'] + StringLiteralComponent('100.1') [18-25] ['100.1'] + BigNumericLiteral [28-44] [BIGNUMERIC '200'] + StringLiteral [39-44] ['200'] + StringLiteralComponent('200') [39-44] ['200'] -- SELECT BIGNUMERIC '100.1' + BIGNUMERIC '200' @@ -2283,7 +2594,9 @@ QueryStatement [0-24] [select -BIGNUMERIC '100'] SelectList [7-24] [-BIGNUMERIC '100'] SelectColumn [7-24] [-BIGNUMERIC '100'] UnaryExpression(-) [7-24] [-BIGNUMERIC '100'] - BigNumericLiteral('100') [8-24] [BIGNUMERIC '100'] + BigNumericLiteral [8-24] [BIGNUMERIC '100'] + StringLiteral [19-24] ['100'] + StringLiteralComponent('100') [19-24] ['100'] -- SELECT -BIGNUMERIC '100' @@ -2296,7 +2609,9 @@ QueryStatement [0-15] [select JSON '1'] Select [0-15] [select JSON '1'] SelectList [7-15] [JSON '1'] SelectColumn [7-15] [JSON '1'] - JSONLiteral('1') [7-15] [JSON '1'] + JSONLiteral [7-15] [JSON '1'] + StringLiteral [12-15] ['1'] + StringLiteralComponent('1') [12-15] ['1'] -- SELECT JSON '1' @@ -2326,41 +2641,77 @@ QueryStatement [0-412] [select JSON...JSON "-123"] Select [0-412] [select JSON...JSON "-123"] SelectList [7-412] [JSON "true...JSON "-123"] SelectColumn [7-18] [JSON "true"] - JSONLiteral("true") [7-18] [JSON "true"] + JSONLiteral [7-18] [JSON "true"] + StringLiteral [12-18] ["true"] + StringLiteralComponent("true") [12-18] ["true"] SelectColumn [27-43] [JSON '\'value\''] - JSONLiteral('\'value\'') [27-43] [JSON '\'value\''] + JSONLiteral [27-43] [JSON '\'value\''] + StringLiteral [32-43] ['\'value\''] + StringLiteralComponent('\'value\'') [32-43] ['\'value\''] SelectColumn [52-60] [JSON "t"] - JSONLiteral("t") [52-60] [JSON "t"] + JSONLiteral [52-60] [JSON "t"] + StringLiteral [57-60] ["t"] + StringLiteralComponent("t") [57-60] ["t"] SelectColumn [69-80] [JSON 'true'] - JSONLiteral('true') [69-80] [JSON 'true'] + JSONLiteral [69-80] [JSON 'true'] + StringLiteral [74-80] ['true'] + StringLiteralComponent('true') [74-80] ['true'] SelectColumn [89-99] [JSON "'v'"] - JSONLiteral("'v'") [89-99] [JSON "'v'"] + JSONLiteral [89-99] [JSON "'v'"] + StringLiteral [94-99] ["'v'"] + StringLiteralComponent("'v'") [94-99] ["'v'"] SelectColumn [108-119] [JSON 'null'] - JSONLiteral('null') [108-119] [JSON 'null'] + JSONLiteral [108-119] [JSON 'null'] + StringLiteral [113-119] ['null'] + StringLiteralComponent('null') [113-119] ['null'] SelectColumn [128-144] [JSON '[1, 2, 3]'] - JSONLiteral('[1, 2, 3]') [128-144] [JSON '[1, 2, 3]'] + JSONLiteral [128-144] [JSON '[1, 2, 3]'] + StringLiteral [133-144] ['[1, 2, 3]'] + StringLiteralComponent('[1, 2, 3]') [133-144] ['[1, 2, 3]'] SelectColumn [153-194] [JSON '{ "k1..., false]}'] - JSONLiteral('{ "k1": "v1", "k2": [true, false]}') [153-194] [JSON '{ "k1..., false]}'] + JSONLiteral [153-194] [JSON '{ "k1..., false]}'] + StringLiteral [158-194] ['{ "k1": "..., false]}'] + StringLiteralComponent('{ "k1": "v1", "k2": [true, false]}') [158-194] ['{ "k1": "..., false]}'] SelectColumn [203-225] [JSON '{\n"k" : "v"\n}'] - JSONLiteral('{\n"k" : "v"\n}') [203-225] [JSON '{\n"k" : "v"\n}'] + JSONLiteral [203-225] [JSON '{\n"k" : "v"\n}'] + StringLiteral [208-225] ['{\n"k" : "v"\n}'] + StringLiteralComponent('{\n"k" : "v"\n}') [208-225] ['{\n"k" : "v"\n}'] SelectColumn [234-256] [JSON r'{"k\n1" : "v"}'] - JSONLiteral(r'{"k\n1" : "v"}') [234-256] [JSON r'{"k\n1" : "v"}'] + JSONLiteral [234-256] [JSON r'{"k\n1" : "v"}'] + StringLiteral [239-256] [r'{"k\n1" : "v"}'] + StringLiteralComponent(r'{"k\n1" : "v"}') [239-256] [r'{"k\n1" : "v"}'] SelectColumn [265-272] [JSON ''] - JSONLiteral('') [265-272] [JSON ''] + JSONLiteral [265-272] [JSON ''] + StringLiteral [270-272] [''] + StringLiteralComponent('') [270-272] [''] SelectColumn [281-293] [JSON '0.123'] - JSONLiteral('0.123') [281-293] [JSON '0.123'] + JSONLiteral [281-293] [JSON '0.123'] + StringLiteral [286-293] ['0.123'] + StringLiteralComponent('0.123') [286-293] ['0.123'] SelectColumn [302-314] [JSON '456.0'] - JSONLiteral('456.0') [302-314] [JSON '456.0'] + JSONLiteral [302-314] [JSON '456.0'] + StringLiteral [307-314] ['456.0'] + StringLiteralComponent('456.0') [307-314] ['456.0'] SelectColumn [323-333] [JSON '123'] - JSONLiteral('123') [323-333] [JSON '123'] + JSONLiteral [323-333] [JSON '123'] + StringLiteral [328-333] ['123'] + StringLiteralComponent('123') [328-333] ['123'] SelectColumn [342-352] [JSON "123"] - JSONLiteral("123") [342-352] [JSON "123"] + JSONLiteral [342-352] [JSON "123"] + StringLiteral [347-352] ["123"] + StringLiteralComponent("123") [347-352] ["123"] SelectColumn [361-372] [JSON '-1.1'] - JSONLiteral('-1.1') [361-372] [JSON '-1.1'] + JSONLiteral [361-372] [JSON '-1.1'] + StringLiteral [366-372] ['-1.1'] + StringLiteralComponent('-1.1') [366-372] ['-1.1'] SelectColumn [381-392] [JSON '-123'] - JSONLiteral('-123') [381-392] [JSON '-123'] + JSONLiteral [381-392] [JSON '-123'] + StringLiteral [386-392] ['-123'] + StringLiteralComponent('-123') [386-392] ['-123'] SelectColumn [401-412] [JSON "-123"] - JSONLiteral("-123") [401-412] [JSON "-123"] + JSONLiteral [401-412] [JSON "-123"] + StringLiteral [406-412] ["-123"] + StringLiteralComponent("-123") [406-412] ["-123"] -- SELECT JSON "true", @@ -2390,7 +2741,9 @@ QueryStatement [0-22] [select JSON '100' json] Select [0-22] [select JSON '100' json] SelectList [7-22] [JSON '100' json] SelectColumn [7-22] [JSON '100' json] - JSONLiteral('100') [7-17] [JSON '100'] + JSONLiteral [7-17] [JSON '100'] + StringLiteral [12-17] ['100'] + StringLiteralComponent('100') [12-17] ['100'] Alias [18-22] [json] Identifier(json) [18-22] [json] -- @@ -2429,53 +2782,101 @@ QueryStatement [0-693] [select DECIMAL...DECIMAL '-'] Select [0-693] [select DECIMAL...DECIMAL '-'] SelectList [7-693] [DECIMAL '1...DECIMAL '-'] SelectColumn [7-20] [DECIMAL '1.1'] - NumericLiteral('1.1') [7-20] [DECIMAL '1.1'] + NumericLiteral [7-20] [DECIMAL '1.1'] + StringLiteral [15-20] ['1.1'] + StringLiteralComponent('1.1') [15-20] ['1.1'] SelectColumn [29-44] [DECIMAL '0.123'] - NumericLiteral('0.123') [29-44] [DECIMAL '0.123'] + NumericLiteral [29-44] [DECIMAL '0.123'] + StringLiteral [37-44] ['0.123'] + StringLiteralComponent('0.123') [37-44] ['0.123'] SelectColumn [53-68] [DECIMAL '456.0'] - NumericLiteral('456.0') [53-68] [DECIMAL '456.0'] + NumericLiteral [53-68] [DECIMAL '456.0'] + StringLiteral [61-68] ['456.0'] + StringLiteralComponent('456.0') [61-68] ['456.0'] SelectColumn [77-90] [DECIMAL '123'] - NumericLiteral('123') [77-90] [DECIMAL '123'] + NumericLiteral [77-90] [DECIMAL '123'] + StringLiteral [85-90] ['123'] + StringLiteralComponent('123') [85-90] ['123'] SelectColumn [99-112] [DECIMAL "123"] - NumericLiteral("123") [99-112] [DECIMAL "123"] + NumericLiteral [99-112] [DECIMAL "123"] + StringLiteral [107-112] ["123"] + StringLiteralComponent("123") [107-112] ["123"] SelectColumn [121-132] [DECIMAL '0'] - NumericLiteral('0') [121-132] [DECIMAL '0'] + NumericLiteral [121-132] [DECIMAL '0'] + StringLiteral [129-132] ['0'] + StringLiteralComponent('0') [129-132] ['0'] SelectColumn [141-155] [DECIMAL '-1.1'] - NumericLiteral('-1.1') [141-155] [DECIMAL '-1.1'] + NumericLiteral [141-155] [DECIMAL '-1.1'] + StringLiteral [149-155] ['-1.1'] + StringLiteralComponent('-1.1') [149-155] ['-1.1'] SelectColumn [164-180] [DECIMAL '-0.123'] - NumericLiteral('-0.123') [164-180] [DECIMAL '-0.123'] + NumericLiteral [164-180] [DECIMAL '-0.123'] + StringLiteral [172-180] ['-0.123'] + StringLiteralComponent('-0.123') [172-180] ['-0.123'] SelectColumn [189-205] [DECIMAL '-456.0'] - NumericLiteral('-456.0') [189-205] [DECIMAL '-456.0'] + NumericLiteral [189-205] [DECIMAL '-456.0'] + StringLiteral [197-205] ['-456.0'] + StringLiteralComponent('-456.0') [197-205] ['-456.0'] SelectColumn [214-228] [DECIMAL '-123'] - NumericLiteral('-123') [214-228] [DECIMAL '-123'] + NumericLiteral [214-228] [DECIMAL '-123'] + StringLiteral [222-228] ['-123'] + StringLiteralComponent('-123') [222-228] ['-123'] SelectColumn [237-251] [DECIMAL "-123"] - NumericLiteral("-123") [237-251] [DECIMAL "-123"] + NumericLiteral [237-251] [DECIMAL "-123"] + StringLiteral [245-251] ["-123"] + StringLiteralComponent("-123") [245-251] ["-123"] SelectColumn [260-281] [DECIMAL '0.999999999'] - NumericLiteral('0.999999999') [260-281] [DECIMAL '0.999999999'] + NumericLiteral [260-281] [DECIMAL '0.999999999'] + StringLiteral [268-281] ['0.999999999'] + StringLiteralComponent('0.999999999') [268-281] ['0.999999999'] SelectColumn [290-339] [DECIMAL '99999999...999999999'] - NumericLiteral('99999999999999999999999999999.999999999') [290-339] [DECIMAL '99999999...999999999'] + NumericLiteral [290-339] [DECIMAL '99999999...999999999'] + StringLiteral [298-339] ['9999999999999999...999999999'] + StringLiteralComponent('99999999999999999999999999999.999999999') [298-339] ['9999999999999999...999999999'] SelectColumn [348-370] [DECIMAL '-0.999999999'] - NumericLiteral('-0.999999999') [348-370] [DECIMAL '-0.999999999'] + NumericLiteral [348-370] [DECIMAL '-0.999999999'] + StringLiteral [356-370] ['-0.999999999'] + StringLiteralComponent('-0.999999999') [356-370] ['-0.999999999'] SelectColumn [379-429] [DECIMAL '-...999999999'] - NumericLiteral('-99999999999999999999999999999.999999999') [379-429] [DECIMAL '-...999999999'] + NumericLiteral [379-429] [DECIMAL '-...999999999'] + StringLiteral [387-429] ['-999999999999999...999999999'] + StringLiteralComponent('-99999999999999999999999999999.999999999') [387-429] ['-999999999999999...999999999'] SelectColumn [438-450] [DECIMAL '-0'] - NumericLiteral('-0') [438-450] [DECIMAL '-0'] + NumericLiteral [438-450] [DECIMAL '-0'] + StringLiteral [446-450] ['-0'] + StringLiteralComponent('-0') [446-450] ['-0'] SelectColumn [459-471] [DECIMAL '+5'] - NumericLiteral('+5') [459-471] [DECIMAL '+5'] + NumericLiteral [459-471] [DECIMAL '+5'] + StringLiteral [467-471] ['+5'] + StringLiteralComponent('+5') [467-471] ['+5'] SelectColumn [480-496] [DECIMAL '+5.123'] - NumericLiteral('+5.123') [480-496] [DECIMAL '+5.123'] + NumericLiteral [480-496] [DECIMAL '+5.123'] + StringLiteral [488-496] ['+5.123'] + StringLiteralComponent('+5.123') [488-496] ['+5.123'] SelectColumn [505-515] [DECIMAL ''] - NumericLiteral('') [505-515] [DECIMAL ''] + NumericLiteral [505-515] [DECIMAL ''] + StringLiteral [513-515] [''] + StringLiteralComponent('') [513-515] [''] SelectColumn [524-537] [DECIMAL 'abc'] - NumericLiteral('abc') [524-537] [DECIMAL 'abc'] + NumericLiteral [524-537] [DECIMAL 'abc'] + StringLiteral [532-537] ['abc'] + StringLiteralComponent('abc') [532-537] ['abc'] SelectColumn [546-594] [DECIMAL '99999999...999999999'] - NumericLiteral('99999999999999999999999999999999999999') [546-594] [DECIMAL '99999999...999999999'] + NumericLiteral [546-594] [DECIMAL '99999999...999999999'] + StringLiteral [554-594] ['9999999999999999...999999999'] + StringLiteralComponent('99999999999999999999999999999999999999') [554-594] ['9999999999999999...999999999'] SelectColumn [603-653] [DECIMAL '0...9999999999999999'] - NumericLiteral('0.99999999999999999999999999999999999999') [603-653] [DECIMAL '0...9999999999999999'] + NumericLiteral [603-653] [DECIMAL '0...9999999999999999'] + StringLiteral [611-653] ['0.99999999999999...999999999'] + StringLiteralComponent('0.99999999999999999999999999999999999999') [611-653] ['0.99999999999999...999999999'] SelectColumn [662-673] [DECIMAL '+'] - NumericLiteral('+') [662-673] [DECIMAL '+'] + NumericLiteral [662-673] [DECIMAL '+'] + StringLiteral [670-673] ['+'] + StringLiteralComponent('+') [670-673] ['+'] SelectColumn [682-693] [DECIMAL '-'] - NumericLiteral('-') [682-693] [DECIMAL '-'] + NumericLiteral [682-693] [DECIMAL '-'] + StringLiteral [690-693] ['-'] + StringLiteralComponent('-') [690-693] ['-'] -- SELECT NUMERIC '1.1', @@ -2535,53 +2936,101 @@ QueryStatement [0-941] [select BIGDECIMAL...ECIMAL '-'] Select [0-941] [select BIGDECIMAL...ECIMAL '-'] SelectList [7-941] [BIGDECIMAL...BIGDECIMAL '-'] SelectColumn [7-23] [BIGDECIMAL '1.1'] - BigNumericLiteral('1.1') [7-23] [BIGDECIMAL '1.1'] + BigNumericLiteral [7-23] [BIGDECIMAL '1.1'] + StringLiteral [18-23] ['1.1'] + StringLiteralComponent('1.1') [18-23] ['1.1'] SelectColumn [32-50] [BIGDECIMAL '0.123'] - BigNumericLiteral('0.123') [32-50] [BIGDECIMAL '0.123'] + BigNumericLiteral [32-50] [BIGDECIMAL '0.123'] + StringLiteral [43-50] ['0.123'] + StringLiteralComponent('0.123') [43-50] ['0.123'] SelectColumn [59-77] [BIGDECIMAL '456.0'] - BigNumericLiteral('456.0') [59-77] [BIGDECIMAL '456.0'] + BigNumericLiteral [59-77] [BIGDECIMAL '456.0'] + StringLiteral [70-77] ['456.0'] + StringLiteralComponent('456.0') [70-77] ['456.0'] SelectColumn [86-102] [BIGDECIMAL '123'] - BigNumericLiteral('123') [86-102] [BIGDECIMAL '123'] + BigNumericLiteral [86-102] [BIGDECIMAL '123'] + StringLiteral [97-102] ['123'] + StringLiteralComponent('123') [97-102] ['123'] SelectColumn [111-127] [BIGDECIMAL "123"] - BigNumericLiteral("123") [111-127] [BIGDECIMAL "123"] + BigNumericLiteral [111-127] [BIGDECIMAL "123"] + StringLiteral [122-127] ["123"] + StringLiteralComponent("123") [122-127] ["123"] SelectColumn [136-150] [BIGDECIMAL '0'] - BigNumericLiteral('0') [136-150] [BIGDECIMAL '0'] + BigNumericLiteral [136-150] [BIGDECIMAL '0'] + StringLiteral [147-150] ['0'] + StringLiteralComponent('0') [147-150] ['0'] SelectColumn [159-176] [BIGDECIMAL '-1.1'] - BigNumericLiteral('-1.1') [159-176] [BIGDECIMAL '-1.1'] + BigNumericLiteral [159-176] [BIGDECIMAL '-1.1'] + StringLiteral [170-176] ['-1.1'] + StringLiteralComponent('-1.1') [170-176] ['-1.1'] SelectColumn [185-204] [BIGDECIMAL '-0.123'] - BigNumericLiteral('-0.123') [185-204] [BIGDECIMAL '-0.123'] + BigNumericLiteral [185-204] [BIGDECIMAL '-0.123'] + StringLiteral [196-204] ['-0.123'] + StringLiteralComponent('-0.123') [196-204] ['-0.123'] SelectColumn [213-232] [BIGDECIMAL '-456.0'] - BigNumericLiteral('-456.0') [213-232] [BIGDECIMAL '-456.0'] + BigNumericLiteral [213-232] [BIGDECIMAL '-456.0'] + StringLiteral [224-232] ['-456.0'] + StringLiteralComponent('-456.0') [224-232] ['-456.0'] SelectColumn [241-258] [BIGDECIMAL '-123'] - BigNumericLiteral('-123') [241-258] [BIGDECIMAL '-123'] + BigNumericLiteral [241-258] [BIGDECIMAL '-123'] + StringLiteral [252-258] ['-123'] + StringLiteralComponent('-123') [252-258] ['-123'] SelectColumn [267-284] [BIGDECIMAL "-123"] - BigNumericLiteral("-123") [267-284] [BIGDECIMAL "-123"] + BigNumericLiteral [267-284] [BIGDECIMAL "-123"] + StringLiteral [278-284] ["-123"] + StringLiteralComponent("-123") [278-284] ["-123"] SelectColumn [293-346] [BIGDECIMAL...9999999999999999'] - BigNumericLiteral('0.99999999999999999999999999999999999999') [293-346] [BIGDECIMAL...9999999999999999'] + BigNumericLiteral [293-346] [BIGDECIMAL...9999999999999999'] + StringLiteral [304-346] ['0.99999999999999...999999999'] + StringLiteralComponent('0.99999999999999999999999999999999999999') [304-346] ['0.99999999999999...999999999'] SelectColumn [355-446] [BIGDECIMAL...2003956564819967'] - BigNumericLiteral('578960446186580977117854925043439539266.34992332820282019728792003956564819967') [355-446] [BIGDECIMAL...2003956564819967'] + BigNumericLiteral [355-446] [BIGDECIMAL...2003956564819967'] + StringLiteral [366-446] ['5789604461865809...564819967'] + StringLiteralComponent('578960446186580977117854925043439539266.34992332820282019728792003956564819967') [366-446] ['5789604461865809...564819967'] SelectColumn [455-509] [BIGDECIMAL...9999999999999999'] - BigNumericLiteral('-0.99999999999999999999999999999999999999') [455-509] [BIGDECIMAL...9999999999999999'] + BigNumericLiteral [455-509] [BIGDECIMAL...9999999999999999'] + StringLiteral [466-509] ['-0.9999999999999...999999999'] + StringLiteralComponent('-0.99999999999999999999999999999999999999') [466-509] ['-0.9999999999999...999999999'] SelectColumn [518-610] [BIGDECIMAL...2003956564819968'] - BigNumericLiteral('-578960446186580977117854925043439539266.34992332820282019728792003956564819968') [518-610] [BIGDECIMAL...2003956564819968'] + BigNumericLiteral [518-610] [BIGDECIMAL...2003956564819968'] + StringLiteral [529-610] ['-578960446186580...564819968'] + StringLiteralComponent('-578960446186580977117854925043439539266.34992332820282019728792003956564819968') [529-610] ['-578960446186580...564819968'] SelectColumn [619-634] [BIGDECIMAL '-0'] - BigNumericLiteral('-0') [619-634] [BIGDECIMAL '-0'] + BigNumericLiteral [619-634] [BIGDECIMAL '-0'] + StringLiteral [630-634] ['-0'] + StringLiteralComponent('-0') [630-634] ['-0'] SelectColumn [643-658] [BIGDECIMAL '+5'] - BigNumericLiteral('+5') [643-658] [BIGDECIMAL '+5'] + BigNumericLiteral [643-658] [BIGDECIMAL '+5'] + StringLiteral [654-658] ['+5'] + StringLiteralComponent('+5') [654-658] ['+5'] SelectColumn [667-686] [BIGDECIMAL '+5.123'] - BigNumericLiteral('+5.123') [667-686] [BIGDECIMAL '+5.123'] + BigNumericLiteral [667-686] [BIGDECIMAL '+5.123'] + StringLiteral [678-686] ['+5.123'] + StringLiteralComponent('+5.123') [678-686] ['+5.123'] SelectColumn [695-708] [BIGDECIMAL ''] - BigNumericLiteral('') [695-708] [BIGDECIMAL ''] + BigNumericLiteral [695-708] [BIGDECIMAL ''] + StringLiteral [706-708] [''] + StringLiteralComponent('') [706-708] [''] SelectColumn [717-733] [BIGDECIMAL 'abc'] - BigNumericLiteral('abc') [717-733] [BIGDECIMAL 'abc'] + BigNumericLiteral [717-733] [BIGDECIMAL 'abc'] + StringLiteral [728-733] ['abc'] + StringLiteralComponent('abc') [728-733] ['abc'] SelectColumn [742-814] [BIGDECIMAL...9999999999999999'] - BigNumericLiteral('99999999999999999999999999999999999999999999999999999999999') [742-814] [BIGDECIMAL...9999999999999999'] + BigNumericLiteral [742-814] [BIGDECIMAL...9999999999999999'] + StringLiteral [753-814] ['9999999999999999...999999999'] + StringLiteralComponent('99999999999999999999999999999999999999999999999999999999999') [753-814] ['9999999999999999...999999999'] SelectColumn [823-895] [BIGDECIMAL...9999999999999999'] - BigNumericLiteral('0.999999999999999999999999999999999999999999999999999999999') [823-895] [BIGDECIMAL...9999999999999999'] + BigNumericLiteral [823-895] [BIGDECIMAL...9999999999999999'] + StringLiteral [834-895] ['0.99999999999999...999999999'] + StringLiteralComponent('0.999999999999999999999999999999999999999999999999999999999') [834-895] ['0.99999999999999...999999999'] SelectColumn [904-918] [BIGDECIMAL '+'] - BigNumericLiteral('+') [904-918] [BIGDECIMAL '+'] + BigNumericLiteral [904-918] [BIGDECIMAL '+'] + StringLiteral [915-918] ['+'] + StringLiteralComponent('+') [915-918] ['+'] SelectColumn [927-941] [BIGDECIMAL '-'] - BigNumericLiteral('-') [927-941] [BIGDECIMAL '-'] + BigNumericLiteral [927-941] [BIGDECIMAL '-'] + StringLiteral [938-941] ['-'] + StringLiteralComponent('-') [938-941] ['-'] -- SELECT BIGNUMERIC '1.1', diff --git a/zetasql/parser/testdata/literals_structurally_invalid.test b/zetasql/parser/testdata/literals_structurally_invalid.test index 449489811..4195710f4 100644 --- a/zetasql/parser/testdata/literals_structurally_invalid.test +++ b/zetasql/parser/testdata/literals_structurally_invalid.test @@ -44,7 +44,8 @@ QueryStatement [0-13] Select [0-13] SelectList [7-13] SelectColumn [7-13] - BytesLiteral(b'|Â|') [7-13] + BytesLiteral [7-13] + BytesLiteralComponent(b'|Â|') [7-13] -- SELECT b'|Â|' diff --git a/zetasql/parser/testdata/loop.test b/zetasql/parser/testdata/loop.test index 3f520b040..ae02fab8b 100644 --- a/zetasql/parser/testdata/loop.test +++ b/zetasql/parser/testdata/loop.test @@ -19,7 +19,7 @@ END LOOP Script [0-25] [LOOP SELECT 5; END LOOP] StatementList [0-25] [LOOP SELECT 5; END LOOP] WhileStatement [0-25] [LOOP SELECT 5; END LOOP] - StatementList [7-17] [SELECT 5;] + StatementList [7-16] [SELECT 5;] QueryStatement [7-15] [SELECT 5] Query [7-15] [SELECT 5] Select [7-15] [SELECT 5] @@ -39,10 +39,10 @@ LOOP SELECT 9; END LOOP; -- -Script [0-39] [LOOP SELECT...END LOOP;] - StatementList [0-39] [LOOP SELECT...END LOOP;] +Script [0-38] [LOOP SELECT...END LOOP;] + StatementList [0-38] [LOOP SELECT...END LOOP;] WhileStatement [0-37] [LOOP SELECT...; END LOOP] - StatementList [7-29] [SELECT 8; SELECT 9;] + StatementList [7-28] [SELECT 8; SELECT 9;] QueryStatement [7-15] [SELECT 8] Query [7-15] [SELECT 8] Select [7-15] [SELECT 8] @@ -73,10 +73,10 @@ LOOP END LOOP; END LOOP; -- -Script [0-60] [LOOP SELECT...END LOOP;] - StatementList [0-60] [LOOP SELECT...END LOOP;] +Script [0-59] [LOOP SELECT...END LOOP;] + StatementList [0-59] [LOOP SELECT...END LOOP;] WhileStatement [0-58] [LOOP SELECT...; END LOOP] - StatementList [7-50] [SELECT 5;...END LOOP;] + StatementList [7-49] [SELECT 5;...END LOOP;] QueryStatement [7-15] [SELECT 5] Query [7-15] [SELECT 5] Select [7-15] [SELECT 5] @@ -84,7 +84,7 @@ Script [0-60] [LOOP SELECT...END LOOP;] SelectColumn [14-15] [5] IntLiteral(5) [14-15] [5] WhileStatement [19-48] [LOOP SELECT 6; END LOOP] - StatementList [28-40] [SELECT 6;] + StatementList [28-37] [SELECT 6;] QueryStatement [28-36] [SELECT 6] Query [28-36] [SELECT 6] Select [28-36] [SELECT 6] @@ -111,7 +111,7 @@ Script [0-38] [WHILE LOOP...END WHILE] WhileStatement [0-38] [WHILE LOOP...END WHILE] PathExpression [6-10] [LOOP] Identifier(LOOP) [6-10] [LOOP] - StatementList [14-29] [LOOP END LOOP;] + StatementList [14-28] [LOOP END LOOP;] WhileStatement [14-27] [LOOP END LOOP] StatementList [18-18] [] -- @@ -126,11 +126,11 @@ LOOP WHILE LOOP DO LOOP END LOOP; END WHILE; END LOOP Script [0-53] [LOOP WHILE...; END LOOP] StatementList [0-53] [LOOP WHILE...; END LOOP] WhileStatement [0-53] [LOOP WHILE...; END LOOP] - StatementList [5-45] [WHILE LOOP...END WHILE;] + StatementList [5-44] [WHILE LOOP...END WHILE;] WhileStatement [5-43] [WHILE LOOP...END WHILE] PathExpression [11-15] [LOOP] Identifier(LOOP) [11-15] [LOOP] - StatementList [19-34] [LOOP END LOOP;] + StatementList [19-33] [LOOP END LOOP;] WhileStatement [19-32] [LOOP END LOOP] StatementList [23-23] [] -- @@ -144,8 +144,8 @@ END LOOP ; WHILE WHILE DO END WHILE; -- -Script [0-26] [WHILE WHILE DO END WHILE;] - StatementList [0-26] [WHILE WHILE DO END WHILE;] +Script [0-25] [WHILE WHILE DO END WHILE;] + StatementList [0-25] [WHILE WHILE DO END WHILE;] WhileStatement [0-24] [WHILE WHILE DO END WHILE] PathExpression [6-11] [WHILE] Identifier(WHILE) [6-11] [WHILE] @@ -157,8 +157,8 @@ END WHILE ; WHILE DO DO END WHILE; -- -Script [0-23] [WHILE DO DO END WHILE;] - StatementList [0-23] [WHILE DO DO END WHILE;] +Script [0-22] [WHILE DO DO END WHILE;] + StatementList [0-22] [WHILE DO DO END WHILE;] WhileStatement [0-21] [WHILE DO DO END WHILE] PathExpression [6-8] [DO] Identifier(DO) [6-8] [DO] @@ -225,7 +225,7 @@ Script [0-47] [WHILE TRUE...END WHILE] StatementList [0-47] [WHILE TRUE...END WHILE] WhileStatement [0-47] [WHILE TRUE...END WHILE] BooleanLiteral(TRUE) [6-10] [TRUE] - StatementList [16-38] [SELECT 5; SELECT 6;] + StatementList [16-37] [SELECT 5; SELECT 6;] QueryStatement [16-24] [SELECT 5] Query [16-24] [SELECT 5] Select [16-24] [SELECT 5] @@ -260,24 +260,24 @@ LOOP END WHILE; END LOOP; -- -Script [0-129] [LOOP WHILE...END LOOP;] - StatementList [0-129] [LOOP WHILE...END LOOP;] +Script [0-128] [LOOP WHILE...END LOOP;] + StatementList [0-128] [LOOP WHILE...END LOOP;] WhileStatement [0-127] [LOOP WHILE...; END LOOP] - StatementList [7-119] [WHILE x >=...END WHILE;] + StatementList [7-118] [WHILE x >=...END WHILE;] WhileStatement [7-117] [WHILE x >=...END WHILE] BinaryExpression(>=) [13-19] [x >= 3] PathExpression [13-14] [x] Identifier(x) [13-14] [x] IntLiteral(3) [18-19] [3] - StatementList [27-108] [WHILE y >=...END WHILE;] + StatementList [27-105] [WHILE y >=...END WHILE;] WhileStatement [27-104] [WHILE y >=...END WHILE] BinaryExpression(>=) [33-39] [y >= 4] PathExpression [33-34] [y] Identifier(y) [33-34] [y] IntLiteral(4) [38-39] [4] - StatementList [49-95] [LOOP...END LOOP;] + StatementList [49-90] [LOOP...END LOOP;] WhileStatement [49-89] [LOOP...END LOOP] - StatementList [62-81] [SELECT x, y;] + StatementList [62-74] [SELECT x, y;] QueryStatement [62-73] [SELECT x, y] Query [62-73] [SELECT x, y] Select [62-73] [SELECT x, y] @@ -308,14 +308,14 @@ WHILE x = 3 DO END LOOP; END WHILE; -- -Script [0-45] [WHILE x =...END WHILE;] - StatementList [0-45] [WHILE x =...END WHILE;] +Script [0-44] [WHILE x =...END WHILE;] + StatementList [0-44] [WHILE x =...END WHILE;] WhileStatement [0-43] [WHILE x =...END WHILE] BinaryExpression(=) [6-11] [x = 3] PathExpression [6-7] [x] Identifier(x) [6-7] [x] IntLiteral(3) [10-11] [3] - StatementList [17-34] [LOOP END LOOP;] + StatementList [17-33] [LOOP END LOOP;] WhileStatement [17-32] [LOOP END LOOP] StatementList [21-21] [] -- @@ -332,16 +332,16 @@ LOOP END WHILE; END LOOP; -- -Script [0-68] [LOOP WHILE...END LOOP;] - StatementList [0-68] [LOOP WHILE...END LOOP;] +Script [0-67] [LOOP WHILE...END LOOP;] + StatementList [0-67] [LOOP WHILE...END LOOP;] WhileStatement [0-66] [LOOP WHILE...; END LOOP] - StatementList [7-58] [WHILE x =...END WHILE;] + StatementList [7-57] [WHILE x =...END WHILE;] WhileStatement [7-56] [WHILE x =...END WHILE] BinaryExpression(=) [13-18] [x = 3] PathExpression [13-14] [x] Identifier(x) [13-14] [x] IntLiteral(3) [17-18] [3] - StatementList [26-47] [LOOP END LOOP;] + StatementList [26-44] [LOOP END LOOP;] WhileStatement [26-43] [LOOP END LOOP] StatementList [30-30] [] -- @@ -441,7 +441,7 @@ END REPEAT Script [0-52] [REPEAT SELECT...END REPEAT] StatementList [0-52] [REPEAT SELECT...END REPEAT] RepeatStatement [0-52] [REPEAT SELECT...END REPEAT] - StatementList [9-31] [SELECT 3; SELECT 5;] + StatementList [9-30] [SELECT 3; SELECT 5;] QueryStatement [9-17] [SELECT 3] Query [9-17] [SELECT 3] Select [9-17] [SELECT 3] @@ -476,12 +476,12 @@ REPEAT UNTIL TRUE END REPEAT; -- -Script [0-82] [REPEAT REPEAT...END REPEAT;] - StatementList [0-82] [REPEAT REPEAT...END REPEAT;] +Script [0-81] [REPEAT REPEAT...END REPEAT;] + StatementList [0-81] [REPEAT REPEAT...END REPEAT;] RepeatStatement [0-80] [REPEAT REPEAT...END REPEAT] - StatementList [9-59] [REPEAT...END REPEAT;] + StatementList [9-58] [REPEAT...END REPEAT;] RepeatStatement [9-57] [REPEAT...END REPEAT] - StatementList [20-32] [SELECT x;] + StatementList [20-29] [SELECT x;] QueryStatement [20-28] [SELECT x] Query [20-28] [SELECT x] Select [20-28] [SELECT x] @@ -516,12 +516,12 @@ LOOP END REPEAT; END LOOP; -- -Script [0-80] [LOOP REPEAT...END LOOP;] - StatementList [0-80] [LOOP REPEAT...END LOOP;] +Script [0-79] [LOOP REPEAT...END LOOP;] + StatementList [0-79] [LOOP REPEAT...END LOOP;] WhileStatement [0-78] [LOOP REPEAT...; END LOOP] - StatementList [7-70] [REPEAT...END REPEAT;] + StatementList [7-69] [REPEAT...END REPEAT;] RepeatStatement [7-68] [REPEAT...END REPEAT] - StatementList [18-44] [SELECT 5; SELECT 6;] + StatementList [18-41] [SELECT 5; SELECT 6;] QueryStatement [18-26] [SELECT 5] Query [18-26] [SELECT 5] Select [18-26] [SELECT 5] @@ -559,18 +559,18 @@ REPEAT UNTIL FALSE END REPEAT; -- -Script [0-101] [REPEAT LOOP...END REPEAT;] - StatementList [0-101] [REPEAT LOOP...END REPEAT;] +Script [0-100] [REPEAT LOOP...END REPEAT;] + StatementList [0-100] [REPEAT LOOP...END REPEAT;] RepeatStatement [0-99] [REPEAT LOOP...END REPEAT] - StatementList [9-77] [LOOP WHILE...END LOOP;] + StatementList [9-76] [LOOP WHILE...END LOOP;] WhileStatement [9-75] [LOOP WHILE...END LOOP] - StatementList [18-67] [WHILE x <=...END WHILE;] + StatementList [18-64] [WHILE x <=...END WHILE;] WhileStatement [18-63] [WHILE x <=...END WHILE] BinaryExpression(<=) [24-30] [x <= 3] PathExpression [24-25] [x] Identifier(x) [24-25] [x] IntLiteral(3) [29-30] [3] - StatementList [40-54] [SELECT x;] + StatementList [40-49] [SELECT x;] QueryStatement [40-48] [SELECT x] Query [40-48] [SELECT x] Select [40-48] [SELECT x] @@ -595,8 +595,8 @@ END REPEAT ; SELECT Key AS REPEAT FROM KeyValue AS UNTIL; -- -Script [0-45] [SELECT Key...AS UNTIL;] - StatementList [0-45] [SELECT Key...AS UNTIL;] +Script [0-44] [SELECT Key...AS UNTIL;] + StatementList [0-44] [SELECT Key...AS UNTIL;] QueryStatement [0-43] [SELECT Key...KeyValue AS UNTIL] Query [0-43] [SELECT Key...KeyValue AS UNTIL] Select [0-43] [SELECT Key...KeyValue AS UNTIL] @@ -622,10 +622,10 @@ FROM repeat select x; select y; until false end repeat; -- -Script [0-51] [repeat select...end repeat;] - StatementList [0-51] [repeat select...end repeat;] +Script [0-50] [repeat select...end repeat;] + StatementList [0-50] [repeat select...end repeat;] RepeatStatement [0-49] [repeat select...end repeat] - StatementList [7-27] [select x; select y;] + StatementList [7-26] [select x; select y;] QueryStatement [7-15] [select x] Query [7-15] [select x] Select [7-15] [select x] @@ -724,8 +724,8 @@ FOR x IN (SELECT 1) DO END FOR; -- -Script [0-32] [FOR x IN (...DO END FOR;] - StatementList [0-32] [FOR x IN (...DO END FOR;] +Script [0-31] [FOR x IN (...DO END FOR;] + StatementList [0-31] [FOR x IN (...DO END FOR;] ForInStatement [0-30] [FOR x IN (SELECT 1) DO END FOR] Identifier(x) [4-5] [x] Query [10-18] [SELECT 1] @@ -749,8 +749,8 @@ DO SELECT x; END FOR; -- -Script [0-44] [FOR x IN (...; END FOR;] - StatementList [0-44] [FOR x IN (...; END FOR;] +Script [0-43] [FOR x IN (...; END FOR;] + StatementList [0-43] [FOR x IN (...; END FOR;] ForInStatement [0-42] [FOR x IN (...x; END FOR] Identifier(x) [4-5] [x] Query [10-18] [SELECT 1] @@ -758,7 +758,7 @@ Script [0-44] [FOR x IN (...; END FOR;] SelectList [17-18] [1] SelectColumn [17-18] [1] IntLiteral(1) [17-18] [1] - StatementList [25-35] [SELECT x;] + StatementList [25-34] [SELECT x;] QueryStatement [25-33] [SELECT x] Query [25-33] [SELECT x] Select [25-33] [SELECT x] @@ -786,8 +786,8 @@ DO SELECT 1; END FOR; -- -Script [0-74] [FOR x IN (...; END FOR;] - StatementList [0-74] [FOR x IN (...; END FOR;] +Script [0-73] [FOR x IN (...; END FOR;] + StatementList [0-73] [FOR x IN (...; END FOR;] ForInStatement [0-72] [FOR x IN (...1; END FOR] Identifier(x) [4-5] [x] Query [10-24] [SELECT 1, 2, 3] @@ -799,7 +799,7 @@ Script [0-74] [FOR x IN (...; END FOR;] IntLiteral(2) [20-21] [2] SelectColumn [23-24] [3] IntLiteral(3) [23-24] [3] - StatementList [31-65] [SELECT x;...SELECT 1;] + StatementList [31-64] [SELECT x;...SELECT 1;] QueryStatement [31-39] [SELECT x] Query [31-39] [SELECT x] Select [31-39] [SELECT x] @@ -846,8 +846,8 @@ DO SELECT x; END FOR; -- -Script [0-72] [FOR x IN (...; END FOR;] - StatementList [0-72] [FOR x IN (...; END FOR;] +Script [0-71] [FOR x IN (...; END FOR;] + StatementList [0-71] [FOR x IN (...; END FOR;] ForInStatement [0-70] [FOR x IN (...x; END FOR] Identifier(x) [4-5] [x] Query [10-46] [SELECT col1...tables.table1] @@ -864,7 +864,7 @@ Script [0-72] [FOR x IN (...; END FOR;] PathExpression [33-46] [tables.table1] Identifier(tables) [33-39] [tables] Identifier(table1) [40-46] [table1] - StatementList [53-63] [SELECT x;] + StatementList [53-62] [SELECT x;] QueryStatement [53-61] [SELECT x] Query [53-61] [SELECT x] Select [53-61] [SELECT x] @@ -894,8 +894,8 @@ DO SELECT x.breed; END FOR; -- -Script [0-129] [FOR x IN (...; END FOR;] - StatementList [0-129] [FOR x IN (...; END FOR;] +Script [0-128] [FOR x IN (...; END FOR;] + StatementList [0-128] [FOR x IN (...; END FOR;] ForInStatement [0-127] [FOR x IN (...breed; END FOR] Identifier(x) [4-5] [x] Query [10-80] [WITH dogs...FROM dogs] @@ -924,7 +924,7 @@ Script [0-129] [FOR x IN (...; END FOR;] TablePathExpression [76-80] [dogs] PathExpression [76-80] [dogs] Identifier(dogs) [76-80] [dogs] - StatementList [87-120] [SELECT x.name...ELECT x.breed;] + StatementList [87-119] [SELECT x.name...ELECT x.breed;] QueryStatement [87-100] [SELECT x.name] Query [87-100] [SELECT x.name] Select [87-100] [SELECT x.name] @@ -971,8 +971,8 @@ DO SELECT x; END FOR; -- -Script [0-89] [FOR x IN (...; END FOR;] - StatementList [0-89] [FOR x IN (...; END FOR;] +Script [0-88] [FOR x IN (...; END FOR;] + StatementList [0-88] [FOR x IN (...; END FOR;] ForInStatement [0-87] [FOR x IN (...x; END FOR] Identifier(x) [4-5] [x] Query [10-63] [SELECT * FROM...IN (0, 1))] @@ -1005,7 +1005,7 @@ Script [0-89] [FOR x IN (...; END FOR;] IntLiteral(0) [57-58] [0] PivotValue [60-61] [1] IntLiteral(1) [60-61] [1] - StatementList [70-80] [SELECT x;] + StatementList [70-79] [SELECT x;] QueryStatement [70-78] [SELECT x] Query [70-78] [SELECT x] Select [70-78] [SELECT x] @@ -1033,8 +1033,8 @@ DO SELECT x; END FOR; -- -Script [0-105] [FOR x IN (...; END FOR;] - StatementList [0-105] [FOR x IN (...; END FOR;] +Script [0-104] [FOR x IN (...; END FOR;] + StatementList [0-104] [FOR x IN (...; END FOR;] ForInStatement [0-103] [FOR x IN (...x; END FOR] Identifier(x) [4-5] [x] Query [10-79] [SELECT uId...REPEATABLE(10)] @@ -1054,7 +1054,7 @@ Script [0-105] [FOR x IN (...; END FOR;] SampleSuffix [65-79] [REPEATABLE(10)] RepeatableClause [65-79] [REPEATABLE(10)] IntLiteral(10) [76-78] [10] - StatementList [86-96] [SELECT x;] + StatementList [86-95] [SELECT x;] QueryStatement [86-94] [SELECT x] Query [86-94] [SELECT x] Select [86-94] [SELECT x] @@ -1086,8 +1086,8 @@ DO END FOR; END FOR; -- -Script [0-107] [FOR x IN (...; END FOR;] - StatementList [0-107] [FOR x IN (...; END FOR;] +Script [0-106] [FOR x IN (...; END FOR;] + StatementList [0-106] [FOR x IN (...; END FOR;] ForInStatement [0-105] [FOR x IN (...FOR; END FOR] Identifier(x) [4-5] [x] Query [10-24] [SELECT 1, 2, 3] @@ -1099,7 +1099,7 @@ Script [0-107] [FOR x IN (...; END FOR;] IntLiteral(2) [20-21] [2] SelectColumn [23-24] [3] IntLiteral(3) [23-24] [3] - StatementList [31-98] [FOR y IN (...END FOR;] + StatementList [31-97] [FOR y IN (...END FOR;] ForInStatement [31-96] [FOR y IN (...END FOR] Identifier(y) [35-36] [y] Query [41-52] [SELECT 4, 5] @@ -1109,7 +1109,7 @@ Script [0-107] [FOR x IN (...; END FOR;] IntLiteral(4) [48-49] [4] SelectColumn [51-52] [5] IntLiteral(5) [51-52] [5] - StatementList [63-89] [SELECT x; SELECT y;] + StatementList [63-86] [SELECT x; SELECT y;] QueryStatement [63-71] [SELECT x] Query [63-71] [SELECT x] Select [63-71] [SELECT x] @@ -1156,10 +1156,10 @@ LOOP END FOR; END LOOP; -- -Script [0-70] [LOOP FOR...END LOOP;] - StatementList [0-70] [LOOP FOR...END LOOP;] +Script [0-69] [LOOP FOR...END LOOP;] + StatementList [0-69] [LOOP FOR...END LOOP;] WhileStatement [0-68] [LOOP FOR...; END LOOP] - StatementList [7-60] [FOR y IN (...END FOR;] + StatementList [7-59] [FOR y IN (...END FOR;] ForInStatement [7-58] [FOR y IN (...END FOR] Identifier(y) [11-12] [y] Query [17-28] [SELECT 1, 2] @@ -1169,7 +1169,7 @@ Script [0-70] [LOOP FOR...END LOOP;] IntLiteral(1) [24-25] [1] SelectColumn [27-28] [2] IntLiteral(2) [27-28] [2] - StatementList [39-51] [SELECT y;] + StatementList [39-48] [SELECT y;] QueryStatement [39-47] [SELECT y] Query [39-47] [SELECT y] Select [39-47] [SELECT y] @@ -1203,11 +1203,11 @@ WHILE TRUE DO END FOR; END WHILE; -- -Script [0-128] [WHILE TRUE...END WHILE;] - StatementList [0-128] [WHILE TRUE...END WHILE;] +Script [0-127] [WHILE TRUE...END WHILE;] + StatementList [0-127] [WHILE TRUE...END WHILE;] WhileStatement [0-126] [WHILE TRUE...END WHILE] BooleanLiteral(TRUE) [6-10] [TRUE] - StatementList [16-117] [FOR y IN (...END FOR;] + StatementList [16-116] [FOR y IN (...END FOR;] ForInStatement [16-115] [FOR y IN (...END FOR] Identifier(y) [20-21] [y] Query [26-37] [SELECT 1, 2] @@ -1217,9 +1217,9 @@ Script [0-128] [WHILE TRUE...END WHILE;] IntLiteral(1) [33-34] [1] SelectColumn [36-37] [2] IntLiteral(2) [36-37] [2] - StatementList [48-108] [REPEAT...END REPEAT;] + StatementList [48-105] [REPEAT...END REPEAT;] RepeatStatement [48-104] [REPEAT...END REPEAT] - StatementList [61-78] [SELECT 3, 4;] + StatementList [61-73] [SELECT 3, 4;] QueryStatement [61-72] [SELECT 3, 4] Query [61-72] [SELECT 3, 4] Select [61-72] [SELECT 3, 4] @@ -1252,8 +1252,8 @@ END WHILE ; # FOR...IN in lowercase for x in (select 1) do select x; select 2; end for; -- -Script [0-52] [for x in (...; end for;] - StatementList [0-52] [for x in (...; end for;] +Script [0-51] [for x in (...; end for;] + StatementList [0-51] [for x in (...; end for;] ForInStatement [0-50] [for x in (...2; end for] Identifier(x) [4-5] [x] Query [10-18] [select 1] @@ -1261,7 +1261,7 @@ Script [0-52] [for x in (...; end for;] SelectList [17-18] [1] SelectColumn [17-18] [1] IntLiteral(1) [17-18] [1] - StatementList [23-43] [select x; select 2;] + StatementList [23-42] [select x; select 2;] QueryStatement [23-31] [select x] Query [23-31] [select x] Select [23-31] [select x] diff --git a/zetasql/parser/testdata/macros.test b/zetasql/parser/testdata/macros.test index 82ba08acb..928aab00c 100644 --- a/zetasql/parser/testdata/macros.test +++ b/zetasql/parser/testdata/macros.test @@ -51,7 +51,8 @@ DEFINE MACRO m1 /* unfinished comment ^ == -# Invalid tokens are still invalid in macros +# Invalid tokens are still invalid in macros in strict mode +[language_features=V_1_4_SQL_MACROS,V_1_4_ENFORCE_STRICT_MACROS] DEFINE MACRO m1 3m -- ERROR: Syntax error: Missing whitespace between literal and alias [at 1:18] @@ -59,6 +60,17 @@ DEFINE MACRO m1 3m ^ == +# Invalid tokens allowed in lenient mode +[language_features=V_1_4_SQL_MACROS] +DEFINE MACRO m1 3m +-- +DefineMacroStatement [0-18] [DEFINE MACRO m1 3m] + Identifier(m1) [13-15] [m1] + MacroBody(3m) [16-18] [3m] +-- +DEFINE MACRO m1 3m +== + # Tokens do now splice, so * and / are acceptable as 2 tokens here DEFINE MACRO m1 */ -- diff --git a/zetasql/parser/testdata/modules.test b/zetasql/parser/testdata/modules.test index 9355a31a9..c251b87ac 100644 --- a/zetasql/parser/testdata/modules.test +++ b/zetasql/parser/testdata/modules.test @@ -464,7 +464,8 @@ ModuleStatement [0-42] [module foo...b', c=3.0)] IntLiteral(1) [26-27] [1] OptionsEntry [29-34] [b='b'] Identifier(b) [29-30] [b] - StringLiteral('b') [31-34] ['b'] + StringLiteral [31-34] ['b'] + StringLiteralComponent('b') [31-34] ['b'] OptionsEntry [36-41] [c=3.0] Identifier(c) [36-37] [c] FloatLiteral(3.0) [38-41] [3.0] @@ -615,7 +616,8 @@ create or replace module foo; import module "file.proto"; -- ImportStatement [0-26] [import module "file.proto"] - StringLiteral("file.proto") [14-26] ["file.proto"] + StringLiteral [14-26] ["file.proto"] + StringLiteralComponent("file.proto") [14-26] ["file.proto"] -- IMPORT MODULE "file.proto" == @@ -637,7 +639,8 @@ IMPORT MODULE name.path INTO a IMPORT MODULE 'name/path' INTO a; -- ImportStatement [0-32] [IMPORT MODULE...path' INTO a] - StringLiteral('name/path') [14-25] ['name/path'] + StringLiteral [14-25] ['name/path'] + StringLiteralComponent('name/path') [14-25] ['name/path'] IntoAlias [26-32] [INTO a] Identifier(a) [31-32] [a] -- diff --git a/zetasql/parser/testdata/named_arguments.test b/zetasql/parser/testdata/named_arguments.test index d08854b8b..a25b89e86 100644 --- a/zetasql/parser/testdata/named_arguments.test +++ b/zetasql/parser/testdata/named_arguments.test @@ -15,10 +15,12 @@ QueryStatement [0-67] [select parse_date...12/25/08")] Identifier(parse_date) [7-17] [parse_date] NamedArgument [18-39] [format_string => "%x"] Identifier(format_string) [18-31] [format_string] - StringLiteral("%x") [35-39] ["%x"] + StringLiteral [35-39] ["%x"] + StringLiteralComponent("%x") [35-39] ["%x"] NamedArgument [41-66] [date_string => "12/25/08"] Identifier(date_string) [41-52] [date_string] - StringLiteral("12/25/08") [56-66] ["12/25/08"] + StringLiteral [56-66] ["12/25/08"] + StringLiteralComponent("12/25/08") [56-66] ["12/25/08"] -- SELECT parse_date(format_string => "%x", date_string => "12/25/08") @@ -38,10 +40,12 @@ QueryStatement [0-67] [select parse_date...g => "%x")] Identifier(parse_date) [7-17] [parse_date] NamedArgument [18-43] [date_string => "12/25/08"] Identifier(date_string) [18-29] [date_string] - StringLiteral("12/25/08") [33-43] ["12/25/08"] + StringLiteral [33-43] ["12/25/08"] + StringLiteralComponent("12/25/08") [33-43] ["12/25/08"] NamedArgument [45-66] [format_string => "%x"] Identifier(format_string) [45-58] [format_string] - StringLiteral("%x") [62-66] ["%x"] + StringLiteral [62-66] ["%x"] + StringLiteralComponent("%x") [62-66] ["%x"] -- SELECT parse_date(date_string => "12/25/08", format_string => "%x") @@ -65,8 +69,10 @@ QueryStatement [0-118] [select parse_date..."/25/08"))] FunctionCall [40-56] [concat("%", "x")] PathExpression [40-46] [concat] Identifier(concat) [40-46] [concat] - StringLiteral("%") [47-50] ["%"] - StringLiteral("x") [52-55] ["x"] + StringLiteral [47-50] ["%"] + StringLiteralComponent("%") [47-50] ["%"] + StringLiteral [52-55] ["x"] + StringLiteralComponent("x") [52-55] ["x"] NamedArgument [62-117] [date_string..."/25/08")] Identifier(date_string) [62-73] [date_string] FunctionCall [77-117] [concat(cast..."/25/08")] @@ -79,7 +85,8 @@ QueryStatement [0-118] [select parse_date..."/25/08"))] SimpleType [99-105] [string] PathExpression [99-105] [string] Identifier(string) [99-105] [string] - StringLiteral("/25/08") [108-116] ["/25/08"] + StringLiteral [108-116] ["/25/08"] + StringLiteralComponent("/25/08") [108-116] ["/25/08"] -- SELECT parse_date(format_string => concat("%", "x"), date_string => concat(CAST(10 + 2 AS string), "/25/08")) @@ -96,10 +103,12 @@ QueryStatement [0-50] [select parse_date...12/25/08")] FunctionCall [7-50] [parse_date...12/25/08")] PathExpression [7-17] [parse_date] Identifier(parse_date) [7-17] [parse_date] - StringLiteral("%x") [18-22] ["%x"] + StringLiteral [18-22] ["%x"] + StringLiteralComponent("%x") [18-22] ["%x"] NamedArgument [24-49] [date_string => "12/25/08"] Identifier(date_string) [24-35] [date_string] - StringLiteral("12/25/08") [39-49] ["12/25/08"] + StringLiteral [39-49] ["12/25/08"] + StringLiteralComponent("12/25/08") [39-49] ["12/25/08"] -- SELECT parse_date("%x", date_string => "12/25/08") @@ -121,11 +130,13 @@ QueryStatement [0-69] [select * from...tring => "%x")] TVFArgument [20-45] [date_string => "12/25/08"] NamedArgument [20-45] [date_string => "12/25/08"] Identifier(date_string) [20-31] [date_string] - StringLiteral("12/25/08") [35-45] ["12/25/08"] + StringLiteral [35-45] ["12/25/08"] + StringLiteralComponent("12/25/08") [35-45] ["12/25/08"] TVFArgument [47-68] [format_string => "%x"] NamedArgument [47-68] [format_string => "%x"] Identifier(format_string) [47-60] [format_string] - StringLiteral("%x") [64-68] ["%x"] + StringLiteral [64-68] ["%x"] + StringLiteralComponent("%x") [64-68] ["%x"] -- SELECT * @@ -153,7 +164,8 @@ QueryStatement [0-118] [select * from...keyvalue))] TVFArgument [25-70] [date_string...value '''] NamedArgument [25-70] [date_string...value '''] Identifier(date_string) [25-36] [date_string] - StringLiteral(''' + StringLiteral [40-70] [''' field: value '''] + StringLiteralComponent(''' field: value ''') [40-70] [''' field: value '''] TVFArgument [76-117] [format_string...keyvalue)] @@ -197,7 +209,8 @@ QueryStatement [0-65] [select * from...as input))] PathExpression [14-19] [mytvf] Identifier(mytvf) [14-19] [mytvf] TVFArgument [20-24] ["%x"] - StringLiteral("%x") [20-24] ["%x"] + StringLiteral [20-24] ["%x"] + StringLiteralComponent("%x") [20-24] ["%x"] TVFArgument [26-64] [date_string...as input)] NamedArgument [26-64] [date_string...as input)] Identifier(date_string) [26-37] [date_string] @@ -206,7 +219,8 @@ QueryStatement [0-65] [select * from...as input))] Select [42-63] [select 'abc' as input] SelectList [49-63] ['abc' as input] SelectColumn [49-63] ['abc' as input] - StringLiteral('abc') [49-54] ['abc'] + StringLiteral [49-54] ['abc'] + StringLiteralComponent('abc') [49-54] ['abc'] Alias [55-63] [as input] Identifier(input) [58-63] [input] -- @@ -264,8 +278,10 @@ QueryStatement [0-52] [select parse_date...12/25/08")] Identifier(parse_date) [7-17] [parse_date] NamedArgument [18-39] [format_string => "%x"] Identifier(format_string) [18-31] [format_string] - StringLiteral("%x") [35-39] ["%x"] - StringLiteral("12/25/08") [41-51] ["12/25/08"] + StringLiteral [35-39] ["%x"] + StringLiteralComponent("%x") [35-39] ["%x"] + StringLiteral [41-51] ["12/25/08"] + StringLiteralComponent("12/25/08") [41-51] ["12/25/08"] -- SELECT parse_date(format_string => "%x", "12/25/08") @@ -284,7 +300,8 @@ QueryStatement [0-40] [select parse_date...g => "%x")] Identifier(parse_date) [7-17] [parse_date] NamedArgument [18-39] [format_string => "%x"] Identifier(format_string) [18-31] [format_string] - StringLiteral("%x") [35-39] ["%x"] + StringLiteral [35-39] ["%x"] + StringLiteralComponent("%x") [35-39] ["%x"] -- SELECT parse_date(format_string => "%x") @@ -301,10 +318,12 @@ QueryStatement [0-50] [select parse_date...12/25/08")] FunctionCall [7-50] [parse_date...12/25/08")] PathExpression [7-17] [parse_date] Identifier(parse_date) [7-17] [parse_date] - StringLiteral("%x") [18-22] ["%x"] + StringLiteral [18-22] ["%x"] + StringLiteralComponent("%x") [18-22] ["%x"] NamedArgument [24-49] [date_string => "12/25/08"] Identifier(date_string) [24-35] [date_string] - StringLiteral("12/25/08") [39-49] ["12/25/08"] + StringLiteral [39-49] ["12/25/08"] + StringLiteralComponent("12/25/08") [39-49] ["12/25/08"] -- SELECT parse_date("%x", date_string => "12/25/08") diff --git a/zetasql/parser/testdata/normalize.test b/zetasql/parser/testdata/normalize.test index c3b64138a..8c3117221 100644 --- a/zetasql/parser/testdata/normalize.test +++ b/zetasql/parser/testdata/normalize.test @@ -12,33 +12,38 @@ QueryStatement [0-165] [SELECT NORMALIZE...efg', NFKD)] FunctionCall [7-27] [NORMALIZE('abcdefg')] PathExpression [7-16] [NORMALIZE] Identifier(NORMALIZE) [7-16] [NORMALIZE] - StringLiteral('abcdefg') [17-26] ['abcdefg'] + StringLiteral [17-26] ['abcdefg'] + StringLiteralComponent('abcdefg') [17-26] ['abcdefg'] SelectColumn [36-61] [NORMALIZE('abcdefg', NFC)] FunctionCall [36-61] [NORMALIZE('abcdefg', NFC)] PathExpression [36-45] [NORMALIZE] Identifier(NORMALIZE) [36-45] [NORMALIZE] - StringLiteral('abcdefg') [46-55] ['abcdefg'] + StringLiteral [46-55] ['abcdefg'] + StringLiteralComponent('abcdefg') [46-55] ['abcdefg'] PathExpression [57-60] [NFC] Identifier(NFC) [57-60] [NFC] SelectColumn [70-96] [NORMALIZE('abcdefg', NFKC)] FunctionCall [70-96] [NORMALIZE('abcdefg', NFKC)] PathExpression [70-79] [NORMALIZE] Identifier(NORMALIZE) [70-79] [NORMALIZE] - StringLiteral('abcdefg') [80-89] ['abcdefg'] + StringLiteral [80-89] ['abcdefg'] + StringLiteralComponent('abcdefg') [80-89] ['abcdefg'] PathExpression [91-95] [NFKC] Identifier(NFKC) [91-95] [NFKC] SelectColumn [105-130] [NORMALIZE('abcdefg', NFD)] FunctionCall [105-130] [NORMALIZE('abcdefg', NFD)] PathExpression [105-114] [NORMALIZE] Identifier(NORMALIZE) [105-114] [NORMALIZE] - StringLiteral('abcdefg') [115-124] ['abcdefg'] + StringLiteral [115-124] ['abcdefg'] + StringLiteralComponent('abcdefg') [115-124] ['abcdefg'] PathExpression [126-129] [NFD] Identifier(NFD) [126-129] [NFD] SelectColumn [139-165] [NORMALIZE('abcdefg', NFKD)] FunctionCall [139-165] [NORMALIZE('abcdefg', NFKD)] PathExpression [139-148] [NORMALIZE] Identifier(NORMALIZE) [139-148] [NORMALIZE] - StringLiteral('abcdefg') [149-158] ['abcdefg'] + StringLiteral [149-158] ['abcdefg'] + StringLiteralComponent('abcdefg') [149-158] ['abcdefg'] PathExpression [160-164] [NFKD] Identifier(NFKD) [160-164] [NFKD] -- @@ -63,7 +68,8 @@ QueryStatement [0-32] [SELECT NORMALIZE...defg', XYZ)] FunctionCall [7-32] [NORMALIZE('abcdefg', XYZ)] PathExpression [7-16] [NORMALIZE] Identifier(NORMALIZE) [7-16] [NORMALIZE] - StringLiteral('abcdefg') [17-26] ['abcdefg'] + StringLiteral [17-26] ['abcdefg'] + StringLiteralComponent('abcdefg') [17-26] ['abcdefg'] PathExpression [28-31] [XYZ] Identifier(XYZ) [28-31] [XYZ] -- @@ -85,33 +91,38 @@ QueryStatement [0-230] [SELECT NORMALIZE_...fg', NFKD)] FunctionCall [7-40] [NORMALIZE_AND_CAS...'abcdefg')] PathExpression [7-29] [NORMALIZE_AND_CASEFOLD] Identifier(NORMALIZE_AND_CASEFOLD) [7-29] [NORMALIZE_AND_CASEFOLD] - StringLiteral('abcdefg') [30-39] ['abcdefg'] + StringLiteral [30-39] ['abcdefg'] + StringLiteralComponent('abcdefg') [30-39] ['abcdefg'] SelectColumn [49-87] [NORMALIZE_AND_CAS...efg', NFC)] FunctionCall [49-87] [NORMALIZE_AND_CAS...efg', NFC)] PathExpression [49-71] [NORMALIZE_AND_CASEFOLD] Identifier(NORMALIZE_AND_CASEFOLD) [49-71] [NORMALIZE_AND_CASEFOLD] - StringLiteral('abcdefg') [72-81] ['abcdefg'] + StringLiteral [72-81] ['abcdefg'] + StringLiteralComponent('abcdefg') [72-81] ['abcdefg'] PathExpression [83-86] [NFC] Identifier(NFC) [83-86] [NFC] SelectColumn [96-135] [NORMALIZE_AND_CAS...fg', NFKC)] FunctionCall [96-135] [NORMALIZE_AND_CAS...fg', NFKC)] PathExpression [96-118] [NORMALIZE_AND_CASEFOLD] Identifier(NORMALIZE_AND_CASEFOLD) [96-118] [NORMALIZE_AND_CASEFOLD] - StringLiteral('abcdefg') [119-128] ['abcdefg'] + StringLiteral [119-128] ['abcdefg'] + StringLiteralComponent('abcdefg') [119-128] ['abcdefg'] PathExpression [130-134] [NFKC] Identifier(NFKC) [130-134] [NFKC] SelectColumn [144-182] [NORMALIZE_AND_CAS...efg', NFD)] FunctionCall [144-182] [NORMALIZE_AND_CAS...efg', NFD)] PathExpression [144-166] [NORMALIZE_AND_CASEFOLD] Identifier(NORMALIZE_AND_CASEFOLD) [144-166] [NORMALIZE_AND_CASEFOLD] - StringLiteral('abcdefg') [167-176] ['abcdefg'] + StringLiteral [167-176] ['abcdefg'] + StringLiteralComponent('abcdefg') [167-176] ['abcdefg'] PathExpression [178-181] [NFD] Identifier(NFD) [178-181] [NFD] SelectColumn [191-230] [NORMALIZE_AND_CAS...fg', NFKD)] FunctionCall [191-230] [NORMALIZE_AND_CAS...fg', NFKD)] PathExpression [191-213] [NORMALIZE_AND_CASEFOLD] Identifier(NORMALIZE_AND_CASEFOLD) [191-213] [NORMALIZE_AND_CASEFOLD] - StringLiteral('abcdefg') [214-223] ['abcdefg'] + StringLiteral [214-223] ['abcdefg'] + StringLiteralComponent('abcdefg') [214-223] ['abcdefg'] PathExpression [225-229] [NFKD] Identifier(NFKD) [225-229] [NFKD] -- @@ -176,15 +187,19 @@ QueryStatement [0-63] [SELECT NORMALIZE..."NFC", 3)] FunctionCall [7-33] [NORMALIZE("abc", "NFC", 3)] PathExpression [7-16] [NORMALIZE] Identifier(NORMALIZE) [7-16] [NORMALIZE] - StringLiteral("abc") [17-22] ["abc"] - StringLiteral("NFC") [24-29] ["NFC"] + StringLiteral [17-22] ["abc"] + StringLiteralComponent("abc") [17-22] ["abc"] + StringLiteral [24-29] ["NFC"] + StringLiteralComponent("NFC") [24-29] ["NFC"] IntLiteral(3) [31-32] [3] SelectColumn [35-63] [`NORMALIZE`("abc", "NFC", 3)] FunctionCall [35-63] [`NORMALIZE`("abc", "NFC", 3)] PathExpression [35-46] [`NORMALIZE`] Identifier(NORMALIZE) [35-46] [`NORMALIZE`] - StringLiteral("abc") [47-52] ["abc"] - StringLiteral("NFC") [54-59] ["NFC"] + StringLiteral [47-52] ["abc"] + StringLiteralComponent("abc") [47-52] ["abc"] + StringLiteral [54-59] ["NFC"] + StringLiteralComponent("NFC") [54-59] ["NFC"] IntLiteral(3) [61-62] [3] -- SELECT @@ -205,16 +220,20 @@ QueryStatement [0-100] [SELECT NORMALIZE_...NFC", def)] FunctionCall [7-48] [NORMALIZE_AND_CAS...NFC", def)] PathExpression [7-29] [NORMALIZE_AND_CASEFOLD] Identifier(NORMALIZE_AND_CASEFOLD) [7-29] [NORMALIZE_AND_CASEFOLD] - StringLiteral("abc") [30-35] ["abc"] - StringLiteral("NFC") [37-42] ["NFC"] + StringLiteral [30-35] ["abc"] + StringLiteralComponent("abc") [30-35] ["abc"] + StringLiteral [37-42] ["NFC"] + StringLiteralComponent("NFC") [37-42] ["NFC"] PathExpression [44-47] [def] Identifier(def) [44-47] [def] SelectColumn [57-100] [`NORMALIZE_AND_CA...NFC", def)] FunctionCall [57-100] [`NORMALIZE_AND_CA...NFC", def)] PathExpression [57-81] [`NORMALIZE_AND_CASEFOLD`] Identifier(NORMALIZE_AND_CASEFOLD) [57-81] [`NORMALIZE_AND_CASEFOLD`] - StringLiteral("abc") [82-87] ["abc"] - StringLiteral("NFC") [89-94] ["NFC"] + StringLiteral [82-87] ["abc"] + StringLiteralComponent("abc") [82-87] ["abc"] + StringLiteral [89-94] ["NFC"] + StringLiteralComponent("NFC") [89-94] ["NFC"] PathExpression [96-99] [def] Identifier(def) [96-99] [def] -- diff --git a/zetasql/parser/testdata/operator_precedence.test b/zetasql/parser/testdata/operator_precedence.test index 7dfb5b089..6f440288b 100644 --- a/zetasql/parser/testdata/operator_precedence.test +++ b/zetasql/parser/testdata/operator_precedence.test @@ -1188,7 +1188,8 @@ QueryStatement [0-16] [SELECT "bar".foo] SelectList [7-16] ["bar".foo] SelectColumn [7-16] ["bar".foo] DotIdentifier [7-16] ["bar".foo] - StringLiteral("bar") [7-12] ["bar"] + StringLiteral [7-12] ["bar"] + StringLiteralComponent("bar") [7-12] ["bar"] Identifier(foo) [13-16] [foo] -- SELECT @@ -1202,7 +1203,8 @@ QueryStatement [0-17] [SELECT b"bar".foo] SelectList [7-17] [b"bar".foo] SelectColumn [7-17] [b"bar".foo] DotIdentifier [7-17] [b"bar".foo] - BytesLiteral(b"bar") [7-13] [b"bar"] + BytesLiteral [7-13] [b"bar"] + BytesLiteralComponent(b"bar") [7-13] [b"bar"] Identifier(foo) [14-17] [foo] -- SELECT @@ -1347,7 +1349,8 @@ QueryStatement [0-28] [SELECT DATE "2016-01-01".foo] SelectColumn [7-28] [DATE "2016-01-01".foo] DotIdentifier [7-28] [DATE "2016-01-01".foo] DateOrTimeLiteral(TYPE_DATE) [7-24] [DATE "2016-01-01"] - StringLiteral("2016-01-01") [12-24] ["2016-01-01"] + StringLiteral [12-24] ["2016-01-01"] + StringLiteralComponent("2016-01-01") [12-24] ["2016-01-01"] Identifier(foo) [25-28] [foo] -- SELECT @@ -1362,7 +1365,8 @@ QueryStatement [0-42] [SELECT TIMESTAMP...00:01".foo] SelectColumn [7-42] [TIMESTAMP...00:01".foo] DotIdentifier [7-42] [TIMESTAMP...00:01".foo] DateOrTimeLiteral(TYPE_TIMESTAMP) [7-38] [TIMESTAMP...00:00:01"] - StringLiteral("2016-01-01 00:00:01") [17-38] ["2016-01-01 00:00:01"] + StringLiteral [17-38] ["2016-01-01 00:00:01"] + StringLiteralComponent("2016-01-01 00:00:01") [17-38] ["2016-01-01 00:00:01"] Identifier(foo) [39-42] [foo] -- SELECT diff --git a/zetasql/parser/testdata/options.test b/zetasql/parser/testdata/options.test index d83433552..7d296379e 100644 --- a/zetasql/parser/testdata/options.test +++ b/zetasql/parser/testdata/options.test @@ -41,7 +41,8 @@ ModuleStatement [0-44] [MODULE foo...= 2.0+57)] IntLiteral(1) [21-22] [1] OptionsEntry [24-31] [b = 'b'] Identifier(b) [24-25] [b] - StringLiteral('b') [28-31] ['b'] + StringLiteral [28-31] ['b'] + StringLiteralComponent('b') [28-31] ['b'] OptionsEntry [33-43] [c = 2.0+57] Identifier(c) [33-34] [c] BinaryExpression(+) [37-43] [2.0+57] diff --git a/zetasql/parser/testdata/order_by_collate.test b/zetasql/parser/testdata/order_by_collate.test index 9c32d7c37..309da70ac 100644 --- a/zetasql/parser/testdata/order_by_collate.test +++ b/zetasql/parser/testdata/order_by_collate.test @@ -18,7 +18,8 @@ QueryStatement [0-48] [select * from...en_US" ASC] PathExpression [25-28] [col] Identifier(col) [25-28] [col] Collate [29-44] [COLLATE "en_US"] - StringLiteral("en_US") [37-44] ["en_US"] + StringLiteral [37-44] ["en_US"] + StringLiteralComponent("en_US") [37-44] ["en_US"] -- SELECT * @@ -77,7 +78,8 @@ QueryStatement [0-111] [select * from...OLLATE @c4 ASC] PathExpression [44-48] [col2] Identifier(col2) [44-48] [col2] Collate [49-61] [COLLATE "c2"] - StringLiteral("c2") [57-61] ["c2"] + StringLiteral [57-61] ["c2"] + StringLiteralComponent("c2") [57-61] ["c2"] OrderingExpression(DESC) [77-83] [3 DESC] IntLiteral(3) [77-78] [3] OrderingExpression(ASC EXPLICITLY) [94-111] [4 COLLATE @c4 ASC] @@ -110,7 +112,8 @@ QueryStatement [0-48] [select f()...en") from T] PathExpression [26-27] [a] Identifier(a) [26-27] [a] Collate [28-40] [COLLATE "en"] - StringLiteral("en") [36-40] ["en"] + StringLiteral [36-40] ["en"] + StringLiteralComponent("en") [36-40] ["en"] FromClause [42-48] [from T] TablePathExpression [47-48] [T] PathExpression [47-48] [T] @@ -150,7 +153,8 @@ QueryStatement [0-62] [select f()...en") from T] PathExpression [40-41] [a] Identifier(a) [40-41] [a] Collate [42-54] [COLLATE "en"] - StringLiteral("en") [50-54] ["en"] + StringLiteral [50-54] ["en"] + StringLiteralComponent("en") [50-54] ["en"] FromClause [56-62] [from T] TablePathExpression [61-62] [T] PathExpression [61-62] [T] @@ -183,12 +187,14 @@ QueryStatement [0-183] [select f()...@c4 DESC)] PathExpression [26-27] [a] Identifier(a) [26-27] [a] Collate [28-40] [COLLATE "ca"] - StringLiteral("ca") [36-40] ["ca"] + StringLiteral [36-40] ["ca"] + StringLiteralComponent("ca") [36-40] ["ca"] OrderingExpression(DESC) [72-91] [b COLLATE "cb" DESC] PathExpression [72-73] [b] Identifier(b) [72-73] [b] Collate [74-86] [COLLATE "cb"] - StringLiteral("cb") [82-86] ["cb"] + StringLiteral [82-86] ["cb"] + StringLiteralComponent("cb") [82-86] ["cb"] OrderingExpression(ASC EXPLICITLY) [119-136] [3 COLLATE @c3 ASC] IntLiteral(3) [119-120] [3] Collate [121-132] [COLLATE @c3] @@ -224,7 +230,8 @@ QueryStatement [0-37] [select * from...COLLATE ""] OrderingExpression(ASC) [25-37] [1 COLLATE ""] IntLiteral(1) [25-26] [1] Collate [27-37] [COLLATE ""] - StringLiteral("") [35-37] [""] + StringLiteral [35-37] [""] + StringLiteralComponent("") [35-37] [""] -- SELECT * @@ -254,7 +261,8 @@ QueryStatement [0-41] [select * from...COLLATE "NULL"] OrderingExpression(ASC) [25-41] [1 COLLATE "NULL"] IntLiteral(1) [25-26] [1] Collate [27-41] [COLLATE "NULL"] - StringLiteral("NULL") [35-41] ["NULL"] + StringLiteral [35-41] ["NULL"] + StringLiteralComponent("NULL") [35-41] ["NULL"] -- SELECT * @@ -358,7 +366,8 @@ QueryStatement [0-53] [select * from..."en_US")=0] Identifier(expr1) [29-34] [expr1] PathExpression [36-41] [expr2] Identifier(expr2) [36-41] [expr2] - StringLiteral("en_US") [43-50] ["en_US"] + StringLiteral [43-50] ["en_US"] + StringLiteralComponent("en_US") [43-50] ["en_US"] IntLiteral(0) [52-53] [0] -- SELECT @@ -392,7 +401,8 @@ QueryStatement [0-52] [select * from...en_US")!=0] Identifier(str1) [29-33] [str1] PathExpression [35-39] [str2] Identifier(str2) [35-39] [str2] - StringLiteral("en_US") [41-48] ["en_US"] + StringLiteral [41-48] ["en_US"] + StringLiteralComponent("en_US") [41-48] ["en_US"] IntLiteral(0) [51-52] [0] -- SELECT diff --git a/zetasql/parser/testdata/orderby.test b/zetasql/parser/testdata/orderby.test index f2e4f5cab..4fea63182 100644 --- a/zetasql/parser/testdata/orderby.test +++ b/zetasql/parser/testdata/orderby.test @@ -630,3 +630,12 @@ SELECT FROM T ORDER BY 1 NULLS LAST, first DESC NULLS FIRST, last ASC NULLS LAST +== + +# Trailing commas don't work. +select a from t +order by 1, +-- +ERROR: Syntax error: Unexpected end of statement [at 2:12] +order by 1, + ^ diff --git a/zetasql/parser/testdata/parameters.test b/zetasql/parser/testdata/parameters.test index e07e7f552..204d67b41 100644 --- a/zetasql/parser/testdata/parameters.test +++ b/zetasql/parser/testdata/parameters.test @@ -223,6 +223,93 @@ WHERE @@aId = @@bId == +# This test is here to document current behavior and notice regressions. +select {{@|@@}} aId {{AS|FROM|3d}} t; +-- + +ALTERNATION GROUP: @,AS +-- +QueryStatement [0-17] [select @ aId AS t] + Query [0-17] [select @ aId AS t] + Select [0-17] [select @ aId AS t] + SelectList [7-17] [@ aId AS t] + SelectColumn [7-17] [@ aId AS t] + ParameterExpr [7-12] [@ aId] + Identifier(aId) [9-12] [aId] + Alias [13-17] [AS t] + Identifier(t) [16-17] [t] +-- +SELECT + @aId AS t +-- +ALTERNATION GROUP: @,FROM +-- +QueryStatement [0-19] [select @ aId FROM t] + Query [0-19] [select @ aId FROM t] + Select [0-19] [select @ aId FROM t] + SelectList [7-12] [@ aId] + SelectColumn [7-12] [@ aId] + ParameterExpr [7-12] [@ aId] + Identifier(aId) [9-12] [aId] + FromClause [13-19] [FROM t] + TablePathExpression [18-19] [t] + PathExpression [18-19] [t] + Identifier(t) [18-19] [t] +-- +SELECT + @aId +FROM + t +-- +ALTERNATION GROUP: @,3d +-- +ERROR: Syntax error: Missing whitespace between literal and alias [at 1:15] +select @ aId 3d t; + ^ +-- +ALTERNATION GROUP: @@,AS +-- +QueryStatement [0-18] [select @@ aId AS t] + Query [0-18] [select @@ aId AS t] + Select [0-18] [select @@ aId AS t] + SelectList [7-18] [@@ aId AS t] + SelectColumn [7-18] [@@ aId AS t] + SystemVariableExpr [7-13] [@@ aId] + PathExpression [10-13] [aId] + Identifier(aId) [10-13] [aId] + Alias [14-18] [AS t] + Identifier(t) [17-18] [t] +-- +SELECT + @@aId AS t +-- +ALTERNATION GROUP: @@,FROM +-- +QueryStatement [0-20] [select @@ aId FROM t] + Query [0-20] [select @@ aId FROM t] + Select [0-20] [select @@ aId FROM t] + SelectList [7-13] [@@ aId] + SelectColumn [7-13] [@@ aId] + SystemVariableExpr [7-13] [@@ aId] + PathExpression [10-13] [aId] + Identifier(aId) [10-13] [aId] + FromClause [14-20] [FROM t] + TablePathExpression [19-20] [t] + PathExpression [19-20] [t] + Identifier(t) [19-20] [t] +-- +SELECT + @@aId +FROM + t +-- +ALTERNATION GROUP: @@,3d +-- +ERROR: Syntax error: Missing whitespace between literal and alias [at 1:16] +select @@ aId 3d t; + ^ +== + # This shows an error that is an unfortunate consequence of the allowed # whitespace separation following @ or @@ select * from T where @{{|@}} diff --git a/zetasql/parser/testdata/parser.test b/zetasql/parser/testdata/parser.test index 0130cc175..6044cf936 100644 --- a/zetasql/parser/testdata/parser.test +++ b/zetasql/parser/testdata/parser.test @@ -244,11 +244,14 @@ QueryStatement [0-67] [select "abc...def */ ghi"] Select [0-67] [select "abc...def */ ghi"] SelectList [7-67] ["abc -- def...def */ ghi"] SelectColumn [7-19] ["abc -- def"] - StringLiteral("abc -- def") [7-19] ["abc -- def"] + StringLiteral [7-19] ["abc -- def"] + StringLiteralComponent("abc -- def") [7-19] ["abc -- def"] SelectColumn [28-39] ["abc # def"] - StringLiteral("abc # def") [28-39] ["abc # def"] + StringLiteral [28-39] ["abc # def"] + StringLiteralComponent("abc # def") [28-39] ["abc # def"] SelectColumn [48-67] ["abc /* def */ ghi"] - StringLiteral("abc /* def */ ghi") [48-67] ["abc /* def */ ghi"] + StringLiteral [48-67] ["abc /* def */ ghi"] + StringLiteralComponent("abc /* def */ ghi") [48-67] ["abc /* def */ ghi"] -- SELECT "abc -- def", @@ -392,7 +395,8 @@ QueryStatement [0-40] [select f1,...f1, "b" f2)] Alias [29-31] [f1] Identifier(f1) [29-31] [f1] SelectColumn [33-39] ["b" f2] - StringLiteral("b") [33-36] ["b"] + StringLiteral [33-36] ["b"] + StringLiteralComponent("b") [33-36] ["b"] Alias [37-39] [f2] Identifier(f2) [37-39] [f2] -- @@ -722,7 +726,8 @@ QueryStatement [0-45] [select f(1...bar from T] IntLiteral(1) [9-10] [1] PathExpression [12-13] [x] Identifier(x) [12-13] [x] - StringLiteral("a") [15-18] ["a"] + StringLiteral [15-18] ["a"] + StringLiteralComponent("a") [15-18] ["a"] BooleanLiteral(true) [20-24] [true] FunctionCall [26-30] [g(y)] PathExpression [26-27] [g] @@ -819,7 +824,8 @@ QueryStatement [0-125] [select 1 as...as `\\x54`] Alias [89-105] [as `\\U00045678`] Identifier(`\\U00045678`) [92-105] [`\\U00045678`] SelectColumn [107-125] ['\\x53' as `\\x54`] - StringLiteral('\\x53') [107-114] ['\\x53'] + StringLiteral [107-114] ['\\x53'] + StringLiteralComponent('\\x53') [107-114] ['\\x53'] Alias [115-125] [as `\\x54`] Identifier(`\\x54`) [118-125] [`\\x54`] -- @@ -840,11 +846,13 @@ QueryStatement [0-61] [select '\\...U00012345`] Select [0-61] [select '\\...U00012345`] SelectList [7-61] ['\\u1235'...U00012345`] SelectColumn [7-29] ['\\u1235' as `\\u1234`] - StringLiteral('\\u1235') [7-16] ['\\u1235'] + StringLiteral [7-16] ['\\u1235'] + StringLiteralComponent('\\u1235') [7-16] ['\\u1235'] Alias [17-29] [as `\\u1234`] Identifier(`\\u1234`) [20-29] [`\\u1234`] SelectColumn [31-61] ["\\U00012346" as `\\U00012345`] - StringLiteral("\\U00012346") [31-44] ["\\U00012346"] + StringLiteral [31-44] ["\\U00012346"] + StringLiteralComponent("\\U00012346") [31-44] ["\\U00012346"] Alias [45-61] [as `\\U00012345`] Identifier(`\\U00012345`) [48-61] [`\\U00012345`] -- @@ -909,7 +917,8 @@ QueryStatement [0-30] [select safe_cast("1" as int32)] SelectList [7-30] [safe_cast("1" as int32)] SelectColumn [7-30] [safe_cast("1" as int32)] CastExpression(return_null_on_error=true) [7-30] [safe_cast("1" as int32)] - StringLiteral("1") [17-20] ["1"] + StringLiteral [17-20] ["1"] + StringLiteralComponent("1") [17-20] ["1"] SimpleType [24-29] [int32] PathExpression [24-29] [int32] Identifier(int32) [24-29] [int32] @@ -925,7 +934,8 @@ QueryStatement [0-25] [select cast("1" as int32)] SelectList [7-25] [cast("1" as int32)] SelectColumn [7-25] [cast("1" as int32)] CastExpression [7-25] [cast("1" as int32)] - StringLiteral("1") [12-15] ["1"] + StringLiteral [12-15] ["1"] + StringLiteralComponent("1") [12-15] ["1"] SimpleType [19-24] [int32] PathExpression [19-24] [int32] Identifier(int32) [19-24] [int32] @@ -968,7 +978,8 @@ QueryStatement [0-41] [select cast...'und:ci')] PathExpression [17-23] [string] Identifier(string) [17-23] [string] Collate [24-40] [collate 'und:ci'] - StringLiteral('und:ci') [32-40] ['und:ci'] + StringLiteral [32-40] ['und:ci'] + StringLiteralComponent('und:ci') [32-40] ['und:ci'] -- SELECT CAST(x AS string COLLATE 'und:ci') @@ -988,7 +999,8 @@ QueryStatement [0-41] [select cast...'und:ci')] PathExpression [17-23] [double] Identifier(double) [17-23] [double] Collate [24-40] [collate 'und:ci'] - StringLiteral('und:ci') [32-40] ['und:ci'] + StringLiteral [32-40] ['und:ci'] + StringLiteralComponent('und:ci') [32-40] ['und:ci'] -- SELECT CAST(x AS double COLLATE 'und:ci') @@ -1011,7 +1023,8 @@ QueryStatement [0-51] [select cast...'und:ci'>)] PathExpression [26-32] [string] Identifier(string) [26-32] [string] Collate [33-49] [collate 'und:ci'] - StringLiteral('und:ci') [41-49] ['und:ci'] + StringLiteral [41-49] ['und:ci'] + StringLiteralComponent('und:ci') [41-49] ['und:ci'] -- SELECT CAST(x AS STRUCT< x string COLLATE 'und:ci' >) @@ -1034,7 +1047,8 @@ QueryStatement [0-51] [select cast...'und:ci')] PathExpression [26-32] [string] Identifier(string) [26-32] [string] Collate [34-50] [collate 'und:ci'] - StringLiteral('und:ci') [42-50] ['und:ci'] + StringLiteral [42-50] ['und:ci'] + StringLiteralComponent('und:ci') [42-50] ['und:ci'] -- SELECT CAST(x AS STRUCT< x string > COLLATE 'und:ci') @@ -1055,7 +1069,8 @@ QueryStatement [0-48] [select cast...'und:ci'>)] PathExpression [23-29] [string] Identifier(string) [23-29] [string] Collate [30-46] [collate 'und:ci'] - StringLiteral('und:ci') [38-46] ['und:ci'] + StringLiteral [38-46] ['und:ci'] + StringLiteralComponent('und:ci') [38-46] ['und:ci'] -- SELECT CAST(x AS ARRAY< string COLLATE 'und:ci' >) @@ -1076,7 +1091,8 @@ QueryStatement [0-48] [select cast...'und:ci')] PathExpression [23-29] [string] Identifier(string) [23-29] [string] Collate [31-47] [collate 'und:ci'] - StringLiteral('und:ci') [39-47] ['und:ci'] + StringLiteral [39-47] ['und:ci'] + StringLiteralComponent('und:ci') [39-47] ['und:ci'] -- SELECT CAST(x AS ARRAY< string > COLLATE 'und:ci') @@ -1189,7 +1205,8 @@ QueryStatement [0-272] [select cast...myenum`) from T] SelectList [7-265] [cast("1" as...package.myenum`)] SelectColumn [7-25] [cast("1" as int32)] CastExpression [7-25] [cast("1" as int32)] - StringLiteral("1") [12-15] ["1"] + StringLiteral [12-15] ["1"] + StringLiteralComponent("1") [12-15] ["1"] SimpleType [19-24] [int32] PathExpression [19-24] [int32] Identifier(int32) [19-24] [int32] @@ -1275,7 +1292,8 @@ QueryStatement [0-319] [select safe_cast...um`) from T] SelectList [7-312] [safe_cast(...ypackage.myenum`)] SelectColumn [7-30] [safe_cast("1" as int32)] CastExpression(return_null_on_error=true) [7-30] [safe_cast("1" as int32)] - StringLiteral("1") [17-20] ["1"] + StringLiteral [17-20] ["1"] + StringLiteralComponent("1") [17-20] ["1"] SimpleType [24-29] [int32] PathExpression [24-29] [int32] Identifier(int32) [24-29] [int32] @@ -1385,7 +1403,8 @@ QueryStatement [0-306] [select cast...bytes>) from T] Identifier(timestamp_seconds) [67-84] [timestamp_seconds] SelectColumn [94-133] [cast("1" as...cast_1_as_int64] CastExpression [94-114] [cast("1" as `int64`)] - StringLiteral("1") [99-102] ["1"] + StringLiteral [99-102] ["1"] + StringLiteralComponent("1") [99-102] ["1"] SimpleType [106-113] [`int64`] PathExpression [106-113] [`int64`] Identifier(int64) [106-113] [`int64`] @@ -1509,7 +1528,8 @@ QueryStatement [0-351] [select safe_cast...es>) from T] Identifier(timestamp_seconds) [82-99] [timestamp_seconds] SelectColumn [109-158] [safe_cast(...e_cast_1_as_int64] CastExpression(return_null_on_error=true) [109-134] [safe_cast("1" as `int64`)] - StringLiteral("1") [119-122] ["1"] + StringLiteral [119-122] ["1"] + StringLiteralComponent("1") [119-122] ["1"] SimpleType [126-133] [`int64`] PathExpression [126-133] [`int64`] Identifier(int64) [126-133] [`int64`] @@ -1660,7 +1680,8 @@ QueryStatement [0-59] [select cast...t_string) from t] SelectList [7-52] [cast('literal...format_string)] SelectColumn [7-52] [cast('literal...format_string)] CastExpression [7-52] [cast('literal...format_string)] - StringLiteral('literal') [12-21] ['literal'] + StringLiteral [12-21] ['literal'] + StringLiteralComponent('literal') [12-21] ['literal'] SimpleType [25-30] [int64] PathExpression [25-30] [int64] Identifier(int64) [25-30] [int64] @@ -1685,7 +1706,8 @@ QueryStatement [0-64] [select safe_cast...ing) from t] SelectList [7-57] [safe_cast(...format_string)] SelectColumn [7-57] [safe_cast(...format_string)] CastExpression(return_null_on_error=true) [7-57] [safe_cast(...format_string)] - StringLiteral('literal') [17-26] ['literal'] + StringLiteral [17-26] ['literal'] + StringLiteralComponent('literal') [17-26] ['literal'] SimpleType [30-35] [int64] PathExpression [30-35] [int64] Identifier(int64) [30-35] [int64] @@ -1726,7 +1748,8 @@ QueryStatement [0-88] [select cast...minute) from t] BinaryExpression(||) [59-70] [hour || ':'] PathExpression [59-63] [hour] Identifier(hour) [59-63] [hour] - StringLiteral(':') [67-70] [':'] + StringLiteral [67-70] [':'] + StringLiteralComponent(':') [67-70] [':'] PathExpression [74-80] [minute] Identifier(minute) [74-80] [minute] FromClause [82-88] [from t] @@ -1759,7 +1782,8 @@ QueryStatement [0-93] [select safe_cast...ute) from t] BinaryExpression(||) [64-75] [hour || ':'] PathExpression [64-68] [hour] Identifier(hour) [64-68] [hour] - StringLiteral(':') [72-75] [':'] + StringLiteral [72-75] [':'] + StringLiteralComponent(':') [72-75] [':'] PathExpression [79-85] [minute] Identifier(minute) [79-85] [minute] FromClause [87-93] [from t] @@ -1817,7 +1841,8 @@ QueryStatement [0-125] [select count...|') from T] Identifier(group_concat) [89-101] [group_concat] PathExpression [111-112] [x] Identifier(x) [111-112] [x] - StringLiteral('|') [114-117] ['|'] + StringLiteral [114-117] ['|'] + StringLiteralComponent('|') [114-117] ['|'] FromClause [119-125] [from T] TablePathExpression [124-125] [T] PathExpression [124-125] [T] @@ -1933,7 +1958,8 @@ QueryStatement [0-43] [select f(column...'pattern%'] PathExpression [9-26] [column.field_name] Identifier(column) [9-15] [column] Identifier(field_name) [16-26] [field_name] - StringLiteral('pattern%') [33-43] ['pattern%'] + StringLiteral [33-43] ['pattern%'] + StringLiteralComponent('pattern%') [33-43] ['pattern%'] -- SELECT f(column.field_name) LIKE 'pattern%' diff --git a/zetasql/parser/testdata/parser_errors.test b/zetasql/parser/testdata/parser_errors.test index 705b21002..f9c4c5be7 100644 --- a/zetasql/parser/testdata/parser_errors.test +++ b/zetasql/parser/testdata/parser_errors.test @@ -534,16 +534,16 @@ select count(1, *); select count(T.*); -- -ERROR: Syntax error: Expected ")" but got "." [at 1:15] +ERROR: Syntax error: Unexpected "*" [at 1:16] select count(T.*); - ^ + ^ == select count(1, T.*); -- -ERROR: Syntax error: Expected ")" but got "." [at 1:18] +ERROR: Syntax error: Unexpected "*" [at 1:19] select count(1, T.*); - ^ + ^ == select array_concat([1, 2], *); diff --git a/zetasql/parser/testdata/pivot.test b/zetasql/parser/testdata/pivot.test index 24280c19c..bc210d1a6 100644 --- a/zetasql/parser/testdata/pivot.test +++ b/zetasql/parser/testdata/pivot.test @@ -921,7 +921,8 @@ QueryStatement [0-111] [SELECT * FROM...zero, 1))] PathExpression [29-30] [t] Identifier(t) [29-30] [t] ForSystemTime [31-65] [FOR SYSTEM...2018-01-01'] - StringLiteral('2018-01-01') [53-65] ['2018-01-01'] + StringLiteral [53-65] ['2018-01-01'] + StringLiteralComponent('2018-01-01') [53-65] ['2018-01-01'] PivotClause [67-111] [PIVOT(SUM(...zero, 1))] PivotExpressionList [73-86] [SUM(a) AS sum] PivotExpression [73-86] [SUM(a) AS sum] diff --git a/zetasql/parser/testdata/proto_braced_constructors.test b/zetasql/parser/testdata/proto_braced_constructors.test new file mode 100644 index 000000000..ee57af1f2 --- /dev/null +++ b/zetasql/parser/testdata/proto_braced_constructors.test @@ -0,0 +1,1153 @@ +[default language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS] +[language_features=] +# Braced constructors disabled +SELECT NEW x {} +-- +ERROR: Braced constructors are not supported [at 2:14] +SELECT NEW x {} + ^ +== + +# No field constructor. +SELECT NEW x {} +-- +QueryStatement [0-15] [SELECT NEW x {}] + Query [0-15] [SELECT NEW x {}] + Select [0-15] [SELECT NEW x {}] + SelectList [7-15] [NEW x {}] + SelectColumn [7-15] [NEW x {}] + BracedNewConstructor [7-15] [NEW x {}] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-15] [{}] +-- +SELECT + NEW x { } +== + +# No type/field constructor. +SELECT {} +-- +QueryStatement [0-9] [SELECT {}] + Query [0-9] [SELECT {}] + Select [0-9] [SELECT {}] + SelectList [7-9] [{}] + SelectColumn [7-9] [{}] + BracedConstructor [7-9] [{}] +-- +SELECT + { } +== + +# No type/field constructor with trailing comma. +SELECT {,} +-- +ERROR: Syntax error: Unexpected "," [at 1:9] +SELECT {,} + ^ +== + +# One field with trailing comma. +SELECT {bar:3,} +-- +QueryStatement [0-15] [SELECT {bar:3,}] + Query [0-15] [SELECT {bar:3,}] + Select [0-15] [SELECT {bar:3,}] + SelectList [7-15] [{bar:3,}] + SelectColumn [7-15] [{bar:3,}] + BracedConstructor [7-15] [{bar:3,}] + BracedConstructorField [8-13] [bar:3] + Identifier(bar) [8-11] [bar] + BracedConstructorFieldValue [11-13] [:3] + IntLiteral(3) [12-13] [3] +-- +SELECT + { bar : 3 } +== + +# One field with multiple trailing commas. +SELECT {bar:3,,} +-- +ERROR: Syntax error: Unexpected "," [at 1:15] +SELECT {bar:3,,} + ^ +== + +# Simple constructor. +SELECT NEW x { foo: "blah" bar: 3 } +-- +QueryStatement [0-35] [SELECT NEW..." bar: 3 }] + Query [0-35] [SELECT NEW..." bar: 3 }] + Select [0-35] [SELECT NEW..." bar: 3 }] + SelectList [7-35] [NEW x { foo: "blah" bar: 3 }] + SelectColumn [7-35] [NEW x { foo: "blah" bar: 3 }] + BracedNewConstructor [7-35] [NEW x { foo: "blah" bar: 3 }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-35] [{ foo: "blah" bar: 3 }] + BracedConstructorField [15-26] [foo: "blah"] + Identifier(foo) [15-18] [foo] + BracedConstructorFieldValue [18-26] [: "blah"] + StringLiteral [20-26] ["blah"] + StringLiteralComponent("blah") [20-26] ["blah"] + BracedConstructorField [27-33] [bar: 3] + Identifier(bar) [27-30] [bar] + BracedConstructorFieldValue [30-33] [: 3] + IntLiteral(3) [32-33] [3] +-- +SELECT + NEW x { foo : "blah" bar : 3 } +== + +# Simple constructor with trailing comma. +SELECT NEW x { foo: "blah" bar: 3, } +-- +QueryStatement [0-36] [SELECT NEW...bar: 3, }] + Query [0-36] [SELECT NEW...bar: 3, }] + Select [0-36] [SELECT NEW...bar: 3, }] + SelectList [7-36] [NEW x { foo: "blah" bar: 3, }] + SelectColumn [7-36] [NEW x { foo: "blah" bar: 3, }] + BracedNewConstructor [7-36] [NEW x { foo: "blah" bar: 3, }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-36] [{ foo: "blah" bar: 3, }] + BracedConstructorField [15-26] [foo: "blah"] + Identifier(foo) [15-18] [foo] + BracedConstructorFieldValue [18-26] [: "blah"] + StringLiteral [20-26] ["blah"] + StringLiteralComponent("blah") [20-26] ["blah"] + BracedConstructorField [27-33] [bar: 3] + Identifier(bar) [27-30] [bar] + BracedConstructorFieldValue [30-33] [: 3] + IntLiteral(3) [32-33] [3] +-- +SELECT + NEW x { foo : "blah" bar : 3 } +== + +# Simple constructor with multiple comma. +SELECT NEW x { foo: "blah" bar: 3,, } +-- +ERROR: Syntax error: Unexpected "," [at 1:35] +SELECT NEW x { foo: "blah" bar: 3,, } + ^ +== + +# Simple constructor with comma. +SELECT NEW x { foo: "blah", bar: 3 } +-- +QueryStatement [0-36] [SELECT NEW..., bar: 3 }] + Query [0-36] [SELECT NEW..., bar: 3 }] + Select [0-36] [SELECT NEW..., bar: 3 }] + SelectList [7-36] [NEW x { foo: "blah", bar: 3 }] + SelectColumn [7-36] [NEW x { foo: "blah", bar: 3 }] + BracedNewConstructor [7-36] [NEW x { foo: "blah", bar: 3 }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-36] [{ foo: "blah", bar: 3 }] + BracedConstructorField [15-26] [foo: "blah"] + Identifier(foo) [15-18] [foo] + BracedConstructorFieldValue [18-26] [: "blah"] + StringLiteral [20-26] ["blah"] + StringLiteralComponent("blah") [20-26] ["blah"] + BracedConstructorField [28-34] [bar: 3] + Identifier(bar) [28-31] [bar] + BracedConstructorFieldValue [31-34] [: 3] + IntLiteral(3) [33-34] [3] +-- +SELECT + NEW x { foo : "blah", bar : 3 } +== + +# Simple constructor with trailing comma. +SELECT NEW x { foo: "blah", bar: 3, } +-- +QueryStatement [0-37] [SELECT NEW...bar: 3, }] + Query [0-37] [SELECT NEW...bar: 3, }] + Select [0-37] [SELECT NEW...bar: 3, }] + SelectList [7-37] [NEW x { foo: "blah", bar: 3, }] + SelectColumn [7-37] [NEW x { foo: "blah", bar: 3, }] + BracedNewConstructor [7-37] [NEW x { foo: "blah", bar: 3, }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-37] [{ foo: "blah", bar: 3, }] + BracedConstructorField [15-26] [foo: "blah"] + Identifier(foo) [15-18] [foo] + BracedConstructorFieldValue [18-26] [: "blah"] + StringLiteral [20-26] ["blah"] + StringLiteralComponent("blah") [20-26] ["blah"] + BracedConstructorField [28-34] [bar: 3] + Identifier(bar) [28-31] [bar] + BracedConstructorFieldValue [31-34] [: 3] + IntLiteral(3) [33-34] [3] +-- +SELECT + NEW x { foo : "blah", bar : 3 } +== + +# Simple constructor with multiple commas. +SELECT NEW x { foo: "blah", bar: 3,, } +-- +ERROR: Syntax error: Unexpected "," [at 1:36] +SELECT NEW x { foo: "blah", bar: 3,, } + ^ +== + +# Leading comma is disallowed. +SELECT NEW x { , foo: "blah", bar: 3 } +-- +ERROR: Syntax error: Unexpected "," [at 1:16] +SELECT NEW x { , foo: "blah", bar: 3 } + ^ +== + +# Nested message and array field. +SELECT NEW x { + foo { + monkey: "blah" + } + bar: 3 + int_array: [1,2,3] +} +-- +QueryStatement [0-77] [SELECT NEW...[1,2,3] }] + Query [0-77] [SELECT NEW...[1,2,3] }] + Select [0-77] [SELECT NEW...[1,2,3] }] + SelectList [7-77] [NEW x {...[1,2,3] }] + SelectColumn [7-77] [NEW x {...[1,2,3] }] + BracedNewConstructor [7-77] [NEW x {...[1,2,3] }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-77] [{ foo {...[1,2,3] }] + BracedConstructorField [17-45] [foo { monkey: "blah" }] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [21-45] [{ monkey: "blah" }] + BracedConstructor [21-45] [{ monkey: "blah" }] + BracedConstructorField [27-41] [monkey: "blah"] + Identifier(monkey) [27-33] [monkey] + BracedConstructorFieldValue [33-41] [: "blah"] + StringLiteral [35-41] ["blah"] + StringLiteralComponent("blah") [35-41] ["blah"] + BracedConstructorField [48-54] [bar: 3] + Identifier(bar) [48-51] [bar] + BracedConstructorFieldValue [51-54] [: 3] + IntLiteral(3) [53-54] [3] + BracedConstructorField [57-75] [int_array: [1,2,3]] + Identifier(int_array) [57-66] [int_array] + BracedConstructorFieldValue [66-75] [: [1,2,3]] + ArrayConstructor [68-75] [[1,2,3]] + IntLiteral(1) [69-70] [1] + IntLiteral(2) [71-72] [2] + IntLiteral(3) [73-74] [3] +-- +SELECT + NEW x { foo { monkey : "blah" } bar : 3 int_array : ARRAY[1, 2, 3] } +== + +# Nested message with trailing comma. +SELECT NEW x { + foo { + monkey: "blah", + }, +} +-- +QueryStatement [0-49] [SELECT NEW...blah", }, }] + Query [0-49] [SELECT NEW...blah", }, }] + Select [0-49] [SELECT NEW...blah", }, }] + SelectList [7-49] [NEW x {...blah", }, }] + SelectColumn [7-49] [NEW x {...blah", }, }] + BracedNewConstructor [7-49] [NEW x {...blah", }, }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-49] [{ foo {...blah", }, }] + BracedConstructorField [17-46] [foo { monkey: "blah", }] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [21-46] [{ monkey: "blah", }] + BracedConstructor [21-46] [{ monkey: "blah", }] + BracedConstructorField [27-41] [monkey: "blah"] + Identifier(monkey) [27-33] [monkey] + BracedConstructorFieldValue [33-41] [: "blah"] + StringLiteral [35-41] ["blah"] + StringLiteralComponent("blah") [35-41] ["blah"] +-- +SELECT + NEW x { foo { monkey : "blah" } } +== + +# Nested message (with colon) and array field. +SELECT NEW x { + foo: { + monkey: "blah" + } + bar: 3 + int_array: [1,2,3] +} +-- +QueryStatement [0-78] [SELECT NEW...[1,2,3] }] + Query [0-78] [SELECT NEW...[1,2,3] }] + Select [0-78] [SELECT NEW...[1,2,3] }] + SelectList [7-78] [NEW x {...[1,2,3] }] + SelectColumn [7-78] [NEW x {...[1,2,3] }] + BracedNewConstructor [7-78] [NEW x {...[1,2,3] }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-78] [{ foo: {...[1,2,3] }] + BracedConstructorField [17-46] [foo: { monkey: "blah" }] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-46] [: { monkey: "blah" }] + BracedConstructor [22-46] [{ monkey: "blah" }] + BracedConstructorField [28-42] [monkey: "blah"] + Identifier(monkey) [28-34] [monkey] + BracedConstructorFieldValue [34-42] [: "blah"] + StringLiteral [36-42] ["blah"] + StringLiteralComponent("blah") [36-42] ["blah"] + BracedConstructorField [49-55] [bar: 3] + Identifier(bar) [49-52] [bar] + BracedConstructorFieldValue [52-55] [: 3] + IntLiteral(3) [54-55] [3] + BracedConstructorField [58-76] [int_array: [1,2,3]] + Identifier(int_array) [58-67] [int_array] + BracedConstructorFieldValue [67-76] [: [1,2,3]] + ArrayConstructor [69-76] [[1,2,3]] + IntLiteral(1) [70-71] [1] + IntLiteral(2) [72-73] [2] + IntLiteral(3) [74-75] [3] +-- +SELECT + NEW x { foo : { monkey : "blah" } bar : 3 int_array : ARRAY[1, 2, 3] } +== + +# Sub-message array. +SELECT NEW x { + int_field: 1 + submessage_array: [{ + monkey: "blah" + }, { + baz: "abc" + }] +} +-- +QueryStatement [0-100] [SELECT NEW...abc" }] }] + Query [0-100] [SELECT NEW...abc" }] }] + Select [0-100] [SELECT NEW...abc" }] }] + SelectList [7-100] [NEW x {...abc" }] }] + SelectColumn [7-100] [NEW x {...abc" }] }] + BracedNewConstructor [7-100] [NEW x {...abc" }] }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-100] [{ int_field...abc" }] }] + BracedConstructorField [17-29] [int_field: 1] + Identifier(int_field) [17-26] [int_field] + BracedConstructorFieldValue [26-29] [: 1] + IntLiteral(1) [28-29] [1] + BracedConstructorField [32-98] [submessage_array..."abc" }]] + Identifier(submessage_array) [32-48] [submessage_array] + BracedConstructorFieldValue [48-98] [: [{ monkey..."abc" }]] + ArrayConstructor [50-98] [[{ monkey..."abc" }]] + BracedConstructor [51-75] [{ monkey: "blah" }] + BracedConstructorField [57-71] [monkey: "blah"] + Identifier(monkey) [57-63] [monkey] + BracedConstructorFieldValue [63-71] [: "blah"] + StringLiteral [65-71] ["blah"] + StringLiteralComponent("blah") [65-71] ["blah"] + BracedConstructor [77-97] [{ baz: "abc" }] + BracedConstructorField [83-93] [baz: "abc"] + Identifier(baz) [83-86] [baz] + BracedConstructorFieldValue [86-93] [: "abc"] + StringLiteral [88-93] ["abc"] + StringLiteralComponent("abc") [88-93] ["abc"] +-- +SELECT + NEW x { int_field : 1 submessage_array : ARRAY[{ monkey : "blah" }, { baz : "abc" }] } +== + +# At parse-time map fields are just like repeated sub-message fields. +SELECT NEW x { + int_field: 1 + map_field: [{ + key: "blah" + value: 1 + }, { + key: "abc" + value: 2 + }] +} +-- +QueryStatement [0-116] [SELECT NEW...: 2 }] }] + Query [0-116] [SELECT NEW...: 2 }] }] + Select [0-116] [SELECT NEW...: 2 }] }] + SelectList [7-116] [NEW x {...: 2 }] }] + SelectColumn [7-116] [NEW x {...: 2 }] }] + BracedNewConstructor [7-116] [NEW x {...: 2 }] }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-116] [{ int_field...: 2 }] }] + BracedConstructorField [17-29] [int_field: 1] + Identifier(int_field) [17-26] [int_field] + BracedConstructorFieldValue [26-29] [: 1] + IntLiteral(1) [28-29] [1] + BracedConstructorField [32-114] [map_field:...value: 2 }]] + Identifier(map_field) [32-41] [map_field] + BracedConstructorFieldValue [41-114] [: [{ key...value: 2 }]] + ArrayConstructor [43-114] [[{ key...value: 2 }]] + BracedConstructor [44-78] [{ key:...value: 1 }] + BracedConstructorField [50-61] [key: "blah"] + Identifier(key) [50-53] [key] + BracedConstructorFieldValue [53-61] [: "blah"] + StringLiteral [55-61] ["blah"] + StringLiteralComponent("blah") [55-61] ["blah"] + BracedConstructorField [66-74] [value: 1] + Identifier(value) [66-71] [value] + BracedConstructorFieldValue [71-74] [: 1] + IntLiteral(1) [73-74] [1] + BracedConstructor [80-113] [{ key:...value: 2 }] + BracedConstructorField [86-96] [key: "abc"] + Identifier(key) [86-89] [key] + BracedConstructorFieldValue [89-96] [: "abc"] + StringLiteral [91-96] ["abc"] + StringLiteralComponent("abc") [91-96] ["abc"] + BracedConstructorField [101-109] [value: 2] + Identifier(value) [101-106] [value] + BracedConstructorFieldValue [106-109] [: 2] + IntLiteral(2) [108-109] [2] +-- +SELECT + NEW x { int_field : 1 map_field : ARRAY[{ key : "blah" value : 1 }, { key : "abc" value : 2 }] } +== + +# Plain extension. +SELECT NEW x { + (path.to.extension) { + value: 1 + } +} +-- +QueryStatement [0-57] [SELECT NEW...value: 1 } }] + Query [0-57] [SELECT NEW...value: 1 } }] + Select [0-57] [SELECT NEW...value: 1 } }] + SelectList [7-57] [NEW x {...value: 1 } }] + SelectColumn [7-57] [NEW x {...value: 1 } }] + BracedNewConstructor [7-57] [NEW x {...value: 1 } }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-57] [{ (path....value: 1 } }] + BracedConstructorField [17-55] [(path.to.extensio...lue: 1 }] + PathExpression [18-35] [path.to.extension] + Identifier(path) [18-22] [path] + Identifier(`to`) [23-25] [to] + Identifier(extension) [26-35] [extension] + BracedConstructorFieldValue [37-55] [{ value: 1 }] + BracedConstructor [37-55] [{ value: 1 }] + BracedConstructorField [43-51] [value: 1] + Identifier(value) [43-48] [value] + BracedConstructorFieldValue [48-51] [: 1] + IntLiteral(1) [50-51] [1] +-- +SELECT + NEW x {(path.`to`.extension) { value : 1 } } +== + +# Extension without parenthesis is an error. +SELECT NEW x { path.to.extension: 1 } +-- +ERROR: Syntax error: Expected ":" or "{" but got "." [at 1:20] +SELECT NEW x { path.to.extension: 1 } + ^ +== + +# Extension with fields. +SELECT NEW x { + foo: "bar", + (path.to.extension) { + value: 1 + } + baz: 1 +} +-- +QueryStatement [0-80] [SELECT NEW...baz: 1 }] + Query [0-80] [SELECT NEW...baz: 1 }] + Select [0-80] [SELECT NEW...baz: 1 }] + SelectList [7-80] [NEW x {...baz: 1 }] + SelectColumn [7-80] [NEW x {...baz: 1 }] + BracedNewConstructor [7-80] [NEW x {...baz: 1 }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-80] [{ foo: "...baz: 1 }] + BracedConstructorField [17-27] [foo: "bar"] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-27] [: "bar"] + StringLiteral [22-27] ["bar"] + StringLiteralComponent("bar") [22-27] ["bar"] + BracedConstructorField [31-69] [(path.to.extensio...lue: 1 }] + PathExpression [32-49] [path.to.extension] + Identifier(path) [32-36] [path] + Identifier(`to`) [37-39] [to] + Identifier(extension) [40-49] [extension] + BracedConstructorFieldValue [51-69] [{ value: 1 }] + BracedConstructor [51-69] [{ value: 1 }] + BracedConstructorField [57-65] [value: 1] + Identifier(value) [57-62] [value] + BracedConstructorFieldValue [62-65] [: 1] + IntLiteral(1) [64-65] [1] + BracedConstructorField [72-78] [baz: 1] + Identifier(baz) [72-75] [baz] + BracedConstructorFieldValue [75-78] [: 1] + IntLiteral(1) [77-78] [1] +-- +SELECT + NEW x { foo : "bar", (path.`to`.extension) { value : 1 } baz : 1 } +== + +# If an extension (not the first field) does not have a preceding comma it +# is an error. +# TODO: Try to make this error message more intuitive. +SELECT NEW x { + foo: "bar" + (path.to.extension) { + value: 1 + } + baz: 1 +} +-- +ERROR: Syntax error: Function call cannot be applied to this expression. Function calls require a path, e.g. a.b.c() [at 3:3] + (path.to.extension) { + ^ +== + +# If there is an expression, there is a different error. +# TODO: Try to make this error message more intuitive. +# is an error. +SELECT NEW x { + foo: column + (path.to.extension) { + value: 1 + } + baz: 1 +} +-- +ERROR: Syntax error: Unexpected "{" [at 3:23] + (path.to.extension) { + ^ +== + +# This is a weird case where a user is trying to build a proto but forgot the +# value for the extension. This parses out to be valid. +SELECT NEW x { + foo: column + (path.to.extension) +} +-- +QueryStatement [0-52] [SELECT NEW...extension) }] + Query [0-52] [SELECT NEW...extension) }] + Select [0-52] [SELECT NEW...extension) }] + SelectList [7-52] [NEW x {...extension) }] + SelectColumn [7-52] [NEW x {...extension) }] + BracedNewConstructor [7-52] [NEW x {...extension) }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-52] [{ foo: column...extension) }] + BracedConstructorField [17-50] [foo: column...extension)] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-50] [: column (path.to.extension)] + FunctionCall [22-50] [column (path.to.extension)] + PathExpression [22-28] [column] + Identifier(column) [22-28] [column] + PathExpression [32-49] [path.to.extension] + Identifier(path) [32-36] [path] + Identifier(`to`) [37-39] [to] + Identifier(extension) [40-49] [extension] +-- +SELECT + NEW x { foo : column(path.`to`.extension) } +== + +# Mixed constructor where the inner sub-message's constructor is typed. +SELECT NEW x { + foo: NEW y { + monkey: "blah" + } + bar: 3 +} +-- +QueryStatement [0-63] [SELECT NEW...bar: 3 }] + Query [0-63] [SELECT NEW...bar: 3 }] + Select [0-63] [SELECT NEW...bar: 3 }] + SelectList [7-63] [NEW x {...bar: 3 }] + SelectColumn [7-63] [NEW x {...bar: 3 }] + BracedNewConstructor [7-63] [NEW x {...bar: 3 }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-63] [{ foo: NEW...bar: 3 }] + BracedConstructorField [17-52] [foo: NEW y..."blah" }] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-52] [: NEW y {..."blah" }] + BracedNewConstructor [22-52] [NEW y { monkey: "blah" }] + SimpleType [26-27] [y] + PathExpression [26-27] [y] + Identifier(y) [26-27] [y] + BracedConstructor [28-52] [{ monkey: "blah" }] + BracedConstructorField [34-48] [monkey: "blah"] + Identifier(monkey) [34-40] [monkey] + BracedConstructorFieldValue [40-48] [: "blah"] + StringLiteral [42-48] ["blah"] + StringLiteralComponent("blah") [42-48] ["blah"] + BracedConstructorField [55-61] [bar: 3] + Identifier(bar) [55-58] [bar] + BracedConstructorFieldValue [58-61] [: 3] + IntLiteral(3) [60-61] [3] +-- +SELECT + NEW x { foo : NEW y { monkey : "blah" } bar : 3 } +== + +# Simple expressions +SELECT NEW x { + foo: (3 + 5), + (bar.baz) { + monkey: "blah" + } +} +-- +QueryStatement [0-69] [SELECT NEW...blah" } }] + Query [0-69] [SELECT NEW...blah" } }] + Select [0-69] [SELECT NEW...blah" } }] + SelectList [7-69] [NEW x {...blah" } }] + SelectColumn [7-69] [NEW x {...blah" } }] + BracedNewConstructor [7-69] [NEW x {...blah" } }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-69] [{ foo: (...blah" } }] + BracedConstructorField [17-29] [foo: (3 + 5)] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-29] [: (3 + 5)] + BinaryExpression(+) [23-28] [3 + 5] + IntLiteral(3) [23-24] [3] + IntLiteral(5) [27-28] [5] + BracedConstructorField [33-67] [(bar.baz)..."blah" }] + PathExpression [34-41] [bar.baz] + Identifier(bar) [34-37] [bar] + Identifier(baz) [38-41] [baz] + BracedConstructorFieldValue [43-67] [{ monkey: "blah" }] + BracedConstructor [43-67] [{ monkey: "blah" }] + BracedConstructorField [49-63] [monkey: "blah"] + Identifier(monkey) [49-55] [monkey] + BracedConstructorFieldValue [55-63] [: "blah"] + StringLiteral [57-63] ["blah"] + StringLiteralComponent("blah") [57-63] ["blah"] +-- +SELECT + NEW x { foo : (3 + 5), (bar.baz) { monkey : "blah" } } +== + +SELECT NEW x { + foo: (SELECT t.* FROM t WHERE t.a = 1), + (bar.baz) { + monkey: "blah" + } +} +-- +QueryStatement [0-95] [SELECT NEW...blah" } }] + Query [0-95] [SELECT NEW...blah" } }] + Select [0-95] [SELECT NEW...blah" } }] + SelectList [7-95] [NEW x {...blah" } }] + SelectColumn [7-95] [NEW x {...blah" } }] + BracedNewConstructor [7-95] [NEW x {...blah" } }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-95] [{ foo: (...blah" } }] + BracedConstructorField [17-55] [foo: (SELECT...WHERE t.a = 1)] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-55] [: (SELECT...WHERE t.a = 1)] + ExpressionSubquery [22-55] [(SELECT t....WHERE t.a = 1)] + Query [23-54] [SELECT t.*...WHERE t.a = 1] + Select [23-54] [SELECT t.*...WHERE t.a = 1] + SelectList [30-33] [t.*] + SelectColumn [30-33] [t.*] + DotStar [30-33] [t.*] + PathExpression [30-31] [t] + Identifier(t) [30-31] [t] + FromClause [34-40] [FROM t] + TablePathExpression [39-40] [t] + PathExpression [39-40] [t] + Identifier(t) [39-40] [t] + WhereClause [41-54] [WHERE t.a = 1] + BinaryExpression(=) [47-54] [t.a = 1] + PathExpression [47-50] [t.a] + Identifier(t) [47-48] [t] + Identifier(a) [49-50] [a] + IntLiteral(1) [53-54] [1] + BracedConstructorField [59-93] [(bar.baz)..."blah" }] + PathExpression [60-67] [bar.baz] + Identifier(bar) [60-63] [bar] + Identifier(baz) [64-67] [baz] + BracedConstructorFieldValue [69-93] [{ monkey: "blah" }] + BracedConstructor [69-93] [{ monkey: "blah" }] + BracedConstructorField [75-89] [monkey: "blah"] + Identifier(monkey) [75-81] [monkey] + BracedConstructorFieldValue [81-89] [: "blah"] + StringLiteral [83-89] ["blah"] + StringLiteralComponent("blah") [83-89] ["blah"] +-- +SELECT + NEW x { foo : ( + SELECT + t.* + FROM + t + WHERE + t.a = 1 + ), (bar.baz) { monkey : "blah" } } +== + +SELECT NEW x { + foo: 3 + 5, + (bar.baz) { + monkey: "blah" + } +} +-- +QueryStatement [0-67] [SELECT NEW...blah" } }] + Query [0-67] [SELECT NEW...blah" } }] + Select [0-67] [SELECT NEW...blah" } }] + SelectList [7-67] [NEW x {...blah" } }] + SelectColumn [7-67] [NEW x {...blah" } }] + BracedNewConstructor [7-67] [NEW x {...blah" } }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-67] [{ foo: 3...blah" } }] + BracedConstructorField [17-27] [foo: 3 + 5] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-27] [: 3 + 5] + BinaryExpression(+) [22-27] [3 + 5] + IntLiteral(3) [22-23] [3] + IntLiteral(5) [26-27] [5] + BracedConstructorField [31-65] [(bar.baz)..."blah" }] + PathExpression [32-39] [bar.baz] + Identifier(bar) [32-35] [bar] + Identifier(baz) [36-39] [baz] + BracedConstructorFieldValue [41-65] [{ monkey: "blah" }] + BracedConstructor [41-65] [{ monkey: "blah" }] + BracedConstructorField [47-61] [monkey: "blah"] + Identifier(monkey) [47-53] [monkey] + BracedConstructorFieldValue [53-61] [: "blah"] + StringLiteral [55-61] ["blah"] + StringLiteralComponent("blah") [55-61] ["blah"] +-- +SELECT + NEW x { foo : 3 + 5, (bar.baz) { monkey : "blah" } } +== + +# Function expression. +SELECT NEW x { + foo: CONCAT("foo", "bar"), + (bar.baz) { + monkey: "blah" + } +} +-- +QueryStatement [0-82] [SELECT NEW...blah" } }] + Query [0-82] [SELECT NEW...blah" } }] + Select [0-82] [SELECT NEW...blah" } }] + SelectList [7-82] [NEW x {...blah" } }] + SelectColumn [7-82] [NEW x {...blah" } }] + BracedNewConstructor [7-82] [NEW x {...blah" } }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-82] [{ foo: CONCAT...blah" } }] + BracedConstructorField [17-42] [foo: CONCAT("foo", "bar")] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-42] [: CONCAT("foo", "bar")] + FunctionCall [22-42] [CONCAT("foo", "bar")] + PathExpression [22-28] [CONCAT] + Identifier(CONCAT) [22-28] [CONCAT] + StringLiteral [29-34] ["foo"] + StringLiteralComponent("foo") [29-34] ["foo"] + StringLiteral [36-41] ["bar"] + StringLiteralComponent("bar") [36-41] ["bar"] + BracedConstructorField [46-80] [(bar.baz)..."blah" }] + PathExpression [47-54] [bar.baz] + Identifier(bar) [47-50] [bar] + Identifier(baz) [51-54] [baz] + BracedConstructorFieldValue [56-80] [{ monkey: "blah" }] + BracedConstructor [56-80] [{ monkey: "blah" }] + BracedConstructorField [62-76] [monkey: "blah"] + Identifier(monkey) [62-68] [monkey] + BracedConstructorFieldValue [68-76] [: "blah"] + StringLiteral [70-76] ["blah"] + StringLiteralComponent("blah") [70-76] ["blah"] +-- +SELECT + NEW x { foo : CONCAT("foo", "bar"), (bar.baz) { monkey : "blah" } } +== + +# Aggregation expression. +SELECT NEW x { + foo: (SELECT count(*) FROM table_foo) +} +-- +QueryStatement [0-56] [SELECT NEW...table_foo) }] + Query [0-56] [SELECT NEW...table_foo) }] + Select [0-56] [SELECT NEW...table_foo) }] + SelectList [7-56] [NEW x {...table_foo) }] + SelectColumn [7-56] [NEW x {...table_foo) }] + BracedNewConstructor [7-56] [NEW x {...table_foo) }] + SimpleType [11-12] [x] + PathExpression [11-12] [x] + Identifier(x) [11-12] [x] + BracedConstructor [13-56] [{ foo: (...table_foo) }] + BracedConstructorField [17-54] [foo: (SELECT...table_foo)] + Identifier(foo) [17-20] [foo] + BracedConstructorFieldValue [20-54] [: (SELECT...table_foo)] + ExpressionSubquery [22-54] [(SELECT count...table_foo)] + Query [23-53] [SELECT count(*) FROM table_foo] + Select [23-53] [SELECT count(*) FROM table_foo] + SelectList [30-38] [count(*)] + SelectColumn [30-38] [count(*)] + FunctionCall [30-38] [count(*)] + PathExpression [30-35] [count] + Identifier(count) [30-35] [count] + Star(*) [36-37] [*] + FromClause [39-53] [FROM table_foo] + TablePathExpression [44-53] [table_foo] + PathExpression [44-53] [table_foo] + Identifier(table_foo) [44-53] [table_foo] +-- +SELECT + NEW x { foo : ( + SELECT + count(*) + FROM + table_foo + ) } +== + +# Untyped constructor. +UPDATE t SET x = { foo: "blah" bar: 3 } +-- +UpdateStatement [0-39] [UPDATE t SET..." bar: 3 }] + PathExpression [7-8] [t] + Identifier(t) [7-8] [t] + UpdateItemList [13-39] [x = { foo: "blah" bar: 3 }] + UpdateItem [13-39] [x = { foo: "blah" bar: 3 }] + UpdateSetValue [13-39] [x = { foo: "blah" bar: 3 }] + PathExpression [13-14] [x] + Identifier(x) [13-14] [x] + BracedConstructor [17-39] [{ foo: "blah" bar: 3 }] + BracedConstructorField [19-30] [foo: "blah"] + Identifier(foo) [19-22] [foo] + BracedConstructorFieldValue [22-30] [: "blah"] + StringLiteral [24-30] ["blah"] + StringLiteralComponent("blah") [24-30] ["blah"] + BracedConstructorField [31-37] [bar: 3] + Identifier(bar) [31-34] [bar] + BracedConstructorFieldValue [34-37] [: 3] + IntLiteral(3) [36-37] [3] +-- +UPDATE t +SET + x = { foo : "blah" bar : 3 } +== + +# Array constructor. +UPDATE t SET arr = ARRAY[{ value: "bar" }, { value: "baz" }] +-- +UpdateStatement [0-71] [UPDATE t SET...: "baz" }]] + PathExpression [7-8] [t] + Identifier(t) [7-8] [t] + UpdateItemList [13-71] [arr = ARRAY...: "baz" }]] + UpdateItem [13-71] [arr = ARRAY...: "baz" }]] + UpdateSetValue [13-71] [arr = ARRAY...: "baz" }]] + PathExpression [13-16] [arr] + Identifier(arr) [13-16] [arr] + ArrayConstructor [19-71] [ARRAY] + SimpleType [25-34] [ProtoType] + PathExpression [25-34] [ProtoType] + Identifier(ProtoType) [25-34] [ProtoType] + BracedConstructor [36-52] [{ value: "bar" }] + BracedConstructorField [38-50] [value: "bar"] + Identifier(value) [38-43] [value] + BracedConstructorFieldValue [43-50] [: "bar"] + StringLiteral [45-50] ["bar"] + StringLiteralComponent("bar") [45-50] ["bar"] + BracedConstructor [54-70] [{ value: "baz" }] + BracedConstructorField [56-68] [value: "baz"] + Identifier(value) [56-61] [value] + BracedConstructorFieldValue [61-68] [: "baz"] + StringLiteral [63-68] ["baz"] + StringLiteralComponent("baz") [63-68] ["baz"] +-- +UPDATE t +SET + arr = ARRAY< ProtoType >[{ value : "bar" }, { value : "baz" }] +== + +# Untyped array constructor. +UPDATE t SET arr = [{ value: "bar" }, { value: "baz" }] +-- +UpdateStatement [0-55] [UPDATE t SET...: "baz" }]] + PathExpression [7-8] [t] + Identifier(t) [7-8] [t] + UpdateItemList [13-55] [arr = [{ value...: "baz" }]] + UpdateItem [13-55] [arr = [{ value...: "baz" }]] + UpdateSetValue [13-55] [arr = [{ value...: "baz" }]] + PathExpression [13-16] [arr] + Identifier(arr) [13-16] [arr] + ArrayConstructor [19-55] [[{ value:...: "baz" }]] + BracedConstructor [20-36] [{ value: "bar" }] + BracedConstructorField [22-34] [value: "bar"] + Identifier(value) [22-27] [value] + BracedConstructorFieldValue [27-34] [: "bar"] + StringLiteral [29-34] ["bar"] + StringLiteralComponent("bar") [29-34] ["bar"] + BracedConstructor [38-54] [{ value: "baz" }] + BracedConstructorField [40-52] [value: "baz"] + Identifier(value) [40-45] [value] + BracedConstructorFieldValue [45-52] [: "baz"] + StringLiteral [47-52] ["baz"] + StringLiteralComponent("baz") [47-52] ["baz"] +-- +UPDATE t +SET + arr = ARRAY[{ value : "bar" }, { value : "baz" }] +== + +# Struct constructor. +UPDATE t SET str = STRUCT({ value: "bar" }, 1) +-- +UpdateStatement [0-64] [UPDATE t SET...bar" }, 1)] + PathExpression [7-8] [t] + Identifier(t) [7-8] [t] + UpdateItemList [13-64] [str = STRUCT...bar" }, 1)] + UpdateItem [13-64] [str = STRUCT...bar" }, 1)] + UpdateSetValue [13-64] [str = STRUCT...bar" }, 1)] + PathExpression [13-16] [str] + Identifier(str) [13-16] [str] + StructConstructorWithKeyword [19-64] [STRUCT] + StructField [26-35] [ProtoType] + SimpleType [26-35] [ProtoType] + PathExpression [26-35] [ProtoType] + Identifier(ProtoType) [26-35] [ProtoType] + StructField [37-42] [INT64] + SimpleType [37-42] [INT64] + PathExpression [37-42] [INT64] + Identifier(INT64) [37-42] [INT64] + StructConstructorArg [44-60] [{ value: "bar" }] + BracedConstructor [44-60] [{ value: "bar" }] + BracedConstructorField [46-58] [value: "bar"] + Identifier(value) [46-51] [value] + BracedConstructorFieldValue [51-58] [: "bar"] + StringLiteral [53-58] ["bar"] + StringLiteralComponent("bar") [53-58] ["bar"] + StructConstructorArg [62-63] [1] + IntLiteral(1) [62-63] [1] +-- +UPDATE t +SET + str = STRUCT< ProtoType, INT64 > ({ value : "bar" }, 1) +== + +# Untyped struct constructor. +UPDATE t SET str = ({ value: "bar" }, 1) +-- +UpdateStatement [0-40] [UPDATE t SET...bar" }, 1)] + PathExpression [7-8] [t] + Identifier(t) [7-8] [t] + UpdateItemList [13-40] [str = ({ value: "bar" }, 1)] + UpdateItem [13-40] [str = ({ value: "bar" }, 1)] + UpdateSetValue [13-40] [str = ({ value: "bar" }, 1)] + PathExpression [13-16] [str] + Identifier(str) [13-16] [str] + StructConstructorWithParens [19-40] [({ value: "bar" }, 1)] + BracedConstructor [20-36] [{ value: "bar" }] + BracedConstructorField [22-34] [value: "bar"] + Identifier(value) [22-27] [value] + BracedConstructorFieldValue [27-34] [: "bar"] + StringLiteral [29-34] ["bar"] + StringLiteralComponent("bar") [29-34] ["bar"] + IntLiteral(1) [38-39] [1] +-- +UPDATE t +SET + str = ({ value : "bar" }, 1) +== + +# Nested struct constructor (inner is has keyword since it is a single +# expression. +UPDATE t SET str = STRUCT>({ value: "bar" }, STRUCT({value: "foo"})) +-- +UpdateStatement [0-97] [UPDATE t SET...: "foo"}))] + PathExpression [7-8] [t] + Identifier(t) [7-8] [t] + UpdateItemList [13-97] [str = STRUCT...: "foo"}))] + UpdateItem [13-97] [str = STRUCT...: "foo"}))] + UpdateSetValue [13-97] [str = STRUCT...: "foo"}))] + PathExpression [13-16] [str] + Identifier(str) [13-16] [str] + StructConstructorWithKeyword [19-97] [STRUCT>] + StructField [26-35] [ProtoType] + SimpleType [26-35] [ProtoType] + PathExpression [26-35] [ProtoType] + Identifier(ProtoType) [26-35] [ProtoType] + StructField [37-54] [STRUCT] + StructType [37-54] [STRUCT] + StructField [44-53] [ProtoType] + SimpleType [44-53] [ProtoType] + PathExpression [44-53] [ProtoType] + Identifier(ProtoType) [44-53] [ProtoType] + StructConstructorArg [56-72] [{ value: "bar" }] + BracedConstructor [56-72] [{ value: "bar" }] + BracedConstructorField [58-70] [value: "bar"] + Identifier(value) [58-63] [value] + BracedConstructorFieldValue [63-70] [: "bar"] + StringLiteral [65-70] ["bar"] + StringLiteralComponent("bar") [65-70] ["bar"] + StructConstructorArg [74-96] [STRUCT({value: "foo"})] + StructConstructorWithKeyword [74-96] [STRUCT({value: "foo"})] + StructConstructorArg [81-95] [{value: "foo"}] + BracedConstructor [81-95] [{value: "foo"}] + BracedConstructorField [82-94] [value: "foo"] + Identifier(value) [82-87] [value] + BracedConstructorFieldValue [87-94] [: "foo"] + StringLiteral [89-94] ["foo"] + StringLiteralComponent("foo") [89-94] ["foo"] +-- +UPDATE t +SET + str = STRUCT< ProtoType, STRUCT< ProtoType > > ({ value : "bar" }, STRUCT({ value : "foo" })) + +== + +# Nested struct constructor (inner constructor is tuple syntax). +UPDATE t SET str = STRUCT>({ value: "bar" }, ({value: "foo"}, 1)) +-- +UpdateStatement [0-101] [UPDATE t SET...foo"}, 1))] + PathExpression [7-8] [t] + Identifier(t) [7-8] [t] + UpdateItemList [13-101] [str = STRUCT...foo"}, 1))] + UpdateItem [13-101] [str = STRUCT...foo"}, 1))] + UpdateSetValue [13-101] [str = STRUCT...foo"}, 1))] + PathExpression [13-16] [str] + Identifier(str) [13-16] [str] + StructConstructorWithKeyword [19-101] [STRUCT>] + StructField [26-35] [ProtoType] + SimpleType [26-35] [ProtoType] + PathExpression [26-35] [ProtoType] + Identifier(ProtoType) [26-35] [ProtoType] + StructField [37-61] [STRUCT] + StructType [37-61] [STRUCT] + StructField [44-53] [ProtoType] + SimpleType [44-53] [ProtoType] + PathExpression [44-53] [ProtoType] + Identifier(ProtoType) [44-53] [ProtoType] + StructField [55-60] [INT64] + SimpleType [55-60] [INT64] + PathExpression [55-60] [INT64] + Identifier(INT64) [55-60] [INT64] + StructConstructorArg [63-79] [{ value: "bar" }] + BracedConstructor [63-79] [{ value: "bar" }] + BracedConstructorField [65-77] [value: "bar"] + Identifier(value) [65-70] [value] + BracedConstructorFieldValue [70-77] [: "bar"] + StringLiteral [72-77] ["bar"] + StringLiteralComponent("bar") [72-77] ["bar"] + StructConstructorArg [81-100] [({value: "foo"}, 1)] + StructConstructorWithParens [81-100] [({value: "foo"}, 1)] + BracedConstructor [82-96] [{value: "foo"}] + BracedConstructorField [83-95] [value: "foo"] + Identifier(value) [83-88] [value] + BracedConstructorFieldValue [88-95] [: "foo"] + StringLiteral [90-95] ["foo"] + StringLiteralComponent("foo") [90-95] ["foo"] + IntLiteral(1) [98-99] [1] +-- +UPDATE t +SET + str = STRUCT< ProtoType, STRUCT< ProtoType, INT64 > > ({ value : "bar" }, ({ value : "foo" }, 1)) + +== + +# No trailing "}". +SELECT NEW ProtoType {foobar: 5 +-- +ERROR: Syntax error: Unexpected end of statement [at 1:32] +SELECT NEW ProtoType {foobar: 5 + ^ +== + +# ARRAY keyword not allowed. +SELECT NEW ARRAY{} +-- +ERROR: Syntax error: Unexpected keyword ARRAY [at 1:12] +SELECT NEW ARRAY{} + ^ +== + +# STRUCT keyword not allowed. +SELECT NEW struct {1, 2} +-- +ERROR: Syntax error: Unexpected keyword STRUCT [at 1:12] +SELECT NEW struct {1, 2} + ^ +== + +# No field name. +SELECT NEW ProtoType {1} +-- +ERROR: Syntax error: Unexpected integer literal "1" [at 1:23] +SELECT NEW ProtoType {1} + ^ +== + +# No field value. +SELECT NEW ProtoType {foobar} +-- +ERROR: Syntax error: Expected ":" or "{" but got "}" [at 1:29] +SELECT NEW ProtoType {foobar} + ^ +== + +# b/262795394 - Value starting with label token keyword 'FOR' parses correctly. +SELECT NEW package.Message { field: FORK() } +-- +QueryStatement [0-44] [SELECT NEW...: FORK() }] + Query [0-44] [SELECT NEW...: FORK() }] + Select [0-44] [SELECT NEW...: FORK() }] + SelectList [7-44] [NEW package...: FORK() }] + SelectColumn [7-44] [NEW package...: FORK() }] + BracedNewConstructor [7-44] [NEW package...: FORK() }] + SimpleType [11-26] [package.Message] + PathExpression [11-26] [package.Message] + Identifier(package) [11-18] [package] + Identifier(Message) [19-26] [Message] + BracedConstructor [27-44] [{ field: FORK() }] + BracedConstructorField [29-42] [field: FORK()] + Identifier(field) [29-34] [field] + BracedConstructorFieldValue [34-42] [: FORK()] + FunctionCall [36-42] [FORK()] + PathExpression [36-40] [FORK] + Identifier(FORK) [36-40] [FORK] +-- +SELECT + NEW package.Message { field : FORK() } diff --git a/zetasql/parser/testdata/raise.test b/zetasql/parser/testdata/raise.test index 9f1e3bece..b06a0fca0 100644 --- a/zetasql/parser/testdata/raise.test +++ b/zetasql/parser/testdata/raise.test @@ -6,13 +6,13 @@ END; -- ALTERNATION GROUP: raise -- -Script [0-44] [BEGIN EXCEPTION...raise; END;] - StatementList [0-44] [BEGIN EXCEPTION...raise; END;] +Script [0-43] [BEGIN EXCEPTION...raise; END;] + StatementList [0-43] [BEGIN EXCEPTION...raise; END;] BeginEndBlock [0-42] [BEGIN EXCEPTION...raise; END] StatementList [5-5] [] - ExceptionHandlerList [6-39] [EXCEPTION...THEN raise;] - ExceptionHandler [16-39] [WHEN ERROR THEN raise;] - StatementList [32-39] [raise;] + ExceptionHandlerList [6-38] [EXCEPTION...THEN raise;] + ExceptionHandler [16-38] [WHEN ERROR THEN raise;] + StatementList [32-38] [raise;] RaiseStatement [32-37] [raise] -- BEGIN @@ -23,13 +23,13 @@ END -- ALTERNATION GROUP: RAISE -- -Script [0-44] [BEGIN EXCEPTION...RAISE; END;] - StatementList [0-44] [BEGIN EXCEPTION...RAISE; END;] +Script [0-43] [BEGIN EXCEPTION...RAISE; END;] + StatementList [0-43] [BEGIN EXCEPTION...RAISE; END;] BeginEndBlock [0-42] [BEGIN EXCEPTION...RAISE; END] StatementList [5-5] [] - ExceptionHandlerList [6-39] [EXCEPTION...THEN RAISE;] - ExceptionHandler [16-39] [WHEN ERROR THEN RAISE;] - StatementList [32-39] [RAISE;] + ExceptionHandlerList [6-38] [EXCEPTION...THEN RAISE;] + ExceptionHandler [16-38] [WHEN ERROR THEN RAISE;] + StatementList [32-38] [RAISE;] RaiseStatement [32-37] [RAISE] -- BEGIN @@ -44,8 +44,8 @@ RAISE{{;|}} -- ALTERNATION GROUP: ; -- -Script [0-7] [RAISE;] - StatementList [0-7] [RAISE;] +Script [0-6] [RAISE;] + StatementList [0-6] [RAISE;] RaiseStatement [0-5] [RAISE] -- RAISE ; @@ -63,8 +63,8 @@ RAISE ; BEGIN EXCEPTION WHEN ERROR THEN END; RAISE; -- -Script [0-44] [BEGIN EXCEPTION...END; RAISE;] - StatementList [0-44] [BEGIN EXCEPTION...END; RAISE;] +Script [0-43] [BEGIN EXCEPTION...END; RAISE;] + StatementList [0-43] [BEGIN EXCEPTION...END; RAISE;] BeginEndBlock [0-35] [BEGIN EXCEPTION...ROR THEN END] StatementList [5-5] [] ExceptionHandlerList [6-31] [EXCEPTION WHEN ERROR THEN] @@ -88,18 +88,18 @@ BEGIN EXCEPTION WHEN ERROR THEN RAISE; END; -- -Script [0-98] [BEGIN EXCEPTION...RAISE; END;] - StatementList [0-98] [BEGIN EXCEPTION...RAISE; END;] +Script [0-97] [BEGIN EXCEPTION...RAISE; END;] + StatementList [0-97] [BEGIN EXCEPTION...RAISE; END;] BeginEndBlock [0-96] [BEGIN EXCEPTION...RAISE; END] StatementList [5-5] [] - ExceptionHandlerList [6-93] [EXCEPTION...; RAISE;] - ExceptionHandler [16-93] [WHEN ERROR...; RAISE;] - StatementList [34-93] [BEGIN EXCEPTION...; RAISE;] + ExceptionHandlerList [6-92] [EXCEPTION...; RAISE;] + ExceptionHandler [16-92] [WHEN ERROR...; RAISE;] + StatementList [34-92] [BEGIN EXCEPTION...; RAISE;] BeginEndBlock [34-82] [BEGIN EXCEPTION...RAISE; END] StatementList [39-39] [] - ExceptionHandlerList [40-79] [EXCEPTION...RAISE;] - ExceptionHandler [50-79] [WHEN ERROR THEN RAISE;] - StatementList [70-79] [RAISE;] + ExceptionHandlerList [40-76] [EXCEPTION...RAISE;] + ExceptionHandler [50-76] [WHEN ERROR THEN RAISE;] + StatementList [70-76] [RAISE;] RaiseStatement [70-75] [RAISE] RaiseStatement [86-91] [RAISE] -- @@ -120,10 +120,11 @@ END -- ALTERNATION GROUP: raise,using,message,; -- -Script [0-30] [raise using message = 'test';] - StatementList [0-30] [raise using message = 'test';] +Script [0-29] [raise using message = 'test';] + StatementList [0-29] [raise using message = 'test';] RaiseStatement [0-28] [raise using message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -132,16 +133,18 @@ ALTERNATION GROUP: raise,using,message, Script [0-28] [raise using message = 'test'] StatementList [0-28] [raise using message = 'test'] RaiseStatement [0-28] [raise using message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- ALTERNATION GROUP: raise,using,MESSAGE,; -- -Script [0-30] [raise using MESSAGE = 'test';] - StatementList [0-30] [raise using MESSAGE = 'test';] +Script [0-29] [raise using MESSAGE = 'test';] + StatementList [0-29] [raise using MESSAGE = 'test';] RaiseStatement [0-28] [raise using MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -150,16 +153,18 @@ ALTERNATION GROUP: raise,using,MESSAGE, Script [0-28] [raise using MESSAGE = 'test'] StatementList [0-28] [raise using MESSAGE = 'test'] RaiseStatement [0-28] [raise using MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- ALTERNATION GROUP: raise,USING,message,; -- -Script [0-30] [raise USING message = 'test';] - StatementList [0-30] [raise USING message = 'test';] +Script [0-29] [raise USING message = 'test';] + StatementList [0-29] [raise USING message = 'test';] RaiseStatement [0-28] [raise USING message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -168,16 +173,18 @@ ALTERNATION GROUP: raise,USING,message, Script [0-28] [raise USING message = 'test'] StatementList [0-28] [raise USING message = 'test'] RaiseStatement [0-28] [raise USING message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- ALTERNATION GROUP: raise,USING,MESSAGE,; -- -Script [0-30] [raise USING MESSAGE = 'test';] - StatementList [0-30] [raise USING MESSAGE = 'test';] +Script [0-29] [raise USING MESSAGE = 'test';] + StatementList [0-29] [raise USING MESSAGE = 'test';] RaiseStatement [0-28] [raise USING MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -186,16 +193,18 @@ ALTERNATION GROUP: raise,USING,MESSAGE, Script [0-28] [raise USING MESSAGE = 'test'] StatementList [0-28] [raise USING MESSAGE = 'test'] RaiseStatement [0-28] [raise USING MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- ALTERNATION GROUP: RAISE,using,message,; -- -Script [0-30] [RAISE using message = 'test';] - StatementList [0-30] [RAISE using message = 'test';] +Script [0-29] [RAISE using message = 'test';] + StatementList [0-29] [RAISE using message = 'test';] RaiseStatement [0-28] [RAISE using message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -204,16 +213,18 @@ ALTERNATION GROUP: RAISE,using,message, Script [0-28] [RAISE using message = 'test'] StatementList [0-28] [RAISE using message = 'test'] RaiseStatement [0-28] [RAISE using message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- ALTERNATION GROUP: RAISE,using,MESSAGE,; -- -Script [0-30] [RAISE using MESSAGE = 'test';] - StatementList [0-30] [RAISE using MESSAGE = 'test';] +Script [0-29] [RAISE using MESSAGE = 'test';] + StatementList [0-29] [RAISE using MESSAGE = 'test';] RaiseStatement [0-28] [RAISE using MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -222,16 +233,18 @@ ALTERNATION GROUP: RAISE,using,MESSAGE, Script [0-28] [RAISE using MESSAGE = 'test'] StatementList [0-28] [RAISE using MESSAGE = 'test'] RaiseStatement [0-28] [RAISE using MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- ALTERNATION GROUP: RAISE,USING,message,; -- -Script [0-30] [RAISE USING message = 'test';] - StatementList [0-30] [RAISE USING message = 'test';] +Script [0-29] [RAISE USING message = 'test';] + StatementList [0-29] [RAISE USING message = 'test';] RaiseStatement [0-28] [RAISE USING message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -240,16 +253,18 @@ ALTERNATION GROUP: RAISE,USING,message, Script [0-28] [RAISE USING message = 'test'] StatementList [0-28] [RAISE USING message = 'test'] RaiseStatement [0-28] [RAISE USING message = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- ALTERNATION GROUP: RAISE,USING,MESSAGE,; -- -Script [0-30] [RAISE USING MESSAGE = 'test';] - StatementList [0-30] [RAISE USING MESSAGE = 'test';] +Script [0-29] [RAISE USING MESSAGE = 'test';] + StatementList [0-29] [RAISE USING MESSAGE = 'test';] RaiseStatement [0-28] [RAISE USING MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; -- @@ -258,7 +273,8 @@ ALTERNATION GROUP: RAISE,USING,MESSAGE, Script [0-28] [RAISE USING MESSAGE = 'test'] StatementList [0-28] [RAISE USING MESSAGE = 'test'] RaiseStatement [0-28] [RAISE USING MESSAGE = 'test'] - StringLiteral('test') [22-28] ['test'] + StringLiteral [22-28] ['test'] + StringLiteralComponent('test') [22-28] ['test'] -- RAISE USING MESSAGE = 'test' ; == @@ -266,15 +282,18 @@ RAISE USING MESSAGE = 'test' ; # raise with nonliteral message raise using message = CONCAT('test', '_', 'foo'); -- -Script [0-50] [raise using...', 'foo');] - StatementList [0-50] [raise using...', 'foo');] +Script [0-49] [raise using...', 'foo');] + StatementList [0-49] [raise using...', 'foo');] RaiseStatement [0-48] [raise using..._', 'foo')] FunctionCall [22-48] [CONCAT('test', '_', 'foo')] PathExpression [22-28] [CONCAT] Identifier(CONCAT) [22-28] [CONCAT] - StringLiteral('test') [29-35] ['test'] - StringLiteral('_') [37-40] ['_'] - StringLiteral('foo') [42-47] ['foo'] + StringLiteral [29-35] ['test'] + StringLiteralComponent('test') [29-35] ['test'] + StringLiteral [37-40] ['_'] + StringLiteralComponent('_') [37-40] ['_'] + StringLiteral [42-47] ['foo'] + StringLiteralComponent('foo') [42-47] ['foo'] -- RAISE USING MESSAGE = CONCAT('test', '_', 'foo') ; == @@ -284,8 +303,8 @@ raise using message = {{a|@a|@@a}}; -- ALTERNATION GROUP: a -- -Script [0-25] [raise using message = a;] - StatementList [0-25] [raise using message = a;] +Script [0-24] [raise using message = a;] + StatementList [0-24] [raise using message = a;] RaiseStatement [0-23] [raise using message = a] PathExpression [22-23] [a] Identifier(a) [22-23] [a] @@ -294,8 +313,8 @@ RAISE USING MESSAGE = a ; -- ALTERNATION GROUP: @a -- -Script [0-26] [raise using message = @a;] - StatementList [0-26] [raise using message = @a;] +Script [0-25] [raise using message = @a;] + StatementList [0-25] [raise using message = @a;] RaiseStatement [0-24] [raise using message = @a] ParameterExpr [22-24] [@a] Identifier(a) [23-24] [a] @@ -304,8 +323,8 @@ RAISE USING MESSAGE = @a ; -- ALTERNATION GROUP: @@a -- -Script [0-27] [raise using message = @@a;] - StatementList [0-27] [raise using message = @@a;] +Script [0-26] [raise using message = @@a;] + StatementList [0-26] [raise using message = @@a;] RaiseStatement [0-25] [raise using message = @@a] SystemVariableExpr [22-25] [@@a] PathExpression [24-25] [a] @@ -319,16 +338,17 @@ IF x < 0 THEN RAISE using message = 'x is negative'; END IF; -- -Script [0-63] [IF x < 0 THEN...'; END IF;] - StatementList [0-63] [IF x < 0 THEN...'; END IF;] +Script [0-62] [IF x < 0 THEN...'; END IF;] + StatementList [0-62] [IF x < 0 THEN...'; END IF;] IfStatement [0-61] [IF x < 0 THEN...ative'; END IF] BinaryExpression(<) [3-8] [x < 0] PathExpression [3-4] [x] Identifier(x) [3-4] [x] IntLiteral(0) [7-8] [0] - StatementList [16-55] [RAISE using...negative';] + StatementList [16-54] [RAISE using...negative';] RaiseStatement [16-53] [RAISE using...negative'] - StringLiteral('x is negative') [38-53] ['x is negative'] + StringLiteral [38-53] ['x is negative'] + StringLiteralComponent('x is negative') [38-53] ['x is negative'] -- IF x < 0 THEN RAISE USING MESSAGE = 'x is negative' ; @@ -338,8 +358,8 @@ END IF ; # raise keywords used as identifiers (except for USING, which is reserved). SELECT RAISE, MESSAGE FROM t; -- -Script [0-30] [SELECT RAISE, MESSAGE FROM t;] - StatementList [0-30] [SELECT RAISE, MESSAGE FROM t;] +Script [0-29] [SELECT RAISE, MESSAGE FROM t;] + StatementList [0-29] [SELECT RAISE, MESSAGE FROM t;] QueryStatement [0-28] [SELECT RAISE, MESSAGE FROM t] Query [0-28] [SELECT RAISE, MESSAGE FROM t] Select [0-28] [SELECT RAISE, MESSAGE FROM t] diff --git a/zetasql/parser/testdata/regression1.test b/zetasql/parser/testdata/regression1.test index 9bca31cf5..ec821e947 100644 --- a/zetasql/parser/testdata/regression1.test +++ b/zetasql/parser/testdata/regression1.test @@ -30,7 +30,8 @@ QueryStatement [0-334] [SELECT SAFE_CAST...>, gtype>)] PathExpression [33-38] [gtype] Identifier(gtype) [33-38] [gtype] CastExpression [40-66] [CAST("TESTENUM1" AS gtype)] - StringLiteral("TESTENUM1") [45-56] ["TESTENUM1"] + StringLiteral [45-56] ["TESTENUM1"] + StringLiteralComponent("TESTENUM1") [45-56] ["TESTENUM1"] SimpleType [60-65] [gtype] PathExpression [60-65] [gtype] Identifier(gtype) [60-65] [gtype] @@ -51,7 +52,8 @@ QueryStatement [0-334] [SELECT SAFE_CAST...>, gtype>)] PathExpression [121-126] [gtype] Identifier(gtype) [121-126] [gtype] CastExpression [128-145] [CAST("" AS gtype)] - StringLiteral("") [133-135] [""] + StringLiteral [133-135] [""] + StringLiteralComponent("") [133-135] [""] SimpleType [139-144] [gtype] PathExpression [139-144] [gtype] Identifier(gtype) [139-144] [gtype] @@ -90,13 +92,15 @@ QueryStatement [0-334] [SELECT SAFE_CAST...>, gtype>)] PathExpression [241-246] [gtype] Identifier(gtype) [241-246] [gtype] CastExpression [248-265] [CAST("" AS gtype)] - StringLiteral("") [253-255] [""] + StringLiteral [253-255] [""] + StringLiteralComponent("") [253-255] [""] SimpleType [259-264] [gtype] PathExpression [259-264] [gtype] Identifier(gtype) [259-264] [gtype] StructConstructorArg [268-294] [CAST("TESTENUM1" AS gtype)] CastExpression [268-294] [CAST("TESTENUM1" AS gtype)] - StringLiteral("TESTENUM1") [273-284] ["TESTENUM1"] + StringLiteral [273-284] ["TESTENUM1"] + StringLiteralComponent("TESTENUM1") [273-284] ["TESTENUM1"] SimpleType [288-293] [gtype] PathExpression [288-293] [gtype] Identifier(gtype) [288-293] [gtype] @@ -122,4 +126,3 @@ SELECT BOOL, ARRAY< gtype >, gtype >) == -== diff --git a/zetasql/parser/testdata/replace_fields.test b/zetasql/parser/testdata/replace_fields.test index d2139e19c..922f7bdec 100644 --- a/zetasql/parser/testdata/replace_fields.test +++ b/zetasql/parser/testdata/replace_fields.test @@ -142,7 +142,8 @@ QueryStatement [0-77] [select replace_fi...ing_field)] PathExpression [39-50] [int32_field] Identifier(int32_field) [39-50] [int32_field] ReplaceFieldsArg [52-76] ["string" as string_field] - StringLiteral("string") [52-60] ["string"] + StringLiteral [52-60] ["string"] + StringLiteralComponent("string") [52-60] ["string"] PathExpression [64-76] [string_field] Identifier(string_field) [64-76] [string_field] -- @@ -166,7 +167,8 @@ QueryStatement [0-86] [select replace_...xtension))] PathExpression [41-52] [int32_field] Identifier(int32_field) [41-52] [int32_field] ReplaceFieldsArg [54-85] ["string" as...extension)] - StringLiteral("string") [54-62] ["string"] + StringLiteral [54-62] ["string"] + StringLiteralComponent("string") [54-62] ["string"] PathExpression [67-84] [path.to.extension] Identifier(path) [67-71] [path] Identifier(`to`) [72-74] [to] @@ -194,7 +196,8 @@ QueryStatement [0-108] [select replace_..._field )] PathExpression [46-57] [int32_field] Identifier(int32_field) [46-57] [int32_field] ReplaceFieldsArg [59-83] ["string" as string_field] - StringLiteral("string") [59-67] ["string"] + StringLiteral [59-67] ["string"] + StringLiteralComponent("string") [59-67] ["string"] PathExpression [71-83] [string_field] Identifier(string_field) [71-83] [string_field] ReplaceFieldsArg [85-104] [2.5 as double_field] diff --git a/zetasql/parser/testdata/rollup.test b/zetasql/parser/testdata/rollup.test index 4db47a197..e7f710526 100644 --- a/zetasql/parser/testdata/rollup.test +++ b/zetasql/parser/testdata/rollup.test @@ -64,8 +64,10 @@ QueryStatement [0-92] [SELECT x,...ROLLUP(x, y)] PathExpression [42-43] [y] Identifier(y) [42-43] [y] IntLiteral(0) [47-48] [0] - StringLiteral("foo") [50-55] ["foo"] - StringLiteral("bar") [57-62] ["bar"] + StringLiteral [50-55] ["foo"] + StringLiteralComponent("foo") [50-55] ["foo"] + StringLiteral [57-62] ["bar"] + StringLiteralComponent("bar") [57-62] ["bar"] FromClause [64-70] [FROM T] TablePathExpression [69-70] [T] PathExpression [69-70] [T] @@ -171,7 +173,8 @@ QueryStatement [0-102] [select x,...ROLLUP(y, x )] BinaryExpression(=) [68-77] [y = 'foo'] PathExpression [68-69] [y] Identifier(y) [68-69] [y] - StringLiteral('foo') [72-77] ['foo'] + StringLiteral [72-77] ['foo'] + StringLiteralComponent('foo') [72-77] ['foo'] PathExpression [79-80] [z] Identifier(z) [79-80] [z] GroupingItem [83-84] [z] diff --git a/zetasql/parser/testdata/script.test b/zetasql/parser/testdata/script.test index fc21eb312..960c894f1 100644 --- a/zetasql/parser/testdata/script.test +++ b/zetasql/parser/testdata/script.test @@ -5,8 +5,8 @@ SELECT 3{{;|}} -- ALTERNATION GROUP: ; -- -Script [0-10] [SELECT 3;] - StatementList [0-10] [SELECT 3;] +Script [0-9] [SELECT 3;] + StatementList [0-9] [SELECT 3;] QueryStatement [0-8] [SELECT 3] Query [0-8] [SELECT 3] Select [0-8] [SELECT 3] @@ -48,8 +48,8 @@ SELECT 3; SELECT 4{{;|}} -- ALTERNATION GROUP: ; -- -Script [0-20] [SELECT 3; SELECT 4;] - StatementList [0-20] [SELECT 3; SELECT 4;] +Script [0-19] [SELECT 3; SELECT 4;] + StatementList [0-19] [SELECT 3; SELECT 4;] QueryStatement [0-8] [SELECT 3] Query [0-8] [SELECT 3] Select [0-8] [SELECT 3] @@ -115,7 +115,7 @@ Script [0-55] [IF true then...SELECT 11] StatementList [0-55] [IF true then...SELECT 11] IfStatement [0-44] [IF true then...10; end if] BooleanLiteral(true) [3-7] [true] - StatementList [15-38] [select 9; select 10;] + StatementList [15-37] [select 9; select 10;] QueryStatement [15-23] [select 9] Query [15-23] [select 9] Select [15-23] [select 9] @@ -159,8 +159,8 @@ ERROR: Syntax error: Unexpected integer literal "7" [at 1:1] SELECT 1; EXECUTE IMMEDIATE x; -- -Script [0-31] [SELECT 1; EXECUTE IMMEDIATE x;] - StatementList [0-31] [SELECT 1; EXECUTE IMMEDIATE x;] +Script [0-30] [SELECT 1; EXECUTE IMMEDIATE x;] + StatementList [0-30] [SELECT 1; EXECUTE IMMEDIATE x;] QueryStatement [0-8] [SELECT 1] Query [0-8] [SELECT 1] Select [0-8] [SELECT 1] @@ -182,8 +182,8 @@ INSERT UPDATE UPDATE (a) VALUES (1); INSERT UPDATE VALUE (a) VALUES (1); SELECT 1; -- -Script [0-93] [SELECT 1;...SELECT 1;] - StatementList [0-93] [SELECT 1;...SELECT 1;] +Script [0-92] [SELECT 1;...SELECT 1;] + StatementList [0-92] [SELECT 1;...SELECT 1;] QueryStatement [0-8] [SELECT 1] Query [0-8] [SELECT 1] Select [0-8] [SELECT 1] @@ -236,10 +236,10 @@ INSERT OR REPLACE INTO table_collated_types(struct_with_string_ci_val, string_ci VALUES (1, 2), (5, 6); END; -- -Script [0-255] [BEGIN INSERT..., 6); END;] - StatementList [0-255] [BEGIN INSERT..., 6); END;] +Script [0-254] [BEGIN INSERT..., 6); END;] + StatementList [0-254] [BEGIN INSERT..., 6); END;] BeginEndBlock [0-253] [BEGIN INSERT...5, 6); END] - StatementList [6-115] [INSERT OR...), (3, 4);] + StatementList [6-114] [INSERT OR...), (3, 4);] InsertStatement(insert_mode=REPLACE) [6-113] [INSERT OR...2), (3, 4)] PathExpression [29-49] [table_collated_types] Identifier(table_collated_types) [29-49] [table_collated_types] @@ -253,9 +253,9 @@ Script [0-255] [BEGIN INSERT..., 6); END;] InsertValuesRow [107-113] [(3, 4)] IntLiteral(3) [108-109] [3] IntLiteral(4) [111-112] [4] - ExceptionHandlerList [115-250] [EXCEPTION...), (5, 6);] - ExceptionHandler [125-250] [WHEN ERROR...), (5, 6);] - StatementList [141-250] [INSERT OR...), (5, 6);] + ExceptionHandlerList [115-249] [EXCEPTION...), (5, 6);] + ExceptionHandler [125-249] [WHEN ERROR...), (5, 6);] + StatementList [141-249] [INSERT OR...), (5, 6);] InsertStatement(insert_mode=REPLACE) [141-248] [INSERT OR...2), (5, 6)] PathExpression [164-184] [table_collated_types] Identifier(table_collated_types) [164-184] [table_collated_types] @@ -290,10 +290,10 @@ INSERT OR REPLACE INTO table_collated_types(struct_with_string_ci_val, string_ci VALUES (1, 2), (3, 4); END LOOP; -- -Script [0-124] [LOOP INSERT...END LOOP;] - StatementList [0-124] [LOOP INSERT...END LOOP;] +Script [0-123] [LOOP INSERT...END LOOP;] + StatementList [0-123] [LOOP INSERT...END LOOP;] WhileStatement [0-122] [LOOP INSERT...; END LOOP] - StatementList [5-114] [INSERT OR...), (3, 4);] + StatementList [5-113] [INSERT OR...), (3, 4);] InsertStatement(insert_mode=REPLACE) [5-112] [INSERT OR...2), (3, 4)] PathExpression [28-48] [table_collated_types] Identifier(table_collated_types) [28-48] [table_collated_types] @@ -322,10 +322,10 @@ VALUES (1, 2), (3, 4); UNTIL FALSE END REPEAT; -- -Script [0-140] [REPEAT INSERT...END REPEAT;] - StatementList [0-140] [REPEAT INSERT...END REPEAT;] +Script [0-139] [REPEAT INSERT...END REPEAT;] + StatementList [0-139] [REPEAT INSERT...END REPEAT;] RepeatStatement [0-138] [REPEAT INSERT...END REPEAT] - StatementList [7-116] [INSERT OR...), (3, 4);] + StatementList [7-115] [INSERT OR...), (3, 4);] InsertStatement(insert_mode=REPLACE) [7-114] [INSERT OR...2), (3, 4)] PathExpression [30-50] [table_collated_types] Identifier(table_collated_types) [30-50] [table_collated_types] @@ -356,11 +356,11 @@ INSERT OR REPLACE INTO table_collated_types(struct_with_string_ci_val, string_ci VALUES (1, 2), (3, 4); END WHILE; -- -Script [0-134] [WHILE TRUE...END WHILE;] - StatementList [0-134] [WHILE TRUE...END WHILE;] +Script [0-133] [WHILE TRUE...END WHILE;] + StatementList [0-133] [WHILE TRUE...END WHILE;] WhileStatement [0-132] [WHILE TRUE...END WHILE] BooleanLiteral(TRUE) [6-10] [TRUE] - StatementList [14-123] [INSERT OR...), (3, 4);] + StatementList [14-122] [INSERT OR...), (3, 4);] InsertStatement(insert_mode=REPLACE) [14-121] [INSERT OR...2), (3, 4)] PathExpression [37-57] [table_collated_types] Identifier(table_collated_types) [37-57] [table_collated_types] @@ -389,8 +389,8 @@ INSERT OR REPLACE INTO table_collated_types(struct_with_string_ci_val, string_ci VALUES (1, 2), (3, 4); END FOR; -- -Script [0-148] [FOR t IN (...; END FOR;] - StatementList [0-148] [FOR t IN (...; END FOR;] +Script [0-147] [FOR t IN (...; END FOR;] + StatementList [0-147] [FOR t IN (...; END FOR;] ForInStatement [0-146] [FOR t IN (...); END FOR] Identifier(t) [4-5] [t] Query [10-25] [SELECT * FROM T] @@ -402,7 +402,7 @@ Script [0-148] [FOR t IN (...; END FOR;] TablePathExpression [24-25] [T] PathExpression [24-25] [T] Identifier(T) [24-25] [T] - StatementList [30-139] [INSERT OR...), (3, 4);] + StatementList [30-138] [INSERT OR...), (3, 4);] InsertStatement(insert_mode=REPLACE) [30-137] [INSERT OR...2), (3, 4)] PathExpression [53-73] [table_collated_types] Identifier(table_collated_types) [53-73] [table_collated_types] diff --git a/zetasql/parser/testdata/select_as_distinct_all.test b/zetasql/parser/testdata/select_as_distinct_all.test index 15e9cbb70..b8eb9433e 100644 --- a/zetasql/parser/testdata/select_as_distinct_all.test +++ b/zetasql/parser/testdata/select_as_distinct_all.test @@ -104,7 +104,8 @@ QueryStatement [0-29] [select as struct 'abc' from T] SelectAs(as_mode=STRUCT) [7-16] [as struct] SelectList [17-22] ['abc'] SelectColumn [17-22] ['abc'] - StringLiteral('abc') [17-22] ['abc'] + StringLiteral [17-22] ['abc'] + StringLiteralComponent('abc') [17-22] ['abc'] FromClause [23-29] [from T] TablePathExpression [28-29] [T] PathExpression [28-29] [T] @@ -324,7 +325,7 @@ select as struct select struct -- -ERROR: Syntax error: Expected "(" or "<" but got end of statement [at 1:14] +ERROR: Syntax error: Expected "(" or "<" or "{" but got end of statement [at 1:14] select struct ^ == diff --git a/zetasql/parser/testdata/set.test b/zetasql/parser/testdata/set.test index aa0183aba..9aba68779 100644 --- a/zetasql/parser/testdata/set.test +++ b/zetasql/parser/testdata/set.test @@ -111,7 +111,8 @@ AssignmentFromStruct [0-26] [SET (a,b) = (1 + 3, 'foo')] BinaryExpression(+) [13-18] [1 + 3] IntLiteral(1) [13-14] [1] IntLiteral(3) [17-18] [3] - StringLiteral('foo') [20-25] ['foo'] + StringLiteral [20-25] ['foo'] + StringLiteralComponent('foo') [20-25] ['foo'] -- SET(a, b) = (1 + 3, 'foo') == @@ -251,7 +252,8 @@ ALTERNATION GROUP: @ ParameterAssignment [0-22] [SET @value="something"] ParameterExpr [4-10] [@value] Identifier(value) [5-10] [value] - StringLiteral("something") [11-22] ["something"] + StringLiteral [11-22] ["something"] + StringLiteralComponent("something") [11-22] ["something"] -- SET @value = "something" -- @@ -261,7 +263,8 @@ SystemVariableAssignment [0-23] [SET @@value="something"] SystemVariableExpr [4-11] [@@value] PathExpression [6-11] [value] Identifier(value) [6-11] [value] - StringLiteral("something") [12-23] ["something"] + StringLiteral [12-23] ["something"] + StringLiteralComponent("something") [12-23] ["something"] -- SET @@value = "something" == @@ -274,7 +277,8 @@ ALTERNATION GROUP: @ ParameterAssignment [0-20] [SET @AND="something"] ParameterExpr [4-8] [@AND] Identifier(`AND`) [5-8] [AND] - StringLiteral("something") [9-20] ["something"] + StringLiteral [9-20] ["something"] + StringLiteralComponent("something") [9-20] ["something"] -- SET @`AND` = "something" -- @@ -284,7 +288,8 @@ SystemVariableAssignment [0-21] [SET @@AND="something"] SystemVariableExpr [4-9] [@@AND] PathExpression [6-9] [AND] Identifier(`AND`) [6-9] [AND] - StringLiteral("something") [10-21] ["something"] + StringLiteral [10-21] ["something"] + StringLiteralComponent("something") [10-21] ["something"] -- SET @@`AND` = "something" == @@ -297,7 +302,8 @@ ALTERNATION GROUP: @ ParameterAssignment [0-22] [SET @ABORT="something"] ParameterExpr [4-10] [@ABORT] Identifier(ABORT) [5-10] [ABORT] - StringLiteral("something") [11-22] ["something"] + StringLiteral [11-22] ["something"] + StringLiteralComponent("something") [11-22] ["something"] -- SET @ABORT = "something" -- @@ -307,7 +313,8 @@ SystemVariableAssignment [0-23] [SET @@ABORT="something"] SystemVariableExpr [4-11] [@@ABORT] PathExpression [6-11] [ABORT] Identifier(ABORT) [6-11] [ABORT] - StringLiteral("something") [12-23] ["something"] + StringLiteral [12-23] ["something"] + StringLiteralComponent("something") [12-23] ["something"] -- SET @@ABORT = "something" == diff --git a/zetasql/parser/testdata/show.test b/zetasql/parser/testdata/show.test index f078d7ad7..58923d8e5 100644 --- a/zetasql/parser/testdata/show.test +++ b/zetasql/parser/testdata/show.test @@ -66,7 +66,8 @@ show TABLES like 'KitchenSync%'; -- ShowStatement [0-31] [show TABLES...KitchenSync%'] Identifier(TABLES) [5-11] [TABLES] - StringLiteral('KitchenSync%') [17-31] ['KitchenSync%'] + StringLiteral [17-31] ['KitchenSync%'] + StringLiteralComponent('KitchenSync%') [17-31] ['KitchenSync%'] -- SHOW TABLES LIKE 'KitchenSync%' == @@ -104,7 +105,8 @@ show VARIABLES like 'server_name'; -- ShowStatement [0-33] [show VARIABLES...server_name'] Identifier(VARIABLES) [5-14] [VARIABLES] - StringLiteral('server_name') [20-33] ['server_name'] + StringLiteral [20-33] ['server_name'] + StringLiteralComponent('server_name') [20-33] ['server_name'] -- SHOW VARIABLES LIKE 'server_name' == @@ -130,7 +132,8 @@ ShowStatement [0-52] [show TABLES...KitchenSync%'] PathExpression [17-32] [catalog.adwords] Identifier(catalog) [17-24] [catalog] Identifier(adwords) [25-32] [adwords] - StringLiteral('KitchenSync%') [38-52] ['KitchenSync%'] + StringLiteral [38-52] ['KitchenSync%'] + StringLiteralComponent('KitchenSync%') [38-52] ['KitchenSync%'] -- SHOW TABLES FROM catalog.adwords LIKE 'KitchenSync%' == @@ -155,7 +158,8 @@ SHOW MATERIALIZED VIEWS like 'KitchenSync%'; -- ShowStatement [0-43] [SHOW MATERIALIZED...chenSync%'] Identifier(`MATERIALIZED VIEWS`) [5-23] [MATERIALIZED VIEWS] - StringLiteral('KitchenSync%') [29-43] ['KitchenSync%'] + StringLiteral [29-43] ['KitchenSync%'] + StringLiteralComponent('KitchenSync%') [29-43] ['KitchenSync%'] -- SHOW `MATERIALIZED VIEWS` LIKE 'KitchenSync%' == @@ -176,7 +180,8 @@ ShowStatement [0-53] [SHOW MATERIALIZED...ke '%foo%'] Identifier(`MATERIALIZED VIEWS`) [5-23] [MATERIALIZED VIEWS] PathExpression [29-40] [KitchenSync] Identifier(KitchenSync) [29-40] [KitchenSync] - StringLiteral('%foo%') [46-53] ['%foo%'] + StringLiteral [46-53] ['%foo%'] + StringLiteralComponent('%foo%') [46-53] ['%foo%'] -- SHOW `MATERIALIZED VIEWS` FROM KitchenSync LIKE '%foo%' == diff --git a/zetasql/parser/testdata/standalone_expression.test b/zetasql/parser/testdata/standalone_expression.test index 33250cce4..a9e314ae2 100644 --- a/zetasql/parser/testdata/standalone_expression.test +++ b/zetasql/parser/testdata/standalone_expression.test @@ -9,7 +9,8 @@ IntLiteral(1) [0-1] [1] 'abc' -- -StringLiteral('abc') [0-5] ['abc'] +StringLiteral [0-5] ['abc'] + StringLiteralComponent('abc') [0-5] ['abc'] -- 'abc' == @@ -147,9 +148,9 @@ a.b.c.d # Dot star is not allowed in a standalone expression. abc.* -- -ERROR: Syntax error: Expected end of input but got "." [at 1:4] +ERROR: Syntax error: Unexpected "*" [at 1:5] abc.* - ^ + ^ == # Parse fails on extra stuff after the expression. diff --git a/zetasql/parser/testdata/standalone_type.test b/zetasql/parser/testdata/standalone_type.test index be6db2680..b994e497d 100644 --- a/zetasql/parser/testdata/standalone_type.test +++ b/zetasql/parser/testdata/standalone_type.test @@ -287,7 +287,8 @@ SimpleType [0-14] [numeric("MAX")] PathExpression [0-7] [numeric] Identifier(numeric) [0-7] [numeric] TypeParameterList [7-13] [("MAX"] - StringLiteral("MAX") [8-13] ["MAX"] + StringLiteral [8-13] ["MAX"] + StringLiteralComponent("MAX") [8-13] ["MAX"] -- numeric("MAX") == @@ -299,7 +300,8 @@ SimpleType [0-14] [string(B"abc")] PathExpression [0-6] [string] Identifier(string) [0-6] [string] TypeParameterList [6-13] [(B"abc"] - BytesLiteral(B"abc") [7-13] [B"abc"] + BytesLiteral [7-13] [B"abc"] + BytesLiteralComponent(B"abc") [7-13] [B"abc"] -- string(B"abc") == @@ -325,7 +327,8 @@ SimpleType [0-19] [money(20, 6, 'USD')] TypeParameterList [5-18] [(20, 6, 'USD'] IntLiteral(20) [6-8] [20] IntLiteral(6) [10-11] [6] - StringLiteral('USD') [13-18] ['USD'] + StringLiteral [13-18] ['USD'] + StringLiteralComponent('USD') [13-18] ['USD'] -- money(20, 6, 'USD') == @@ -362,7 +365,8 @@ SimpleType [0-20] [string collate 'ABC'] PathExpression [0-6] [string] Identifier(string) [0-6] [string] Collate [7-20] [collate 'ABC'] - StringLiteral('ABC') [15-20] ['ABC'] + StringLiteral [15-20] ['ABC'] + StringLiteralComponent('ABC') [15-20] ['ABC'] -- string COLLATE 'ABC' == @@ -375,7 +379,8 @@ SimpleType [0-25] [string(100) collate 'ABC'] TypeParameterList [6-10] [(100] IntLiteral(100) [7-10] [100] Collate [12-25] [collate 'ABC'] - StringLiteral('ABC') [20-25] ['ABC'] + StringLiteral [20-25] ['ABC'] + StringLiteralComponent('ABC') [20-25] ['ABC'] -- string(100) COLLATE 'ABC' == @@ -389,7 +394,8 @@ StructType [0-30] [struct] PathExpression [9-15] [string] Identifier(string) [9-15] [string] Collate [16-29] [collate 'ABC'] - StringLiteral('ABC') [24-29] ['ABC'] + StringLiteral [24-29] ['ABC'] + StringLiteralComponent('ABC') [24-29] ['ABC'] -- STRUCT< a string COLLATE 'ABC' > == @@ -403,7 +409,8 @@ StructType [0-30] [struct collate 'ABC'] PathExpression [9-15] [string] Identifier(string) [9-15] [string] Collate [17-30] [collate 'ABC'] - StringLiteral('ABC') [25-30] ['ABC'] + StringLiteral [25-30] ['ABC'] + StringLiteralComponent('ABC') [25-30] ['ABC'] -- STRUCT< a string > COLLATE 'ABC' == @@ -419,7 +426,8 @@ StructType [0-35] [struct] TypeParameterList [15-19] [(100] IntLiteral(100) [16-19] [100] Collate [21-34] [collate 'ABC'] - StringLiteral('ABC') [29-34] ['ABC'] + StringLiteral [29-34] ['ABC'] + StringLiteralComponent('ABC') [29-34] ['ABC'] -- STRUCT< a string(100) COLLATE 'ABC' > == @@ -431,7 +439,8 @@ ArrayType [0-27] [array] PathExpression [6-12] [string] Identifier(string) [6-12] [string] Collate [13-26] [collate 'ABC'] - StringLiteral('ABC') [21-26] ['ABC'] + StringLiteral [21-26] ['ABC'] + StringLiteralComponent('ABC') [21-26] ['ABC'] -- ARRAY< string COLLATE 'ABC' > == @@ -443,7 +452,8 @@ ArrayType [0-27] [array collate 'ABC'] PathExpression [6-12] [string] Identifier(string) [6-12] [string] Collate [14-27] [collate 'ABC'] - StringLiteral('ABC') [22-27] ['ABC'] + StringLiteral [22-27] ['ABC'] + StringLiteralComponent('ABC') [22-27] ['ABC'] -- ARRAY< string > COLLATE 'ABC' == @@ -457,7 +467,8 @@ ArrayType [0-32] [array] TypeParameterList [12-16] [(100] IntLiteral(100) [13-16] [100] Collate [18-31] [collate 'ABC'] - StringLiteral('ABC') [26-31] ['ABC'] + StringLiteral [26-31] ['ABC'] + StringLiteralComponent('ABC') [26-31] ['ABC'] -- ARRAY< string(100) COLLATE 'ABC' > == @@ -471,7 +482,8 @@ ArrayType [0-37] [array(200)] TypeParameterList [12-16] [(100] IntLiteral(100) [13-16] [100] Collate [18-31] [collate 'ABC'] - StringLiteral('ABC') [26-31] ['ABC'] + StringLiteral [26-31] ['ABC'] + StringLiteralComponent('ABC') [26-31] ['ABC'] TypeParameterList [32-36] [(200] IntLiteral(200) [33-36] [200] -- @@ -491,7 +503,8 @@ StructType [0-78] [struct>] TypeParameterList [21-25] [(100] IntLiteral(100) [22-25] [100] Collate [27-40] [collate 'ABC'] - StringLiteral('ABC') [35-40] ['ABC'] + StringLiteral [35-40] ['ABC'] + StringLiteralComponent('ABC') [35-40] ['ABC'] StructField [43-77] [b struct] Identifier(b) [43-44] [b] StructType [45-77] [struct] @@ -502,7 +515,8 @@ StructType [0-78] [struct>] TypeParameterList [58-61] [(50] IntLiteral(50) [59-61] [50] Collate [63-76] [collate 'CDE'] - StringLiteral('CDE') [71-76] ['CDE'] + StringLiteral [71-76] ['CDE'] + StringLiteralComponent('CDE') [71-76] ['CDE'] -- STRUCT< a ARRAY< string(100) COLLATE 'ABC' >, b STRUCT< string(50) COLLATE 'CDE' > > == @@ -529,7 +543,8 @@ StructType [0-69] [struct] PathExpression [9-15] [string] Identifier(string) [9-15] [string] Collate [16-29] [collate 'ABC'] - StringLiteral('ABC') [24-29] ['ABC'] + StringLiteral [24-29] ['ABC'] + StringLiteralComponent('ABC') [24-29] ['ABC'] StructField [31-68] [b string(100...collation_name] Identifier(b) [31-32] [b] SimpleType [33-68] [string(100...collation_name] @@ -720,7 +735,8 @@ FunctionType [0-38] [FUNCTIONSTRING>] PathExpression [9-15] [STRING] Identifier(STRING) [9-15] [STRING] Collate [16-29] [collate 'abc'] - StringLiteral('abc') [24-29] ['abc'] + StringLiteral [24-29] ['abc'] + StringLiteralComponent('abc') [24-29] ['abc'] SimpleType [31-37] [STRING] PathExpression [31-37] [STRING] Identifier(STRING) [31-37] [STRING] @@ -740,14 +756,16 @@ FunctionType [0-58] [FUNCTION] TypeParameterList [15-17] [(8] IntLiteral(8) [16-17] [8] Collate [19-32] [collate 'abc'] - StringLiteral('abc') [27-32] ['abc'] + StringLiteral [27-32] ['abc'] + StringLiteralComponent('abc') [27-32] ['abc'] SimpleType [34-57] [STRING(8) collate 'abc'] PathExpression [34-40] [STRING] Identifier(STRING) [34-40] [STRING] TypeParameterList [40-42] [(8] IntLiteral(8) [41-42] [8] Collate [44-57] [collate 'abc'] - StringLiteral('abc') [52-57] ['abc'] + StringLiteral [52-57] ['abc'] + StringLiteralComponent('abc') [52-57] ['abc'] -- FUNCTION<(STRING(8) COLLATE 'abc') -> STRING(8) COLLATE 'abc' > == @@ -764,14 +782,16 @@ FunctionType [0-60] [FUNCTION<(...collate 'abc'>] TypeParameterList [16-18] [(8] IntLiteral(8) [17-18] [8] Collate [20-33] [collate 'abc'] - StringLiteral('abc') [28-33] ['abc'] + StringLiteral [28-33] ['abc'] + StringLiteralComponent('abc') [28-33] ['abc'] SimpleType [36-59] [STRING(8) collate 'abc'] PathExpression [36-42] [STRING] Identifier(STRING) [36-42] [STRING] TypeParameterList [42-44] [(8] IntLiteral(8) [43-44] [8] Collate [46-59] [collate 'abc'] - StringLiteral('abc') [54-59] ['abc'] + StringLiteral [54-59] ['abc'] + StringLiteralComponent('abc') [54-59] ['abc'] -- FUNCTION<(STRING(8) COLLATE 'abc') -> STRING(8) COLLATE 'abc' > == @@ -785,7 +805,8 @@ FunctionType [0-38] [FUNCTIONSTRING>] PathExpression [9-15] [STRING] Identifier(STRING) [9-15] [STRING] Collate [16-29] [collate 'abc'] - StringLiteral('abc') [24-29] ['abc'] + StringLiteral [24-29] ['abc'] + StringLiteralComponent('abc') [24-29] ['abc'] SimpleType [31-37] [STRING] PathExpression [31-37] [STRING] Identifier(STRING) [31-37] [STRING] @@ -802,7 +823,8 @@ FunctionType [0-40] [FUNCTION<(...)->STRING>] PathExpression [10-16] [STRING] Identifier(STRING) [10-16] [STRING] Collate [17-30] [collate 'abc'] - StringLiteral('abc') [25-30] ['abc'] + StringLiteral [25-30] ['abc'] + StringLiteralComponent('abc') [25-30] ['abc'] SimpleType [33-39] [STRING] PathExpression [33-39] [STRING] Identifier(STRING) [33-39] [STRING] @@ -834,7 +856,8 @@ FunctionType [0-31] [FUNCTION<(...collate 'c'] PathExpression [13-18] [INT64] Identifier(INT64) [13-18] [INT64] Collate [20-31] [collate 'c'] - StringLiteral('c') [28-31] ['c'] + StringLiteral [28-31] ['c'] + StringLiteralComponent('c') [28-31] ['c'] -- FUNCTION<() -> INT64 > COLLATE 'c' -- @@ -848,7 +871,8 @@ FunctionType [0-34] [FUNCTION<(...collate 'c'] TypeParameterList [19-21] [(0] IntLiteral(0) [20-21] [0] Collate [23-34] [collate 'c'] - StringLiteral('c') [31-34] ['c'] + StringLiteral [31-34] ['c'] + StringLiteralComponent('c') [31-34] ['c'] -- FUNCTION<() -> INT64 > (0) COLLATE 'c' == diff --git a/zetasql/parser/testdata/star.test b/zetasql/parser/testdata/star.test index 5dc491cb1..2bb651db1 100644 --- a/zetasql/parser/testdata/star.test +++ b/zetasql/parser/testdata/star.test @@ -530,23 +530,23 @@ FROM # DotStar is only supported as select column expression. select a from a.*; -- -ERROR: Syntax error: Expected end of input but got "." [at 1:16] +ERROR: Syntax error: Unexpected "*" [at 1:17] select a from a.*; - ^ + ^ == select count(a.*); -- -ERROR: Syntax error: Expected ")" but got "." [at 1:15] +ERROR: Syntax error: Unexpected "*" [at 1:16] select count(a.*); - ^ + ^ == select anon_count(a.*, 0, 1); -- -ERROR: Syntax error: Expected ")" but got "." [at 1:20] +ERROR: Syntax error: Unexpected "*" [at 1:21] select anon_count(a.*, 0, 1); - ^ + ^ == # Star and DotStar in the select list cannot have an alias. @@ -608,26 +608,9 @@ SELECT SELECT a+b.* FROM Table -- -QueryStatement [0-23] [SELECT a+b.* FROM Table] - Query [0-23] [SELECT a+b.* FROM Table] - Select [0-23] [SELECT a+b.* FROM Table] - SelectList [7-12] [a+b.*] - SelectColumn [7-12] [a+b.*] - DotStar [7-12] [a+b.*] - BinaryExpression(+) [7-10] [a+b] - PathExpression [7-8] [a] - Identifier(a) [7-8] [a] - PathExpression [9-10] [b] - Identifier(b) [9-10] [b] - FromClause [13-23] [FROM Table] - TablePathExpression [18-23] [Table] - PathExpression [18-23] [Table] - Identifier(Table) [18-23] [Table] --- -SELECT - a + b.* -FROM - Table +ERROR: Syntax error: Unexpected "*" [at 1:12] +SELECT a+b.* FROM Table + ^ == SELECT (a+b).* FROM Table diff --git a/zetasql/parser/testdata/string_literal_errors.test b/zetasql/parser/testdata/string_literal_errors.test index 5898b8308..c5bb02d09 100644 --- a/zetasql/parser/testdata/string_literal_errors.test +++ b/zetasql/parser/testdata/string_literal_errors.test @@ -1226,7 +1226,7 @@ select br"""a""\e # Some more cases of invalid quoting. select -{{"|'|'''|"""|''''|""""|'''''|"""""|'''''''|"""""""|''''''''|""""""""|'''''''''|"""""""""}} +{{"|'|'''|"""|''''|""""|'''''|"""""|'''''''|"""""""|'''''''''|"""""""""}} -- ALTERNATION GROUP: " -- @@ -1288,18 +1288,6 @@ ERROR: Syntax error: Unclosed string literal [at 2:7] """"""" ^ -- -ALTERNATION GROUP: '''''''' --- -ERROR: Syntax error: Expected end of input but got string literal '' [at 2:7] -'''''''' - ^ --- -ALTERNATION GROUP: """""""" --- -ERROR: Syntax error: Expected end of input but got string literal "" [at 2:7] -"""""""" - ^ --- ALTERNATION GROUP: ''''''''' -- ERROR: Syntax error: Unclosed triple-quoted string literal [at 2:7] @@ -1738,7 +1726,8 @@ QueryStatement [0-14] [SELECT ''' '''] Select [0-14] [SELECT ''' '''] SelectList [7-14] [''' '''] SelectColumn [7-14] [''' '''] - StringLiteral(''' + StringLiteral [7-14] [''' '''] + StringLiteralComponent(''' ''') [7-14] [''' '''] -- SELECT @@ -1773,7 +1762,8 @@ QueryStatement [0-15] [SELECT b''' '''] Select [0-15] [SELECT b''' '''] SelectList [7-15] [b''' '''] SelectColumn [7-15] [b''' '''] - BytesLiteral(b''' + BytesLiteral [7-15] [b''' '''] + BytesLiteralComponent(b''' ''') [7-15] [b''' '''] -- SELECT @@ -1808,7 +1798,8 @@ QueryStatement [0-14] [SELECT """ """] Select [0-14] [SELECT """ """] SelectList [7-14] [""" """] SelectColumn [7-14] [""" """] - StringLiteral(""" + StringLiteral [7-14] [""" """] + StringLiteralComponent(""" """) [7-14] [""" """] -- SELECT @@ -1843,7 +1834,8 @@ QueryStatement [0-15] [SELECT b""" """] Select [0-15] [SELECT b""" """] SelectList [7-15] [b""" """] SelectColumn [7-15] [b""" """] - BytesLiteral(b""" + BytesLiteral [7-15] [b""" """] + BytesLiteralComponent(b""" """) [7-15] [b""" """] -- SELECT @@ -1896,7 +1888,8 @@ QueryStatement [0-16] [SELECT rb'abc\ '] Select [0-16] [SELECT rb'abc\ '] SelectList [7-16] [rb'abc\ '] SelectColumn [7-16] [rb'abc\ '] - BytesLiteral(rb'abc\ + BytesLiteral [7-16] [rb'abc\ '] + BytesLiteralComponent(rb'abc\ ') [7-16] [rb'abc\ '] -- SELECT @@ -1958,7 +1951,8 @@ QueryStatement [0-16] [SELECT rb"abc\ "] Select [0-16] [SELECT rb"abc\ "] SelectList [7-16] [rb"abc\ "] SelectColumn [7-16] [rb"abc\ "] - BytesLiteral(rb"abc\ + BytesLiteral [7-16] [rb"abc\ "] + BytesLiteralComponent(rb"abc\ ") [7-16] [rb"abc\ "] -- SELECT diff --git a/zetasql/parser/testdata/struct.test b/zetasql/parser/testdata/struct.test index 3f91f5c95..22dbd1853 100644 --- a/zetasql/parser/testdata/struct.test +++ b/zetasql/parser/testdata/struct.test @@ -35,7 +35,8 @@ QueryStatement [0-17] [select (1, 'abc')] SelectColumn [7-17] [(1, 'abc')] StructConstructorWithParens [7-17] [(1, 'abc')] IntLiteral(1) [8-9] [1] - StringLiteral('abc') [11-16] ['abc'] + StringLiteral [11-16] ['abc'] + StringLiteralComponent('abc') [11-16] ['abc'] -- SELECT (1, 'abc') @@ -113,10 +114,12 @@ QueryStatement [0-92] [select key...key, value))] InList [55-91] [(1, 'abc')...key, value)] StructConstructorWithParens [55-65] [(1, 'abc')] IntLiteral(1) [56-57] [1] - StringLiteral('abc') [59-64] ['abc'] + StringLiteral [59-64] ['abc'] + StringLiteralComponent('abc') [59-64] ['abc'] StructConstructorWithParens [67-77] [(2, 'abc')] IntLiteral(2) [68-69] [2] - StringLiteral('abc') [71-76] ['abc'] + StringLiteral [71-76] ['abc'] + StringLiteralComponent('abc') [71-76] ['abc'] StructConstructorWithParens [79-91] [(key, value)] PathExpression [80-83] [key] Identifier(key) [80-83] [key] @@ -144,8 +147,10 @@ QueryStatement [0-31] [select ((1...', 'def'))] IntLiteral(1) [9-10] [1] IntLiteral(2) [12-13] [2] StructConstructorWithParens [16-30] [('abc', 'def')] - StringLiteral('abc') [17-22] ['abc'] - StringLiteral('def') [24-29] ['def'] + StringLiteral [17-22] ['abc'] + StringLiteralComponent('abc') [17-22] ['abc'] + StringLiteral [24-29] ['def'] + StringLiteralComponent('def') [24-29] ['def'] -- SELECT @@ -515,7 +520,7 @@ SELECT select struct<> -- -ERROR: Syntax error: Expected "(" but got end of statement [at 1:16] +ERROR: Syntax error: Expected "(" or "{" but got end of statement [at 1:16] select struct<> ^ == diff --git a/zetasql/parser/testdata/struct_braced_constructors.test b/zetasql/parser/testdata/struct_braced_constructors.test new file mode 100644 index 000000000..841dd3f59 --- /dev/null +++ b/zetasql/parser/testdata/struct_braced_constructors.test @@ -0,0 +1,607 @@ +[default language_features=V_1_3_BRACED_PROTO_CONSTRUCTORS,V_1_4_STRUCT_BRACED_CONSTRUCTORS] + +# No field constructor, plain STRUCT. +SELECT STRUCT {} +-- +QueryStatement [38-54] [SELECT STRUCT {}] + Query [38-54] [SELECT STRUCT {}] + Select [38-54] [SELECT STRUCT {}] + SelectList [45-54] [STRUCT {}] + SelectColumn [45-54] [STRUCT {}] + StructBracedConstructor [45-54] [STRUCT {}] + BracedConstructor [52-54] [{}] +-- +SELECT + STRUCT { } +== + +# Simple constructor with comma. +SELECT STRUCT { foo: "blah", bar: 3 } +-- +QueryStatement [0-37] [SELECT STRUCT..., bar: 3 }] + Query [0-37] [SELECT STRUCT..., bar: 3 }] + Select [0-37] [SELECT STRUCT..., bar: 3 }] + SelectList [7-37] [STRUCT { foo: "blah", bar: 3 }] + SelectColumn [7-37] [STRUCT { foo: "blah", bar: 3 }] + StructBracedConstructor [7-37] [STRUCT { foo: "blah", bar: 3 }] + BracedConstructor [14-37] [{ foo: "blah", bar: 3 }] + BracedConstructorField [16-27] [foo: "blah"] + Identifier(foo) [16-19] [foo] + BracedConstructorFieldValue [19-27] [: "blah"] + StringLiteral [21-27] ["blah"] + StringLiteralComponent("blah") [21-27] ["blah"] + BracedConstructorField [29-35] [bar: 3] + Identifier(bar) [29-32] [bar] + BracedConstructorFieldValue [32-35] [: 3] + IntLiteral(3) [34-35] [3] +-- +SELECT + STRUCT { foo : "blah", bar : 3 } +== + +# Leading comma is syntax error +SELECT STRUCT { , foo: "blah", bar: 3 } +-- +ERROR: Syntax error: Unexpected "," [at 1:17] +SELECT STRUCT { , foo: "blah", bar: 3 } + ^ +== + +# Trailing comma is fine +SELECT STRUCT {foo: "blah", bar: 3, } +-- +QueryStatement [0-37] [SELECT STRUCT...bar: 3, }] + Query [0-37] [SELECT STRUCT...bar: 3, }] + Select [0-37] [SELECT STRUCT...bar: 3, }] + SelectList [7-37] [STRUCT {foo: "blah", bar: 3, }] + SelectColumn [7-37] [STRUCT {foo: "blah", bar: 3, }] + StructBracedConstructor [7-37] [STRUCT {foo: "blah", bar: 3, }] + BracedConstructor [14-37] [{foo: "blah", bar: 3, }] + BracedConstructorField [15-26] [foo: "blah"] + Identifier(foo) [15-18] [foo] + BracedConstructorFieldValue [18-26] [: "blah"] + StringLiteral [20-26] ["blah"] + StringLiteralComponent("blah") [20-26] ["blah"] + BracedConstructorField [28-34] [bar: 3] + Identifier(bar) [28-31] [bar] + BracedConstructorFieldValue [31-34] [: 3] + IntLiteral(3) [33-34] [3] +-- +SELECT + STRUCT { foo : "blah", bar : 3 } +== + +# Error syntax to use whitespace instead of comma to separate fields. +# Report error at the analyzer stage. +SELECT STRUCT {a:1 b:2} +-- +QueryStatement [0-23] [SELECT STRUCT {a:1 b:2}] + Query [0-23] [SELECT STRUCT {a:1 b:2}] + Select [0-23] [SELECT STRUCT {a:1 b:2}] + SelectList [7-23] [STRUCT {a:1 b:2}] + SelectColumn [7-23] [STRUCT {a:1 b:2}] + StructBracedConstructor [7-23] [STRUCT {a:1 b:2}] + BracedConstructor [14-23] [{a:1 b:2}] + BracedConstructorField [15-18] [a:1] + Identifier(a) [15-16] [a] + BracedConstructorFieldValue [16-18] [:1] + IntLiteral(1) [17-18] [1] + BracedConstructorField [19-22] [b:2] + Identifier(b) [19-20] [b] + BracedConstructorFieldValue [20-22] [:2] + IntLiteral(2) [21-22] [2] +-- +SELECT + STRUCT { a : 1 b : 2 } +== + +# Error syntax to use both whitespace and comma separate fields. +# Report error at the analyzer stage. +SELECT STRUCT {a:1 b:2, c:3} +-- +QueryStatement [0-28] [SELECT STRUCT {a:1 b:2, c:3}] + Query [0-28] [SELECT STRUCT {a:1 b:2, c:3}] + Select [0-28] [SELECT STRUCT {a:1 b:2, c:3}] + SelectList [7-28] [STRUCT {a:1 b:2, c:3}] + SelectColumn [7-28] [STRUCT {a:1 b:2, c:3}] + StructBracedConstructor [7-28] [STRUCT {a:1 b:2, c:3}] + BracedConstructor [14-28] [{a:1 b:2, c:3}] + BracedConstructorField [15-18] [a:1] + Identifier(a) [15-16] [a] + BracedConstructorFieldValue [16-18] [:1] + IntLiteral(1) [17-18] [1] + BracedConstructorField [19-22] [b:2] + Identifier(b) [19-20] [b] + BracedConstructorFieldValue [20-22] [:2] + IntLiteral(2) [21-22] [2] + BracedConstructorField [24-27] [c:3] + Identifier(c) [24-25] [c] + BracedConstructorFieldValue [25-27] [:3] + IntLiteral(3) [26-27] [3] +-- +SELECT + STRUCT { a : 1 b : 2, c : 3 } +== + +# Error syntax to use whitespace instead of colon to separate field and value for submessage. +# Report error at the analyzer stage. +SELECT STRUCT { + a { + b: 1 + } +} +-- +QueryStatement [0-38] [SELECT STRUCT...: 1 } }] + Query [0-38] [SELECT STRUCT...: 1 } }] + Select [0-38] [SELECT STRUCT...: 1 } }] + SelectList [7-38] [STRUCT {...: 1 } }] + SelectColumn [7-38] [STRUCT {...: 1 } }] + StructBracedConstructor [7-38] [STRUCT {...: 1 } }] + BracedConstructor [14-38] [{ a { b: 1 } }] + BracedConstructorField [19-36] [a { b: 1 }] + Identifier(a) [19-20] [a] + BracedConstructorFieldValue [21-36] [{ b: 1 }] + BracedConstructor [21-36] [{ b: 1 }] + BracedConstructorField [27-31] [b: 1] + Identifier(b) [27-28] [b] + BracedConstructorFieldValue [28-31] [: 1] + IntLiteral(1) [30-31] [1] +-- +SELECT + STRUCT { a { b : 1 } } +== + +# Error syntax to use both whitespace and colon to separate field and value for submessage. +# Report error at the analyzer stage. +SELECT STRUCT { + a { + b: 1 + }, + c: { + d: 1 + } +} +-- +QueryStatement [0-59] [SELECT STRUCT...: 1 } }] + Query [0-59] [SELECT STRUCT...: 1 } }] + Select [0-59] [SELECT STRUCT...: 1 } }] + SelectList [7-59] [STRUCT {...: 1 } }] + SelectColumn [7-59] [STRUCT {...: 1 } }] + StructBracedConstructor [7-59] [STRUCT {...: 1 } }] + BracedConstructor [14-59] [{ a {...: 1 } }] + BracedConstructorField [19-36] [a { b: 1 }] + Identifier(a) [19-20] [a] + BracedConstructorFieldValue [21-36] [{ b: 1 }] + BracedConstructor [21-36] [{ b: 1 }] + BracedConstructorField [27-31] [b: 1] + Identifier(b) [27-28] [b] + BracedConstructorFieldValue [28-31] [: 1] + IntLiteral(1) [30-31] [1] + BracedConstructorField [40-57] [c: { d: 1 }] + Identifier(c) [40-41] [c] + BracedConstructorFieldValue [41-57] [: { d: 1 }] + BracedConstructor [43-57] [{ d: 1 }] + BracedConstructorField [48-52] [d: 1] + Identifier(d) [48-49] [d] + BracedConstructorFieldValue [49-52] [: 1] + IntLiteral(1) [51-52] [1] +-- +SELECT + STRUCT { a { b : 1 }, c : { d : 1 } } +== + +# Nested message (with colon) and array field. +SELECT STRUCT { + foo: { + monkey: "blah" + } + bar: 3 + int_array: [1,2,3] +} +-- +QueryStatement [0-79] [SELECT STRUCT...[1,2,3] }] + Query [0-79] [SELECT STRUCT...[1,2,3] }] + Select [0-79] [SELECT STRUCT...[1,2,3] }] + SelectList [7-79] [STRUCT {...[1,2,3] }] + SelectColumn [7-79] [STRUCT {...[1,2,3] }] + StructBracedConstructor [7-79] [STRUCT {...[1,2,3] }] + BracedConstructor [14-79] [{ foo: {...[1,2,3] }] + BracedConstructorField [18-47] [foo: { monkey: "blah" }] + Identifier(foo) [18-21] [foo] + BracedConstructorFieldValue [21-47] [: { monkey: "blah" }] + BracedConstructor [23-47] [{ monkey: "blah" }] + BracedConstructorField [29-43] [monkey: "blah"] + Identifier(monkey) [29-35] [monkey] + BracedConstructorFieldValue [35-43] [: "blah"] + StringLiteral [37-43] ["blah"] + StringLiteralComponent("blah") [37-43] ["blah"] + BracedConstructorField [50-56] [bar: 3] + Identifier(bar) [50-53] [bar] + BracedConstructorFieldValue [53-56] [: 3] + IntLiteral(3) [55-56] [3] + BracedConstructorField [59-77] [int_array: [1,2,3]] + Identifier(int_array) [59-68] [int_array] + BracedConstructorFieldValue [68-77] [: [1,2,3]] + ArrayConstructor [70-77] [[1,2,3]] + IntLiteral(1) [71-72] [1] + IntLiteral(2) [73-74] [2] + IntLiteral(3) [75-76] [3] +-- +SELECT + STRUCT { foo : { monkey : "blah" } bar : 3 int_array : ARRAY[1, 2, 3] } +== + +# Sub-message array. +SELECT STRUCT { + int_field: 1 + submessage_array: [{ + monkey: "blah" + }, { + baz: "abc" + }] +} +-- +QueryStatement [0-101] [SELECT STRUCT...abc" }] }] + Query [0-101] [SELECT STRUCT...abc" }] }] + Select [0-101] [SELECT STRUCT...abc" }] }] + SelectList [7-101] [STRUCT {...abc" }] }] + SelectColumn [7-101] [STRUCT {...abc" }] }] + StructBracedConstructor [7-101] [STRUCT {...abc" }] }] + BracedConstructor [14-101] [{ int_field...abc" }] }] + BracedConstructorField [18-30] [int_field: 1] + Identifier(int_field) [18-27] [int_field] + BracedConstructorFieldValue [27-30] [: 1] + IntLiteral(1) [29-30] [1] + BracedConstructorField [33-99] [submessage_array..."abc" }]] + Identifier(submessage_array) [33-49] [submessage_array] + BracedConstructorFieldValue [49-99] [: [{ monkey..."abc" }]] + ArrayConstructor [51-99] [[{ monkey..."abc" }]] + BracedConstructor [52-76] [{ monkey: "blah" }] + BracedConstructorField [58-72] [monkey: "blah"] + Identifier(monkey) [58-64] [monkey] + BracedConstructorFieldValue [64-72] [: "blah"] + StringLiteral [66-72] ["blah"] + StringLiteralComponent("blah") [66-72] ["blah"] + BracedConstructor [78-98] [{ baz: "abc" }] + BracedConstructorField [84-94] [baz: "abc"] + Identifier(baz) [84-87] [baz] + BracedConstructorFieldValue [87-94] [: "abc"] + StringLiteral [89-94] ["abc"] + StringLiteralComponent("abc") [89-94] ["abc"] +-- +SELECT + STRUCT { int_field : 1 submessage_array : ARRAY[{ monkey : "blah" }, { baz : "abc" }] } +== + +# At parse-time map fields are just like repeated sub-message fields. +SELECT STRUCT { + int_field: 1 + map_field: [{ + key: "blah" + value: 1 + }, { + key: "abc" + value: 2 + }] +} +-- +QueryStatement [0-117] [SELECT STRUCT...: 2 }] }] + Query [0-117] [SELECT STRUCT...: 2 }] }] + Select [0-117] [SELECT STRUCT...: 2 }] }] + SelectList [7-117] [STRUCT {...: 2 }] }] + SelectColumn [7-117] [STRUCT {...: 2 }] }] + StructBracedConstructor [7-117] [STRUCT {...: 2 }] }] + BracedConstructor [14-117] [{ int_field...: 2 }] }] + BracedConstructorField [18-30] [int_field: 1] + Identifier(int_field) [18-27] [int_field] + BracedConstructorFieldValue [27-30] [: 1] + IntLiteral(1) [29-30] [1] + BracedConstructorField [33-115] [map_field:...value: 2 }]] + Identifier(map_field) [33-42] [map_field] + BracedConstructorFieldValue [42-115] [: [{ key...value: 2 }]] + ArrayConstructor [44-115] [[{ key...value: 2 }]] + BracedConstructor [45-79] [{ key:...value: 1 }] + BracedConstructorField [51-62] [key: "blah"] + Identifier(key) [51-54] [key] + BracedConstructorFieldValue [54-62] [: "blah"] + StringLiteral [56-62] ["blah"] + StringLiteralComponent("blah") [56-62] ["blah"] + BracedConstructorField [67-75] [value: 1] + Identifier(value) [67-72] [value] + BracedConstructorFieldValue [72-75] [: 1] + IntLiteral(1) [74-75] [1] + BracedConstructor [81-114] [{ key:...value: 2 }] + BracedConstructorField [87-97] [key: "abc"] + Identifier(key) [87-90] [key] + BracedConstructorFieldValue [90-97] [: "abc"] + StringLiteral [92-97] ["abc"] + StringLiteralComponent("abc") [92-97] ["abc"] + BracedConstructorField [102-110] [value: 2] + Identifier(value) [102-107] [value] + BracedConstructorFieldValue [107-110] [: 2] + IntLiteral(2) [109-110] [2] +-- +SELECT + STRUCT { int_field : 1 map_field : ARRAY[{ key : "blah" value : 1 }, { key : "abc" value : 2 }] } +== + +# Nested STRUCT constructor +SELECT STRUCT { + foo: STRUCT { + monkey: "blah" + } + bar: 3 +} +-- +QueryStatement [0-65] [SELECT STRUCT...bar: 3 }] + Query [0-65] [SELECT STRUCT...bar: 3 }] + Select [0-65] [SELECT STRUCT...bar: 3 }] + SelectList [7-65] [STRUCT {...bar: 3 }] + SelectColumn [7-65] [STRUCT {...bar: 3 }] + StructBracedConstructor [7-65] [STRUCT {...bar: 3 }] + BracedConstructor [14-65] [{ foo: STRUCT...bar: 3 }] + BracedConstructorField [18-54] [foo: STRUCT..."blah" }] + Identifier(foo) [18-21] [foo] + BracedConstructorFieldValue [21-54] [: STRUCT {..."blah" }] + StructBracedConstructor [23-54] [STRUCT {..."blah" }] + BracedConstructor [30-54] [{ monkey: "blah" }] + BracedConstructorField [36-50] [monkey: "blah"] + Identifier(monkey) [36-42] [monkey] + BracedConstructorFieldValue [42-50] [: "blah"] + StringLiteral [44-50] ["blah"] + StringLiteralComponent("blah") [44-50] ["blah"] + BracedConstructorField [57-63] [bar: 3] + Identifier(bar) [57-60] [bar] + BracedConstructorFieldValue [60-63] [: 3] + IntLiteral(3) [62-63] [3] +-- +SELECT + STRUCT { foo : STRUCT { monkey : "blah" } bar : 3 } +== + +SELECT STRUCT { + foo: (SELECT t.* FROM t WHERE t.a = 1) +} +-- +QueryStatement [0-58] [SELECT STRUCT...t.a = 1) }] + Query [0-58] [SELECT STRUCT...t.a = 1) }] + Select [0-58] [SELECT STRUCT...t.a = 1) }] + SelectList [7-58] [STRUCT {...t.a = 1) }] + SelectColumn [7-58] [STRUCT {...t.a = 1) }] + StructBracedConstructor [7-58] [STRUCT {...t.a = 1) }] + BracedConstructor [14-58] [{ foo: (...t.a = 1) }] + BracedConstructorField [18-56] [foo: (SELECT...WHERE t.a = 1)] + Identifier(foo) [18-21] [foo] + BracedConstructorFieldValue [21-56] [: (SELECT...WHERE t.a = 1)] + ExpressionSubquery [23-56] [(SELECT t....WHERE t.a = 1)] + Query [24-55] [SELECT t.*...WHERE t.a = 1] + Select [24-55] [SELECT t.*...WHERE t.a = 1] + SelectList [31-34] [t.*] + SelectColumn [31-34] [t.*] + DotStar [31-34] [t.*] + PathExpression [31-32] [t] + Identifier(t) [31-32] [t] + FromClause [35-41] [FROM t] + TablePathExpression [40-41] [t] + PathExpression [40-41] [t] + Identifier(t) [40-41] [t] + WhereClause [42-55] [WHERE t.a = 1] + BinaryExpression(=) [48-55] [t.a = 1] + PathExpression [48-51] [t.a] + Identifier(t) [48-49] [t] + Identifier(a) [50-51] [a] + IntLiteral(1) [54-55] [1] +-- +SELECT + STRUCT { foo : ( + SELECT + t.* + FROM + t + WHERE + t.a = 1 + ) } +== + +SELECT STRUCT { + foo: 3 + 5 +} +-- +QueryStatement [0-30] [SELECT STRUCT { foo: 3 + 5 }] + Query [0-30] [SELECT STRUCT { foo: 3 + 5 }] + Select [0-30] [SELECT STRUCT { foo: 3 + 5 }] + SelectList [7-30] [STRUCT { foo: 3 + 5 }] + SelectColumn [7-30] [STRUCT { foo: 3 + 5 }] + StructBracedConstructor [7-30] [STRUCT { foo: 3 + 5 }] + BracedConstructor [14-30] [{ foo: 3 + 5 }] + BracedConstructorField [18-28] [foo: 3 + 5] + Identifier(foo) [18-21] [foo] + BracedConstructorFieldValue [21-28] [: 3 + 5] + BinaryExpression(+) [23-28] [3 + 5] + IntLiteral(3) [23-24] [3] + IntLiteral(5) [27-28] [5] +-- +SELECT + STRUCT { foo : 3 + 5 } +== + +# Error syntax to use extension. +# Errors will be reported at the resolver stage. +SELECT STRUCT { + (path.to.extension) { + value: 1 + } +} +-- +QueryStatement [0-58] [SELECT STRUCT...value: 1 } }] + Query [0-58] [SELECT STRUCT...value: 1 } }] + Select [0-58] [SELECT STRUCT...value: 1 } }] + SelectList [7-58] [STRUCT {...value: 1 } }] + SelectColumn [7-58] [STRUCT {...value: 1 } }] + StructBracedConstructor [7-58] [STRUCT {...value: 1 } }] + BracedConstructor [14-58] [{ (path....value: 1 } }] + BracedConstructorField [18-56] [(path.to.extensio...lue: 1 }] + PathExpression [19-36] [path.to.extension] + Identifier(path) [19-23] [path] + Identifier(`to`) [24-26] [to] + Identifier(extension) [27-36] [extension] + BracedConstructorFieldValue [38-56] [{ value: 1 }] + BracedConstructor [38-56] [{ value: 1 }] + BracedConstructorField [44-52] [value: 1] + Identifier(value) [44-49] [value] + BracedConstructorFieldValue [49-52] [: 1] + IntLiteral(1) [51-52] [1] +-- +SELECT + STRUCT {(path.`to`.extension) { value : 1 } } +== + +# Function expression. +SELECT STRUCT { + foo: CONCAT("foo", "bar"), +} +-- +QueryStatement [0-46] [SELECT STRUCT..."bar"), }] + Query [0-46] [SELECT STRUCT..."bar"), }] + Select [0-46] [SELECT STRUCT..."bar"), }] + SelectList [7-46] [STRUCT {..."bar"), }] + SelectColumn [7-46] [STRUCT {..."bar"), }] + StructBracedConstructor [7-46] [STRUCT {..."bar"), }] + BracedConstructor [14-46] [{ foo: CONCAT..."bar"), }] + BracedConstructorField [18-43] [foo: CONCAT("foo", "bar")] + Identifier(foo) [18-21] [foo] + BracedConstructorFieldValue [21-43] [: CONCAT("foo", "bar")] + FunctionCall [23-43] [CONCAT("foo", "bar")] + PathExpression [23-29] [CONCAT] + Identifier(CONCAT) [23-29] [CONCAT] + StringLiteral [30-35] ["foo"] + StringLiteralComponent("foo") [30-35] ["foo"] + StringLiteral [37-42] ["bar"] + StringLiteralComponent("bar") [37-42] ["bar"] +-- +SELECT + STRUCT { foo : CONCAT("foo", "bar") } +== + +# Aggregation expression. +SELECT STRUCT { + foo: (SELECT count(*) FROM table_foo) +} +-- +QueryStatement [0-57] [SELECT STRUCT...table_foo) }] + Query [0-57] [SELECT STRUCT...table_foo) }] + Select [0-57] [SELECT STRUCT...table_foo) }] + SelectList [7-57] [STRUCT {...table_foo) }] + SelectColumn [7-57] [STRUCT {...table_foo) }] + StructBracedConstructor [7-57] [STRUCT {...table_foo) }] + BracedConstructor [14-57] [{ foo: (...table_foo) }] + BracedConstructorField [18-55] [foo: (SELECT...table_foo)] + Identifier(foo) [18-21] [foo] + BracedConstructorFieldValue [21-55] [: (SELECT...table_foo)] + ExpressionSubquery [23-55] [(SELECT count...table_foo)] + Query [24-54] [SELECT count(*) FROM table_foo] + Select [24-54] [SELECT count(*) FROM table_foo] + SelectList [31-39] [count(*)] + SelectColumn [31-39] [count(*)] + FunctionCall [31-39] [count(*)] + PathExpression [31-36] [count] + Identifier(count) [31-36] [count] + Star(*) [37-38] [*] + FromClause [40-54] [FROM table_foo] + TablePathExpression [45-54] [table_foo] + PathExpression [45-54] [table_foo] + Identifier(table_foo) [45-54] [table_foo] +-- +SELECT + STRUCT { foo : ( + SELECT + count(*) + FROM + table_foo + ) } +== + +# No trailing "}". +SELECT STRUCT {foobar: 5 +-- +ERROR: Syntax error: Unexpected end of statement [at 1:25] +SELECT STRUCT {foobar: 5 + ^ +== + +# ARRAY keyword not allowed. +SELECT STRUCT ARRAY{} +-- +ERROR: Syntax error: Expected "(" or "<" or "{" but got keyword ARRAY [at 1:15] +SELECT STRUCT ARRAY{} + ^ +== + +# NEW keyword not allowed. +SELECT NEW STRUCT {foo:1} +-- +ERROR: Syntax error: Unexpected keyword STRUCT [at 1:12] +SELECT NEW STRUCT {foo:1} + ^ +== + +# No field name. +SELECT STRUCT {1} +-- +ERROR: Syntax error: Unexpected integer literal "1" [at 1:16] +SELECT STRUCT {1} + ^ +== + +# No field value. +SELECT STRUCT {foobar} +-- +ERROR: Syntax error: Expected ":" or "{" but got "}" [at 1:22] +SELECT STRUCT {foobar} + ^ +== + +# b/262795394 - Value starting with label token keyword 'FOR' parses correctly. +SELECT STRUCT { field: FORK() } +-- +QueryStatement [0-31] [SELECT STRUCT...: FORK() }] + Query [0-31] [SELECT STRUCT...: FORK() }] + Select [0-31] [SELECT STRUCT...: FORK() }] + SelectList [7-31] [STRUCT { field: FORK() }] + SelectColumn [7-31] [STRUCT { field: FORK() }] + StructBracedConstructor [7-31] [STRUCT { field: FORK() }] + BracedConstructor [14-31] [{ field: FORK() }] + BracedConstructorField [16-29] [field: FORK()] + Identifier(field) [16-21] [field] + BracedConstructorFieldValue [21-29] [: FORK()] + FunctionCall [23-29] [FORK()] + PathExpression [23-27] [FORK] + Identifier(FORK) [23-27] [FORK] +-- +SELECT + STRUCT { field : FORK() } +== + +# Explicit type +SELECT STRUCT {a: 1} +-- +QueryStatement [0-29] [SELECT STRUCT {a: 1}] + Query [0-29] [SELECT STRUCT {a: 1}] + Select [0-29] [SELECT STRUCT {a: 1}] + SelectList [7-29] [STRUCT {a: 1}] + SelectColumn [7-29] [STRUCT {a: 1}] + StructBracedConstructor [7-29] [STRUCT {a: 1}] + StructType [7-22] [STRUCT] + StructField [14-21] [a INT64] + Identifier(a) [14-15] [a] + SimpleType [16-21] [INT64] + PathExpression [16-21] [INT64] + Identifier(INT64) [16-21] [INT64] + BracedConstructor [23-29] [{a: 1}] + BracedConstructorField [24-28] [a: 1] + Identifier(a) [24-25] [a] + BracedConstructorFieldValue [25-28] [: 1] + IntLiteral(1) [27-28] [1] +-- +SELECT + STRUCT< a INT64 > { a : 1 } diff --git a/zetasql/parser/testdata/time_travel.test b/zetasql/parser/testdata/time_travel.test index 3aa49d05d..3554ba0bd 100644 --- a/zetasql/parser/testdata/time_travel.test +++ b/zetasql/parser/testdata/time_travel.test @@ -12,7 +12,8 @@ QueryStatement [0-69] [SELECT * FROM...12:20:20'] Identifier(t) [14-15] [t] ForSystemTime [16-69] [FOR SYSTEM_TIME...12:20:20'] DateOrTimeLiteral(TYPE_TIMESTAMP) [38-69] [TIMESTAMP...12:20:20'] - StringLiteral('2011-01-01 12:20:20') [48-69] ['2011-01-01 12:20:20'] + StringLiteral [48-69] ['2011-01-01 12:20:20'] + StringLiteralComponent('2011-01-01 12:20:20') [48-69] ['2011-01-01 12:20:20'] -- SELECT * @@ -35,7 +36,8 @@ QueryStatement [0-69] [SELECT * FROM...12:20:20'] Identifier(t) [14-15] [t] ForSystemTime [16-69] [FOR SYSTEM...12:20:20'] DateOrTimeLiteral(TYPE_TIMESTAMP) [38-69] [TIMESTAMP...12:20:20'] - StringLiteral('2011-01-01 12:20:20') [48-69] ['2011-01-01 12:20:20'] + StringLiteral [48-69] ['2011-01-01 12:20:20'] + StringLiteralComponent('2011-01-01 12:20:20') [48-69] ['2011-01-01 12:20:20'] -- SELECT * diff --git a/zetasql/parser/testdata/tvf.test b/zetasql/parser/testdata/tvf.test index e9ac82896..7a9028e25 100644 --- a/zetasql/parser/testdata/tvf.test +++ b/zetasql/parser/testdata/tvf.test @@ -2004,7 +2004,8 @@ QueryStatement [0-79] [select * from..., [true]))] TVFArgument [61-62] [1] IntLiteral(1) [61-62] [1] TVFArgument [64-69] ["abc"] - StringLiteral("abc") [64-69] ["abc"] + StringLiteral [64-69] ["abc"] + StringLiteralComponent("abc") [64-69] ["abc"] TVFArgument [71-77] [[true]] ArrayConstructor [71-77] [[true]] BooleanLiteral(true) [72-76] [true] diff --git a/zetasql/parser/testdata/unpivot.test b/zetasql/parser/testdata/unpivot.test index 7e67d0a0c..c8bf482fc 100644 --- a/zetasql/parser/testdata/unpivot.test +++ b/zetasql/parser/testdata/unpivot.test @@ -237,19 +237,22 @@ QueryStatement [0-67] [SELECT * FROM...', y "3"))] PathExpression [41-42] [w] Identifier(w) [41-42] [w] UnpivotInItemLabel [43-49] [AS '1'] - StringLiteral('1') [46-49] ['1'] + StringLiteral [46-49] ['1'] + StringLiteralComponent('1') [46-49] ['1'] UnpivotInItem [51-58] [(x) '2'] PathExpressionList [52-53] [x] PathExpression [52-53] [x] Identifier(x) [52-53] [x] UnpivotInItemLabel [54-58] ['2'] - StringLiteral('2') [55-58] ['2'] + StringLiteral [55-58] ['2'] + StringLiteralComponent('2') [55-58] ['2'] UnpivotInItem [60-65] [y "3"] PathExpressionList [60-61] [y] PathExpression [60-61] [y] Identifier(y) [60-61] [y] UnpivotInItemLabel [61-65] ["3"] - StringLiteral("3") [62-65] ["3"] + StringLiteral [62-65] ["3"] + StringLiteralComponent("3") [62-65] ["3"] -- SELECT * @@ -795,7 +798,8 @@ QueryStatement [0-90] [SELECT * FROM...b in (c))] PathExpression [29-30] [t] Identifier(t) [29-30] [t] ForSystemTime [31-65] [FOR SYSTEM...2018-01-01'] - StringLiteral('2018-01-01') [53-65] ['2018-01-01'] + StringLiteral [53-65] ['2018-01-01'] + StringLiteralComponent('2018-01-01') [53-65] ['2018-01-01'] UnpivotClause [67-90] [UNPIVOT(a for b in (c))] PathExpressionList [75-76] [a] PathExpression [75-76] [a] diff --git a/zetasql/parser/testdata/variable_declarations.test b/zetasql/parser/testdata/variable_declarations.test index 98e2ab6e6..c02345202 100644 --- a/zetasql/parser/testdata/variable_declarations.test +++ b/zetasql/parser/testdata/variable_declarations.test @@ -4,8 +4,8 @@ DECLARE x STRING{{;|}} -- ALTERNATION GROUP: ; -- -Script [0-18] [DECLARE x STRING;] - StatementList [0-18] [DECLARE x STRING;] +Script [0-17] [DECLARE x STRING;] + StatementList [0-17] [DECLARE x STRING;] VariableDeclaration [0-16] [DECLARE x STRING] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -35,8 +35,8 @@ DECLARE y INT32{{;|}} -- ALTERNATION GROUP: ; -- -Script [0-35] [DECLARE x...DECLARE y INT32;] - StatementList [0-35] [DECLARE x...DECLARE y INT32;] +Script [0-34] [DECLARE x...DECLARE y INT32;] + StatementList [0-34] [DECLARE x...DECLARE y INT32;] VariableDeclaration [0-16] [DECLARE x STRING] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -77,8 +77,8 @@ DECLARE y INT32 ; # Declaration of multiple variables in one statement. DECLARE x, y INT32; -- -Script [0-20] [DECLARE x, y INT32;] - StatementList [0-20] [DECLARE x, y INT32;] +Script [0-19] [DECLARE x, y INT32;] + StatementList [0-19] [DECLARE x, y INT32;] VariableDeclaration [0-18] [DECLARE x, y INT32] IdentifierList [8-12] [x, y] Identifier(x) [8-9] [x] @@ -93,8 +93,8 @@ DECLARE x, y INT32 ; # Single variable with default value DECLARE x INT32 DEFAULT 5; -- -Script [0-27] [DECLARE x INT32 DEFAULT 5;] - StatementList [0-27] [DECLARE x INT32 DEFAULT 5;] +Script [0-26] [DECLARE x INT32 DEFAULT 5;] + StatementList [0-26] [DECLARE x INT32 DEFAULT 5;] VariableDeclaration [0-25] [DECLARE x INT32 DEFAULT 5] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -110,8 +110,8 @@ DECLARE x INT32 DEFAULT 5 ; DECLARE x INT32 DEFAULT 5; DECLARE y INT32 DEFAULT x + 23; -- -Script [0-59] [DECLARE x...DEFAULT x + 23;] - StatementList [0-59] [DECLARE x...DEFAULT x + 23;] +Script [0-58] [DECLARE x...DEFAULT x + 23;] + StatementList [0-58] [DECLARE x...DEFAULT x + 23;] VariableDeclaration [0-25] [DECLARE x INT32 DEFAULT 5] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -138,8 +138,8 @@ DECLARE y INT32 DEFAULT x + 23 ; DECLARE x INT32; SELECT x; -- -Script [0-27] [DECLARE x INT32; SELECT x;] - StatementList [0-27] [DECLARE x INT32; SELECT x;] +Script [0-26] [DECLARE x INT32; SELECT x;] + StatementList [0-26] [DECLARE x INT32; SELECT x;] VariableDeclaration [0-15] [DECLARE x INT32] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -169,7 +169,7 @@ END Script [0-40] [BEGIN DECLARE...ELECT x; END] StatementList [0-40] [BEGIN DECLARE...ELECT x; END] BeginEndBlock [0-40] [BEGIN DECLARE...ELECT x; END] - StatementList [8-37] [DECLARE x INT32; SELECT x;] + StatementList [8-36] [DECLARE x INT32; SELECT x;] VariableDeclaration [8-23] [DECLARE x INT32] IdentifierList [16-17] [x] Identifier(x) [16-17] [x] @@ -219,8 +219,8 @@ FROM # 'DECLARE' used both as a keyword and identifier in the same statement. DECLARE declare declare; -- -Script [0-25] [DECLARE declare declare;] - StatementList [0-25] [DECLARE declare declare;] +Script [0-24] [DECLARE declare declare;] + StatementList [0-24] [DECLARE declare declare;] VariableDeclaration [0-23] [DECLARE declare declare] IdentifierList [8-15] [declare] Identifier(declare) [8-15] [declare] @@ -233,8 +233,8 @@ DECLARE declare declare ; DECLARE declare declare DEFAULT declare; -- -Script [0-41] [DECLARE declare...ULT declare;] - StatementList [0-41] [DECLARE declare...ULT declare;] +Script [0-40] [DECLARE declare...ULT declare;] + StatementList [0-40] [DECLARE declare...ULT declare;] VariableDeclaration [0-39] [DECLARE declare...AULT declare] IdentifierList [8-15] [declare] Identifier(declare) [8-15] [declare] @@ -249,8 +249,8 @@ DECLARE declare declare DEFAULT declare ; DECLARE x INT64 DEFAULT IF(a,b,c); -- -Script [0-35] [DECLARE x...IF(a,b,c);] - StatementList [0-35] [DECLARE x...IF(a,b,c);] +Script [0-34] [DECLARE x...IF(a,b,c);] + StatementList [0-34] [DECLARE x...IF(a,b,c);] VariableDeclaration [0-33] [DECLARE x...IF(a,b,c)] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -272,8 +272,8 @@ DECLARE x INT64 DEFAULT `IF`(a, b, c) ; DECLARE x INT64 DEFAULT (SELECT MIN(a) FROM t); -- -Script [0-48] [DECLARE x...) FROM t);] - StatementList [0-48] [DECLARE x...) FROM t);] +Script [0-47] [DECLARE x...) FROM t);] + StatementList [0-47] [DECLARE x...) FROM t);] VariableDeclaration [0-46] [DECLARE x...a) FROM t)] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -332,11 +332,11 @@ IF true THEN DECLARE x INT32; END IF; -- -Script [0-40] [IF true THEN...INT32; END IF;] - StatementList [0-40] [IF true THEN...INT32; END IF;] +Script [0-39] [IF true THEN...INT32; END IF;] + StatementList [0-39] [IF true THEN...INT32; END IF;] IfStatement [0-38] [IF true THEN...INT32; END IF] BooleanLiteral(true) [3-7] [true] - StatementList [15-32] [DECLARE x INT32;] + StatementList [15-31] [DECLARE x INT32;] VariableDeclaration [15-30] [DECLARE x INT32] IdentifierList [23-24] [x] Identifier(x) [23-24] [x] @@ -353,12 +353,12 @@ IF declare then DECLARE x INT64; END IF; -- -Script [0-43] [IF declare...INT64; END IF;] - StatementList [0-43] [IF declare...INT64; END IF;] +Script [0-42] [IF declare...INT64; END IF;] + StatementList [0-42] [IF declare...INT64; END IF;] IfStatement [0-41] [IF declare...INT64; END IF] PathExpression [3-10] [declare] Identifier(declare) [3-10] [declare] - StatementList [18-35] [DECLARE x INT64;] + StatementList [18-34] [DECLARE x INT64;] VariableDeclaration [18-33] [DECLARE x INT64] IdentifierList [26-27] [x] Identifier(x) [26-27] [x] @@ -375,8 +375,8 @@ END IF ; SELECT 5; DECLARE x INT32; -- -Script [0-27] [SELECT 5; DECLARE x INT32;] - StatementList [0-27] [SELECT 5; DECLARE x INT32;] +Script [0-26] [SELECT 5; DECLARE x INT32;] + StatementList [0-26] [SELECT 5; DECLARE x INT32;] QueryStatement [0-8] [SELECT 5] Query [0-8] [SELECT 5] Select [0-8] [SELECT 5] @@ -406,7 +406,7 @@ END Script [0-40] [BEGIN SELECT...INT32; END] StatementList [0-40] [BEGIN SELECT...INT32; END] BeginEndBlock [0-40] [BEGIN SELECT...INT32; END] - StatementList [8-37] [SELECT 5; DECLARE x INT32;] + StatementList [8-36] [SELECT 5; DECLARE x INT32;] QueryStatement [8-16] [SELECT 5] Query [8-16] [SELECT 5] Select [8-16] [SELECT 5] @@ -432,8 +432,8 @@ END # Error: Declaring the same variable twice in the same statement DECLARE x,x INT32; -- -Script [0-19] [DECLARE x,x INT32;] - StatementList [0-19] [DECLARE x,x INT32;] +Script [0-18] [DECLARE x,x INT32;] + StatementList [0-18] [DECLARE x,x INT32;] VariableDeclaration [0-17] [DECLARE x,x INT32] IdentifierList [8-11] [x,x] Identifier(x) [8-9] [x] @@ -449,8 +449,8 @@ DECLARE x, x INT32 ; # (variable names differ only in case). DECLARE X,x INT32; -- -Script [0-19] [DECLARE X,x INT32;] - StatementList [0-19] [DECLARE X,x INT32;] +Script [0-18] [DECLARE X,x INT32;] + StatementList [0-18] [DECLARE X,x INT32;] VariableDeclaration [0-17] [DECLARE X,x INT32] IdentifierList [8-11] [X,x] Identifier(X) [8-9] [X] @@ -466,8 +466,8 @@ DECLARE X, x INT32 ; DECLARE x INT32; DECLARE x INT64; -- -Script [0-34] [DECLARE x...DECLARE x INT64;] - StatementList [0-34] [DECLARE x...DECLARE x INT64;] +Script [0-33] [DECLARE x...DECLARE x INT64;] + StatementList [0-33] [DECLARE x...DECLARE x INT64;] VariableDeclaration [0-15] [DECLARE x INT32] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -490,8 +490,8 @@ DECLARE x INT64 ; DECLARE X INT32; DECLARE x INT64; -- -Script [0-34] [DECLARE X...DECLARE x INT64;] - StatementList [0-34] [DECLARE X...DECLARE x INT64;] +Script [0-33] [DECLARE X...DECLARE x INT64;] + StatementList [0-33] [DECLARE X...DECLARE x INT64;] VariableDeclaration [0-15] [DECLARE X INT32] IdentifierList [8-9] [X] Identifier(X) [8-9] [X] @@ -515,8 +515,8 @@ BEGIN DECLARE x INT64; END; -- -Script [0-47] [DECLARE x...INT64; END;] - StatementList [0-47] [DECLARE x...INT64; END;] +Script [0-46] [DECLARE x...INT64; END;] + StatementList [0-46] [DECLARE x...INT64; END;] VariableDeclaration [0-15] [DECLARE x INT32] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -524,7 +524,7 @@ Script [0-47] [DECLARE x...INT64; END;] PathExpression [10-15] [INT32] Identifier(INT32) [10-15] [INT32] BeginEndBlock [17-45] [BEGIN DECLARE x INT64; END] - StatementList [25-42] [DECLARE x INT64;] + StatementList [25-41] [DECLARE x INT64;] VariableDeclaration [25-40] [DECLARE x INT64] IdentifierList [33-34] [x] Identifier(x) [33-34] [x] @@ -546,8 +546,8 @@ BEGIN DECLARE x INT64; END; -- -Script [0-47] [DECLARE X...INT64; END;] - StatementList [0-47] [DECLARE X...INT64; END;] +Script [0-46] [DECLARE X...INT64; END;] + StatementList [0-46] [DECLARE X...INT64; END;] VariableDeclaration [0-15] [DECLARE X INT32] IdentifierList [8-9] [X] Identifier(X) [8-9] [X] @@ -555,7 +555,7 @@ Script [0-47] [DECLARE X...INT64; END;] PathExpression [10-15] [INT32] Identifier(INT32) [10-15] [INT32] BeginEndBlock [17-45] [BEGIN DECLARE x INT64; END] - StatementList [25-42] [DECLARE x INT64;] + StatementList [25-41] [DECLARE x INT64;] VariableDeclaration [25-40] [DECLARE x INT64] IdentifierList [33-34] [x] Identifier(x) [33-34] [x] @@ -578,8 +578,8 @@ BEGIN END; END; -- -Script [0-64] [DECLARE x...END; END;] - StatementList [0-64] [DECLARE x...END; END;] +Script [0-63] [DECLARE x...END; END;] + StatementList [0-63] [DECLARE x...END; END;] VariableDeclaration [0-15] [DECLARE x INT32] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -587,9 +587,9 @@ Script [0-64] [DECLARE x...END; END;] PathExpression [10-15] [INT32] Identifier(INT32) [10-15] [INT32] BeginEndBlock [17-62] [BEGIN BEGIN...END; END] - StatementList [25-59] [BEGIN...INT64; END;] + StatementList [25-58] [BEGIN...INT64; END;] BeginEndBlock [25-57] [BEGIN...INT64; END] - StatementList [35-54] [DECLARE x INT64;] + StatementList [35-51] [DECLARE x INT64;] VariableDeclaration [35-50] [DECLARE x INT64] IdentifierList [43-44] [x] Identifier(x) [43-44] [x] @@ -616,8 +616,8 @@ BEGIN END; END; -- -Script [0-64] [DECLARE X...END; END;] - StatementList [0-64] [DECLARE X...END; END;] +Script [0-63] [DECLARE X...END; END;] + StatementList [0-63] [DECLARE X...END; END;] VariableDeclaration [0-15] [DECLARE X INT32] IdentifierList [8-9] [X] Identifier(X) [8-9] [X] @@ -625,9 +625,9 @@ Script [0-64] [DECLARE X...END; END;] PathExpression [10-15] [INT32] Identifier(INT32) [10-15] [INT32] BeginEndBlock [17-62] [BEGIN BEGIN...END; END] - StatementList [25-59] [BEGIN...INT64; END;] + StatementList [25-58] [BEGIN...INT64; END;] BeginEndBlock [25-57] [BEGIN...INT64; END] - StatementList [35-54] [DECLARE x INT64;] + StatementList [35-51] [DECLARE x INT64;] VariableDeclaration [35-50] [DECLARE x INT64] IdentifierList [43-44] [x] Identifier(x) [43-44] [x] @@ -651,8 +651,8 @@ BEGIN DECLARE y INT64; END; -- -Script [0-47] [DECLARE x...INT64; END;] - StatementList [0-47] [DECLARE x...INT64; END;] +Script [0-46] [DECLARE x...INT64; END;] + StatementList [0-46] [DECLARE x...INT64; END;] VariableDeclaration [0-15] [DECLARE x INT32] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -660,7 +660,7 @@ Script [0-47] [DECLARE x...INT64; END;] PathExpression [10-15] [INT32] Identifier(INT32) [10-15] [INT32] BeginEndBlock [17-45] [BEGIN DECLARE y INT64; END] - StatementList [25-42] [DECLARE y INT64;] + StatementList [25-41] [DECLARE y INT64;] VariableDeclaration [25-40] [DECLARE y INT64] IdentifierList [33-34] [y] Identifier(y) [33-34] [y] @@ -683,10 +683,10 @@ BEGIN DECLARE x INT32; END; -- -Script [0-60] [BEGIN DECLARE...INT32; END;] - StatementList [0-60] [BEGIN DECLARE...INT32; END;] +Script [0-59] [BEGIN DECLARE...INT32; END;] + StatementList [0-59] [BEGIN DECLARE...INT32; END;] BeginEndBlock [0-28] [BEGIN DECLARE x INT32; END] - StatementList [8-25] [DECLARE x INT32;] + StatementList [8-24] [DECLARE x INT32;] VariableDeclaration [8-23] [DECLARE x INT32] IdentifierList [16-17] [x] Identifier(x) [16-17] [x] @@ -694,7 +694,7 @@ Script [0-60] [BEGIN DECLARE...INT32; END;] PathExpression [18-23] [INT32] Identifier(INT32) [18-23] [INT32] BeginEndBlock [30-58] [BEGIN DECLARE x INT32; END] - StatementList [38-55] [DECLARE x INT32;] + StatementList [38-54] [DECLARE x INT32;] VariableDeclaration [38-53] [DECLARE x INT32] IdentifierList [46-47] [x] Identifier(x) [46-47] [x] @@ -719,10 +719,10 @@ BEGIN END; DECLARE y INT64; -- -Script [0-47] [BEGIN DECLARE...ARE y INT64;] - StatementList [0-47] [BEGIN DECLARE...ARE y INT64;] +Script [0-46] [BEGIN DECLARE...ARE y INT64;] + StatementList [0-46] [BEGIN DECLARE...ARE y INT64;] BeginEndBlock [0-28] [BEGIN DECLARE x INT32; END] - StatementList [8-25] [DECLARE x INT32;] + StatementList [8-24] [DECLARE x INT32;] VariableDeclaration [8-23] [DECLARE x INT32] IdentifierList [16-17] [x] Identifier(x) [16-17] [x] @@ -754,8 +754,8 @@ DECLARE @@sysvar INT32; # Can default to a system variable. DECLARE x INT32 DEFAULT @@sysvar; -- -Script [0-34] [DECLARE x...@@sysvar;] - StatementList [0-34] [DECLARE x...@@sysvar;] +Script [0-33] [DECLARE x...@@sysvar;] + StatementList [0-33] [DECLARE x...@@sysvar;] VariableDeclaration [0-32] [DECLARE x...DEFAULT @@sysvar] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -773,8 +773,8 @@ DECLARE x INT32 DEFAULT @@sysvar ; # Automatically inferred variable type DECLARE x DEFAULT 5; -- -Script [0-21] [DECLARE x DEFAULT 5;] - StatementList [0-21] [DECLARE x DEFAULT 5;] +Script [0-20] [DECLARE x DEFAULT 5;] + StatementList [0-20] [DECLARE x DEFAULT 5;] VariableDeclaration [0-19] [DECLARE x DEFAULT 5] IdentifierList [8-9] [x] Identifier(x) [8-9] [x] @@ -786,8 +786,8 @@ DECLARE x DEFAULT 5 ; # Automatically inferred variable type, multiple variables DECLARE x, y, z DEFAULT 5; -- -Script [0-27] [DECLARE x, y, z DEFAULT 5;] - StatementList [0-27] [DECLARE x, y, z DEFAULT 5;] +Script [0-26] [DECLARE x, y, z DEFAULT 5;] + StatementList [0-26] [DECLARE x, y, z DEFAULT 5;] VariableDeclaration [0-25] [DECLARE x, y, z DEFAULT 5] IdentifierList [8-15] [x, y, z] Identifier(x) [8-9] [x] @@ -817,8 +817,8 @@ DECLARE x; # Expression as default value, inferred type. DECLARE y DEFAULT x + 1; -- -Script [0-25] [DECLARE y DEFAULT x + 1;] - StatementList [0-25] [DECLARE y DEFAULT x + 1;] +Script [0-24] [DECLARE y DEFAULT x + 1;] + StatementList [0-24] [DECLARE y DEFAULT x + 1;] VariableDeclaration [0-23] [DECLARE y DEFAULT x + 1] IdentifierList [8-9] [y] Identifier(y) [8-9] [y] @@ -833,8 +833,8 @@ DECLARE y DEFAULT x + 1 ; # Subquery as default value, inferred type. DECLARE y DEFAULT (SELECT x FROM MyTable); -- -Script [0-43] [DECLARE y...MyTable);] - StatementList [0-43] [DECLARE y...MyTable);] +Script [0-42] [DECLARE y...MyTable);] + StatementList [0-42] [DECLARE y...MyTable);] VariableDeclaration [0-41] [DECLARE y...FROM MyTable)] IdentifierList [8-9] [y] Identifier(y) [8-9] [y] @@ -862,8 +862,8 @@ DECLARE y DEFAULT( # Declare variable with parameterized type DECLARE y string(100); -- -Script [0-23] [DECLARE y string(100);] - StatementList [0-23] [DECLARE y string(100);] +Script [0-22] [DECLARE y string(100);] + StatementList [0-22] [DECLARE y string(100);] VariableDeclaration [0-21] [DECLARE y string(100)] IdentifierList [8-9] [y] Identifier(y) [8-9] [y] diff --git a/zetasql/parser/testdata/with.test b/zetasql/parser/testdata/with.test index 2505aae4a..77db37c1b 100644 --- a/zetasql/parser/testdata/with.test +++ b/zetasql/parser/testdata/with.test @@ -869,6 +869,386 @@ WITH q1 as (select 1 x) WITH q2 as (select 2 y) select * from q2 -- -ERROR: Syntax error: Unexpected keyword WITH [at 2:1] +ERROR: Syntax error: Expected keyword DEPTH but got identifier "q2" [at 2:6] WITH q2 as (select 2 y) -^ + ^ + +== + +# NOTE: even when the AS alias is unspecified, there would be column with +# default `depth` alias visible, just like UNNEST() WITH OFFSET with no alias. +with recursive test AS ( + select 1 as n + union all + select n+1 + from test +) with depth {{|as depth}} {{|max 10|between ? and @p}} +select n from test +-- +ALTERNATION GROUP: +-- +QueryStatement [0-111] [with recursive...from test] + Query [0-111] [with recursive...from test] + WithClause (recursive) [0-90] [with recursive...with depth] + AliasedQuery [15-90] [test AS (...with depth] + Identifier(test) [15-19] [test] + Query [27-77] [select 1 as...from test] + SetOperation(UNION ALL) [27-77] [select 1 as...from test] + SetOperationMetadataList [40-52] [union all] + SetOperationMetadata [40-52] [union all] + SetOperationType [43-48] [union] + SetOperationAllOrDistinct [49-52] [all] + Select [27-40] [select 1 as n] + SelectList [34-40] [1 as n] + SelectColumn [34-40] [1 as n] + IntLiteral(1) [34-35] [1] + Alias [36-40] [as n] + Identifier(n) [39-40] [n] + Select [55-77] [select n+1 from test] + SelectList [62-65] [n+1] + SelectColumn [62-65] [n+1] + BinaryExpression(+) [62-65] [n+1] + PathExpression [62-63] [n] + Identifier(n) [62-63] [n] + IntLiteral(1) [64-65] [1] + FromClause [68-77] [from test] + TablePathExpression [73-77] [test] + PathExpression [73-77] [test] + Identifier(test) [73-77] [test] + AliasedQueryModifiers [80-90] [with depth] + RecursionDepthModifier [80-90] [with depth] + IntOrUnbounded [90-90] [] + IntOrUnbounded [90-90] [] + Select [93-111] [select n from test] + SelectList [100-101] [n] + SelectColumn [100-101] [n] + PathExpression [100-101] [n] + Identifier(n) [100-101] [n] + FromClause [102-111] [from test] + TablePathExpression [107-111] [test] + PathExpression [107-111] [test] + Identifier(test) [107-111] [test] +-- +WITH RECURSIVE + test AS ( + SELECT + 1 AS n + UNION ALL + SELECT + n + 1 + FROM + test + ) WITH DEPTH BETWEEN UNBOUNDED AND UNBOUNDED +SELECT + n +FROM + test +-- +ALTERNATION GROUP: max 10 +-- +QueryStatement [0-117] [with recursive...from test] + Query [0-117] [with recursive...from test] + WithClause (recursive) [0-98] [with recursive...depth max 10] + AliasedQuery [15-98] [test AS (...depth max 10] + Identifier(test) [15-19] [test] + Query [27-77] [select 1 as...from test] + SetOperation(UNION ALL) [27-77] [select 1 as...from test] + SetOperationMetadataList [40-52] [union all] + SetOperationMetadata [40-52] [union all] + SetOperationType [43-48] [union] + SetOperationAllOrDistinct [49-52] [all] + Select [27-40] [select 1 as n] + SelectList [34-40] [1 as n] + SelectColumn [34-40] [1 as n] + IntLiteral(1) [34-35] [1] + Alias [36-40] [as n] + Identifier(n) [39-40] [n] + Select [55-77] [select n+1 from test] + SelectList [62-65] [n+1] + SelectColumn [62-65] [n+1] + BinaryExpression(+) [62-65] [n+1] + PathExpression [62-63] [n] + Identifier(n) [62-63] [n] + IntLiteral(1) [64-65] [1] + FromClause [68-77] [from test] + TablePathExpression [73-77] [test] + PathExpression [73-77] [test] + Identifier(test) [73-77] [test] + AliasedQueryModifiers [80-98] [with depth max 10] + RecursionDepthModifier [80-98] [with depth max 10] + IntOrUnbounded [90-90] [] + IntOrUnbounded [96-98] [10] + IntLiteral(10) [96-98] [10] + Select [99-117] [select n from test] + SelectList [106-107] [n] + SelectColumn [106-107] [n] + PathExpression [106-107] [n] + Identifier(n) [106-107] [n] + FromClause [108-117] [from test] + TablePathExpression [113-117] [test] + PathExpression [113-117] [test] + Identifier(test) [113-117] [test] +-- +WITH RECURSIVE + test AS ( + SELECT + 1 AS n + UNION ALL + SELECT + n + 1 + FROM + test + ) WITH DEPTH BETWEEN UNBOUNDED AND 10 +SELECT + n +FROM + test +-- +ALTERNATION GROUP: between ? and @p +-- +QueryStatement [0-127] [with recursive...from test] + Query [0-127] [with recursive...from test] + WithClause (recursive) [0-108] [with recursive...ween ? and @p] + AliasedQuery [15-108] [test AS (...between ? and @p] + Identifier(test) [15-19] [test] + Query [27-77] [select 1 as...from test] + SetOperation(UNION ALL) [27-77] [select 1 as...from test] + SetOperationMetadataList [40-52] [union all] + SetOperationMetadata [40-52] [union all] + SetOperationType [43-48] [union] + SetOperationAllOrDistinct [49-52] [all] + Select [27-40] [select 1 as n] + SelectList [34-40] [1 as n] + SelectColumn [34-40] [1 as n] + IntLiteral(1) [34-35] [1] + Alias [36-40] [as n] + Identifier(n) [39-40] [n] + Select [55-77] [select n+1 from test] + SelectList [62-65] [n+1] + SelectColumn [62-65] [n+1] + BinaryExpression(+) [62-65] [n+1] + PathExpression [62-63] [n] + Identifier(n) [62-63] [n] + IntLiteral(1) [64-65] [1] + FromClause [68-77] [from test] + TablePathExpression [73-77] [test] + PathExpression [73-77] [test] + Identifier(test) [73-77] [test] + AliasedQueryModifiers [80-108] [with depth between ? and @p] + RecursionDepthModifier [80-108] [with depth between ? and @p] + IntOrUnbounded [100-101] [?] + ParameterExpr(1) [100-101] [?] + IntOrUnbounded [106-108] [@p] + ParameterExpr [106-108] [@p] + Identifier(p) [107-108] [p] + Select [109-127] [select n from test] + SelectList [116-117] [n] + SelectColumn [116-117] [n] + PathExpression [116-117] [n] + Identifier(n) [116-117] [n] + FromClause [118-127] [from test] + TablePathExpression [123-127] [test] + PathExpression [123-127] [test] + Identifier(test) [123-127] [test] +-- +WITH RECURSIVE + test AS ( + SELECT + 1 AS n + UNION ALL + SELECT + n + 1 + FROM + test + ) WITH DEPTH BETWEEN ? AND @p +SELECT + n +FROM + test +-- +ALTERNATION GROUP: as depth, +-- +QueryStatement [0-119] [with recursive...from test] + Query [0-119] [with recursive...from test] + WithClause (recursive) [0-99] [with recursive...epth as depth] + AliasedQuery [15-99] [test AS (...depth as depth] + Identifier(test) [15-19] [test] + Query [27-77] [select 1 as...from test] + SetOperation(UNION ALL) [27-77] [select 1 as...from test] + SetOperationMetadataList [40-52] [union all] + SetOperationMetadata [40-52] [union all] + SetOperationType [43-48] [union] + SetOperationAllOrDistinct [49-52] [all] + Select [27-40] [select 1 as n] + SelectList [34-40] [1 as n] + SelectColumn [34-40] [1 as n] + IntLiteral(1) [34-35] [1] + Alias [36-40] [as n] + Identifier(n) [39-40] [n] + Select [55-77] [select n+1 from test] + SelectList [62-65] [n+1] + SelectColumn [62-65] [n+1] + BinaryExpression(+) [62-65] [n+1] + PathExpression [62-63] [n] + Identifier(n) [62-63] [n] + IntLiteral(1) [64-65] [1] + FromClause [68-77] [from test] + TablePathExpression [73-77] [test] + PathExpression [73-77] [test] + Identifier(test) [73-77] [test] + AliasedQueryModifiers [80-99] [with depth as depth] + RecursionDepthModifier [80-99] [with depth as depth] + Alias [91-99] [as depth] + Identifier(depth) [94-99] [depth] + IntOrUnbounded [99-99] [] + IntOrUnbounded [99-99] [] + Select [101-119] [select n from test] + SelectList [108-109] [n] + SelectColumn [108-109] [n] + PathExpression [108-109] [n] + Identifier(n) [108-109] [n] + FromClause [110-119] [from test] + TablePathExpression [115-119] [test] + PathExpression [115-119] [test] + Identifier(test) [115-119] [test] +-- +WITH RECURSIVE + test AS ( + SELECT + 1 AS n + UNION ALL + SELECT + n + 1 + FROM + test + ) WITH DEPTH AS depth BETWEEN UNBOUNDED AND UNBOUNDED +SELECT + n +FROM + test +-- +ALTERNATION GROUP: as depth,max 10 +-- +QueryStatement [0-125] [with recursive...from test] + Query [0-125] [with recursive...from test] + WithClause (recursive) [0-106] [with recursive...depth max 10] + AliasedQuery [15-106] [test AS (...depth max 10] + Identifier(test) [15-19] [test] + Query [27-77] [select 1 as...from test] + SetOperation(UNION ALL) [27-77] [select 1 as...from test] + SetOperationMetadataList [40-52] [union all] + SetOperationMetadata [40-52] [union all] + SetOperationType [43-48] [union] + SetOperationAllOrDistinct [49-52] [all] + Select [27-40] [select 1 as n] + SelectList [34-40] [1 as n] + SelectColumn [34-40] [1 as n] + IntLiteral(1) [34-35] [1] + Alias [36-40] [as n] + Identifier(n) [39-40] [n] + Select [55-77] [select n+1 from test] + SelectList [62-65] [n+1] + SelectColumn [62-65] [n+1] + BinaryExpression(+) [62-65] [n+1] + PathExpression [62-63] [n] + Identifier(n) [62-63] [n] + IntLiteral(1) [64-65] [1] + FromClause [68-77] [from test] + TablePathExpression [73-77] [test] + PathExpression [73-77] [test] + Identifier(test) [73-77] [test] + AliasedQueryModifiers [80-106] [with depth as depth max 10] + RecursionDepthModifier [80-106] [with depth as depth max 10] + Alias [91-99] [as depth] + Identifier(depth) [94-99] [depth] + IntOrUnbounded [99-99] [] + IntOrUnbounded [104-106] [10] + IntLiteral(10) [104-106] [10] + Select [107-125] [select n from test] + SelectList [114-115] [n] + SelectColumn [114-115] [n] + PathExpression [114-115] [n] + Identifier(n) [114-115] [n] + FromClause [116-125] [from test] + TablePathExpression [121-125] [test] + PathExpression [121-125] [test] + Identifier(test) [121-125] [test] +-- +WITH RECURSIVE + test AS ( + SELECT + 1 AS n + UNION ALL + SELECT + n + 1 + FROM + test + ) WITH DEPTH AS depth BETWEEN UNBOUNDED AND 10 +SELECT + n +FROM + test +-- +ALTERNATION GROUP: as depth,between ? and @p +-- +QueryStatement [0-135] [with recursive...from test] + Query [0-135] [with recursive...from test] + WithClause (recursive) [0-116] [with recursive...ween ? and @p] + AliasedQuery [15-116] [test AS (...between ? and @p] + Identifier(test) [15-19] [test] + Query [27-77] [select 1 as...from test] + SetOperation(UNION ALL) [27-77] [select 1 as...from test] + SetOperationMetadataList [40-52] [union all] + SetOperationMetadata [40-52] [union all] + SetOperationType [43-48] [union] + SetOperationAllOrDistinct [49-52] [all] + Select [27-40] [select 1 as n] + SelectList [34-40] [1 as n] + SelectColumn [34-40] [1 as n] + IntLiteral(1) [34-35] [1] + Alias [36-40] [as n] + Identifier(n) [39-40] [n] + Select [55-77] [select n+1 from test] + SelectList [62-65] [n+1] + SelectColumn [62-65] [n+1] + BinaryExpression(+) [62-65] [n+1] + PathExpression [62-63] [n] + Identifier(n) [62-63] [n] + IntLiteral(1) [64-65] [1] + FromClause [68-77] [from test] + TablePathExpression [73-77] [test] + PathExpression [73-77] [test] + Identifier(test) [73-77] [test] + AliasedQueryModifiers [80-116] [with depth...between ? and @p] + RecursionDepthModifier [80-116] [with depth...between ? and @p] + Alias [91-99] [as depth] + Identifier(depth) [94-99] [depth] + IntOrUnbounded [108-109] [?] + ParameterExpr(1) [108-109] [?] + IntOrUnbounded [114-116] [@p] + ParameterExpr [114-116] [@p] + Identifier(p) [115-116] [p] + Select [117-135] [select n from test] + SelectList [124-125] [n] + SelectColumn [124-125] [n] + PathExpression [124-125] [n] + Identifier(n) [124-125] [n] + FromClause [126-135] [from test] + TablePathExpression [131-135] [test] + PathExpression [131-135] [test] + Identifier(test) [131-135] [test] +-- +WITH RECURSIVE + test AS ( + SELECT + 1 AS n + UNION ALL + SELECT + n + 1 + FROM + test + ) WITH DEPTH AS depth BETWEEN ? AND @p +SELECT + n +FROM + test diff --git a/zetasql/parser/testdata/with_offset.test b/zetasql/parser/testdata/with_offset.test index ad00e8c79..e79c2e19b 100644 --- a/zetasql/parser/testdata/with_offset.test +++ b/zetasql/parser/testdata/with_offset.test @@ -159,7 +159,8 @@ QueryStatement [0-135] [select pos...key='Mykey'] BinaryExpression(=) [124-135] [key='Mykey'] PathExpression [124-127] [key] Identifier(key) [124-127] [key] - StringLiteral('Mykey') [128-135] ['Mykey'] + StringLiteral [128-135] ['Mykey'] + StringLiteralComponent('Mykey') [128-135] ['Mykey'] -- SELECT pos, @@ -275,7 +276,8 @@ QueryStatement [0-121] [select pos1...2007-01-02'] Alias [80-84] [pos2] Identifier(pos2) [80-84] [pos2] ForSystemTime [87-121] [FOR SYSTEM...2007-01-02'] - StringLiteral('2007-01-02') [109-121] ['2007-01-02'] + StringLiteral [109-121] ['2007-01-02'] + StringLiteralComponent('2007-01-02') [109-121] ['2007-01-02'] -- SELECT pos1, diff --git a/zetasql/parser/token_disambiguator.cc b/zetasql/parser/token_disambiguator.cc new file mode 100644 index 000000000..e0d933830 --- /dev/null +++ b/zetasql/parser/token_disambiguator.cc @@ -0,0 +1,466 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/parser/token_disambiguator.h" + +#include +#include +#include +#include + +#include "zetasql/base/arena.h" +#include "zetasql/common/errors.h" +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/bison_token_codes.h" +#include "zetasql/parser/flex_tokenizer.h" +#include "zetasql/parser/macros/flex_token_provider.h" +#include "zetasql/parser/macros/macro_catalog.h" +#include "zetasql/parser/macros/macro_expander.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/error_helpers.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/parse_location.h" +#include "zetasql/base/check.h" +#include "absl/log/log.h" +#include "absl/memory/memory.h" +#include "absl/status/status.h" +#include "absl/strings/string_view.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +// Implementation of the wrapper calls forward-declared in parser_internal.h. +// This workaround is to avoid creating an interface and incurring a v-table +// lookup on every token. +namespace parser_internal { +using zetasql::parser::BisonParserMode; +using zetasql::parser::DisambiguatorLexer; + +void SetForceTerminate(DisambiguatorLexer* disambiguator, int* end_offset) { + return disambiguator->SetForceTerminate(end_offset); +} +void PushBisonParserMode(DisambiguatorLexer* disambiguator, + BisonParserMode mode) { + return disambiguator->PushBisonParserMode(mode); +} +void PopBisonParserMode(DisambiguatorLexer* disambiguator) { + return disambiguator->PopBisonParserMode(); +} +int GetNextToken(DisambiguatorLexer* disambiguator, absl::string_view* text, + ParseLocationRange* location) { + return disambiguator->GetNextToken(text, location); +} +} // namespace parser_internal + +namespace parser { +// Include the helpful type aliases in the namespace within the C++ file so +// that they are useful for free helper functions as well as class member +// functions. +using Token = TokenKinds; +using TokenKind = int; +using Location = ParseLocationRange; +using TokenWithLocation = macros::TokenWithLocation; +using FlexTokenProvider = macros::FlexTokenProvider; +using MacroCatalog = macros::MacroCatalog; +using MacroExpander = macros::MacroExpander; +using MacroExpanderBase = macros::MacroExpanderBase; +using TokenWithLocation = macros::TokenWithLocation; + +static bool IsReservedKeywordToken(TokenKind token) { + // We need to add sentinels before and after each block of keywords to make + // this safe. + return token > Token::SENTINEL_RESERVED_KW_START && + token < Token::SENTINEL_RESERVED_KW_END; +} + +static bool IsNonreservedKeywordToken(TokenKind token) { + // We need to add sentinels before and after each block of keywords to make + // this safe. + return token > Token::SENTINEL_NONRESERVED_KW_START && + token < Token::SENTINEL_NONRESERVED_KW_END; +} + +static absl::Status MakeError(absl::string_view error_message, + const Location& yylloc) { + return MakeSqlErrorAtPoint(yylloc.start()) << error_message; +} + +static TokenKind DisambiguateModeStartToken(BisonParserMode mode) { + switch (mode) { + case BisonParserMode::kStatement: + return Token::MODE_STATEMENT; + case BisonParserMode::kScript: + return Token::MODE_SCRIPT; + case BisonParserMode::kNextStatement: + return Token::MODE_NEXT_STATEMENT; + case BisonParserMode::kNextScriptStatement: + return Token::MODE_NEXT_SCRIPT_STATEMENT; + case BisonParserMode::kNextStatementKind: + return Token::MODE_NEXT_STATEMENT_KIND; + case BisonParserMode::kExpression: + return Token::MODE_EXPRESSION; + case BisonParserMode::kType: + return Token::MODE_TYPE; + + case BisonParserMode::kTokenizer: + case BisonParserMode::kTokenizerPreserveComments: + case BisonParserMode::kMacroBody: + ABSL_LOG(FATAL) << "BisonParserMode: " << static_cast(mode) + << " should not disambiguate CUSTOM_START_TOKEN"; + return -1; + + default: + ABSL_LOG(FATAL) << "Unkonwn BisonParserMode: " << static_cast(mode); + return -1; + } +} + +// The token disambiguation rules are allowed to see a fixed-length sequence of +// tokens produced by the lexical rules in flex_tokenzer.h and may change the +// kind of `token` based on the kinds of the other tokens in the window. +// +// For now, the window available is: +// [token, Lookahead1()] +// +// `token` is the token that is about to be dispensed to the consuming +// component. +// `Lookahead1()` is the next token that will be disambiguated on the subsequent +// call to GetNextToken. +// +// USE WITH CAUTION: +// For any given sequence of tokens, there may be many different shift/reduce +// sequences in the parser that "accept" that token sequence. It's critical +// when adding a token disambiguation rule that all parts of the grammar that +// accept the sequence of tokens are identified to verify that changing the kind +// of `token` does not break any unanticipated cases where that sequence would +// currently be accepted. +TokenKind DisambiguatorLexer::ApplyTokenDisambiguation( + TokenKind token, const Location& location) { + switch (mode_) { + case BisonParserMode::kTokenizer: + case BisonParserMode::kTokenizerPreserveComments: + // Tokenizer modes are used to extract tokens for error messages among + // other things. The rules below are mostly intended to support the bison + // parser, and aren't necessary in tokenizer mode. + // For keywords that have context-dependent variations, return the + // "standard" one. + switch (token) { + case Token::KW_AND_FOR_BETWEEN: + return Token::KW_AND; + case Token::KW_FULL_IN_SET_OP: + return Token::KW_FULL; + case Token::KW_LEFT_IN_SET_OP: + return Token::KW_LEFT; + case Token::KW_DEFINE_FOR_MACROS: + return Token::KW_DEFINE; + case Token::KW_OPEN_HINT: + case Token::KW_OPEN_INTEGER_HINT: + return '@'; + default: + return token; + } + case BisonParserMode::kMacroBody: + switch (token) { + case ';': + case Token::YYEOF: + return token; + default: + return Token::MACRO_BODY_TOKEN; + } + default: + break; + } + + switch (token) { + case Token::CUSTOM_MODE_START: { + return DisambiguateModeStartToken(mode_); + } + case Token::KW_NOT: { + // This returns a different token because returning KW_NOT would confuse + // the operator precedence parsing. Boolean NOT has a different + // precedence than NOT BETWEEN/IN/LIKE/DISTINCT. + switch (Lookahead1(location)) { + case Token::KW_BETWEEN: + case Token::KW_IN: + case Token::KW_LIKE: + case Token::KW_DISTINCT: + return Token::KW_NOT_SPECIAL; + default: + break; + } + break; + } + case Token::KW_WITH: { + // The WITH expression uses a function-call like syntax and is followed by + // the open parenthesis. + if (Lookahead1(location) == '(') { + return Token::KW_WITH_STARTING_WITH_EXPRESSION; + } + break; + } + case Token::KW_EXCEPT: { + // EXCEPT is used in two locations of the language. And when the parser is + // exploding the rules it detects that two rules can be used for the same + // syntax. + // + // This rule generates a special token for an EXCEPT that is followed by a + // hint, ALL or DISTINCT which is distinctly the set operator use. + switch (Lookahead1(location)) { + case '(': + // This is the SELECT * EXCEPT (column...) case. + return Token::KW_EXCEPT; + case Token::KW_ALL: + case Token::KW_DISTINCT: + case Token::KW_OPEN_HINT: + case Token::KW_OPEN_INTEGER_HINT: + // This is the {query} EXCEPT {opt_hint} ALL|DISTINCT {query} case. + return Token::KW_EXCEPT_IN_SET_OP; + default: + return SetOverrideErrorAndReturnEof( + "EXCEPT must be followed by ALL, DISTINCT, or \"(\"", location); + } + break; + } + case Token::KW_FULL: { + // If FULL is used in set operations, return KW_FULL_IN_SET_OP instead. + TokenKind lookahead = Lookahead1(location) == Token::KW_OUTER + ? Lookahead2(location) + : Lookahead1(location); + switch (lookahead) { + case Token::KW_UNION: + case Token::KW_INTERSECT: + case Token::KW_EXCEPT: + return Token::KW_FULL_IN_SET_OP; + } + break; + } + case Token::KW_QUALIFY_NONRESERVED: { + if (IsReservedKeyword("QUALIFY")) { + return Token::KW_QUALIFY_RESERVED; + } + break; + } + default: { + break; + } + } + + return token; +} + +TokenKind DisambiguatorLexer::SetOverrideErrorAndReturnEof( + absl::string_view error_message, const Location& error_location) { + override_error_ = MakeError(error_message, error_location); + return Token::YYEOF; +} + +namespace { +class NoOpExpander : public MacroExpanderBase { + public: + explicit NoOpExpander(std::unique_ptr token_provider) + : token_provider_(std::move(token_provider)) {} + absl::StatusOr GetNextToken() override { + return token_provider_->ConsumeNextToken(); + } + int num_unexpanded_tokens_consumed() const override { + return token_provider_->num_consumed_tokens(); + } + + private: + std::unique_ptr token_provider_; +}; +} // namespace + +// Retrieve the next token, and split absl::StatusOr into +// separate variables for compatibility with Bison. If the status is not ok(), +// the token is set to YYEOF and location is not updated. +static void ConsumeNextToken(MacroExpanderBase& expander, + TokenWithLocation& token_with_location, + absl::Status& status) { + absl::StatusOr next_token = expander.GetNextToken(); + if (next_token.ok()) { + token_with_location = *next_token; + } else { + token_with_location.kind = Token::YYEOF; + status = std::move(next_token.status()); + } +} + +int DisambiguatorLexer::Lookahead1(const Location& current_token_location) { + if (lookahead_1_.has_value()) { + return lookahead_1_->kind; + } + + lookahead_1_ = {.kind = Token::YYEOF, + .location = current_token_location, + .text = "", + .preceding_whitespaces = ""}; + if (force_terminate_) { + return lookahead_1_->kind; + } + + ConsumeNextToken(*macro_expander_, *lookahead_1_, override_error_); + return lookahead_1_->kind; +} + +int DisambiguatorLexer::Lookahead2(const Location& current_token_location) { + if (lookahead_2_.has_value()) { + // If `force_terminate_` is true, the token kind stored in `lookahead_2_` + // has been overwritten as YYEOF. + return lookahead_2_->kind; + } + if (force_terminate_) { + return Token::YYEOF; + } + if (!lookahead_1_.has_value()) { + Lookahead1(current_token_location); + } + lookahead_2_ = {.kind = Token::YYEOF, + .location = lookahead_1_->location, + .text = "", + .preceding_whitespaces = ""}; + ConsumeNextToken(*macro_expander_, *lookahead_2_, override_error_); + return lookahead_2_->kind; +} + +// Returns the next token id, returning its location in 'yylloc'. On input, +// 'yylloc' must be the location of the previous token that was returned. +TokenKind DisambiguatorLexer::GetNextToken(absl::string_view* text, + Location* yylloc) { + if (lookahead_1_.has_value()) { + if (prev_token_.has_value()) { + ABSL_DCHECK_EQ(prev_token_->location, *yylloc); + } + // Get the next token from the lookahead buffer and advance the buffer. If + // force_terminate_ was set, we still need the location from the buffer, + // with Token::YYEOF as the token. Calling SetForceTerminate() should have + // updated the lookahead. + prev_token_.swap(lookahead_1_); + lookahead_1_.swap(lookahead_2_); + lookahead_2_.reset(); + } else if (force_terminate_) { + return Token::YYEOF; + } else { + // The lookahead buffer is empty, so get a token from the underlying lexer. + ConsumeNextToken(*macro_expander_, prev_token_.emplace(), override_error_); + } + *yylloc = prev_token_->location; + *text = prev_token_->text; + return ApplyTokenDisambiguation(prev_token_->kind, *yylloc); +} + +static const MacroCatalog* empty_macro_catalog() { + static MacroCatalog* empty_macro_catalog = new MacroCatalog(); + return empty_macro_catalog; +} + +absl::StatusOr> DisambiguatorLexer::Create( + BisonParserMode mode, absl::string_view filename, absl::string_view input, + int start_offset, const LanguageOptions& language_options, + const macros::MacroCatalog* macro_catalog, zetasql_base::UnsafeArena* arena) { + auto token_provider = std::make_unique( + mode, filename, input, start_offset, language_options); + + std::unique_ptr macro_expander; + if (language_options.LanguageFeatureEnabled(FEATURE_V_1_4_SQL_MACROS)) { + if (macro_catalog == nullptr) { + macro_catalog = empty_macro_catalog(); + } + macro_expander = std::make_unique( + std::move(token_provider), *macro_catalog, arena, ErrorMessageOptions{}, + /*parent_location=*/nullptr); + } else { + ZETASQL_RET_CHECK(macro_catalog == nullptr); + macro_expander = std::make_unique(std::move(token_provider)); + } + return absl::WrapUnique(new DisambiguatorLexer(mode, language_options, + std::move(macro_expander))); +} + +DisambiguatorLexer::DisambiguatorLexer( + BisonParserMode mode, const LanguageOptions& language_options, + std::unique_ptr expander) + : mode_(mode), + language_options_(language_options), + macro_expander_(std::move(expander)) {} + +int64_t DisambiguatorLexer::num_lexical_tokens() const { + return macro_expander_->num_unexpanded_tokens_consumed(); +} + +// TODO: this overload should also be updated to return the image, and +// all callers should be updated. In fact, all callers should simply use +// TokenWithLocation, and maybe have the image attached there. +absl::Status DisambiguatorLexer::GetNextToken(ParseLocationRange* location, + TokenKind* token) { + absl::string_view image; + *token = GetNextToken(&image, location); + return GetOverrideError(); +} + +void DisambiguatorLexer::SetForceTerminate(int* end_byte_offset) { + // Ensure that the lookahead buffers immediately reflects the termination, + // but only after making sure to seek to the end offset if all that's left + // is whitespace. + bool lookahead_ok = false; + Location last_returned_location = + prev_token_.has_value() ? prev_token_->location : Location(); + if (!lookahead_1_.has_value()) { + // Quick fix for b/325081404 to unblock TAP. Further investigation of the + // invariant needed. + // TODO: investigate why that case is calling + // SetForceTerminate() when it already hit an error. + absl::Status original_status; + std::swap(override_error_, original_status); + + Lookahead1(last_returned_location); + lookahead_ok = override_error_.ok(); + + // Restore the old error. + // If we hit an error on the lookahead, this is a problem on the next + // statement, not the current one. We should not let it take effect. + std::swap(override_error_, original_status); + } + if (end_byte_offset != nullptr) { + // If the next token is YYEOF, update the end_byte_offset to the end. + *end_byte_offset = lookahead_1_->kind == Token::YYEOF && lookahead_ok + ? lookahead_1_->location.end().GetByteOffset() + : last_returned_location.end().GetByteOffset(); + } + force_terminate_ = true; + // Ensure that the lookahead buffers immediately reflects the termination. + if (lookahead_1_.has_value()) { + lookahead_1_->kind = Token::YYEOF; + } + if (lookahead_2_.has_value()) { + lookahead_2_->kind = Token::YYEOF; + } +} + +void DisambiguatorLexer::PushBisonParserMode(BisonParserMode mode) { + restore_modes_.push(mode_); + mode_ = mode; +} + +void DisambiguatorLexer::PopBisonParserMode() { + ABSL_DCHECK(!restore_modes_.empty()); + mode_ = restore_modes_.top(); + restore_modes_.pop(); +} + +} // namespace parser +} // namespace zetasql diff --git a/zetasql/parser/token_disambiguator.h b/zetasql/parser/token_disambiguator.h new file mode 100644 index 000000000..a35774fa4 --- /dev/null +++ b/zetasql/parser/token_disambiguator.h @@ -0,0 +1,168 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PARSER_TOKEN_DISAMBIGUATOR_H_ +#define ZETASQL_PARSER_TOKEN_DISAMBIGUATOR_H_ + +#include +#include +#include +#include +#include +#include + +#include "zetasql/base/arena.h" +#include "zetasql/parser/bison_parser_mode.h" +#include "zetasql/parser/flex_tokenizer.h" +#include "zetasql/parser/macros/macro_catalog.h" +#include "zetasql/parser/macros/macro_expander.h" +#include "zetasql/parser/macros/token_with_location.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/parse_location.h" +#include "absl/base/attributes.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/strings/string_view.h" + +namespace zetasql { +namespace parser { + +class DisambiguatorLexer final { + public: + using TokenKind = ZetaSqlFlexTokenizer::TokenKind; + using Location = ZetaSqlFlexTokenizer::Location; + + static absl::StatusOr> Create( + BisonParserMode mode, absl::string_view filename, absl::string_view input, + int start_offset, const LanguageOptions& language_options, + const macros::MacroCatalog* macro_catalog, zetasql_base::UnsafeArena* arena); + + // Returns the next token id, returning its location in 'yylloc' and image in + // 'text'. On input, 'yylloc' must be the location of the previous token that + // was returned. + TokenKind GetNextToken(absl::string_view* text, Location* yylloc); + + // This is the "nice" API for the tokenizer, to be used by GetParseTokens(). + // On input, 'location' must be the location of the previous token that was + // generated. Returns the Bison token id in 'token' and the ZetaSQL location + // in 'location'. Returns an error if the tokenizer sets override_error. + absl::Status GetNextToken(ParseLocationRange* location, int* token); + + // Returns a non-OK error status if the tokenizer encountered an error. This + // error takes priority over a parser error, because the parser error is + // always a consequence of the tokenizer error. + absl::Status GetOverrideError() const { return override_error_; } + + // Ensures that the next token returned will be EOF, even if we're not at the + // end of the input. The current end offset is returned in `end_byte_offset` + // if it is not nullptr. + // Ensures that `lookahead_1_` is filled (and masked as YYEOF), and voids + // `lookahead_2_`. Lookahead1 is important for when the input ends in a + // semicolon, because in that case we need the returned `end_byte_offset` + // points to the end of input, not just the semicolon. + void SetForceTerminate(int* end_byte_offset); + + // Some sorts of statements need to change the mode after the parser consumes + // the preamble of the statement. DEFINE MACRO is an example, it wants to + // consume the macro body as raw tokens. + void PushBisonParserMode(BisonParserMode mode); + // Restore the BisonParserMode to its value before the previous Push. + void PopBisonParserMode(); + + // Returns the number of lexical tokens returned by the underlying tokenizer. + int64_t num_lexical_tokens() const; + + private: + DisambiguatorLexer(BisonParserMode mode, + const LanguageOptions& language_options, + std::unique_ptr expander); + + DisambiguatorLexer(const DisambiguatorLexer&) = delete; + DisambiguatorLexer& operator=(const DisambiguatorLexer&) = delete; + + // This friend is used by the unit test to help test internals. + friend class TokenTestThief; + + // If the N+1 token is already buffered we simply return the token value from + // the buffer. Otherwise we read the next token from `GetNextTokenFlexImpl` + // and put it in the lookahead buffer before returning it. + int Lookahead1(const Location& current_token_location); + + // If the N+2 token is already buffered we simply return the token value from + // the buffer. Otherwise we ensure the N+1 lookahead is populated then read + // the next token from `GetNextTokenFlexImpl` and put it in the lookahead + // buffer before returning it. `current_token_location` is used if we need + // to populated token N+1. + int Lookahead2(const Location& current_token_location); + + // Applies a set of rules based on previous and successive token kinds and if + // any rule matches, returns the token kind specified by the rule. Otherwise + // when no rule matches, returns `token`. `location` is used when requesting + // Lookahead tokens and also to generate error messages for + // `SetOverrideError`. + TokenKind ApplyTokenDisambiguation(TokenKind token, const Location& location); + + // Indicates whether macros are enabled. + bool AreMacrosEnabled() const; + + bool IsReservedKeyword(absl::string_view text) const { + return language_options_.IsReservedKeyword(text); + } + + // Sets the field `override_error_` and returns the token kind YYEOF. + ABSL_MUST_USE_RESULT TokenKind SetOverrideErrorAndReturnEof( + absl::string_view error_message, const Location& error_location); + + // This determines the first token returned to the bison parser, which + // determines the mode that we'll run in. + BisonParserMode mode_; + std::stack restore_modes_; + + const LanguageOptions& language_options_; + + // The underlying macro expander which feeds tokens to this disambiguator. + std::unique_ptr macro_expander_; + + // If this is set to true, the next token returned will be EOF, even if we're + // not at the end of the input. + bool force_terminate_ = false; + + // The disambiguator may want to return an error directly. It does this by + // returning EOF to the bison parser, which then may or may not spew out its + // own error message. The BisonParser wrapper then grabs the error from here + // instead. + absl::Status override_error_; + + // The previous token returned by the tokenizer. + std::optional prev_token_; + + // The lookahead_N_ fields implement the token lookahead buffer. There are a + // fixed number of fields here, each represented by an optional, rather than a + // deque or vector because, ideally we only do token disambiguation on small + // windows (e.g. no more than two or three lookaheads). + + // A token in the lookahead buffer. + // The lookahead buffer slot for token N+1. + std::optional lookahead_1_; + + // The lookahead buffer slot for token N+2. + std::optional lookahead_2_; +}; + +} // namespace parser +} // namespace zetasql + +#endif // ZETASQL_PARSER_TOKEN_DISAMBIGUATOR_H_ diff --git a/zetasql/parser/tokenizer.h b/zetasql/parser/tokenizer.h deleted file mode 100644 index aeeefb119..000000000 --- a/zetasql/parser/tokenizer.h +++ /dev/null @@ -1,56 +0,0 @@ -// -// Copyright 2019 Google LLC -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. -// - -#ifndef ZETASQL_PARSER_TOKENIZER_H_ -#define ZETASQL_PARSER_TOKENIZER_H_ - -#include - -#include "zetasql/parser/location.hh" -#include "absl/status/status.h" -#include "absl/strings/string_view.h" - -namespace zetasql { -namespace parser { - -// General interface for tokenizers. DOCUMENT -class Tokenizer { - public: - // Type aliases to improve readability of API. - using Location = zetasql_bison_parser::location; - using TokenKind = int; - virtual ~Tokenizer() = default; - - // Returns a fresh instance of this tokenizer, to run on potentially new - // input. However, language_options, mode, etc, are all the same. - virtual std::unique_ptr GetNewInstance( - absl::string_view filename, absl::string_view input) const = 0; - - // Returns the next lexical token kind. On input, 'location' must be the - // location of the previous token that was generated. Returns the Bison token - // id in 'token' and the ZetaSQL location in 'location'. Returns an error if - // the tokenizer sets override_error. - virtual TokenKind GetNextToken(Location* location) = 0; - - // Returns the override error, if set. Multiple custom rules in our lexer - // set the override error. - virtual absl::Status GetOverrideError() const = 0; -}; - -} // namespace parser -} // namespace zetasql - -#endif // ZETASQL_PARSER_TOKENIZER_H_ diff --git a/zetasql/parser/unparser.cc b/zetasql/parser/unparser.cc index 5c010803b..52c72f34b 100644 --- a/zetasql/parser/unparser.cc +++ b/zetasql/parser/unparser.cc @@ -204,7 +204,7 @@ void Unparser::PrintCloseParenIfNeeded(const ASTNode* node) { } } -void Unparser::UnparseLeafNode(const ASTLeaf* leaf_node) { +void Unparser::UnparseLeafNode(const ASTPrintableLeaf* leaf_node) { print(leaf_node->image()); } @@ -1263,6 +1263,11 @@ void Unparser::visitASTUndropStatement(const ASTUndropStatement* node, Formatter::Indenter indenter(&formatter_); node->for_system_time()->Accept(this, data); } + if (node->options_list() != nullptr) { + println(); + print("OPTIONS"); + node->options_list()->Accept(this, data); + } } void Unparser::visitASTDropStatement(const ASTDropStatement* node, void* data) { @@ -1584,6 +1589,9 @@ void Unparser::visitASTAliasedQuery(const ASTAliasedQuery* node, void* data) { } println(); print(")"); + if (node->modifiers() != nullptr) { + node->modifiers()->Accept(this, data); + } } void Unparser::visitASTAliasedQueryList(const ASTAliasedQueryList* node, @@ -1591,6 +1599,34 @@ void Unparser::visitASTAliasedQueryList(const ASTAliasedQueryList* node, UnparseVectorWithSeparator(node->aliased_query_list(), data, ","); } +void Unparser::visitASTAliasedQueryModifiers( + const ASTAliasedQueryModifiers* node, void* data) { + if (node->recursion_depth_modifier() != nullptr) { + node->recursion_depth_modifier()->Accept(this, data); + } +} + +void Unparser::visitASTRecursionDepthModifier( + const ASTRecursionDepthModifier* node, void* data) { + print("WITH DEPTH"); + if (node->alias() != nullptr) { + node->alias()->Accept(this, data); + } + print("BETWEEN"); + node->lower_bound()->Accept(this, data); + print("AND"); + node->upper_bound()->Accept(this, data); +} + +void Unparser::visitASTIntOrUnbounded(const ASTIntOrUnbounded* node, + void* data) { + if (node->bound() != nullptr) { + node->bound()->Accept(this, data); + } else { + print("UNBOUNDED"); + } +} + void Unparser::visitASTIntoAlias(const ASTIntoAlias* node, void* data) { print(absl::StrCat("INTO ", ToIdentifierLiteral(node->identifier()->GetAsIdString()))); @@ -1955,6 +1991,7 @@ void Unparser::visitASTNewConstructor(const ASTNewConstructor* node, void Unparser::visitASTBracedConstructorFieldValue( const ASTBracedConstructorFieldValue* node, void* data) { + print(node->colon_prefixed() ? ": " : " "); if (node->expression()) { node->expression()->Accept(this, data); } @@ -1970,21 +2007,14 @@ void Unparser::visitASTBracedConstructorField( node->parenthesized_path()->Accept(this, data); print(")"); } - print(": "); node->value()->Accept(this, data); } void Unparser::visitASTBracedConstructor(const ASTBracedConstructor* node, void* data) { print("{"); - bool first = true; for (auto* field : node->fields()) { - if (first) { - field->Accept(this, data); - first = false; - continue; - } - if (field->parenthesized_path()) { + if (field->comma_separated()) { print(","); } field->Accept(this, data); @@ -1999,6 +2029,16 @@ void Unparser::visitASTBracedNewConstructor(const ASTBracedNewConstructor* node, node->braced_constructor()->Accept(this, data); } +void Unparser::visitASTStructBracedConstructor( + const ASTStructBracedConstructor* node, void* data) { + if (node->type_name() != nullptr) { + node->type_name()->Accept(this, data); + } else { + print("STRUCT"); + } + node->braced_constructor()->Accept(this, data); +} + void Unparser::visitASTInferredTypeColumnSchema( const ASTInferredTypeColumnSchema* node, void* data) { UnparseColumnSchema(node, data); @@ -2053,18 +2093,18 @@ void Unparser::visitASTIntLiteral(const ASTIntLiteral* node, void* data) { void Unparser::visitASTNumericLiteral( const ASTNumericLiteral* node, void* data) { print("NUMERIC"); - UnparseLeafNode(node); + node->string_literal()->Accept(this, data); } void Unparser::visitASTBigNumericLiteral(const ASTBigNumericLiteral* node, void* data) { print("BIGNUMERIC"); - UnparseLeafNode(node); + node->string_literal()->Accept(this, data); } void Unparser::visitASTJSONLiteral(const ASTJSONLiteral* node, void* data) { print("JSON"); - UnparseLeafNode(node); + node->string_literal()->Accept(this, data); } void Unparser::visitASTFloatLiteral(const ASTFloatLiteral* node, void* data) { @@ -2072,10 +2112,20 @@ void Unparser::visitASTFloatLiteral(const ASTFloatLiteral* node, void* data) { } void Unparser::visitASTStringLiteral(const ASTStringLiteral* node, void* data) { + visitASTChildren(node, data); +} + +void Unparser::visitASTStringLiteralComponent( + const ASTStringLiteralComponent* node, void* data) { UnparseLeafNode(node); } void Unparser::visitASTBytesLiteral(const ASTBytesLiteral* node, void* data) { + visitASTChildren(node, data); +} + +void Unparser::visitASTBytesLiteralComponent( + const ASTBytesLiteralComponent* node, void* data) { UnparseLeafNode(node); } @@ -3473,6 +3523,12 @@ void Unparser::visitASTAlterSchemaStatement( VisitAlterStatementBase(node, data); } +void Unparser::visitASTAlterExternalSchemaStatement( + const ASTAlterExternalSchemaStatement* node, void* data) { + print("ALTER EXTERNAL SCHEMA"); + VisitAlterStatementBase(node, data); +} + void Unparser::visitASTAlterTableStatement(const ASTAlterTableStatement* node, void* data) { print("ALTER TABLE"); diff --git a/zetasql/parser/unparser.h b/zetasql/parser/unparser.h index 0f9607bf5..ab52a3982 100644 --- a/zetasql/parser/unparser.h +++ b/zetasql/parser/unparser.h @@ -304,6 +304,12 @@ class Unparser : public ParseTreeVisitor { void visitASTAliasedQuery(const ASTAliasedQuery* node, void* data) override; void visitASTAliasedQueryList(const ASTAliasedQueryList* node, void* data) override; + void visitASTAliasedQueryModifiers(const ASTAliasedQueryModifiers* node, + void* data) override; + void visitASTRecursionDepthModifier(const ASTRecursionDepthModifier* node, + void* data) override; + void visitASTIntOrUnbounded(const ASTIntOrUnbounded* node, + void* data) override; void visitASTIntoAlias(const ASTIntoAlias* node, void* data) override; void visitASTFromClause(const ASTFromClause* node, void* data) override; void visitASTTransformClause(const ASTTransformClause* node, @@ -378,6 +384,8 @@ class Unparser : public ParseTreeVisitor { void* data) override; void visitASTBracedNewConstructor(const ASTBracedNewConstructor* node, void* data) override; + void visitASTStructBracedConstructor(const ASTStructBracedConstructor* node, + void* data) override; void visitASTInferredTypeColumnSchema(const ASTInferredTypeColumnSchema* node, void* data) override; void visitASTArrayConstructor(const ASTArrayConstructor* node, @@ -402,7 +410,11 @@ class Unparser : public ParseTreeVisitor { void visitASTJSONLiteral(const ASTJSONLiteral* node, void* data) override; void visitASTFloatLiteral(const ASTFloatLiteral* node, void* data) override; void visitASTStringLiteral(const ASTStringLiteral* node, void* data) override; + void visitASTStringLiteralComponent(const ASTStringLiteralComponent* node, + void* data) override; void visitASTBytesLiteral(const ASTBytesLiteral* node, void* data) override; + void visitASTBytesLiteralComponent(const ASTBytesLiteralComponent* node, + void* data) override; void visitASTBooleanLiteral(const ASTBooleanLiteral* node, void* data) override; void visitASTNullLiteral(const ASTNullLiteral* node, void* data) override; @@ -636,6 +648,8 @@ class Unparser : public ParseTreeVisitor { void* data) override; void visitASTAlterSchemaStatement(const ASTAlterSchemaStatement* node, void* data) override; + void visitASTAlterExternalSchemaStatement( + const ASTAlterExternalSchemaStatement* node, void* data) override; void visitASTAlterTableStatement(const ASTAlterTableStatement* node, void* data) override; void visitASTAlterViewStatement(const ASTAlterViewStatement* node, @@ -850,7 +864,7 @@ class Unparser : public ParseTreeVisitor { void UnparseASTTableDataSource(const ASTTableDataSource* node, void* data); void VisitCheckConstraintSpec(const ASTCheckConstraint* node, void* data); void VisitForeignKeySpec(const ASTForeignKey* node, void* data); - void UnparseLeafNode(const ASTLeaf* leaf_node); + void UnparseLeafNode(const ASTPrintableLeaf* leaf_node); void UnparseColumnSchema(const ASTColumnSchema* node, void* data); void VisitAlterStatementBase(const ASTAlterStatementBase* node, void* data); void VisitASTDropIndexStatement(const ASTDropIndexStatement* node, diff --git a/zetasql/proto/function.proto b/zetasql/proto/function.proto index 4f8645d06..55dff7d8c 100644 --- a/zetasql/proto/function.proto +++ b/zetasql/proto/function.proto @@ -172,6 +172,7 @@ message FunctionOptionsProto { optional bool supports_having_modifier = 15 [default = true]; optional bool supports_clamped_between_modifier = 16 [default = false]; optional bool uses_upper_case_sql_name = 17 [default = true]; + optional bool may_suppress_side_effects = 18 [default = false]; } message FunctionProto { diff --git a/zetasql/public/BUILD b/zetasql/public/BUILD index 89916b071..e16112630 100644 --- a/zetasql/public/BUILD +++ b/zetasql/public/BUILD @@ -88,6 +88,7 @@ cc_library( "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/container:node_hash_map", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", "@com_google_protobuf//:cc_wkt_protos", "@com_google_protobuf//:protobuf", ], @@ -367,7 +368,6 @@ cc_library( ":value", "//zetasql/base", "//zetasql/base:map_util", - "//zetasql/base:source_location", "//zetasql/base:status", "//zetasql/common:errors", "//zetasql/common:internal_value", @@ -390,9 +390,10 @@ cc_library( "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:cord", + "@com_google_absl//absl/strings:str_format", "@com_google_absl//absl/time", "@com_google_absl//absl/types:optional", - "@com_google_protobuf//:protobuf", + "@com_google_absl//absl/types:span", ], ) @@ -460,6 +461,7 @@ cc_library( hdrs = ["numeric_value_test_utils.h"], deps = [ ":numeric_value", + "//zetasql/common:multiprecision_int", "@com_google_absl//absl/random", "@com_google_absl//absl/status:statusor", ], @@ -474,6 +476,7 @@ cc_test( "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", "@com_google_absl//absl/random", + "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", ], ) @@ -539,6 +542,7 @@ cc_test( "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -575,6 +579,8 @@ cc_library( ":interval_value", "@com_google_absl//absl/random", "@com_google_absl//absl/random:distributions", + "@com_google_absl//absl/strings:str_format", + "@com_google_googletest//:gtest", ], ) @@ -591,6 +597,7 @@ cc_test( "@com_google_absl//absl/random", "@com_google_absl//absl/random:distributions", "@com_google_absl//absl/status", + "@com_google_absl//absl/strings", ], ) @@ -612,16 +619,20 @@ cc_library( ":language_options", ":numeric_value", ":options_cc_proto", + ":token_list", ":type", ":type_cc_proto", ":value_cc_proto", ":value_content", "//zetasql/base", + "//zetasql/base:check", "//zetasql/base:compact_reference_counted", "//zetasql/base:map_util", + "//zetasql/base:map_view", "//zetasql/base:ret_check", "//zetasql/base:source_location", "//zetasql/base:status", + "//zetasql/common:errors", "//zetasql/common:float_margin", "//zetasql/common:thread_stack", "//zetasql/public/functions:arithmetics", @@ -633,11 +644,13 @@ cc_library( "//zetasql/public/types:value_representations", "@com_google_absl//absl/base", "@com_google_absl//absl/base:core_headers", + "@com_google_absl//absl/container:btree", "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/container:inlined_vector", "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/hash", + "@com_google_absl//absl/log", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", @@ -666,6 +679,7 @@ cc_test( ":numeric_value", ":options_cc_proto", ":simple_catalog", + ":token_list_util", ":type", ":value", "//zetasql/base", @@ -680,9 +694,11 @@ cc_test( "//zetasql/testdata:test_schema_cc_proto", "//zetasql/testing:test_value", "@com_google_absl//absl/algorithm:container", + "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/hash", "@com_google_absl//absl/hash:hash_testing", + "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", @@ -1109,13 +1125,13 @@ cc_test( "//zetasql/common/testing:proto_matchers", "//zetasql/base/testing:status_matchers", "//zetasql/public/proto:type_annotation_cc_proto", + ":token_list_util", "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", # buildcleaner: keep (text format proto) "//zetasql/testdata:test_schema_cc_proto", "//zetasql/base/testing:zetasql_gtest_main", - "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/functional:bind_front", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", @@ -1243,7 +1259,6 @@ cc_library( "@com_google_absl//absl/strings", "@com_google_absl//absl/types:optional", "@com_google_absl//absl/types:span", - "@com_google_protobuf//:protobuf", ], ) @@ -1266,6 +1281,7 @@ cc_library( deps = [ ":catalog", ":deprecation_warning_cc_proto", + ":evaluator_table_iterator", ":function_cc_proto", ":function_headers", ":id_string", @@ -1279,6 +1295,7 @@ cc_library( ":type_cc_proto", ":value", "//zetasql/base", + "//zetasql/base:check", "//zetasql/base:map_util", "//zetasql/base:ret_check", "//zetasql/base:status", @@ -1329,6 +1346,7 @@ cc_library( "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/resolved_ast", + "@com_google_absl//absl/strings", "@com_google_absl//absl/types:optional", ], ) @@ -1410,6 +1428,7 @@ cc_library( "//zetasql/resolved_ast", "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/status", + "@com_google_absl//absl/strings", ], ) @@ -1543,6 +1562,7 @@ cc_test( "//zetasql/testdata:test_schema_cc_proto", "//zetasql/testing:test_function", "//zetasql/testing:test_value", + "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:cord", @@ -1556,6 +1576,7 @@ cc_test( srcs = ["function_test.cc"], deps = [ ":builtin_function", + ":builtin_function_options", ":deprecation_warning_cc_proto", ":error_location_cc_proto", ":function", @@ -1567,6 +1588,7 @@ cc_test( ":type", ":value", "//zetasql/base", + "//zetasql/base:check", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/common:function_signature_testutil", @@ -1575,6 +1597,8 @@ cc_test( "//zetasql/proto:function_cc_proto", "//zetasql/public/types", "//zetasql/resolved_ast", + "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/strings", ], ) @@ -1597,6 +1621,7 @@ cc_test( "//zetasql/public/types", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -1613,6 +1638,7 @@ cc_test( "//zetasql/testdata:test_schema_cc_proto", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:cord", + "@com_google_absl//absl/types:span", ], ) @@ -1720,6 +1746,7 @@ cc_library( ":strings", ":value", "//zetasql/base", + "//zetasql/base:arena", "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/base:strings", @@ -1728,6 +1755,7 @@ cc_library( "//zetasql/parser:bison_parser_generated_lib", "//zetasql/parser:bison_parser_mode", "//zetasql/parser:keywords", + "//zetasql/parser:token_disambiguator", "//zetasql/public/functions:convert_string", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", "@com_google_absl//absl/container:flat_hash_map", @@ -1748,7 +1776,6 @@ cc_test( "//zetasql/base:path", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", - "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/strings", ], ) @@ -1820,6 +1847,7 @@ cc_library( "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -1852,6 +1880,7 @@ cc_library( deps = [ ":analyzer_options", ":analyzer_output_properties", + ":catalog", ":id_string", "//zetasql/base:arena", "//zetasql/base:enum_utils", @@ -1861,7 +1890,6 @@ cc_library( "//zetasql/public/proto:logging_cc_proto", "//zetasql/public/types", "//zetasql/resolved_ast", - "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/status", "@com_google_absl//absl/strings:str_format", @@ -2023,6 +2051,7 @@ cc_test( ":analyzer", ":analyzer_output", ":builtin_function_options", + ":catalog", ":civil_time", ":evaluator", ":evaluator_base", @@ -2039,13 +2068,12 @@ cc_test( "//zetasql/base", "//zetasql/base:check", "//zetasql/base:clock", + "//zetasql/base:map_util", "//zetasql/base:ret_check", "//zetasql/base:status", - "//zetasql/base:stl_util", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/common:evaluator_test_table", - "//zetasql/common/testing:testing_proto_util", "//zetasql/public/functions:date_time_util", "//zetasql/public/types", "//zetasql/reference_impl:evaluation", @@ -2066,7 +2094,6 @@ cc_test( "@com_google_absl//absl/time", "@com_google_absl//absl/types:span", "@com_google_protobuf//:cc_wkt_protos", - "@com_google_protobuf//:protobuf", ], ) @@ -2110,6 +2137,7 @@ cc_library( "@com_google_absl//absl/hash", "@com_google_absl//absl/strings", "@com_google_absl//absl/synchronization", + "@com_google_absl//absl/types:span", ], ) @@ -2128,6 +2156,7 @@ cc_test( "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/hash", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -2261,3 +2290,49 @@ cc_test( "@com_google_absl//absl/container:btree", ], ) + +proto_library( + name = "simple_token_list_proto", + srcs = ["simple_token_list.proto"], +) + +cc_proto_library( + name = "simple_token_list_cc_proto", + deps = [":simple_token_list_proto"], +) + +java_proto_library( + name = "simple_token_list_java_proto", + deps = [":simple_token_list_proto"], +) + +cc_library( + name = "simple_token_list", + srcs = ["simple_token_list.cc"], + hdrs = ["simple_token_list.h"], + deps = [ + ":simple_token_list_cc_proto", + "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/strings:str_format", + ], +) + +cc_library( + name = "token_list", + hdrs = ["token_list.h"], + deps = [ + "//zetasql/public:simple_token_list", + ], +) + +cc_library( + name = "token_list_util", + srcs = ["token_list_util.cc"], + hdrs = ["token_list_util.h"], + deps = [ + ":value", + "//zetasql/public:simple_token_list", + ], +) diff --git a/zetasql/public/analyzer.cc b/zetasql/public/analyzer.cc index 45c2d4341..f0d078cc7 100644 --- a/zetasql/public/analyzer.cc +++ b/zetasql/public/analyzer.cc @@ -29,10 +29,10 @@ #include "zetasql/analyzer/all_rewriters.h" #include "zetasql/analyzer/analyzer_impl.h" #include "zetasql/analyzer/analyzer_output_mutator.h" -#include "zetasql/analyzer/anonymization_rewriter.h" #include "zetasql/analyzer/function_resolver.h" #include "zetasql/analyzer/resolver.h" #include "zetasql/analyzer/rewrite_resolved_ast.h" +#include "zetasql/analyzer/rewriters/anonymization_rewriter.h" #include "zetasql/common/errors.h" #include "zetasql/common/internal_analyzer_options.h" #include "zetasql/common/status_payload_utils.h" @@ -284,7 +284,8 @@ static absl::Status AnalyzeStatementHelper( options.error_message_options(), sql, resolver.deprecation_warnings()), *type_assignments, resolver.undeclared_positional_parameters(), - resolver.max_column_id()); + resolver.max_column_id() + ); ZETASQL_RETURN_IF_ERROR( RewriteResolvedAstImpl(options, sql, catalog, type_factory, **output)); if (options.fields_accessed_mode() == @@ -702,7 +703,8 @@ absl::StatusOr> RewriteForAnonymization( /*parser_output=*/nullptr, analyzer_output.deprecation_warnings(), analyzer_output.undeclared_parameters(), analyzer_output.undeclared_positional_parameters(), - column_factory.max_column_id()); + column_factory.max_column_id() + ); if (analyzer_options.fields_accessed_mode() == AnalyzerOptions::FieldsAccessedMode::CLEAR_FIELDS) { AnalyzerOutputMutator(ret).resolved_node()->ClearFieldsAccessed(); diff --git a/zetasql/public/analyzer_options.cc b/zetasql/public/analyzer_options.cc index fabb69c17..ac7540c52 100644 --- a/zetasql/public/analyzer_options.cc +++ b/zetasql/public/analyzer_options.cc @@ -35,6 +35,7 @@ #include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/status_macros.h" ABSL_FLAG(bool, zetasql_validate_resolved_ast, true, @@ -712,7 +713,7 @@ void AnalyzerOptions::SetDdlPseudoColumns( // memory errors. data_->ddl_pseudo_columns_callback = [ddl_pseudo_columns]( - const std::vector& table_name, + absl::Span table_name, const std::vector& options, std::vector>* pseudo_columns) { *pseudo_columns = ddl_pseudo_columns; @@ -728,10 +729,14 @@ void AnalyzerOptions::enable_rewrite(ResolvedASTRewrite rewrite, bool enable) { if (enable) { data_->enabled_rewrites.insert(rewrite); } else { - data_->enabled_rewrites.erase(rewrite); + disable_rewrite(rewrite); } } +void AnalyzerOptions::disable_rewrite(ResolvedASTRewrite rewrite) { + data_->enabled_rewrites.erase(rewrite); +} + absl::Status AnalyzerOptions::set_default_anon_kappa_value(int64_t value) { // 0 is the default value means it has not been set. Otherwise, we check // the valid range here. diff --git a/zetasql/public/analyzer_options.h b/zetasql/public/analyzer_options.h index 76e83f8cd..70d30c644 100644 --- a/zetasql/public/analyzer_options.h +++ b/zetasql/public/analyzer_options.h @@ -320,6 +320,10 @@ class AnalyzerOptions { } // Enables or disables a particular rewrite. void enable_rewrite(ResolvedASTRewrite rewrite, bool enable = true); + + // Disables a particular rewrite. Identical to calling `enable_rewrite` with + // `enable` = false. + void disable_rewrite(ResolvedASTRewrite rewrite); // Returns if a given AST rewrite is enabled. ABSL_MUST_USE_RESULT bool rewrite_enabled(ResolvedASTRewrite rewrite) const { return data_->enabled_rewrites.contains(rewrite); diff --git a/zetasql/public/analyzer_output.cc b/zetasql/public/analyzer_output.cc index ddf3b331d..dcccf684e 100644 --- a/zetasql/public/analyzer_output.cc +++ b/zetasql/public/analyzer_output.cc @@ -26,6 +26,7 @@ #include "zetasql/base/enum_utils.h" #include "zetasql/common/timer_util.h" #include "zetasql/public/proto/logging.pb.h" +#include "absl/container/flat_hash_set.h" #include "absl/strings/str_format.h" #include "absl/time/time.h" #include "zetasql/base/map_util.h" @@ -46,7 +47,8 @@ AnalyzerOutput::AnalyzerOutput( const std::vector& deprecation_warnings, const QueryParametersMap& undeclared_parameters, const std::vector& undeclared_positional_parameters, - int max_column_id) + int max_column_id + ) : id_string_pool_(std::move(id_string_pool)), arena_(std::move(arena)), statement_(std::move(statement)), @@ -55,7 +57,8 @@ AnalyzerOutput::AnalyzerOutput( deprecation_warnings_(deprecation_warnings), undeclared_parameters_(undeclared_parameters), undeclared_positional_parameters_(undeclared_positional_parameters), - max_column_id_(max_column_id) {} + max_column_id_(max_column_id) +{} AnalyzerOutput::AnalyzerOutput( std::shared_ptr id_string_pool, @@ -66,7 +69,8 @@ AnalyzerOutput::AnalyzerOutput( const std::vector& deprecation_warnings, const QueryParametersMap& undeclared_parameters, const std::vector& undeclared_positional_parameters, - int max_column_id) + int max_column_id + ) : id_string_pool_(std::move(id_string_pool)), arena_(std::move(arena)), expr_(std::move(expr)), @@ -75,7 +79,8 @@ AnalyzerOutput::AnalyzerOutput( deprecation_warnings_(deprecation_warnings), undeclared_parameters_(undeclared_parameters), undeclared_positional_parameters_(undeclared_positional_parameters), - max_column_id_(max_column_id) {} + max_column_id_(max_column_id) +{} AnalyzerOutput::~AnalyzerOutput() = default; diff --git a/zetasql/public/analyzer_output.h b/zetasql/public/analyzer_output.h index 7acf0a1b6..2e6574eb6 100644 --- a/zetasql/public/analyzer_output.h +++ b/zetasql/public/analyzer_output.h @@ -158,7 +158,8 @@ class AnalyzerOutput { const std::vector& deprecation_warnings, const QueryParametersMap& undeclared_parameters, const std::vector& undeclared_positional_parameters, - int max_column_id); + int max_column_id + ); AnalyzerOutput( std::shared_ptr id_string_pool, std::shared_ptr arena, @@ -168,7 +169,8 @@ class AnalyzerOutput { const std::vector& deprecation_warnings, const QueryParametersMap& undeclared_parameters, const std::vector& undeclared_positional_parameters, - int max_column_id); + int max_column_id + ); AnalyzerOutput(const AnalyzerOutput&) = delete; AnalyzerOutput& operator=(const AnalyzerOutput&) = delete; ~AnalyzerOutput(); diff --git a/zetasql/public/anon_function.cc b/zetasql/public/anon_function.cc index 73734049d..3a29dfbb6 100644 --- a/zetasql/public/anon_function.cc +++ b/zetasql/public/anon_function.cc @@ -19,6 +19,8 @@ #include #include +#include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/language_options.h" #include "absl/functional/bind_front.h" #include "absl/strings/ascii.h" @@ -49,63 +51,72 @@ static std::string AnonFunctionSQL(absl::string_view display_name, } } +static std::string SignatureTextForAnonFunction( + const std::string& function_name, const LanguageOptions& language_options, + const Function& function, const FunctionSignature& signature) { + std::string upper_case_function_name = absl::AsciiStrToUpper(function_name); + std::string percentile_or_quantiles = ""; + bool is_function_name_percentile_or_quantiles = false; + bool is_function_name_quantiles = + upper_case_function_name == "ANON_QUANTILES" || + upper_case_function_name == "$ANON_QUANTILES_WITH_REPORT_JSON" || + upper_case_function_name == "$ANON_QUANTILES_WITH_REPORT_PROTO"; + if (upper_case_function_name == "ANON_PERCENTILE_CONT" || + is_function_name_quantiles) { + // TODO: Support inputs.size() == 2 once the DP Library's + // Quantiles supports automatic/implicit bounds. + // The expected signatures of ANON_PERCENTILE_CONT and ANON_QUANTILES are + // that they have two input arguments along with two required clamped + // bounds arguments (in that order). + ABSL_DCHECK_EQ(signature.arguments().size(), 4) + << signature.DebugString(function_name, /*verbose=*/true); + is_function_name_percentile_or_quantiles = true; + percentile_or_quantiles = absl::StrCat( + ", ", + signature.argument(1).UserFacingName(language_options.product_mode())); + } else { + // The expected invariant for the current list of the anonymized aggregate + // functions other than ANON_PERCENTILE_CONT or ANON_QUANTILES is that + // they have one input argument along with two optional clamped bounds + // arguments (in that order). + ABSL_DCHECK_EQ(signature.arguments().size(), 3) + << "upper_case_function_name = " << upper_case_function_name << "\n" + << signature.DebugString(function_name, /*verbose=*/true); + } + const std::string base_argument_type = + signature.argument(0).UserFacingName(language_options.product_mode()); + const std::string lower_bound_type = + signature.argument(is_function_name_percentile_or_quantiles ? 2 : 1) + .UserFacingName(language_options.product_mode()); + const std::string upper_bound_type = + signature.argument(is_function_name_percentile_or_quantiles ? 3 : 2) + .UserFacingName(language_options.product_mode()); + // TODO: Once the DP Library's Quantiles supports + // automatic/implicit bounds and ZetaSQL is ready to support them, + // remove the is_quantiles conditionals below when CLAMPED BETWEEN is + // optional for ANON_QUANTILES. + return absl::StrCat( + absl::AsciiStrToUpper(function_name), "(", base_argument_type, + percentile_or_quantiles, " ", (is_function_name_quantiles ? "" : "["), + "CLAMPED BETWEEN ", lower_bound_type, " AND ", upper_bound_type, + (is_function_name_quantiles ? "" : "]"), ")"); +} + static std::string SupportedSignaturesForAnonFunction( const std::string& function_name, const LanguageOptions& language_options, const Function& function) { std::string upper_case_function_name = absl::AsciiStrToUpper(function_name); std::string supported_signatures; for (const FunctionSignature& signature : function.signatures()) { - std::string percentile_or_quantiles = ""; - bool is_function_name_percentile_or_quantiles = false; - bool is_function_name_quantiles = - upper_case_function_name == "ANON_QUANTILES" || - upper_case_function_name == "$ANON_QUANTILES_WITH_REPORT_JSON" || - upper_case_function_name == "$ANON_QUANTILES_WITH_REPORT_PROTO"; - if (upper_case_function_name == "ANON_PERCENTILE_CONT" || - is_function_name_quantiles) { - // TODO: Support inputs.size() == 2 once the DP Library's - // Quantiles supports automatic/implicit bounds. - // The expected signatures of ANON_PERCENTILE_CONT and ANON_QUANTILES are - // that they have two input arguments along with two required clamped - // bounds arguments (in that order). - ABSL_DCHECK_EQ(signature.arguments().size(), 4) - << signature.DebugString(function_name, /*verbose=*/true); - is_function_name_percentile_or_quantiles = true; - percentile_or_quantiles = - absl::StrCat(", ", signature.argument(1).UserFacingName( - language_options.product_mode())); - } else { - // The expected invariant for the current list of the anonymized aggregate - // functions other than ANON_PERCENTILE_CONT or ANON_QUANTILES is that - // they have one input argument along with two optional clamped bounds - // arguments (in that order). - ABSL_DCHECK_EQ(signature.arguments().size(), 3) - << "upper_case_function_name = " << upper_case_function_name << "\n" - << signature.DebugString(function_name, /*verbose=*/true); - } if (signature.IsInternal()) { continue; } - const std::string base_argument_type = - signature.argument(0).UserFacingName(language_options.product_mode()); - const std::string lower_bound_type = - signature.argument(is_function_name_percentile_or_quantiles ? 2 : 1) - .UserFacingName(language_options.product_mode()); - const std::string upper_bound_type = - signature.argument(is_function_name_percentile_or_quantiles ? 3 : 2) - .UserFacingName(language_options.product_mode()); if (!supported_signatures.empty()) { absl::StrAppend(&supported_signatures, ", "); } - // TODO: Once the DP Library's Quantiles supports - // automatic/implicit bounds and ZetaSQL is ready to support them, - // remove the is_quantiles conditionals below when CLAMPED BETWEEN is - // optional for ANON_QUANTILES. - absl::StrAppend(&supported_signatures, absl::AsciiStrToUpper(function_name), - "(", base_argument_type, percentile_or_quantiles, " ", - (is_function_name_quantiles ? "" : "["), "CLAMPED BETWEEN ", - lower_bound_type, " AND ", upper_bound_type, - (is_function_name_quantiles ? "" : "]"), ")"); + absl::StrAppend(&supported_signatures, + SignatureTextForAnonFunction( + function_name, language_options, function, signature)); } return supported_signatures; } @@ -166,6 +177,11 @@ static const FunctionOptions AddDefaultFunctionOptions( options.set_supported_signatures_callback(absl::bind_front( &SupportedSignaturesForAnonFunction, std::string(name))); } + options.set_hide_supported_signatures(options.hide_supported_signatures); + if (options.signature_text_callback == nullptr) { + options.set_signature_text_callback( + absl::bind_front(&SignatureTextForAnonFunction, std::string(name))); + } if (options.bad_argument_error_prefix_callback == nullptr) { options.set_bad_argument_error_prefix_callback(absl::bind_front( AnonFunctionBadArgumentErrorPrefix, std::string(name))); @@ -174,7 +190,7 @@ static const FunctionOptions AddDefaultFunctionOptions( } AnonFunction::AnonFunction( - absl::string_view name, const std::string& group, + absl::string_view name, absl::string_view group, const std::vector& function_signatures, const FunctionOptions& function_options, const std::string& partial_aggregate_name) diff --git a/zetasql/public/anon_function.h b/zetasql/public/anon_function.h index 1ee6d5dfc..e539f9584 100644 --- a/zetasql/public/anon_function.h +++ b/zetasql/public/anon_function.h @@ -42,7 +42,7 @@ class AnonFunction : public Function { // three arguments with the second two being the CLAMPED BETWEEN clause. The // appropriate error message callbacks for rendering the function call with // CLAMPED BETWEEN syntax are automatically added. - AnonFunction(absl::string_view name, const std::string& group, + AnonFunction(absl::string_view name, absl::string_view group, const std::vector& function_signatures, const FunctionOptions& function_options, const std::string& partial_aggregate_name); diff --git a/zetasql/public/builtin_function.cc b/zetasql/public/builtin_function.cc index afbaaaaaa..a75682688 100644 --- a/zetasql/public/builtin_function.cc +++ b/zetasql/public/builtin_function.cc @@ -162,6 +162,7 @@ absl::Status GetBuiltinFunctionsAndTypes(const BuiltinFunctionOptions& options, GetKllQuantilesFunctions(&type_factory, options, &functions); ZETASQL_RETURN_IF_ERROR( GetProto3ConversionFunctions(&type_factory, options, &functions)); + // TODO: Move language feature checks to function declarations. if (options.language_options.LanguageFeatureEnabled( FEATURE_ANALYTIC_FUNCTIONS)) { GetAnalyticFunctions(&type_factory, options, &functions); @@ -191,6 +192,7 @@ absl::Status GetBuiltinFunctionsAndTypes(const BuiltinFunctionOptions& options, GetArrayFilteringFunctions(&type_factory, options, &functions); GetArrayTransformFunctions(&type_factory, options, &functions); GetArrayIncludesFunctions(&type_factory, options, &functions); + GetElementWiseAggregationFunctions(&type_factory, options, &functions); if (options.language_options.LanguageFeatureEnabled( FEATURE_V_1_4_ARRAY_FIND_FUNCTIONS)) { ZETASQL_RETURN_IF_ERROR( @@ -201,6 +203,9 @@ absl::Status GetBuiltinFunctionsAndTypes(const BuiltinFunctionOptions& options, ZETASQL_RETURN_IF_ERROR( GetArrayZipFunctions(&type_factory, options, &functions, &types)); } + ZETASQL_RETURN_IF_ERROR( + GetStandaloneBuiltinEnumTypes(&type_factory, options, &types)); + GetMapCoreFunctions(&type_factory, options, &functions); return absl::OkStatus(); } diff --git a/zetasql/public/builtin_function.proto b/zetasql/public/builtin_function.proto index 9556390a3..0c62be6ae 100644 --- a/zetasql/public/builtin_function.proto +++ b/zetasql/public/builtin_function.proto @@ -39,9 +39,9 @@ enum FunctionSignatureId { // other special functions (CASE). FunctionSignatureIds are assigned // in ranges: // - // 0002-0999 Non-standard function calls (NextId: 302) + // 0002-0999 Non-standard function calls (NextId: 310) // 1000-1099 String functions (NextId: 1083) - // 1100-1199 Control flow functions (NextId: 1107) + // 1100-1199 Control flow functions (NextId: 1108) // 1200-1299 Time functions (Fully used) // 1300-1399 Math functions (Fully used) // 1400-1499 Aggregate functions I (Fully used) @@ -50,14 +50,15 @@ enum FunctionSignatureId { // 1700-1799 Net functions (NextId: 1716) // 1800-1899 More time functions (NextId: 1864) // 1900-1999 Hashing/encryption functions (NextId: 1938) - // 2000-2199 Geography functions (NextId: 2092) + // 2000-2199 Geography functions (NextId: 2095) // 2200-2231 Anonymization functions (NextId: 2232) - // 2300-2499 Numeric, more math & distances (NextId: 2355) + // 2300-2499 Numeric, more math & distances (NextId: 2381) // 2500-2599 Array and proto map functions (NextId: 2561) - // 2600-2699 More misc functions. (NextId: 2622) - // 2700-2799 Aggregate functions II (NextId: 2746) + // 2600-2699 More misc functions. (NextId: 2627) + // 2700-2799 Aggregate functions II (NextId: 2764) // 2800-2899 Differential Privacy functions (NextId: 2832) // 2900-2999 Range functions (NextId: 2921) + // 3000-3100 Map functions (NextId: 3001) // enum value // Related function name // ---------- // --------------------- @@ -107,6 +108,14 @@ enum FunctionSignatureId { FN_BYTE_LIKE = 98; // $like FN_BYTE_LIKE_ANY = 297; // $like_any FN_BYTE_LIKE_ALL = 298; // $like_all + FN_STRING_NOT_LIKE_ANY = 302; // $not_like_any + FN_BYTE_NOT_LIKE_ANY = 303; // $not_like_any + FN_STRING_ARRAY_NOT_LIKE_ANY = 304; // $not_like_any_array + FN_BYTE_ARRAY_NOT_LIKE_ANY = 305; // $not_like_any_array + FN_STRING_NOT_LIKE_ALL = 306; // $not_like_all + FN_BYTE_NOT_LIKE_ALL = 307; // $not_like_all + FN_STRING_ARRAY_NOT_LIKE_ALL = 308; // $not_like_all_array + FN_BYTE_ARRAY_NOT_LIKE_ALL = 309; // $not_like_all_array FN_IN = 100; // $in FN_IN_ARRAY = 219; // $in_array FN_BETWEEN = 110; // $between @@ -324,6 +333,19 @@ enum FunctionSignatureId { FN_NULLIFERROR = 1105; // nulliferror FN_ISERROR = 1106; // iserror + // Internal function, enabled only when + // FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION is turned on. + // + // This is a special operator that accepts an expression and a BYTES value + // representing a deferred side-effect such as an error, or NULL if no such + // side effect was generated. The object is engine-defined. For example, it + // can be a serialized status. If there is such a pending side-effect, this + // operator exposes it immediately (e.g. the error is produced). Otherwise, + // the expression is returned. Calls to this operator are where deferred side + // effects generated in a ResolvedDeferredComputedColumn are finally consumed. + // See (broken link). + FN_WITH_SIDE_EFFECTS = 1107; // $with_side_effects + // Time functions FN_CURRENT_DATE = 1200; // current_date FN_CURRENT_DATETIME = 1804; // current_datetime @@ -687,6 +709,27 @@ enum FunctionSignatureId { FN_VAR_SAMP_NUMERIC = 1477; // var_samp FN_VAR_SAMP_BIGNUMERIC = 1485; // var_samp + // Vector aggregate functions + FN_ELEMENTWISE_SUM_INT32 = 2746; // elementwise_sum + FN_ELEMENTWISE_SUM_INT64 = 2747; // elementwise_sum + FN_ELEMENTWISE_SUM_UINT32 = 2748; // elementwise_sum + FN_ELEMENTWISE_SUM_UINT64 = 2749; // elementwise_sum + FN_ELEMENTWISE_SUM_FLOAT = 2750; // elementwise_sum + FN_ELEMENTWISE_SUM_DOUBLE = 2751; // elementwise_sum + FN_ELEMENTWISE_SUM_NUMERIC = 2752; // elementwise_sum + FN_ELEMENTWISE_SUM_BIGNUMERIC = 2753; // elementwise_sum + FN_ELEMENTWISE_SUM_INTERVAL = 2754; // elementwise_sum + + FN_ELEMENTWISE_AVG_INT32 = 2755; // elementwise_avg + FN_ELEMENTWISE_AVG_INT64 = 2756; // elementwise_avg + FN_ELEMENTWISE_AVG_UINT32 = 2757; // elementwise_avg + FN_ELEMENTWISE_AVG_UINT64 = 2758; // elementwise_avg + FN_ELEMENTWISE_AVG_FLOAT = 2759; // elementwise_avg + FN_ELEMENTWISE_AVG_DOUBLE = 2760; // elementwise_avg + FN_ELEMENTWISE_AVG_NUMERIC = 2761; // elementwise_avg + FN_ELEMENTWISE_AVG_BIGNUMERIC = 2762; // elementwise_avg + FN_ELEMENTWISE_AVG_INTERVAL = 2763; // elementwise_avg + FN_COUNTIF = 1443; // countif // Approximate quantiles functions that produce or consume intermediate @@ -1007,6 +1050,7 @@ enum FunctionSignatureId { FN_ST_EXTERIORRING = 2078; FN_ST_INTERIORRINGS = 2079; FN_ST_LINE_SUBSTRING = 2091; + FN_ST_LINE_INTERPOLATE_POINT = 2092; // Predicates FN_ST_EQUALS = 2020; @@ -1054,6 +1098,8 @@ enum FunctionSignatureId { FN_ST_GEOG_FROM_GEO_JSON_EXT = 2068; FN_ST_GEOG_FROM_WKB = 2056; FN_ST_GEOG_FROM_WKB_HEX = 2072; + FN_ST_GEOG_FROM_WKB_EXT = 2093; + FN_ST_GEOG_FROM_WKB_HEX_EXT = 2094; FN_ST_GEOG_FROM_STRING = 2073; FN_ST_GEOG_FROM_BYTES = 2074; FN_ST_AS_TEXT = 2053; @@ -1324,6 +1370,15 @@ enum FunctionSignatureId { // DOUBLE>>) -> DOUBLE FN_COSINE_DISTANCE_SPARSE_STRING = 2347; + // approx_cosine_distance(array, array) -> DOUBLE + FN_APPROX_COSINE_DISTANCE_DOUBLE = 2355; + // approx_cosine_distance(array, array) -> DOUBLE + FN_APPROX_COSINE_DISTANCE_FLOAT = 2356; + // approx_cosine_distance(array, array, json) -> DOUBLE + FN_APPROX_COSINE_DISTANCE_DOUBLE_WITH_OPTIONS = 2357; + // approx_cosine_distance(array, array, json) -> DOUBLE + FN_APPROX_COSINE_DISTANCE_FLOAT_WITH_OPTIONS = 2358; + // euclidean_distance(array, array) -> DOUBLE FN_EUCLIDEAN_DISTANCE_DENSE_DOUBLE = 2348; // euclidean_distance(array, array) -> DOUBLE @@ -1334,8 +1389,111 @@ enum FunctionSignatureId { // euclidean_distance(array>, array>) -> DOUBLE FN_EUCLIDEAN_DISTANCE_SPARSE_STRING = 2350; + + // approx_euclidean_distance(array, array) -> DOUBLE + FN_APPROX_EUCLIDEAN_DISTANCE_DOUBLE = 2359; + // approx_euclidean_distance(array, array) -> DOUBLE + FN_APPROX_EUCLIDEAN_DISTANCE_FLOAT = 2360; + // approx_euclidean_distance(array, array, json) -> DOUBLE + FN_APPROX_EUCLIDEAN_DISTANCE_DOUBLE_WITH_OPTIONS = 2361; + // approx_euclidean_distance(array, array, json) -> DOUBLE + FN_APPROX_EUCLIDEAN_DISTANCE_FLOAT_WITH_OPTIONS = 2362; + // edit_distance(STRING, STRING[, INT64]) -> INT64 FN_EDIT_DISTANCE = 2351; // edit_distance(BYTES, BYTES[, INT64]) -> INT64 FN_EDIT_DISTANCE_BYTES = 2352; + + // dot_product(array, array) -> DOUBLE + FN_DOT_PRODUCT_INT64 = 2363; + // dot_product(array, array) -> DOUBLE + FN_DOT_PRODUCT_FLOAT = 2364; + // dot_product(array, array) -> DOUBLE + FN_DOT_PRODUCT_DOUBLE = 2365; + + // approx_dot_product(array, array) -> DOUBLE + FN_APPROX_DOT_PRODUCT_INT64 = 2366; + // approx_dot_product(array, array) -> DOUBLE + FN_APPROX_DOT_PRODUCT_FLOAT = 2367; + // approx_dot_product(array, array) -> DOUBLE + FN_APPROX_DOT_PRODUCT_DOUBLE = 2368; + // approx_dot_product(array, array, json) -> DOUBLE + FN_APPROX_DOT_PRODUCT_INT64_WITH_OPTIONS = 2369; + // approx_dot_product(array, array, json) -> DOUBLE + FN_APPROX_DOT_PRODUCT_FLOAT_WITH_OPTIONS = 2370; + // approx_dot_product(array, array, json) -> DOUBLE + FN_APPROX_DOT_PRODUCT_DOUBLE_WITH_OPTIONS = 2371; + + // manhattan_distance(array, array) -> DOUBLE + FN_MANHATTAN_DISTANCE_INT64 = 2372; + // manhattan_distance(array, array) -> DOUBLE + FN_MANHATTAN_DISTANCE_FLOAT = 2373; + // manhattan_distance(array, array) -> DOUBLE + FN_MANHATTAN_DISTANCE_DOUBLE = 2374; + + // l1_norm(array) -> DOUBLE + FN_L1_NORM_INT64 = 2375; + // l1_norm(array) -> DOUBLE + FN_L1_NORM_FLOAT = 2376; + // l1_norm(array) -> DOUBLE + FN_L1_NORM_DOUBLE = 2377; + + // l2_norm(array) -> DOUBLE + FN_L2_NORM_INT64 = 2378; + // l2_norm(array) -> DOUBLE + FN_L2_NORM_FLOAT = 2379; + // l2_norm(array) -> DOUBLE + FN_L2_NORM_DOUBLE = 2380; + + // bool_array(JSON) -> array + FN_JSON_TO_BOOL_ARRAY = 2627; + // lax_bool_array(JSON) -> array + FN_JSON_LAX_TO_BOOL_ARRAY = 2628; + // float64_array(JSON) -> array + FN_JSON_TO_FLOAT64_ARRAY = 2629; + // lax_float64_array(JSON) -> array + FN_JSON_LAX_TO_FLOAT64_ARRAY = 2630; + // int64_array(JSON) -> array + FN_JSON_TO_INT64_ARRAY = 2631; + // lax_int64_array(JSON) -> array + FN_JSON_LAX_TO_INT64_ARRAY = 2632; + // string_array(JSON) -> array + FN_JSON_TO_STRING_ARRAY = 2633; + // lax_string_array(JSON) -> array + FN_JSON_LAX_TO_STRING_ARRAY = 2634; + // float32(JSON) -> FLOAT32 + FN_JSON_TO_FLOAT32 = 2635; + // lax_float32(JSON) -> FLOAT32 + FN_JSON_LAX_TO_FLOAT32 = 2636; + // float32_array(JSON) -> array + FN_JSON_TO_FLOAT32_ARRAY = 2637; + // lax_float32_array(JSON) -> array + FN_JSON_LAX_TO_FLOAT32_ARRAY = 2638; + // int32(JSON) -> INT32 + FN_JSON_TO_INT32 = 2639; + // lax_int32(JSON) -> INT32 + FN_JSON_LAX_TO_INT32 = 2640; + // int32_array(JSON) -> array + FN_JSON_TO_INT32_ARRAY = 2641; + // lax_int32_array(JSON) -> array + FN_JSON_LAX_TO_INT32_ARRAY = 2642; + // uint32(JSON) -> UINT32 + FN_JSON_TO_UINT32 = 2643; + // lax_uint32(JSON) -> UINT32 + FN_JSON_LAX_TO_UINT32 = 2644; + // uint32_array(JSON) -> array + FN_JSON_TO_UINT32_ARRAY = 2645; + // lax_uint32_array(JSON) -> array + FN_JSON_LAX_TO_UINT32_ARRAY = 2646; + // uint64(JSON) -> UINT64 + FN_JSON_TO_UINT64 = 2647; + // lax_uint64(JSON) -> UINT64 + FN_JSON_LAX_TO_UINT64 = 2648; + // uint64_array(JSON) -> array + FN_JSON_TO_UINT64_ARRAY = 2649; + // lax_uint64_array(JSON) -> array + FN_JSON_LAX_TO_UINT64_ARRAY = 2650; + + // Map functions + FN_MAP_FROM_ARRAY = 3000; // map_from_array(array>) -> map } diff --git a/zetasql/public/cast.cc b/zetasql/public/cast.cc index 86091a975..f20d381a2 100644 --- a/zetasql/public/cast.cc +++ b/zetasql/public/cast.cc @@ -16,20 +16,14 @@ #include "zetasql/public/cast.h" -#include #include -#include #include #include #include -#include #include #include #include "zetasql/base/logging.h" -#include "google/protobuf/arena.h" -#include "google/protobuf/dynamic_message.h" -#include "google/protobuf/message.h" #include "zetasql/common/errors.h" #include "zetasql/common/internal_value.h" #include "zetasql/common/utf_util.h" @@ -53,13 +47,15 @@ #include "zetasql/public/strings.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/value.h" +#include "absl/algorithm/container.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/cord.h" +#include "absl/strings/match.h" +#include "absl/strings/str_format.h" +#include "absl/strings/string_view.h" #include "absl/time/time.h" #include "zetasql/base/map_util.h" -#include "zetasql/base/source_location.h" -#include "zetasql/base/status.h" #include "zetasql/base/status_macros.h" namespace zetasql { @@ -218,6 +214,7 @@ const CastHashMap* InitializeZetaSQLCasts() { ADD_TO_MAP(BYTES, BYTES, IMPLICIT); ADD_TO_MAP(BYTES, STRING, EXPLICIT); ADD_TO_MAP(BYTES, PROTO, EXPLICIT_OR_LITERAL_OR_PARAMETER); + ADD_TO_MAP(BYTES, TOKENLIST, EXPLICIT); ADD_TO_MAP(DATE, DATE, IMPLICIT); ADD_TO_MAP(DATE, DATETIME, IMPLICIT); @@ -248,6 +245,9 @@ const CastHashMap* InitializeZetaSQLCasts() { ADD_TO_MAP(JSON, JSON, IMPLICIT); + ADD_TO_MAP(TOKENLIST, TOKENLIST, IMPLICIT); + ADD_TO_MAP(TOKENLIST, BYTES, EXPLICIT); + ADD_TO_MAP(ENUM, STRING, EXPLICIT); ADD_TO_MAP(ENUM, INT32, EXPLICIT); @@ -268,6 +268,7 @@ const CastHashMap* InitializeZetaSQLCasts() { ADD_TO_MAP(ARRAY, ARRAY, IMPLICIT); ADD_TO_MAP(STRUCT, STRUCT, IMPLICIT); ADD_TO_MAP(RANGE, RANGE, IMPLICIT); + ADD_TO_MAP(MAP, MAP, IMPLICIT); // clang-format on return map; @@ -400,6 +401,10 @@ absl::StatusOr DoMapEntryCast(const Value& from_value, return Value::Proto(to_proto_type, bytes); } +static std::string ValueOrUnbounded(std::optional value) { + return value ? *value : "UNBOUNDED"; +} + } // namespace bool SupportsImplicitCoercion(CastFunctionType type) { @@ -980,6 +985,13 @@ absl::StatusOr CastContext::CastValue( // Opaque proto support does not affect this implementation, which does // no validation. return Value::Proto(to_type->AsProto(), absl::Cord(v.bytes_value())); + case FCT(TYPE_BYTES, TYPE_TOKENLIST): { + auto tokenlist = tokens::TokenList::FromBytes(v.bytes_value()); + if (!tokenlist.IsValid()) { + return MakeEvalError() << "Invalid tokenlist encoding"; + } + return Value::TokenList(std::move(tokenlist)); + } case FCT(TYPE_DATE, TYPE_STRING): { std::string date; if (format.has_value()) { @@ -1126,6 +1138,9 @@ absl::StatusOr CastContext::CastValue( return Value::Interval(interval); } + case FCT(TYPE_TOKENLIST, TYPE_BYTES): + return Value::Bytes(v.tokenlist_value().GetBytes()); + case FCT(TYPE_STRUCT, TYPE_STRUCT): { const StructType* v_type = v.type()->AsStruct(); std::vector casted_field_values(v_type->num_fields()); @@ -1272,6 +1287,24 @@ absl::StatusOr CastContext::CastValue( } return Value::MakeRange(start, end); } + case FCT(TYPE_RANGE, TYPE_STRING): { + if (v.is_null()) { + return Value::NullString(); + } + std::optional start = std::nullopt; + if (!v.start().is_null()) { + ZETASQL_ASSIGN_OR_RETURN(Value start_str, + CastValue(v.start(), to_type, format)); + start = start_str.string_value(); + } + std::optional end = std::nullopt; + if (!v.end().is_null()) { + ZETASQL_ASSIGN_OR_RETURN(Value end_str, CastValue(v.end(), to_type, format)); + end = end_str.string_value(); + } + return Value::String(absl::StrFormat("[%s, %s)", ValueOrUnbounded(start), + ValueOrUnbounded(end))); + } default: return ::zetasql_base::UnimplementedErrorBuilder() << "Unimplemented cast from " diff --git a/zetasql/public/cast_test.cc b/zetasql/public/cast_test.cc index 2e2020024..17631bd89 100644 --- a/zetasql/public/cast_test.cc +++ b/zetasql/public/cast_test.cc @@ -36,6 +36,7 @@ #include "zetasql/testing/using_test_value.cc" #include "gmock/gmock.h" #include "gtest/gtest.h" +#include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/cord.h" #include "absl/strings/str_cat.h" @@ -48,8 +49,10 @@ MATCHER_P(StringValueMatches, matcher, "") { return ExplainMatchResult(matcher, arg.string_value(), result_listener); } +using testing::HasSubstr; using testing::StrCaseEq; using zetasql_base::testing::IsOkAndHolds; +using zetasql_base::testing::StatusIs; static TypeFactory* type_factory = new TypeFactory(); @@ -375,4 +378,7 @@ INSTANTIATE_TEST_SUITE_P( CastNumericString, CastTemplateTest, testing::ValuesIn(GetFunctionTestsCastNumericString())); +INSTANTIATE_TEST_SUITE_P(CastTokenList, CastTemplateTest, + testing::ValuesIn(GetFunctionTestsCastTokenList())); + } // namespace zetasql diff --git a/zetasql/public/catalog.h b/zetasql/public/catalog.h index dd0c04f8e..9e4c18f66 100644 --- a/zetasql/public/catalog.h +++ b/zetasql/public/catalog.h @@ -622,6 +622,11 @@ class EnumerableCatalog : public Catalog { absl::flat_hash_set* output) const = 0; virtual absl::Status GetFunctions( absl::flat_hash_set* output) const = 0; + virtual absl::Status GetTableValuedFunctions( + absl::flat_hash_set* output) const { + return absl::NotFoundError( + "TableValuedFunctions are not supported in this EnumerableCatalog"); + } virtual absl::Status GetConversions( absl::flat_hash_set* output) const { return absl::NotFoundError( diff --git a/zetasql/public/coercer.cc b/zetasql/public/coercer.cc index 2c8b2150d..e38039b20 100644 --- a/zetasql/public/coercer.cc +++ b/zetasql/public/coercer.cc @@ -42,6 +42,7 @@ #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/match.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" @@ -193,7 +194,7 @@ class TypeGlobalOrderChecker { return checker.Check(); } - static absl::Status Check(const std::vector& supertypes_list, + static absl::Status Check(absl::Span supertypes_list, Catalog* catalog = nullptr) { std::vector types; for (const TypeSuperTypes& supertypes : supertypes_list) { @@ -304,7 +305,7 @@ class TypeGlobalOrderChecker { }; absl::Status CheckSupertypesGlobalOrderForCoercer( - const std::vector& supertypes_list, Catalog* catalog) { + absl::Span supertypes_list, Catalog* catalog) { if (!std::any_of( supertypes_list.begin(), supertypes_list.end(), [](const auto& st) { return st.type()->IsExtendedType(); })) { diff --git a/zetasql/public/collator_test.cc b/zetasql/public/collator_test.cc index 76d1c62a3..935c305ea 100644 --- a/zetasql/public/collator_test.cc +++ b/zetasql/public/collator_test.cc @@ -36,7 +36,7 @@ enum class CompareType { kSortKey, kCompare }; class CollatorTest : public ::testing::TestWithParam { protected: - void TestEquals(absl::string_view s1, const std::string& s2, + void TestEquals(absl::string_view s1, absl::string_view s2, const ZetaSqlCollator* collator) { switch (GetParam()) { case CompareType::kCompare: { @@ -57,7 +57,7 @@ class CollatorTest : public ::testing::TestWithParam { } } - void TestLessThan(absl::string_view s1, const std::string& s2, + void TestLessThan(absl::string_view s1, absl::string_view s2, const ZetaSqlCollator* collator) { switch (GetParam()) { case CompareType::kCompare: { diff --git a/zetasql/public/convert_type_to_proto.cc b/zetasql/public/convert_type_to_proto.cc index d9dba403a..cd67b1436 100644 --- a/zetasql/public/convert_type_to_proto.cc +++ b/zetasql/public/convert_type_to_proto.cc @@ -35,6 +35,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_split.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" @@ -95,7 +96,7 @@ class TypeToProtoConverter { // Make a proto to represent in , which is // assumed to be an empty message. absl::Status MakeStructProto(const StructType* struct_type, - const std::string& name, + absl::string_view name, google::protobuf::DescriptorProto* struct_proto); // Make a proto to represent in , which is @@ -228,6 +229,12 @@ absl::Status TypeToProtoConverter::MakeFieldDescriptor( zetasql::format, FieldFormat::JSON); break; } + case TYPE_TOKENLIST: { + proto_field->set_type(google::protobuf::FieldDescriptorProto::TYPE_BYTES); + proto_field->mutable_options()->SetExtension(zetasql::format, + FieldFormat::TOKENLIST); + break; + } case TYPE_RANGE: { proto_field->set_type(google::protobuf::FieldDescriptorProto::TYPE_BYTES); const RangeType* range_type = field_type->AsRange(); @@ -318,6 +325,10 @@ absl::Status TypeToProtoConverter::MakeFieldDescriptor( // TODO: fix by moving this logic into Type class. return absl::UnimplementedError( "Extended types are not fully implemented"); + case TYPE_MAP: + // TODO: Implement proto type conversion for Map. + return absl::UnimplementedError( + "Proto type conversion for Map is not yet implemented."); case __TypeKind__switch_must_have_a_default__: case TYPE_UNKNOWN: break; // Error generated below. @@ -349,7 +360,7 @@ static bool IsValidFieldName(const absl::string_view name) { } absl::Status TypeToProtoConverter::MakeStructProto( - const StructType* struct_type, const std::string& name, + const StructType* struct_type, absl::string_view name, google::protobuf::DescriptorProto* struct_proto) { ZETASQL_RET_CHECK_EQ(struct_proto->field_size(), 0); @@ -599,7 +610,7 @@ absl::Status ConvertArrayToProto( } absl::Status ConvertTableToProto( - const std::vector>& columns, + absl::Span> columns, bool is_value_table, google::protobuf::FileDescriptorProto* file, const ConvertTypeToProtoOptions& options) { TypeFactory type_factory; diff --git a/zetasql/public/convert_type_to_proto.h b/zetasql/public/convert_type_to_proto.h index c9e7c9dba..d8990cc00 100644 --- a/zetasql/public/convert_type_to_proto.h +++ b/zetasql/public/convert_type_to_proto.h @@ -29,6 +29,7 @@ #include "zetasql/public/type.pb.h" #include "absl/container/flat_hash_map.h" #include "absl/container/node_hash_map.h" +#include "absl/types/span.h" #include "zetasql/base/status.h" namespace zetasql { @@ -236,7 +237,7 @@ absl::Status ConvertTableToProto( // SQL tables to structs. For value tables, must have // exactly one column. absl::Status ConvertTableToProto( - const std::vector>& columns, + absl::Span> columns, bool is_value_table, google::protobuf::FileDescriptorProto* file, const ConvertTypeToProtoOptions& options = ConvertTypeToProtoOptions()); diff --git a/zetasql/public/error_helpers.cc b/zetasql/public/error_helpers.cc index a8e0b7c39..d05cf883f 100644 --- a/zetasql/public/error_helpers.cc +++ b/zetasql/public/error_helpers.cc @@ -176,7 +176,7 @@ void ClearErrorLocation(absl::Status* status) { static bool IsWordChar(char c) { return isalnum(c) || c == '_'; } // Return true if (0-based) in starts a word. -static bool IsWordStart(const std::string& str, int column) { +static bool IsWordStart(absl::string_view str, int column) { ABSL_DCHECK_LT(column, str.size()); if (column == 0 || column >= str.size()) return true; return !IsWordChar(str[column - 1]) && IsWordChar(str[column]); @@ -251,7 +251,7 @@ static void GetTruncatedInputStringInfo(absl::string_view input, // Helper function to return an error string from an error line and column. static std::string GetErrorStringFromErrorLineAndColumn( - const std::string& error_line, const int error_column) { + absl::string_view error_line, const int error_column) { return absl::StrFormat("%s\n%*s^", error_line, error_column, ""); } @@ -398,7 +398,7 @@ absl::Status MaybeUpdateErrorFromPayload(ErrorMessageMode mode, } absl::Status UpdateErrorLocationPayloadWithFilenameIfNotPresent( - const absl::Status& status, const std::string& filename) { + const absl::Status& status, absl::string_view filename) { ErrorLocation error_location; if (filename.empty() || !GetErrorLocation(status, &error_location)) { return status; diff --git a/zetasql/public/error_helpers.h b/zetasql/public/error_helpers.h index a52c1cd05..a1987e0ad 100644 --- a/zetasql/public/error_helpers.h +++ b/zetasql/public/error_helpers.h @@ -121,7 +121,7 @@ absl::Status MaybeUpdateErrorFromPayload(ErrorMessageMode mode, // payload to set the and returns an updated Status with the // updated ErrorLocation. Otherwise, just returns . absl::Status UpdateErrorLocationPayloadWithFilenameIfNotPresent( - const absl::Status& status, const std::string& filename); + const absl::Status& status, absl::string_view filename); // If is OK or if it does not have a InternalErrorLocation payload, // returns . Otherwise, replaces the InternalErrorLocation payload by an diff --git a/zetasql/public/error_helpers_test.cc b/zetasql/public/error_helpers_test.cc index b87adb4af..dda3415c6 100644 --- a/zetasql/public/error_helpers_test.cc +++ b/zetasql/public/error_helpers_test.cc @@ -36,6 +36,7 @@ #include "gtest/gtest.h" #include "absl/status/status.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "zetasql/base/map_util.h" #include "zetasql/base/source_location.h" #include "zetasql/base/status.h" @@ -204,9 +205,8 @@ TEST(ErrorHelpersTest, ErrorLocationHelpers) { ClearErrorLocation(&status3); } -static void TestGetCaret(const std::string& query, - const ErrorLocation& location, - const std::string& expected_output) { +static void TestGetCaret(absl::string_view query, const ErrorLocation& location, + absl::string_view expected_output) { EXPECT_EQ(expected_output, GetErrorStringWithCaret(query, location)); } diff --git a/zetasql/public/evaluator_test.cc b/zetasql/public/evaluator_test.cc index e6537f143..77419fea4 100644 --- a/zetasql/public/evaluator_test.cc +++ b/zetasql/public/evaluator_test.cc @@ -25,15 +25,12 @@ #include "zetasql/base/logging.h" #include "google/protobuf/descriptor.pb.h" -#include "google/protobuf/dynamic_message.h" -#include "google/protobuf/message.h" -#include "google/protobuf/text_format.h" #include "zetasql/common/evaluator_test_table.h" #include "zetasql/base/testing/status_matchers.h" -#include "zetasql/common/testing/testing_proto_util.h" #include "zetasql/public/analyzer.h" #include "zetasql/public/analyzer_output.h" #include "zetasql/public/builtin_function_options.h" +#include "zetasql/public/catalog.h" #include "zetasql/public/civil_time.h" #include "zetasql/public/evaluator_base.h" #include "zetasql/public/evaluator_table_iterator.h" @@ -57,7 +54,6 @@ #include "zetasql/testdata/populate_sample_tables.h" #include "zetasql/testdata/sample_catalog.h" #include "zetasql/testdata/test_schema.pb.h" -#include "zetasql/testing/test_value.h" #include "zetasql/testing/using_test_value.cc" #include "gmock/gmock.h" #include "gtest/gtest.h" @@ -69,10 +65,11 @@ #include "absl/strings/cord.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_format.h" -#include "absl/strings/str_join.h" +#include "absl/strings/string_view.h" +#include "absl/time/civil_time.h" #include "absl/time/time.h" #include "absl/types/span.h" -#include "zetasql/base/stl_util.h" +#include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" @@ -1275,7 +1272,7 @@ TEST(EvaluatorTest, GetReferencedColumnsUsingCallback) { PreparedExpression expr("col"); AnalyzerOptions options; options.SetLookupExpressionColumnCallback( - [](const std::string& column_name, const Type** column_type) { + [](absl::string_view column_name, const Type** column_type) { if (column_name == "col") { *column_type = types::Int64Type(); } @@ -1295,7 +1292,7 @@ TEST(EvaluatorTest, GetReferencedColumnsUsingCallbackInMixedCases) { PreparedExpression expr("cOl"); AnalyzerOptions options; options.SetLookupExpressionColumnCallback( - [](const std::string& column_name, const Type** column_type) { + [](absl::string_view column_name, const Type** column_type) { if (column_name == "col") { *column_type = types::Int64Type(); } @@ -1729,7 +1726,7 @@ struct PreparedExpressionFromAST { std::unique_ptr expression; }; PreparedExpressionFromAST ParseToASTAndPrepareOrDie( - const std::string& sql, const AnalyzerOptions& analyzer_options, + absl::string_view sql, const AnalyzerOptions& analyzer_options, TypeFactory* type_factory) { PreparedExpressionFromAST prepared_from_ast; prepared_from_ast.catalog = std::make_unique("foo"); @@ -3069,12 +3066,199 @@ TEST(PreparedQuery, ZETASQL_ASSERT_OK(query.Prepare(analyzer_options, &catalog)); ZETASQL_ASSERT_OK_AND_ASSIGN(int count, query.GetPositionalParameterCount()); EXPECT_EQ(2, count); - EXPECT_THAT(query.ExecuteWithPositionalParams({Value::Int64(6), - Value::StringValue("foo")}), - StatusIs( - absl::StatusCode::kInternal, - HasSubstr("Mismatch in number of analyzer parameters versus " - "algebrizer parameters"))); + EXPECT_THAT( + query.ExecuteWithPositionalParams( + {Value::Int64(6), Value::StringValue("foo")}), + StatusIs(absl::StatusCode::kInternal, + HasSubstr("Mismatch in number of analyzer parameters versus " + "algebrizer parameters"))); +} + +class PreparedTvfQuery : public testing::Test { + protected: + void SetUp() override { + analyzer_options_.mutable_language()->EnableLanguageFeature( + FEATURE_TABLE_VALUED_FUNCTIONS); + analyzer_options_.mutable_language()->EnableLanguageFeature( + FEATURE_NAMED_ARGUMENTS); + } + + absl::StatusOr> Execute( + absl::string_view query) { + std::unique_ptr analyzer_output; + ZETASQL_RETURN_IF_ERROR(AnalyzeStatement(query, analyzer_options_, + catalog_.catalog(), &type_factory_, + &analyzer_output)); + prepared_query_ = std::make_unique( + analyzer_output->resolved_statement()->GetAs(), + evaluator_options_); + ZETASQL_RETURN_IF_ERROR(prepared_query_->Prepare(analyzer_options_)); + return prepared_query_->Execute(); + } + + SampleCatalog catalog_; + AnalyzerOptions analyzer_options_; + TypeFactory type_factory_; + EvaluatorOptions evaluator_options_; + std::unique_ptr prepared_query_; +}; + +TEST_F(PreparedTvfQuery, TVF_OptionalRelation_Absent) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_optional_relation(); + )")); + EXPECT_EQ(iter->NumColumns(), 1); + EXPECT_EQ(iter->GetColumnName(0), ""); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Int64(0)); + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_OptionalRelation_Present) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_optional_relation((select key as foo from KeyValue)); + )")); + EXPECT_EQ(iter->NumColumns(), 1); + EXPECT_EQ(iter->GetColumnName(0), ""); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Int64(2)); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Int64(4)); + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_OptionalArguments_Absent) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_optional_arguments(); + )")); + EXPECT_EQ(iter->NumColumns(), 1); + EXPECT_EQ(iter->GetColumnName(0), "y"); + EXPECT_TRUE(iter->NextRow()); + EXPECT_EQ(iter->GetValue(0), Double(5)); // 1 * 2 + 3 + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_OptionalArguments_Present) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_optional_arguments(3.0, 2, 1); + )")); + EXPECT_EQ(iter->NumColumns(), 1); + EXPECT_EQ(iter->GetColumnName(0), "y"); + EXPECT_TRUE(iter->NextRow()); + EXPECT_EQ(iter->GetValue(0), Double(7)); // 3 * 2 + 1 + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_OptionalArguments_NamedOnly) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_optional_arguments(steps=>2, dx=>-1); + )")); + EXPECT_EQ(iter->NumColumns(), 1); + EXPECT_EQ(iter->GetColumnName(0), "y"); + EXPECT_TRUE(iter->NextRow()); + EXPECT_EQ(iter->GetValue(0), Double(5)); // 1 * 2 + 3 + EXPECT_TRUE(iter->NextRow()); + EXPECT_EQ(iter->GetValue(0), Double(3)); // 0 * 2 + 3 + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_OptionalArguments_All) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_optional_arguments(3.0, 2, 1, steps=>2, dx=>-1); + )")); + EXPECT_EQ(iter->NumColumns(), 1); + EXPECT_EQ(iter->GetColumnName(0), "y"); + EXPECT_TRUE(iter->NextRow()); + EXPECT_EQ(iter->GetValue(0), Double(7)); // 3 * 2 + 1 + EXPECT_TRUE(iter->NextRow()); + EXPECT_EQ(iter->GetValue(0), Double(5)); // 2 * 2 + 1 + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_RepeatedArguments_Absent) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_repeated_arguments(); + )")); + EXPECT_EQ(iter->NumColumns(), 2); + EXPECT_EQ(iter->GetColumnName(0), "key"); + EXPECT_EQ(iter->GetColumnName(1), "value"); + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_RepeatedArguments_Present) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT * FROM tvf_repeated_arguments("a", 1, "b", 2, "c", 3); + )")); + EXPECT_EQ(iter->NumColumns(), 2); + EXPECT_EQ(iter->GetColumnName(0), "key"); + EXPECT_EQ(iter->GetColumnName(1), "value"); + EXPECT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), String("a")); + EXPECT_EQ(iter->GetValue(1), Int64(1)); + EXPECT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), String("b")); + EXPECT_EQ(iter->GetValue(1), Int64(2)); + EXPECT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), String("c")); + EXPECT_EQ(iter->GetValue(1), Int64(3)); + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_DefaultValues) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT key, value FROM tvf_increment_by(TABLE KeyValue); + )")); + EXPECT_EQ(iter->NumColumns(), 2); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Int64(2)); + EXPECT_EQ(iter->GetValue(1), String("a")); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Int64(3)); + EXPECT_EQ(iter->GetValue(1), String("b")); + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_OutputColumnsProjectionAndOrder) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT baz, bar, foo + FROM tvf_increment_by( + (SELECT value as foo, key AS bar, false as baz, key as qux FROM KeyValue), + 16); + )")); + EXPECT_EQ(iter->NumColumns(), 3); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Bool(false)); + EXPECT_EQ(iter->GetValue(1), Int64(17)); + EXPECT_EQ(iter->GetValue(2), String("a")); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Bool(false)); + EXPECT_EQ(iter->GetValue(1), Int64(18)); + EXPECT_EQ(iter->GetValue(2), String("b")); + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); +} + +TEST_F(PreparedTvfQuery, TVF_FixedSchema) { + ZETASQL_ASSERT_OK_AND_ASSIGN(auto iter, Execute(R"( + SELECT sum + FROM tvf_sum_diff((SELECT key as a, key AS b FROM KeyValue)); + )")); + EXPECT_EQ(iter->NumColumns(), 1); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Int64(2)); + ASSERT_TRUE(iter->NextRow()) << iter->Status(); + EXPECT_EQ(iter->GetValue(0), Int64(4)); + EXPECT_FALSE(iter->NextRow()); + ZETASQL_EXPECT_OK(iter->Status()); } TEST(PreparedQuery, ResolvedQueryValidatedWithCorrectLanguageOptions) { diff --git a/zetasql/public/function.cc b/zetasql/public/function.cc index e773fa388..1bbfafc19 100644 --- a/zetasql/public/function.cc +++ b/zetasql/public/function.cc @@ -39,6 +39,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" #include "absl/strings/str_replace.h" +#include "absl/strings/string_view.h" #include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -80,6 +81,7 @@ absl::Status FunctionOptions::Deserialize( options->set_supports_clamped_between_modifier( proto.supports_clamped_between_modifier()); options->set_uses_upper_case_sql_name(proto.uses_upper_case_sql_name()); + options->set_may_suppress_side_effects(proto.may_suppress_side_effects()); *result = std::move(options); return absl::OkStatus(); @@ -108,6 +110,7 @@ void FunctionOptions::Serialize(FunctionOptionsProto* proto) const { } proto->set_supports_limit(supports_limit); proto->set_supports_null_handling_modifier(supports_null_handling_modifier); + proto->set_may_suppress_side_effects(may_suppress_side_effects); } FunctionOptions& FunctionOptions::set_evaluator( @@ -527,7 +530,7 @@ absl::Status Function::CheckPostResolutionArgumentConstraints( // static const std::string Function::GetGenericNoMatchingFunctionSignatureErrorMessage( - const std::string& qualified_function_name, + absl::string_view qualified_function_name, const std::vector& arguments, ProductMode product_mode, absl::Span argument_names, bool argument_types_on_new_line) { @@ -563,6 +566,9 @@ std::string Function::GetSupportedSignaturesUserFacingText( bool print_template_details) const { // Make a good guess *num_signatures = NumSignatures(); + if (HideSupportedSignatures()) { + return ""; + } if (GetSupportedSignaturesCallback() != nullptr) { return GetSupportedSignaturesCallback()(language_options, *this); } @@ -577,11 +583,17 @@ std::string Function::GetSupportedSignaturesUserFacingText( if (!supported_signatures.empty()) { absl::StrAppend(&supported_signatures, "; "); } - std::vector argument_texts = - signature.GetArgumentsUserFacingTextWithCardinality( - language_options, print_style, print_template_details); - (*num_signatures)++; - absl::StrAppend(&supported_signatures, GetSQL(argument_texts)); + if (HasSignatureTextCallback()) { + absl::StrAppend( + &supported_signatures, + GetSignatureTextCallback()(language_options, *this, signature)); + } else { + std::vector argument_texts = + signature.GetArgumentsUserFacingTextWithCardinality( + language_options, print_style, print_template_details); + (*num_signatures)++; + absl::StrAppend(&supported_signatures, GetSQL(argument_texts)); + } } return supported_signatures; } @@ -614,6 +626,18 @@ const SupportedSignaturesCallback& Function::GetSupportedSignaturesCallback() return function_options_.supported_signatures_callback; } +bool Function::HideSupportedSignatures() const { + return function_options_.hide_supported_signatures; +} + +bool Function::HasSignatureTextCallback() const { + return GetSignatureTextCallback() != nullptr; +} + +const SignatureTextCallback& Function::GetSignatureTextCallback() const { + return function_options_.signature_text_callback; +} + const BadArgumentErrorPrefixCallback& Function::GetBadArgumentErrorPrefixCallback() const { return function_options_.bad_argument_error_prefix_callback; diff --git a/zetasql/public/function.h b/zetasql/public/function.h index 74761e104..392040797 100644 --- a/zetasql/public/function.h +++ b/zetasql/public/function.h @@ -25,7 +25,6 @@ #include -#include "google/protobuf/descriptor.h" #include "zetasql/public/function.pb.h" #include "zetasql/public/function_signature.h" #include "zetasql/public/input_argument_type.h" @@ -41,7 +40,6 @@ #include "absl/strings/string_view.h" #include "absl/types/span.h" #include "zetasql/base/map_util.h" -#include "zetasql/base/status.h" // ZetaSQL's Catalog interface class allows an implementing query engine // to define the functions available in the engine. The caller initially @@ -86,6 +84,7 @@ namespace zetasql { class AnalyzerOptions; class Catalog; class CycleDetector; +class EvaluationContext; class Function; class FunctionOptionsProto; class FunctionProto; @@ -141,6 +140,7 @@ using NoMatchingSignatureCallback = std::function&, ProductMode)>; +// TODO: remove. // This callback produces text containing supported function signatures. This // text is used in the user facing messages, i.e. in errors. Example of // text returned by such callback for LIKE operator: @@ -148,6 +148,13 @@ using NoMatchingSignatureCallback = std::function; +// This callback produces text for the supplied signature. This text is +// interleaved with mismatch reason in detailed no matching signature error +// message. +// When this is set, ignore SupportedSignaturesCallback, which will be removed. +using SignatureTextCallback = std::function; + // This callback produces a prefix for bad arguments. // An example of the standard prefix is "Argument 1 to FUNC". using BadArgumentErrorPrefixCallback = @@ -168,6 +175,9 @@ class AggregateFunctionEvaluator { public: virtual ~AggregateFunctionEvaluator() = default; + // Sets an evaluation context. + virtual void SetEvaluationContext(EvaluationContext* context) {}; + // Resets the accumulation. This method will be called before any value // accumulation and between groups, and should restore any state variables // needed to keep track of the accumulated result to their initial values. @@ -218,7 +228,7 @@ struct FunctionOptions { static constexpr WindowOrderSupport ORDER_REQUIRED = FunctionEnums::ORDER_REQUIRED; - FunctionOptions() {} + FunctionOptions() = default; // Construct FunctionOptions with support for an OVER clause. FunctionOptions(WindowOrderSupport window_ordering_support_in, @@ -300,6 +310,16 @@ struct FunctionOptions { return *this; } + FunctionOptions& set_hide_supported_signatures(bool value) { + hide_supported_signatures = value; + return *this; + } + + FunctionOptions& set_signature_text_callback(SignatureTextCallback callback) { + signature_text_callback = std::move(callback); + return *this; + } + FunctionOptions& set_bad_argument_error_prefix_callback( BadArgumentErrorPrefixCallback callback) { bad_argument_error_prefix_callback = std::move(callback); @@ -377,6 +397,10 @@ struct FunctionOptions { uses_upper_case_sql_name = value; return *this; } + FunctionOptions& set_may_suppress_side_effects(bool value) { + may_suppress_side_effects = value; + return *this; + } // Add a LanguageFeature that must be enabled for this function to be enabled. // This is used only on built-in functions, and determines whether they will @@ -452,6 +476,10 @@ struct FunctionOptions { // list of signatures supported by the function. SupportedSignaturesCallback supported_signatures_callback = nullptr; + // If not nullptr, this callback is used to construct per signature mismatch + // error message. + SignatureTextCallback signature_text_callback = nullptr; + // If not nullptr, this callback is used to construct a custom prefix to the // argument error message when certain argument conditions are not met. BadArgumentErrorPrefixCallback bad_argument_error_prefix_callback = nullptr; @@ -535,10 +563,47 @@ struct FunctionOptions { // the upper-case version of . bool uses_upper_case_sql_name = true; + // In "No matching signature" error message, when set, the list of supported + // signatures and potentially mismatch reasons for each signature are not + // printed. + bool hide_supported_signatures = false; + // A set of LanguageFeatures that need to be enabled for the function to be // loaded in GetBuiltinFunctionsAndTypes. std::set required_language_features; + // Indicates whether this function might suppress deferred side effects, + // usually due to short-circuiting of computations or effects of some of its + // arguments. For example: + // 1. IF(a, b(), c()) ensures that either b() or c() is evaluated, not both. + // 2. ISERROR(f(x)) absorbs computation errors from f(x). + // + // When FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION is enabled, functions + // with this bit on handle deferred side effects and may suppress them. + // The resolver wraps each argument that has potentially suppressed side + // effects (e.g. an aggregation computed on a different scan) into an + // invocation of the internal $with_side_effects() function. + // + // For example, the SQL expression ISERROR(agg(x)) will be resolved like the + // following, and computed in a ResolvedProjectScan: + // ISERROR($with_side_effects($col1, $err1)) + // + // after the ResolvedAggregateScan computes the aggregate as: + // $col1: = ResolvedDeferredComputedColumn + // expr: agg(x), + // side_effect_column: $err1 (payload representing any deferred + // error) + // + // Note that if the FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION if off, or + // the computation does not need to be split across scans (i.e., does not get + // refactored out and referenced as a ColumnRef, such as when there are no + // aggregations involved), resolution proceeds normally without any side- + // effect payload columns or $with_side_effects() invocations. + // + // See ResolvedDeferredComputedColumn + // and (broken link) for more details. + bool may_suppress_side_effects = false; + // Copyable. }; @@ -735,6 +800,8 @@ class Function { const NoMatchingSignatureCallback& GetNoMatchingSignatureCallback() const; + bool HideSupportedSignatures() const; + // Returns a relevant (customizable) error message for the no matching // function signature error condition. // @@ -750,12 +817,14 @@ class Function { // Returns a generic error message for the no matching function signature // error condition. static const std::string GetGenericNoMatchingFunctionSignatureErrorMessage( - const std::string& qualified_function_name, + absl::string_view qualified_function_name, const std::vector& arguments, ProductMode product_mode, absl::Span argument_names = {}, bool argument_types_on_new_line = false); const SupportedSignaturesCallback& GetSupportedSignaturesCallback() const; + bool HasSignatureTextCallback() const; + const SignatureTextCallback& GetSignatureTextCallback() const; // Returns a relevant (customizable) user facing text (to be used in error // message) listing supported function signatures. For example: @@ -835,6 +904,12 @@ class Function { // Must only be true for differential privacy functions. bool SupportsClampedBetweenModifier() const; + // Returns true if this function may suppress deferred side effects. + // See the full comment on the field's definition. + bool MaySuppressSideEffects() const { + return function_options_.may_suppress_side_effects; + } + bool IsDeprecated() const { return function_options_.is_deprecated; } diff --git a/zetasql/public/function_signature.cc b/zetasql/public/function_signature.cc index ca93af36d..887adaf40 100644 --- a/zetasql/public/function_signature.cc +++ b/zetasql/public/function_signature.cc @@ -48,6 +48,7 @@ #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" #include "absl/types/optional.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" @@ -1236,8 +1237,8 @@ std::string FunctionSignature::DebugString(absl::string_view function_name, } std::string FunctionSignature::SignaturesToString( - const std::vector& signatures, bool verbose, - absl::string_view prefix, const std::string& separator) { + absl::Span signatures, bool verbose, + absl::string_view prefix, absl::string_view separator) { std::string out; for (const FunctionSignature& signature : signatures) { absl::StrAppend(&out, (out.empty() ? "" : separator), prefix, @@ -1294,7 +1295,7 @@ absl::StatusOr ShouldHaveReturnsClauseInSQLDeclaration( } // namespace std::string FunctionSignature::GetSQLDeclaration( - const std::vector& argument_names, + absl::Span argument_names, ProductMode product_mode) const { std::string out = "("; for (int i = 0; i < arguments_.size(); ++i) { @@ -1633,8 +1634,8 @@ int FunctionSignature::ComputeNumOptionalArguments() const { } void FunctionSignature::SetConcreteResultType(const Type* type) { - result_type_ = type; - result_type_.set_num_occurrences(1); // Make concrete. + result_type_ = + FunctionArgumentType(type, result_type_.options(), /*num_occurrences=*/1); // Recompute since it now may have changed by setting a // concrete result type. is_concrete_ = ComputeIsConcrete(); @@ -1648,7 +1649,7 @@ bool FunctionSignature::HasEnabledRewriteImplementation() const { bool FunctionSignature::HideInSupportedSignatureList( const LanguageOptions& language_options) const { - return IsDeprecated() || IsInternal() || + return IsDeprecated() || IsInternal() || IsHidden() || HasUnsupportedType(language_options) || !options().check_all_required_features_are_enabled( language_options.GetEnabledLanguageFeatures()); diff --git a/zetasql/public/function_signature.h b/zetasql/public/function_signature.h index 456a42ca1..dfc8d1ff8 100644 --- a/zetasql/public/function_signature.h +++ b/zetasql/public/function_signature.h @@ -42,6 +42,7 @@ #include "absl/status/statusor.h" #include "absl/strings/string_view.h" #include "absl/types/optional.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/status.h" @@ -1026,6 +1027,13 @@ class FunctionSignatureOptions { } bool is_internal() const { return is_internal_; } + // Setter/getter for whether or not this signature is hidden from the user. + FunctionSignatureOptions& set_is_hidden(bool value) { + is_hidden_ = value; + return *this; + } + bool is_hidden() const { return is_hidden_; } + // Setters/getters for additional deprecation warnings associated with // this function signature. These have DeprecationWarning protos attached. The // analyzer will propagate these warnings to any statement that invokes this @@ -1184,6 +1192,8 @@ class FunctionSignatureOptions { bool is_internal_ = false; + bool is_hidden_ = false; + // When true, the signature uses the same signature (context) id as another // signature with different function name, and this signature's function name // is an alias. This flag is useful when trying to resolve signature id to @@ -1388,8 +1398,8 @@ class FunctionSignature { // signatures. Each signature string is prefixed with , and // appears between each signature string. static std::string SignaturesToString( - const std::vector& signatures, bool verbose = false, - absl::string_view prefix = " ", const std::string& separator = "\n"); + absl::Span signatures, bool verbose = false, + absl::string_view prefix = " ", absl::string_view separator = "\n"); // Get the SQL declaration for this signature, including all options. // For each argument in the signature, the name will be taken from the @@ -1397,13 +1407,15 @@ class FunctionSignature { // will result in a signature with just type names. // The result is formatted as "(arg_name type, ...) RETURNS type", which // is valid to use in CREATE FUNCTION, DROP FUNCTION, etc, if possible. - std::string GetSQLDeclaration(const std::vector& argument_names, + std::string GetSQLDeclaration(absl::Span argument_names, ProductMode product_mode) const; bool IsDeprecated() const { return options_.is_deprecated(); } bool IsInternal() const { return options_.is_internal(); } + bool IsHidden() const { return options_.is_hidden(); } + void SetIsDeprecated(bool value) { options_.set_is_deprecated(value); } @@ -1456,8 +1468,8 @@ class FunctionSignature { // Returns true if this signature should be hidden in the supported signature // list in signature mismatch error message. - // Signatures are hidden if they are internal, deprecated, or unsupported - // according to LanguageOptions. + // Signatures are hidden if they are internal, deprecated, explictly hidden or + // unsupported according to LanguageOptions. bool HideInSupportedSignatureList( const LanguageOptions& language_options) const; diff --git a/zetasql/public/function_signature_test.cc b/zetasql/public/function_signature_test.cc index 9f65f6de4..a9ba56822 100644 --- a/zetasql/public/function_signature_test.cc +++ b/zetasql/public/function_signature_test.cc @@ -38,6 +38,7 @@ #include "gtest/gtest.h" #include "absl/status/status.h" #include "absl/strings/str_cat.h" +#include "absl/types/span.h" #include "zetasql/base/status.h" namespace zetasql { @@ -1909,7 +1910,7 @@ TEST(FunctionSignatureTests, TestArgumentConstraints) { auto nonnull_constraints_callback = [](const FunctionSignature& signature, - const std::vector& arguments) -> std::string { + absl::Span arguments) -> std::string { if (signature.NumConcreteArguments() != arguments.size()) { return absl::StrCat("Expecting ", signature.NumConcreteArguments(), " arguments, but got ", arguments.size()); @@ -2149,4 +2150,18 @@ TEST(FunctionSignatureTests, SignatureSupportsArgumentAlias) { /*context_id=*/-1); EXPECT_FALSE(SignatureSupportsArgumentAliases(unsupport_alias)); } + +TEST(FunctionSignatureTests, SetConcreteResultTypePreservesArgumentOptions) { + FunctionSignature result_arg_has_options( + FunctionArgumentType( + ARG_TYPE_ANY_1, + FunctionArgumentTypeOptions().set_uses_array_element_for_collation()), + {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, + /*context_id=*/-1); + result_arg_has_options.SetConcreteResultType(types::Int64Type()); + EXPECT_TRUE(result_arg_has_options.result_type() + .options() + .uses_array_element_for_collation()); +} + } // namespace zetasql diff --git a/zetasql/public/function_test.cc b/zetasql/public/function_test.cc index 5f91d622b..2cfd253e1 100644 --- a/zetasql/public/function_test.cc +++ b/zetasql/public/function_test.cc @@ -16,7 +16,6 @@ #include "zetasql/public/function.h" -#include #include #include #include @@ -30,6 +29,7 @@ #include "zetasql/common/testing/testing_proto_util.h" #include "zetasql/proto/function.pb.h" #include "zetasql/public/builtin_function.h" +#include "zetasql/public/builtin_function_options.h" #include "zetasql/public/deprecation_warning.pb.h" #include "zetasql/public/error_location.pb.h" #include "zetasql/public/function.pb.h" @@ -45,7 +45,11 @@ #include "zetasql/resolved_ast/resolved_ast.h" #include "gmock/gmock.h" #include "gtest/gtest.h" +#include "absl/container/flat_hash_map.h" +#include "absl/container/flat_hash_set.h" +#include "zetasql/base/check.h" #include "absl/strings/str_join.h" +#include "absl/strings/string_view.h" // Note - test coverage for the 'Function' class interface is primarily // provided by builtin_function_test.cc which instantiates the concrete @@ -213,6 +217,49 @@ TEST(SimpleFunctionTests, WindowSupportTests) { EXPECT_FALSE(analytic_function.RequiresWindowOrdering()); } +static void AddFunctionToSet( + const absl::flat_hash_map>& + functions, + absl::string_view name, + absl::flat_hash_set& scoping_functions) { + auto it = functions.find(name); + ASSERT_TRUE(it != functions.end()) << "Function not found: " << name; + scoping_functions.insert(it->second.get()); +} + +TEST(ConditionalEvaluationFunctionsTest, + VerifyListOfBuiltinFunctionsScopeingSideEffects) { + TypeFactory type_factory; + LanguageOptions language_options; + // Even functions that are "in_development" should serialize properly. + language_options.EnableMaximumLanguageFeaturesForDevelopment(); + + absl::flat_hash_map> functions; + absl::flat_hash_map types_ignored; + ZETASQL_ASSERT_OK( + GetBuiltinFunctionsAndTypes(BuiltinFunctionOptions(language_options), + type_factory, functions, types_ignored)); + + absl::flat_hash_set scoping_functions; + AddFunctionToSet(functions, "if", scoping_functions); + AddFunctionToSet(functions, "ifnull", scoping_functions); + AddFunctionToSet(functions, "$case_with_value", scoping_functions); + AddFunctionToSet(functions, "$case_no_value", scoping_functions); + AddFunctionToSet(functions, "coalesce", scoping_functions); + AddFunctionToSet(functions, "iferror", scoping_functions); + AddFunctionToSet(functions, "iserror", scoping_functions); + AddFunctionToSet(functions, "nulliferror", scoping_functions); + + EXPECT_EQ(scoping_functions.size(), 8); + for (const auto& [_, function] : functions) { + if (scoping_functions.contains(function)) { + EXPECT_TRUE(function->MaySuppressSideEffects()); + } else { + EXPECT_FALSE(function->MaySuppressSideEffects()); + } + } +} + class AnyAndRelatedTypeSimpleFunctionTests : public ::testing::TestWithParam {}; @@ -450,7 +497,7 @@ TEST_F(FunctionSerializationTests, BuiltinFunctions) { // Test a function with a signature that triggers a deprecation warning. ASSERT_FALSE(functions.empty()); Function* function = functions.begin()->second.get(); - ASSERT_GT(function->NumSignatures(), 0); + ASSERT_GT(function->NumSignatures(), 0) << function->Name(); FunctionSignature new_signature = *function->GetSignature(0); new_signature.SetAdditionalDeprecationWarnings({CreateDeprecationWarning()}); CheckSerializationAndDeserialization(*function); diff --git a/zetasql/public/functions/BUILD b/zetasql/public/functions/BUILD index 073c0c3d3..230ce3c8c 100644 --- a/zetasql/public/functions/BUILD +++ b/zetasql/public/functions/BUILD @@ -192,35 +192,6 @@ cc_test( ], ) -cc_library( - name = "like", - srcs = ["like.cc"], - hdrs = ["like.h"], - deps = [ - "//zetasql/base", - "//zetasql/base:status", - "//zetasql/public:type_cc_proto", - "@com_google_absl//absl/base:core_headers", - "@com_google_absl//absl/memory", - "@com_google_absl//absl/status:statusor", - "@com_google_absl//absl/strings", - "@com_googlesource_code_re2//:re2", - ], -) - -cc_test( - name = "like_test", - size = "small", - srcs = ["like_test.cc"], - deps = [ - ":like", - "//zetasql/base:status", - "//zetasql/base/testing:zetasql_gtest_main", - "@com_google_absl//absl/strings", - "@com_googlesource_code_re2//:re2", - ], -) - cc_library( name = "format_max_output_width", srcs = ["format_max_output_width.cc"], @@ -282,6 +253,7 @@ cc_test( "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/compliance:functions_testlib", "//zetasql/public:civil_time", + "//zetasql/public:value", "//zetasql/testdata:test_schema_cc_proto", "//zetasql/testing:test_function", "//zetasql/testing:test_value", @@ -292,6 +264,7 @@ cc_test( "@com_google_absl//absl/random", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -307,6 +280,7 @@ cc_test( "//zetasql/common/testing:testing_proto_util", "//zetasql/testdata:test_schema_cc_proto", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -545,6 +519,7 @@ cc_library( "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", "@com_google_absl//absl/time", + "@com_google_absl//absl/types:span", "@icu//:headers", ], ) @@ -771,6 +746,7 @@ cc_library( "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/types:optional", + "@com_google_absl//absl/types:span", "@com_googlesource_code_re2//:re2", "@icu//:headers", ], @@ -914,6 +890,7 @@ cc_library( srcs = ["json_internal.cc"], hdrs = ["json_internal.h"], deps = [ + "//zetasql/base:case", "//zetasql/base:check", "//zetasql/base:status", "//zetasql/base:strings", @@ -939,17 +916,22 @@ cc_library( ":json_format", ":json_internal", ":to_json", + "//zetasql/base:lossless_convert", "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/base:strings", "//zetasql/common:errors", - "//zetasql/common:int_ops_util", "//zetasql/public:json_value", + "//zetasql/public:language_options", + "//zetasql/public:numeric_value", + "//zetasql/public:value", "@com_google_absl//absl/base:core_headers", + "@com_google_absl//absl/functional:function_ref", "@com_google_absl//absl/memory", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", "@com_googlesource_code_re2//:re2", ], ) @@ -977,6 +959,7 @@ cc_test( "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", + "@com_google_absl//absl/types:span", ], ) @@ -988,6 +971,7 @@ cc_library( ":arithmetics", ":math", "//zetasql/base:edit_distance", + "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/public:value", "//zetasql/public/types", @@ -997,6 +981,7 @@ cc_library( "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", "@icu//:headers", ], ) @@ -1017,6 +1002,7 @@ cc_test( "@com_google_absl//absl/random:distributions", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -1065,6 +1051,7 @@ cc_library( "//zetasql/common:utf_util", "//zetasql/public:civil_time", "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:cord", "@com_google_googleapis//google/type:latlng_cc_proto", "@com_google_googleapis//google/type:timeofday_cc_proto", @@ -1255,6 +1242,9 @@ cc_library( "//zetasql/common:errors", "//zetasql/public:civil_time", "//zetasql/public:interval_value", + "//zetasql/public:type_cc_proto", + "//zetasql/public:value", + "//zetasql/public/types", "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", @@ -1275,6 +1265,7 @@ cc_test( "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/compliance:functions_testlib", + "//zetasql/public:civil_time", "//zetasql/public:interval_value", "//zetasql/public:options_cc_proto", "//zetasql/public:type", @@ -1289,3 +1280,32 @@ cc_test( "@com_google_absl//absl/time", ], ) + +cc_library( + name = "like", + srcs = ["like.cc"], + hdrs = ["like.h"], + deps = [ + "//zetasql/base", + "//zetasql/base:status", + "//zetasql/public:type_cc_proto", + "@com_google_absl//absl/base:core_headers", + "@com_google_absl//absl/memory", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings", + "@com_googlesource_code_re2//:re2", + ], +) + +cc_test( + name = "like_test", + size = "small", + srcs = ["like_test.cc"], + deps = [ + ":like", + "//zetasql/base:status", + "//zetasql/base/testing:zetasql_gtest_main", + "@com_google_absl//absl/strings", + "@com_googlesource_code_re2//:re2", + ], +) diff --git a/zetasql/public/functions/cast_date_time.cc b/zetasql/public/functions/cast_date_time.cc index c2f2c6d15..c05700bb2 100644 --- a/zetasql/public/functions/cast_date_time.cc +++ b/zetasql/public/functions/cast_date_time.cc @@ -52,6 +52,7 @@ #include "absl/time/civil_time.h" #include "absl/time/clock.h" #include "absl/time/time.h" +#include "absl/types/span.h" #include "unicode/uchar.h" #include "unicode/utf8.h" #include "zetasql/base/general_trie.h" @@ -1228,8 +1229,8 @@ absl::Status ConductBasicFormatStringChecks(absl::string_view format_string) { // Validates the elements in with specific rules, and also // makes sure they are not of any category in . absl::Status ValidateDateTimeFormatElements( - const std::vector& format_elements, - const std::vector& invalid_categories, + absl::Span format_elements, + absl::Span invalid_categories, absl::string_view output_type_name) { CategoryToElementsMap category_to_elements_map; TypeToElementMap type_to_element_map; @@ -1329,12 +1330,12 @@ absl::Status ParseTimeWithFormatElements( } absl::Status ValidateDateTimeFormatElementsForTimestampType( - const std::vector& format_elements) { + absl::Span format_elements) { return ValidateDateTimeFormatElements(format_elements, {}, "TIMESTAMP"); } absl::Status ValidateDateTimeFormatElementsForDateType( - const std::vector& format_elements) { + absl::Span format_elements) { return ValidateDateTimeFormatElements( format_elements, {FormatElementCategory::kHour, FormatElementCategory::kMinute, @@ -1356,7 +1357,7 @@ absl::Status ValidateDateTimeFormatElementsForTimeType( } absl::Status ValidateDateTimeFormatElementsForDatetimeType( - const std::vector& format_elements) { + absl::Span format_elements) { return ValidateDateTimeFormatElements( format_elements, {FormatElementCategory::kTimeZone}, "DATETIME"); } diff --git a/zetasql/public/functions/cast_date_time_test.cc b/zetasql/public/functions/cast_date_time_test.cc index b8c49745b..f5e38bb29 100644 --- a/zetasql/public/functions/cast_date_time_test.cc +++ b/zetasql/public/functions/cast_date_time_test.cc @@ -36,8 +36,10 @@ #include "absl/functional/bind_front.h" #include "absl/strings/match.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "absl/strings/substitute.h" #include "absl/time/time.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" namespace zetasql { @@ -55,7 +57,7 @@ using cast_date_time_internal::GetDateTimeFormatElements; static void ExecuteDateTimeFormatElementParsingTest( absl::string_view format_str, - const std::vector& expected_format_elements, + absl::Span expected_format_elements, std::string error_message) { std::string upper_format_str_temp = absl::AsciiStrToUpper(format_str); @@ -680,7 +682,7 @@ static void TestCastStringToDatetime(const FunctionTestCall& test) { }; }; auto CastStringToDatetimeResultValidator = - [](const Value& expected_result, const std::string& actual_string) { + [](const Value& expected_result, absl::string_view actual_string) { return expected_result.type_kind() == TYPE_DATETIME && expected_result.DebugString() == actual_string; }; diff --git a/zetasql/public/functions/common_proto.cc b/zetasql/public/functions/common_proto.cc index 41c95003e..98c543864 100644 --- a/zetasql/public/functions/common_proto.cc +++ b/zetasql/public/functions/common_proto.cc @@ -23,6 +23,7 @@ #include "zetasql/common/errors.h" #include "zetasql/public/civil_time.h" #include "zetasql/public/functions/date_time_util.h" +#include "absl/strings/str_cat.h" #include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" @@ -71,7 +72,7 @@ absl::Status ConvertProto3TimeOfDayToTime(const google::type::TimeOfDay& input, TimeValue* output) { if (!IsValidProto3TimeOfDay(input)) { return MakeEvalError() << "Invalid Proto3 TimeOfDay input: " - << input.DebugString(); + << absl::StrCat(input); } if (scale == kMicroseconds) { *output = TimeValue::FromHMSAndMicros(input.hours(), input.minutes(), diff --git a/zetasql/public/functions/common_proto.h b/zetasql/public/functions/common_proto.h index abd59ed6a..7e387b1e3 100644 --- a/zetasql/public/functions/common_proto.h +++ b/zetasql/public/functions/common_proto.h @@ -35,6 +35,7 @@ #include "zetasql/public/functions/date_time_util.h" #include "absl/status/statusor.h" #include "absl/strings/cord.h" +#include "absl/strings/str_cat.h" #include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" @@ -87,7 +88,7 @@ inline absl::Status ConvertProto3WrapperToType( if (!IsWellFormedUTF8(input.value())) { return MakeEvalError() << "Invalid conversion: ZetaSQL strings must be UTF8 encoded" - << input.DebugString(); + << absl::StrCat(input); } *output = input.value(); return absl::OkStatus(); diff --git a/zetasql/public/functions/date_time_util.cc b/zetasql/public/functions/date_time_util.cc index b579d10aa..ee1d42d05 100644 --- a/zetasql/public/functions/date_time_util.cc +++ b/zetasql/public/functions/date_time_util.cc @@ -2936,7 +2936,7 @@ absl::Status ConvertProto3TimestampToTimestamp( ZETASQL_RETURN_IF_ERROR(ConvertProto3TimestampToTimestamp(input_timestamp, &time)); if (!FromTime(time, output_scale, output)) { return MakeEvalError() << "Invalid Proto3 Timestamp input: " - << input_timestamp.DebugString(); + << absl::StrCat(input_timestamp); } return absl::OkStatus(); } @@ -2945,7 +2945,8 @@ absl::Status ConvertProto3TimestampToTimestamp( const google::protobuf::Timestamp& input_timestamp, absl::Time* output) { auto result_or = zetasql_base::DecodeGoogleApiProto(input_timestamp); if (!result_or.ok()) { - return MakeEvalError() << "Invalid Proto3 Timestamp input: " + return MakeEvalError() + << "Invalid Proto3 Timestamp input: " << input_timestamp.DebugString(); } *output = result_or.value(); diff --git a/zetasql/public/functions/distance.cc b/zetasql/public/functions/distance.cc index 2284b6168..a765f3650 100644 --- a/zetasql/public/functions/distance.cc +++ b/zetasql/public/functions/distance.cc @@ -35,11 +35,14 @@ #include "absl/functional/function_ref.h" #include "absl/status/status.h" #include "absl/status/statusor.h" +#include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" +#include "absl/types/span.h" #include "unicode/umachine.h" #include "unicode/utf8.h" #include "zetasql/base/edit_distance.h" +#include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" namespace zetasql { @@ -76,7 +79,7 @@ absl::Status Apply( // Populates a sparse vector content into its representation as a map and a list // of all non-zero indices. template -absl::Status PopulateSparseInput(const std::vector& input_array, +absl::Status PopulateSparseInput(absl::Span input_array, absl::flat_hash_map& map, absl::btree_set& all_indices) { map.reserve(input_array.size()); @@ -97,7 +100,7 @@ absl::Status PopulateSparseInput(const std::vector& input_array, double value = element.field(1).Get(); if (!map.emplace(index, value).second) { return absl::InvalidArgumentError(absl::Substitute( - "Duplicate index $0 found in the input array.", index)); + "Duplicate index $0 found in the input array", index)); } all_indices.emplace(index); } @@ -146,7 +149,7 @@ absl::StatusOr ComputeCosineDistance( if (len_a == 0 || len_b == 0) { return absl::InvalidArgumentError( - "Cannot compute cosine distance against zero vector."); + "Cannot compute cosine distance against zero vector"); } double sqrt_len_a; @@ -242,9 +245,124 @@ absl::StatusOr ComputeEuclideanDistanceFunctionSparse( return ComputeEuclideanDistance(array_elements_supplier); } +template >> +absl::StatusOr ComputeDotProduct( + absl::FunctionRef>>()> + array_elements_supplier) { + double result = 0; + while (true) { + std::optional> paired_elements; + ZETASQL_ASSIGN_OR_RETURN(paired_elements, array_elements_supplier()); + if (!paired_elements.has_value()) { + break; + } + double left_magnitude = (double)paired_elements.value().first; + double right_magnitude = (double)paired_elements.value().second; + + double magnitude_product; + ZETASQL_RETURN_IF_ERROR(Apply(Multiply, left_magnitude, right_magnitude, + &magnitude_product)); + ZETASQL_RETURN_IF_ERROR(Apply(Add, result, magnitude_product, &result)); + } + + return Value::Double(result); +} + +template >> +absl::StatusOr ComputeManhattanDistance( + absl::FunctionRef>>()> + array_elements_supplier) { + double result = 0; + while (true) { + std::optional> paired_elements; + ZETASQL_ASSIGN_OR_RETURN(paired_elements, array_elements_supplier()); + if (!paired_elements.has_value()) { + break; + } + double left_magnitude = (double)paired_elements.value().first; + double right_magnitude = (double)paired_elements.value().second; + + double magnitude_difference; + ZETASQL_RETURN_IF_ERROR(Apply(Subtract, left_magnitude, right_magnitude, + &magnitude_difference)); + ZETASQL_RETURN_IF_ERROR( + Apply(Abs, magnitude_difference, &magnitude_difference)); + + ZETASQL_RETURN_IF_ERROR(Apply(Add, result, magnitude_difference, &result)); + } + + return Value::Double(result); +} + +template >> +absl::StatusOr ComputeL1Norm( + absl::FunctionRef>()> + array_elements_supplier) { + double result = 0; + while (true) { + std::optional element; + ZETASQL_ASSIGN_OR_RETURN(element, array_elements_supplier()); + if (!element.has_value()) { + break; + } + double element_value = (double)element.value(); + + ZETASQL_RETURN_IF_ERROR(Apply(Abs, element_value, &element_value)); + + ZETASQL_RETURN_IF_ERROR(Apply(Add, result, element_value, &result)); + } + + return Value::Double(result); +} + +template >> +absl::StatusOr ComputeL2Norm( + absl::FunctionRef>()> + array_elements_supplier) { + double result = 0; + while (true) { + std::optional element; + ZETASQL_ASSIGN_OR_RETURN(element, array_elements_supplier()); + if (!element.has_value()) { + break; + } + double element_value = (double)element.value(); + + ZETASQL_RETURN_IF_ERROR( + Apply(Multiply, element_value, element_value, &element_value)); + + ZETASQL_RETURN_IF_ERROR(Apply(Add, result, element_value, &result)); + } + + ZETASQL_RETURN_IF_ERROR(Apply(Sqrt, result, &result)); + + return Value::Double(result); +} + } // namespace -template >> +template || + std::is_integral_v>> +std::function>()> +MakeZippedArrayElementsSupplier(const std::vector& vector) { + return [v1_it = vector.begin(), + v1_end = vector.end()]() mutable -> absl::StatusOr> { + if (v1_it == v1_end) { + return std::nullopt; + } + + if (v1_it->is_null()) { + return absl::OutOfRangeError("NULL array element"); + } + + auto element = v1_it->Get(); + v1_it++; + return element; + }; +} + +template || + std::is_integral_v>> std::function>>()> MakeZippedArrayElementsSupplier(const std::vector& vector1, const std::vector& vector2) { @@ -256,7 +374,9 @@ MakeZippedArrayElementsSupplier(const std::vector& vector1, } if (v1_it->is_null() || v2_it->is_null()) { - return absl::InvalidArgumentError("NULL array element"); + return absl::OutOfRangeError( + absl::StrCat("NULL array element in ", + v1_it->is_null() ? "first" : "second", " argument")); } auto pair = std::make_pair(v1_it->Get(), v2_it->Get()); @@ -269,7 +389,7 @@ MakeZippedArrayElementsSupplier(const std::vector& vector1, absl::StatusOr CosineDistanceDense(Value vector1, Value vector2) { if (vector1.num_elements() != vector2.num_elements()) { return absl::InvalidArgumentError( - absl::Substitute("Array length mismatch: $0 and $1.", + absl::Substitute("Array length mismatch: $0 and $1", vector1.num_elements(), vector2.num_elements())); } if (vector1.type()->AsArray()->element_type() == types::DoubleType()) { @@ -295,7 +415,7 @@ absl::StatusOr CosineDistanceSparseStringKey(Value vector1, absl::StatusOr EuclideanDistanceDense(Value vector1, Value vector2) { if (vector1.num_elements() != vector2.num_elements()) { return absl::InvalidArgumentError( - absl::Substitute("Array length mismatch: $0 and $1.", + absl::Substitute("Array length mismatch: $0 and $1", vector1.num_elements(), vector2.num_elements())); } @@ -320,6 +440,93 @@ absl::StatusOr EuclideanDistanceSparseStringKey(Value vector1, return ComputeEuclideanDistanceFunctionSparse(vector1, vector2); } +absl::StatusOr DotProduct(Value vector1, Value vector2) { + if (vector1.num_elements() != vector2.num_elements()) { + return absl::OutOfRangeError( + absl::Substitute("Array length mismatch: $0 and $1", + vector1.num_elements(), vector2.num_elements())); + } + + const Type* element_type = vector1.type()->AsArray()->element_type(); + + if (element_type->IsInt64()) { + return ComputeDotProduct(MakeZippedArrayElementsSupplier( + vector1.elements(), vector2.elements())); + } else if (element_type->IsFloat()) { + return ComputeDotProduct(MakeZippedArrayElementsSupplier( + vector1.elements(), vector2.elements())); + } else if (element_type->IsDouble()) { + return ComputeDotProduct(MakeZippedArrayElementsSupplier( + vector1.elements(), vector2.elements())); + } else { + ZETASQL_RET_CHECK_FAIL() << "Unexpected array element type: " + << element_type->DebugString(); + } +} + +absl::StatusOr ManhattanDistance(Value vector1, Value vector2) { + if (vector1.num_elements() != vector2.num_elements()) { + return absl::OutOfRangeError( + absl::Substitute("Array length mismatch: $0 and $1", + vector1.num_elements(), vector2.num_elements())); + } + + const Type* element_type = vector1.type()->AsArray()->element_type(); + + if (element_type->IsInt64()) { + return ComputeManhattanDistance( + MakeZippedArrayElementsSupplier(vector1.elements(), + vector2.elements())); + } else if (element_type->IsFloat()) { + return ComputeManhattanDistance( + MakeZippedArrayElementsSupplier(vector1.elements(), + vector2.elements())); + } else if (element_type->IsDouble()) { + return ComputeManhattanDistance( + MakeZippedArrayElementsSupplier(vector1.elements(), + vector2.elements())); + } else { + ZETASQL_RET_CHECK_FAIL() << "Unexpected array element type: " + << element_type->DebugString(); + } +} + +absl::StatusOr L1Norm(Value vector) { + const Type* element_type = vector.type()->AsArray()->element_type(); + + if (element_type->IsInt64()) { + return ComputeL1Norm( + MakeZippedArrayElementsSupplier(vector.elements())); + } else if (element_type == types::FloatType()) { + return ComputeL1Norm( + MakeZippedArrayElementsSupplier(vector.elements())); + } else if (element_type == types::DoubleType()) { + return ComputeL1Norm( + MakeZippedArrayElementsSupplier(vector.elements())); + } else { + ZETASQL_RET_CHECK_FAIL() << "Unexpected array element type: " + << element_type->DebugString(); + } +} + +absl::StatusOr L2Norm(Value vector) { + const Type* element_type = vector.type()->AsArray()->element_type(); + + if (element_type->IsInt64()) { + return ComputeL2Norm( + MakeZippedArrayElementsSupplier(vector.elements())); + } else if (element_type == types::FloatType()) { + return ComputeL2Norm( + MakeZippedArrayElementsSupplier(vector.elements())); + } else if (element_type == types::DoubleType()) { + return ComputeL2Norm( + MakeZippedArrayElementsSupplier(vector.elements())); + } else { + ZETASQL_RET_CHECK_FAIL() << "Unexpected array element type: " + << element_type->DebugString(); + } +} + absl::StatusOr> GetUtf8CodePoints(absl::string_view s) { std::vector result; int32_t offset = 0; @@ -327,7 +534,7 @@ absl::StatusOr> GetUtf8CodePoints(absl::string_view s) { UChar32 character; U8_NEXT(s, offset, s.size(), character); if (character < 0) { - return absl::InvalidArgumentError("invalid UTF8 string"); + return absl::InvalidArgumentError("Invalid UTF8 string"); } result.push_back(character); } @@ -342,7 +549,7 @@ absl::StatusOr EditDistance( ? max_distance_value.value() : std::max(s0.size(), s1.size()); if (max_distance < 0) { - return absl::InvalidArgumentError("max_distance must be non-negative"); + return absl::InvalidArgumentError("Max distance must be non-negative"); } const int64_t max_possible_distance = std::max(s0.size(), s1.size()); if (max_distance > max_possible_distance) { @@ -367,7 +574,7 @@ absl::StatusOr EditDistanceBytes( ? max_distance_value.value() : std::max(s0.size(), s1.size()); if (max_distance < 0) { - return absl::InvalidArgumentError("max_distance must be non-negative"); + return absl::InvalidArgumentError("Max distance must be non-negative"); } const int64_t max_possible_distance = std::max(s0.size(), s1.size()); if (max_distance > max_possible_distance) { diff --git a/zetasql/public/functions/distance.h b/zetasql/public/functions/distance.h index 01fb3000a..57bd7844f 100644 --- a/zetasql/public/functions/distance.h +++ b/zetasql/public/functions/distance.h @@ -17,7 +17,7 @@ #ifndef ZETASQL_PUBLIC_FUNCTIONS_DISTANCE_H_ #define ZETASQL_PUBLIC_FUNCTIONS_DISTANCE_H_ -// This code contains implementations for the *_DISTANCE functions. +// This code contains implementations for distance functions. #include @@ -45,22 +45,46 @@ absl::StatusOr CosineDistanceSparseStringKey(Value vector1, Value vector2); // Implementation of: -// * COSINE_DISTANCE(ARRAY, ARRAY) -> DOUBLE -// * COSINE_DISTANCE(ARRAY, ARRAY) -> DOUBLE +// * EUCLIDEAN_DISTANCE(ARRAY, ARRAY) -> DOUBLE +// * EUCLIDEAN_DISTANCE(ARRAY, ARRAY) -> DOUBLE absl::StatusOr EuclideanDistanceDense(Value vector1, Value vector2); // Implementation of: -// COSINE_DISTANCE(ARRAY>, ARRAY>, ARRAY>) -> DOUBLE absl::StatusOr EuclideanDistanceSparseInt64Key(Value vector1, Value vector2); // Implementation of: -// COSINE_DISTANCE(ARRAY>, ARRAY>, ARRAY>) -> DOUBLE absl::StatusOr EuclideanDistanceSparseStringKey(Value vector1, Value vector2); +// Implementation of: +// * DOT_PRODUCT(ARRAY, ARRAY) -> DOUBLE +// * DOT_PRODUCT(ARRAY, ARRAY) -> DOUBLE +// * DOT_PRODUCT(ARRAY, ARRAY) -> DOUBLE +absl::StatusOr DotProduct(Value vector1, Value vector2); + +// Implementation of: +// * MANHATTAN_DISTANCE(ARRAY, ARRAY) -> DOUBLE +// * MANHATTAN_DISTANCE(ARRAY, ARRAY) -> DOUBLE +// * MANHATTAN_DISTANCE(ARRAY, ARRAY) -> DOUBLE +absl::StatusOr ManhattanDistance(Value vector1, Value vector2); + +// Implementation of: +// * L1_NORM(ARRAY) -> DOUBLE +// * L1_NORM(ARRAY) -> DOUBLE +// * L1_NORM(ARRAY) -> DOUBLE +absl::StatusOr L1Norm(Value vector); + +// Implementation of: +// * L2_NORM(ARRAY) -> DOUBLE +// * L2_NORM(ARRAY) -> DOUBLE +// * L2_NORM(ARRAY) -> DOUBLE +absl::StatusOr L2Norm(Value vector); + // Implementation of: // EDIT_DISTANCE(STRING, STRING) -> INT64 absl::StatusOr EditDistance(absl::string_view s0, absl::string_view s1, diff --git a/zetasql/public/functions/distance_test.cc b/zetasql/public/functions/distance_test.cc index ab4b69358..04120adb0 100644 --- a/zetasql/public/functions/distance_test.cc +++ b/zetasql/public/functions/distance_test.cc @@ -38,6 +38,7 @@ #include "absl/random/random.h" #include "absl/status/status.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" namespace zetasql { namespace functions { @@ -61,6 +62,26 @@ T ValueOrDie(absl::StatusOr& s) { return s.value(); } +Value MakeArray(std::vector arr) { + std::vector values; + values.reserve(arr.size()); + for (const auto& v : arr) { + values.push_back(Value::Int64(v)); + } + auto status = Value::MakeArray(zetasql::types::Int64ArrayType(), values); + return ValueOrDie(status); +} + +Value MakeArray(std::vector arr) { + std::vector values; + values.reserve(arr.size()); + for (const auto& v : arr) { + values.push_back(Value::Float(v)); + } + auto status = Value::MakeArray(zetasql::types::FloatArrayType(), values); + return ValueOrDie(status); +} + Value MakeArray(std::vector arr) { std::vector values; values.reserve(arr.size()); @@ -133,14 +154,14 @@ TEST(CosineDistanceTest, DenseArrayLengthMismatch) { std::vector args = CreateArrayPair({1.0, 2.0}, {3.0}); EXPECT_THAT(CosineDistanceDense(args[0], args[1]), StatusIs(absl::StatusCode::kInvalidArgument, - "Array length mismatch: 2 and 1.")); + "Array length mismatch: 2 and 1")); } TEST(CosineDistanceTest, DenseZeroArray) { std::vector args = CreateArrayPair({1.0, 2.0}, {0.0, 0.0}); EXPECT_THAT(CosineDistanceDense(args[0], args[1]), StatusIs(absl::StatusCode::kInvalidArgument, - "Cannot compute cosine distance against zero vector.")); + "Cannot compute cosine distance against zero vector")); } TEST(CosineDistanceTest, SparseArrayDuplicateInt64Key) { @@ -148,7 +169,7 @@ TEST(CosineDistanceTest, SparseArrayDuplicateInt64Key) { CreateArrayPair({{1, 1.0}, {1, 2.0}}, {{2, 3.0}, {3, 4.0}}); EXPECT_THAT(CosineDistanceSparseInt64Key(args[0], args[1]), StatusIs(absl::StatusCode::kInvalidArgument, - "Duplicate index 1 found in the input array.")); + "Duplicate index 1 found in the input array")); } TEST(CosineDistanceTest, SparseArrayDuplicateStringKey) { @@ -156,14 +177,14 @@ TEST(CosineDistanceTest, SparseArrayDuplicateStringKey) { {{"a", 1.0}, {"a", 2.0}}, {{"a", 3.0}, {"b", 4.0}}); EXPECT_THAT(CosineDistanceSparseStringKey(args[0], args[1]), StatusIs(absl::StatusCode::kInvalidArgument, - "Duplicate index a found in the input array.")); + "Duplicate index a found in the input array")); } TEST(EuclideanDistanceTest, DenseArrayLengthMismatch) { std::vector args = CreateArrayPair({1.0, 2.0}, {0.0}); EXPECT_THAT(EuclideanDistanceDense(args[0], args[1]), StatusIs(absl::StatusCode::kInvalidArgument, - "Array length mismatch: 2 and 1.")); + "Array length mismatch: 2 and 1")); } TEST(EuclideanDistanceTest, SparseArrayDuplicateInt64Key) { @@ -171,7 +192,7 @@ TEST(EuclideanDistanceTest, SparseArrayDuplicateInt64Key) { CreateArrayPair({{1, 1.0}, {1, 2.0}}, {{2, 3.0}, {3, 4.0}}); EXPECT_THAT(EuclideanDistanceSparseInt64Key(args[0], args[1]), StatusIs(absl::StatusCode::kInvalidArgument, - "Duplicate index 1 found in the input array.")); + "Duplicate index 1 found in the input array")); } TEST(EuclideanDistanceTest, SparseArrayDuplicateStringKey) { @@ -179,7 +200,451 @@ TEST(EuclideanDistanceTest, SparseArrayDuplicateStringKey) { {{"a", 1.0}, {"a", 2.0}}, {{"a", 3.0}, {"b", 4.0}}); EXPECT_THAT(EuclideanDistanceSparseStringKey(args[0], args[1]), StatusIs(absl::StatusCode::kInvalidArgument, - "Duplicate index a found in the input array.")); + "Duplicate index a found in the input array")); +} + +TEST(DotProductTest, ArrayLengthMismatch) { + std::vector args = CreateArrayPair({1.0, 2.0}, {3.0}); + EXPECT_THAT(DotProduct(args[0], args[1]), + StatusIs(absl::StatusCode::kOutOfRange, + "Array length mismatch: 2 and 1")); +} + +TEST(DotProductTest, Int64TypePositive) { + std::vector vector1 = {1, 2}; + std::vector vector2 = {3, 4}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 11, 1e-6); +} + +TEST(DotProductTest, Int64TypeNegative) { + std::vector vector1 = {-1, -2}; + std::vector vector2 = {-3, -4}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 11, 1e-6); +} + +TEST(DotProductTest, Int64TypeMixedSign) { + std::vector vector1 = {-1, 2}; + std::vector vector2 = {3, -4}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), -11, 1e-6); +} + +TEST(DotProductTest, Int64TypeAllZeroInput) { + std::vector vector1 = {0, 0}; + std::vector vector2 = {0, 0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 0, 1e-6); +} + +TEST(DotProductTest, Int64TypeSomeZeroInput) { + std::vector vector1 = {1, 0, -3}; + std::vector vector2 = {0, 5, 6}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), -18, 1e-6); +} + +TEST(DotProductTest, FloatTypePositive) { + std::vector vector1 = {1.0, 2.0}; + std::vector vector2 = {3.0, 4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 11.0, 1e-6); +} + +TEST(DotProductTest, FloatTypeNegative) { + std::vector vector1 = {-1.0, -2.0}; + std::vector vector2 = {-3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 11.0, 1e-6); +} + +TEST(DotProductTest, FloatTypeMixedSign) { + std::vector vector1 = {-1.0, 2.0}; + std::vector vector2 = {3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), -11.0, 1e-6); +} + +TEST(DotProductTest, FloatTypeAllZeroInput) { + std::vector vector1 = {0.0, 0.0}; + std::vector vector2 = {0.0, 0.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 0.0, 1e-6); +} + +TEST(DotProductTest, FloatTypeSomeZeroInput) { + std::vector vector1 = {1, 0, -3}; + std::vector vector2 = {0, 5, 6}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), -18.0, 1e-6); +} + +TEST(DotProductTest, DoubleTypePositive) { + std::vector vector1 = {1.0, 2.0}; + std::vector vector2 = {3.0, 4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 11.0, 1e-6); +} + +TEST(DotProductTest, DoubleTypeNegative) { + std::vector vector1 = {-1.0, -2.0}; + std::vector vector2 = {-3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 11.0, 1e-6); +} + +TEST(DotProductTest, DoubleTypeMixedSign) { + std::vector vector1 = {-1.0, 2.0}; + std::vector vector2 = {3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), -11.0, 1e-6); +} + +TEST(DotProductTest, DoubleTypeAllZeroInput) { + std::vector vector1 = {0.0, 0.0}; + std::vector vector2 = {0.0, 0.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), 0.0, 1e-6); +} + +TEST(DotProductTest, DoubleTypeSomeZeroInput) { + std::vector vector1 = {1, 0, -3}; + std::vector vector2 = {0, 5, 6}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(DotProduct(args1, args2).value().double_value(), -18.0, 1e-6); +} + +TEST(ManhattanDistanceTest, ArrayLengthMismatch) { + std::vector args = CreateArrayPair({1.0, 2.0}, {3.0}); + EXPECT_THAT(ManhattanDistance(args[0], args[1]), + StatusIs(absl::StatusCode::kOutOfRange, + "Array length mismatch: 2 and 1")); +} + +TEST(ManhattanDistanceTest, Int64TypePositive) { + std::vector vector1 = {1, 2}; + std::vector vector2 = {3, 4}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 4, 1e-6); +} + +TEST(ManhattanDistanceTest, Int64TypeNegative) { + std::vector vector1 = {-1, -2}; + std::vector vector2 = {-3, -4}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 4, 1e-6); +} + +TEST(ManhattanDistanceTest, Int64TypeMixedSign) { + std::vector vector1 = {-1, 2}; + std::vector vector2 = {3, -4}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 10, 1e-6); +} + +TEST(ManhattanDistanceTest, Int64TypeAllZeroInput) { + std::vector vector1 = {0, 0}; + std::vector vector2 = {0, 0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 0, 1e-6); +} + +TEST(ManhattanDistanceTest, Int64TypeSomeZeroInput) { + std::vector vector1 = {1, 0, -3}; + std::vector vector2 = {0, 5, 6}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 15, 1e-6); +} + +TEST(ManhattanDistanceTest, FloatTypePositive) { + std::vector vector1 = {1.0, 2.0}; + std::vector vector2 = {3.0, 4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 4.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, FloatTypeNegative) { + std::vector vector1 = {-1.0, -2.0}; + std::vector vector2 = {-3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 4.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, FloatTypeMixedSign) { + std::vector vector1 = {-1.0, 2.0}; + std::vector vector2 = {3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 10.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, FloatTypeAllZeroInput) { + std::vector vector1 = {0.0, 0.0}; + std::vector vector2 = {0.0, 0.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 0.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, FloatTypeSomeZeroInput) { + std::vector vector1 = {1, 0, -3}; + std::vector vector2 = {0, 5, 6}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 15.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, DoubleTypePositive) { + std::vector vector1 = {1.0, 2.0}; + std::vector vector2 = {3.0, 4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 4.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, DoubleTypeNegative) { + std::vector vector1 = {-1.0, -2.0}; + std::vector vector2 = {-3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 4.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, DoubleTypeMixedSign) { + std::vector vector1 = {-1.0, 2.0}; + std::vector vector2 = {3.0, -4.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 10.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, DoubleTypeAllZeroInput) { + std::vector vector1 = {0.0, 0.0}; + std::vector vector2 = {0.0, 0.0}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 0.0, + 1e-6); +} + +TEST(ManhattanDistanceTest, DoubleTypeSomeZeroInput) { + std::vector vector1 = {1, 0, -3}; + std::vector vector2 = {0, 5, 6}; + Value args1 = MakeArray(vector1); + Value args2 = MakeArray(vector2); + EXPECT_NEAR(ManhattanDistance(args1, args2).value().double_value(), 15.0, + 1e-6); +} + +TEST(L1NormTest, Int64TypePositive) { + std::vector vector = {1, 2, 3, 4}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10, 1e-6); +} + +TEST(L1NormTest, Int64TypeNegative) { + std::vector vector = {-1, -2, -3, -4}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10, 1e-6); +} + +TEST(L1NormTest, Int64TypeMixedSign) { + std::vector vector = {-1, 2, -3, 4}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10, 1e-6); +} + +TEST(L1NormTest, Int64TypeAllZeroInput) { + std::vector vector = {0, 0, 0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 0, 1e-6); +} + +TEST(L1NormTest, Int64TypeSomeZeroInput) { + std::vector vector = {0, 2, 0, 0, -5}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 7, 1e-6); +} + +TEST(L1NormTest, FloatTypePositive) { + std::vector vector = {1.0, 2.0, 3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10.0, 1e-6); +} + +TEST(L1NormTest, FloatTypeNegative) { + std::vector vector = {-1.0, -2.0, -3.0, -4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10.0, 1e-6); +} + +TEST(L1NormTest, FloatTypeMixedSign) { + std::vector vector = {-1.0, 2.0, -3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10.0, 1e-6); +} + +TEST(L1NormTest, FloatTypeAllZeroInput) { + std::vector vector = {0.0, 0.0, 0.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 0.0, 1e-6); +} + +TEST(L1NormTest, FloatTypeSomeZeroInput) { + std::vector vector = {0.0, 2.0, 0.0, 0.0, -5.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 7.0, 1e-6); +} + +TEST(L1NormTest, DoubleTypePositive) { + std::vector vector = {1.0, 2.0, 3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10.0, 1e-6); +} + +TEST(L1NormTest, DoubleTypeNegative) { + std::vector vector = {-1.0, -2.0, -3.0, -4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10.0, 1e-6); +} + +TEST(L1NormTest, DoubleTypeMixedSign) { + std::vector vector = {-1.0, 2.0, -3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 10.0, 1e-6); +} + +TEST(L1NormTest, DoubleTypeAllZeroInput) { + std::vector vector = {0.0, 0.0, 0.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 0.0, 1e-6); +} + +TEST(L1NormTest, DoubleTypeSomeZeroInput) { + std::vector vector = {0.0, 2.0, 0.0, 0.0, -5.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L1Norm(args).value().double_value(), 7.0, 1e-6); +} + +TEST(L2NormTest, Int64TypePositive) { + std::vector vector = {1, 2, 3, 4}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, Int64TypeNegative) { + std::vector vector = {-1, -2, -3, -4}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, Int64TypeMixedSign) { + std::vector vector = {-1, 2, -3, 4}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, Int64TypeAllZeroInput) { + std::vector vector = {0, 0, 0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), 0, 1e-6); +} + +TEST(L2NormTest, Int64TypeSomeZeroInput) { + std::vector vector = {0, 2, 0, 0, -5}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(29), 1e-6); +} + +TEST(L2NormTest, FloatTypePositive) { + std::vector vector = {1.0, 2.0, 3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, FloatTypeNegative) { + std::vector vector = {-1.0, -2.0, -3.0, -4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, FloatTypeMixedSign) { + std::vector vector = {-1.0, 2.0, -3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, FloatTypeAllZeroInput) { + std::vector vector = {0.0, 0.0, 0.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), 0.0, 1e-6); +} + +TEST(L2NormTest, FloatTypeSomeZeroInput) { + std::vector vector = {0.0, 2.0, 0.0, 0.0, -5.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(29), 1e-6); +} + +TEST(L2NormTest, DoubleTypePositive) { + std::vector vector = {1.0, 2.0, 3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, DoubleTypeNegative) { + std::vector vector = {-1.0, -2.0, -3.0, -4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, DoubleTypeMixedSign) { + std::vector vector = {-1.0, 2.0, -3.0, 4.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(30), 1e-6); +} + +TEST(L2NormTest, DoubleTypeAllZeroInput) { + std::vector vector = {0.0, 0.0, 0.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), 0.0, 1e-6); +} + +TEST(L2NormTest, DoubleTypeSomeZeroInput) { + std::vector vector = {0.0, 2.0, 0.0, 0.0, -5.0}; + Value args = MakeArray(vector); + EXPECT_NEAR(L2Norm(args).value().double_value(), std::sqrt(29), 1e-6); } enum class DistanceFunctionVectorTypeEnum { diff --git a/zetasql/public/functions/format_test.cc b/zetasql/public/functions/format_test.cc index 4e762f43b..ddbbee725 100644 --- a/zetasql/public/functions/format_test.cc +++ b/zetasql/public/functions/format_test.cc @@ -28,6 +28,7 @@ #include "zetasql/public/civil_time.h" #include "zetasql/public/functions/format_max_output_width.h" #include "zetasql/public/functions/string_format.h" +#include "zetasql/public/value.h" #include "zetasql/testdata/test_schema.pb.h" #include "zetasql/testing/test_function.h" #include "zetasql/testing/test_value.h" @@ -53,20 +54,22 @@ using ::zetasql_base::testing::StatusIs; using FormatF = std::function values, ProductMode product_mode, std::string* output, bool* is_null, - bool canonicalize_zero)>; + bool canonicalize_zero, bool use_external_float32)>; // Call `FormatFunction` and abstract error and null results into a // unified string representation. Also ensures error codes are of an // allowed type. static std::string TestFormatFunction(const FormatF& FormatFunction, + ProductMode product_mode, bool canonicalize_zero, + bool use_external_float32, absl::string_view format, - const std::vector& values) { + absl::Span values) { std::string output; bool is_null; const absl::Status status = - FormatFunction(format, values, ProductMode::PRODUCT_INTERNAL, &output, - &is_null, canonicalize_zero); + FormatFunction(format, values, product_mode, &output, &is_null, + canonicalize_zero, use_external_float32); if (status.ok()) { return is_null ? "" : output; } else { @@ -88,9 +91,10 @@ struct FormatFunctionParam { FormatF FormatFunction; // Binds `FormatFunction` to TestFormatFunction. - TestFormatF WrapTestFormat() const { - return absl::bind_front(TestFormatFunction, FormatFunction, - /*canonicalize_zero=*/true); + TestFormatF WrapTestFormat(ProductMode product_mode, + bool use_external_float32) const { + return absl::bind_front(TestFormatFunction, FormatFunction, product_mode, + /*canonicalize_zero=*/true, use_external_float32); } // String suitable for use in SCOPE-TRACE to aid in debugging failing tests. std::string ScopeLabel() const { @@ -109,7 +113,8 @@ class FormatFunctionTests TEST_P(FormatFunctionTests, Test) { SCOPED_TRACE(GetParam().ScopeLabel()); - TestFormatF TestFormat = GetParam().WrapTestFormat(); + TestFormatF TestFormat = GetParam().WrapTestFormat( + PRODUCT_INTERNAL, /*use_external_float32=*/false); using values::Int32; using values::Int64; @@ -820,9 +825,9 @@ str_value: "bar" EXPECT_EQ("\"" + long_string_3 + "\"", TestFormat("%T", {values::String(long_string_3)})); - TestFormatF TestFormatLegacy = - absl::bind_front(TestFormatFunction, GetParam().FormatFunction, - /*canonicalize_zero=*/false); + TestFormatF TestFormatLegacy = absl::bind_front( + TestFormatFunction, GetParam().FormatFunction, PRODUCT_INTERNAL, + /*canonicalize_zero=*/false, /*use_external_float32=*/false); EXPECT_EQ("-0.000000", TestFormatLegacy("%+f", {Double(-0.0)})); EXPECT_EQ("-0.000000", TestFormatLegacy("%+f", {Float(-0.0)})); EXPECT_EQ("-0.000000", TestFormatLegacy("%f", {Double(-0.0)})); @@ -839,6 +844,118 @@ str_value: "bar" EXPECT_EQ("-0", TestFormatLegacy("%g", {Float(-0.0)})); } +TEST_P(FormatFunctionTests, FloatingPointInInternalModeWithFloat32) { + SCOPED_TRACE(GetParam().ScopeLabel()); + TestFormatF TestFormat = GetParam().WrapTestFormat( + PRODUCT_INTERNAL, /*use_external_float32=*/true); + + // Even when the product mode is PRODUCT_INTERNAL, if `use_external_float32` + // is true, it will return FLOAT32. + // Reason for this behavior: Format functions are currently calling + // GetSQLLiteral with PRODUCT_EXTERNAL, which we cannot change without + // breaking backward compatibility. + EXPECT_EQ( + "CAST(\"nan\" AS FLOAT32)", + TestFormat("%T", + {values::Float(std::numeric_limits::quiet_NaN())})); + + const float kNegativeFloatNan = -std::numeric_limits::quiet_NaN(); + EXPECT_EQ("CAST(\"nan\" AS FLOAT32)", + TestFormat("%T", {values::Float(kNegativeFloatNan)})); +} + +TEST_P(FormatFunctionTests, FloatingPointInExternalModeWithoutFloat32) { + using values::Double; + using values::DoubleArray; + using values::Float; + using values::FloatArray; + using values::Int64; + + SCOPED_TRACE(GetParam().ScopeLabel()); + TestFormatF TestFormat = GetParam().WrapTestFormat( + PRODUCT_EXTERNAL, /*use_external_float32=*/false); + + EXPECT_EQ( + "ERROR: Invalid type for argument 2 to FORMAT; " + "Expected FLOAT64; Got INT64", + TestFormat("%f", {Int64(1)})); + + EXPECT_EQ("CAST(\"nan\" AS FLOAT)", + TestFormat("%T", {Float(std::numeric_limits::quiet_NaN())})); + EXPECT_EQ( + "[4.0, -2.5, CAST(\"nan\" AS FLOAT)]", + TestFormat( + "%T", + {FloatArray({4, -2.5, std::numeric_limits::quiet_NaN()})})); + EXPECT_EQ("nan", + TestFormat("%t", {Float(std::numeric_limits::quiet_NaN())})); + + const float kNegativeFloatNan = -std::numeric_limits::quiet_NaN(); + EXPECT_EQ("CAST(\"nan\" AS FLOAT)", + TestFormat("%T", {Float(kNegativeFloatNan)})); + EXPECT_EQ( + "[-4.0, CAST(\"-inf\" AS FLOAT64)]", + TestFormat( + "%T", {DoubleArray({-4, -std::numeric_limits::infinity()})})); + EXPECT_EQ("nan", TestFormat("%t", {Float(kNegativeFloatNan)})); + + EXPECT_EQ( + "CAST(\"nan\" AS FLOAT64)", + TestFormat("%T", {Double(std::numeric_limits::quiet_NaN())})); + EXPECT_EQ( + "nan", + TestFormat("%t", {Double(std::numeric_limits::quiet_NaN())})); + + const double kNegativeDoubleNan = -std::numeric_limits::quiet_NaN(); + EXPECT_EQ("CAST(\"nan\" AS FLOAT64)", + TestFormat("%T", {Double(kNegativeDoubleNan)})); + EXPECT_EQ("nan", TestFormat("%t", {Double(kNegativeDoubleNan)})); +} + +TEST_P(FormatFunctionTests, FloatingPointInExternalModeWithFloat32) { + using values::Double; + using values::Float; + using values::Int64; + + SCOPED_TRACE(GetParam().ScopeLabel()); + TestFormatF TestFormat = GetParam().WrapTestFormat( + PRODUCT_EXTERNAL, /*use_external_float32=*/true); + + EXPECT_EQ( + "ERROR: Invalid type for argument 2 to FORMAT; " + "Expected FLOAT64; Got INT64", + TestFormat("%f", {Int64(1)})); + + // Tests to check that positive and negative NaNs are formatted as "nan". + EXPECT_EQ( + "CAST(\"nan\" AS FLOAT32)", + TestFormat("%T", + {values::Float(std::numeric_limits::quiet_NaN())})); + EXPECT_EQ( + "nan", + TestFormat("%t", + {values::Float(std::numeric_limits::quiet_NaN())})); + + const float kNegativeFloatNan = -std::numeric_limits::quiet_NaN(); + EXPECT_EQ("CAST(\"nan\" AS FLOAT32)", + TestFormat("%T", {values::Float(kNegativeFloatNan)})); + EXPECT_EQ("nan", TestFormat("%t", {values::Float(kNegativeFloatNan)})); + + EXPECT_EQ( + "CAST(\"nan\" AS FLOAT64)", + TestFormat("%T", + {values::Double(std::numeric_limits::quiet_NaN())})); + EXPECT_EQ( + "nan", + TestFormat("%t", + {values::Double(std::numeric_limits::quiet_NaN())})); + + const double kNegativeDoubleNan = -std::numeric_limits::quiet_NaN(); + EXPECT_EQ("CAST(\"nan\" AS FLOAT64)", + TestFormat("%T", {values::Double(kNegativeDoubleNan)})); + EXPECT_EQ("nan", TestFormat("%t", {values::Double(kNegativeDoubleNan)})); +} + static absl::Status TestCheckFormat(absl::string_view format_string, const std::vector& types) { const absl::Status status = CheckStringFormatUtf8ArgumentTypes( @@ -881,7 +998,8 @@ static Value BigNumericFromString(absl::string_view str) { // Normal cases are tested in TEST_P(FormatComplianceTests, Test). TEST_P(FormatFunctionTests, NumericFormat_Errors) { SCOPED_TRACE(GetParam().ScopeLabel()); - TestFormatF TestFormat = GetParam().WrapTestFormat(); + TestFormatF TestFormat = GetParam().WrapTestFormat( + PRODUCT_INTERNAL, /*use_external_float32=*/false); using values::Int32; using values::Numeric; @@ -913,7 +1031,8 @@ TEST_P(FormatFunctionTests, NumericFormat_Errors) { TEST_P(FormatFunctionTests, BigNumericFormat_Errors) { SCOPED_TRACE(GetParam().ScopeLabel()); - TestFormatF TestFormat = GetParam().WrapTestFormat(); + TestFormatF TestFormat = GetParam().WrapTestFormat( + PRODUCT_INTERNAL, /*use_external_float32=*/false); using values::BigNumeric; using values::Int32; @@ -1000,8 +1119,10 @@ void CompareFormattedStrings(const std::string& formatted_double, template // T must be NumericValue or BigNumericValue. void TestFormatNumericWithRandomData(FormatF FormatFunction) { - auto TestFormat = absl::bind_front(TestFormatFunction, FormatFunction, - /*canonicalize_zero=*/true); + auto TestFormat = + absl::bind_front(TestFormatFunction, FormatFunction, PRODUCT_INTERNAL, + /*canonicalize_zero=*/true, + /*use_external_float32=*/false); using values::Int32; absl::BitGen random; for (int i = 0; i < 20000; ++i) { @@ -1106,9 +1227,9 @@ TEST_P(FormatComplianceTests, Test) { std::string actual; bool is_null = false; - const absl::Status status = - FormatFunction(pattern, args, ProductMode::PRODUCT_INTERNAL, &actual, - &is_null, /*canonicalize_zero=*/true); + const absl::Status status = FormatFunction( + pattern, args, ProductMode::PRODUCT_INTERNAL, &actual, &is_null, + /*canonicalize_zero=*/true, /*use_external_float32=*/false); if (using_any_civil_time_values && !zetasql_base::ContainsKey(test.params.required_features(), diff --git a/zetasql/public/functions/json.cc b/zetasql/public/functions/json.cc index aa8a39fb5..e9c914d24 100644 --- a/zetasql/public/functions/json.cc +++ b/zetasql/public/functions/json.cc @@ -15,29 +15,38 @@ #include "zetasql/public/functions/json.h" +#include #include #include +#include #include #include #include #include -#include #include #include #include "zetasql/common/errors.h" -#include "zetasql/common/int_ops_util.h" #include "zetasql/public/functions/convert.h" #include "zetasql/public/functions/convert_string.h" +#include "zetasql/public/functions/json_format.h" #include "zetasql/public/functions/json_internal.h" #include "zetasql/public/functions/to_json.h" #include "zetasql/public/json_value.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/numeric_value.h" +#include "zetasql/public/value.h" #include "absl/base/optimization.h" +#include "absl/functional/function_ref.h" #include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/match.h" +#include "absl/strings/numbers.h" +#include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" +#include "zetasql/base/lossless_convert.h" #include "re2/re2.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -337,21 +346,84 @@ absl::StatusOr MergeJSONPathsIntoSqlStandardMode( } absl::StatusOr ConvertJsonToInt64(JSONValueConstRef input) { + int64_t output; if (input.IsInt64()) { return input.GetInt64(); } + if (input.IsDouble()) { + if (zetasql_base::LosslessConvert(input.GetDouble(), &output)) { + return output; + } + return MakeEvalError() << "The provided JSON number: " << input.GetDouble() + << " cannot be converted to an int64"; + } + return MakeEvalError() << "The provided JSON input is not an integer"; +} + +absl::StatusOr ConvertJsonToInt32(JSONValueConstRef input) { + int32_t output; + if (input.IsInt64()) { + if (zetasql_base::LosslessConvert(input.GetInt64(), &output)) { + return output; + } + return MakeEvalError() << "The provided JSON number: " << input.GetInt64() + << " cannot be converted to an int32"; + } + if (input.IsDouble()) { + if (zetasql_base::LosslessConvert(input.GetDouble(), &output)) { + return output; + } + return MakeEvalError() << "The provided JSON number: " << input.GetDouble() + << " cannot be converted to an int32"; + } + return MakeEvalError() << "The provided JSON input is not an integer"; +} - // There must be no fractional part if provided double as input +absl::StatusOr ConvertJsonToUint32(JSONValueConstRef input) { + uint32_t output; + if (input.IsUInt64()) { + if (zetasql_base::LosslessConvert(input.GetUInt64(), &output)) { + return output; + } + return MakeEvalError() << "The provided JSON number: " << input.GetUInt64() + << " cannot be converted to an uint32"; + } + if (input.IsInt64()) { + if (zetasql_base::LosslessConvert(input.GetInt64(), &output)) { + return output; + } + return MakeEvalError() << "The provided JSON number: " << input.GetInt64() + << " cannot be converted to an uint32"; + } if (input.IsDouble()) { - double input_as_double = input.GetDouble(); - int64_t output; - if (LossLessConvertDoubleToInt64(input_as_double, &output)) { + if (zetasql_base::LosslessConvert(input.GetDouble(), &output)) { return output; } - return MakeEvalError() << "The provided JSON number: " << input_as_double - << " cannot be converted to an integer"; + return MakeEvalError() << "The provided JSON number: " << input.GetDouble() + << " cannot be converted to an uint32"; } + return MakeEvalError() << "The provided JSON input is not an integer"; +} +absl::StatusOr ConvertJsonToUint64(JSONValueConstRef input) { + uint64_t output; + if (input.IsUInt64()) { + return input.GetUInt64(); + } + if (input.IsInt64()) { + if (zetasql_base::LosslessConvert(input.GetInt64(), &output)) { + return output; + } + return MakeEvalError() << "The provided JSON number: " << input.GetInt64() + << " cannot be converted to an uint64"; + } + if (input.IsDouble()) { + if (zetasql_base::LosslessConvert(input.GetDouble(), &output)) { + return output; + } + return MakeEvalError() << "The provided JSON number: " << input.GetDouble() + << " cannot be converted to an uint64"; + } return MakeEvalError() << "The provided JSON input is not an integer"; } @@ -372,6 +444,9 @@ absl::StatusOr ConvertJsonToString(JSONValueConstRef input) { absl::StatusOr ConvertJsonToDouble(JSONValueConstRef input, WideNumberMode wide_number_mode, ProductMode product_mode) { + std::string type_name = + product_mode == PRODUCT_EXTERNAL ? "FLOAT64" : "DOUBLE"; + if (input.IsDouble()) { return input.GetDouble(); } @@ -381,11 +456,9 @@ absl::StatusOr ConvertJsonToDouble(JSONValueConstRef input, if (wide_number_mode == functions::WideNumberMode::kExact && (value < kMinLosslessInt64ValueForJson || value > kMaxLosslessInt64ValueForJson)) { - std::string function_name = - product_mode == PRODUCT_EXTERNAL ? "FLOAT64" : "DOUBLE"; return MakeEvalError() << "JSON number: " << value << " cannot be converted to " - << function_name << " without loss of precision"; + << type_name << " without loss of precision"; } return double{static_cast(value)}; } @@ -393,11 +466,9 @@ absl::StatusOr ConvertJsonToDouble(JSONValueConstRef input, uint64_t value = input.GetUInt64(); if (wide_number_mode == functions::WideNumberMode::kExact && value > static_cast(kMaxLosslessInt64ValueForJson)) { - std::string function_name = - product_mode == PRODUCT_EXTERNAL ? "FLOAT64" : "DOUBLE"; return MakeEvalError() << "JSON number: " << value << " cannot be converted to " - << function_name << " without loss of precision"; + << type_name << " without loss of precision"; } return double{static_cast(value)}; } @@ -405,6 +476,122 @@ absl::StatusOr ConvertJsonToDouble(JSONValueConstRef input, return MakeEvalError() << "The provided JSON input is not a number"; } +absl::StatusOr ConvertJsonToFloat(JSONValueConstRef input, + WideNumberMode wide_number_mode, + ProductMode product_mode) { + std::string type_name = + product_mode == PRODUCT_EXTERNAL ? "FLOAT32" : "FLOAT"; + + float output; + if (input.IsDouble()) { + double value = input.GetDouble(); + if (zetasql_base::LosslessConvert(value, &output)) { + return output; + } + if (wide_number_mode == functions::WideNumberMode::kExact) { + return MakeEvalError() + << "JSON number: " << value << " cannot be converted to " + << type_name << " without loss of precision"; + } + // A static_cast can result in +/-INF, so check if input is in range. + absl::Status status; + if (!Convert(value, &output, &status)) { + return MakeEvalError() << "JSON number: " << value + << " cannot be converted to " << type_name; + } + return output; + } + if (input.IsInt64()) { + int64_t value = input.GetInt64(); + if (zetasql_base::LosslessConvert(value, &output)) { + return output; + } + if (wide_number_mode == functions::WideNumberMode::kExact) { + return MakeEvalError() + << "JSON number: " << value << " cannot be converted to " + << type_name << " without loss of precision"; + } + return static_cast(value); + } + if (input.IsUInt64()) { + uint64_t value = input.GetUInt64(); + if (zetasql_base::LosslessConvert(value, &output)) { + return output; + } + if (wide_number_mode == functions::WideNumberMode::kExact) { + return MakeEvalError() + << "JSON number: " << value << " cannot be converted to " + << type_name << " without loss of precision"; + } + return static_cast(value); + } + + return MakeEvalError() << "The provided JSON input is not a number"; +} + +template +absl::StatusOr> ConvertJsonToArray( + JSONValueConstRef input, + absl::FunctionRef(JSONValueConstRef)> converter) { + if (!input.IsArray()) { + return MakeEvalError() << "The provided JSON input is not an array"; + } + + std::vector result; + result.reserve(input.GetArraySize()); + for (int i = 0; i < input.GetArraySize(); ++i) { + ZETASQL_ASSIGN_OR_RETURN(*std::back_inserter(result), + converter(input.GetArrayElement(i))); + } + return result; +} + +absl::StatusOr> ConvertJsonToInt64Array( + JSONValueConstRef input) { + return ConvertJsonToArray(input, ConvertJsonToInt64); +} + +absl::StatusOr> ConvertJsonToInt32Array( + JSONValueConstRef input) { + return ConvertJsonToArray(input, ConvertJsonToInt32); +} + +absl::StatusOr> ConvertJsonToUint64Array( + JSONValueConstRef input) { + return ConvertJsonToArray(input, ConvertJsonToUint64); +} + +absl::StatusOr> ConvertJsonToUint32Array( + JSONValueConstRef input) { + return ConvertJsonToArray(input, ConvertJsonToUint32); +} + +absl::StatusOr> ConvertJsonToBoolArray( + JSONValueConstRef input) { + return ConvertJsonToArray(input, ConvertJsonToBool); +} + +absl::StatusOr> ConvertJsonToStringArray( + JSONValueConstRef input) { + return ConvertJsonToArray(input, ConvertJsonToString); +} + +absl::StatusOr> ConvertJsonToDoubleArray( + JSONValueConstRef input, WideNumberMode mode, ProductMode product_mode) { + return ConvertJsonToArray( + input, [mode, product_mode](JSONValueConstRef input) { + return ConvertJsonToDouble(input, mode, product_mode); + }); +} + +absl::StatusOr> ConvertJsonToFloatArray( + JSONValueConstRef input, WideNumberMode mode, ProductMode product_mode) { + return ConvertJsonToArray( + input, [mode, product_mode](JSONValueConstRef input) { + return ConvertJsonToFloat(input, mode, product_mode); + }); +} + absl::StatusOr GetJsonType(JSONValueConstRef input) { if (input.IsNumber()) { return "number"; @@ -474,20 +661,20 @@ absl::StatusOr> LaxConvertJsonToBool( return std::nullopt; } -absl::StatusOr> LaxConvertJsonToInt64( - JSONValueConstRef input) { +template +absl::StatusOr> LaxConvertJsonToInt(JSONValueConstRef input) { if (input.IsBoolean()) { return input.GetBoolean() ? 1 : 0; } else if (input.IsInt64()) { - return input.GetInt64(); + return ConvertNumericToNumeric(input.GetInt64()); } else if (input.IsUInt64()) { - return ConvertNumericToNumeric(input.GetUInt64()); + return ConvertNumericToNumeric(input.GetUInt64()); } else if (input.IsDouble()) { - return ConvertNumericToNumeric(input.GetDouble()); + return ConvertNumericToNumeric(input.GetDouble()); } else if (input.IsString()) { BigNumericValue big_numeric_value; absl::Status status; - int64_t out; + T out; if (!StringToNumeric(input.GetString(), &big_numeric_value, &status) || !Convert(big_numeric_value, &out, &status)) { return std::nullopt; @@ -497,20 +684,52 @@ absl::StatusOr> LaxConvertJsonToInt64( return std::nullopt; } -absl::StatusOr> LaxConvertJsonToFloat64( +absl::StatusOr> LaxConvertJsonToInt64( + JSONValueConstRef input) { + return LaxConvertJsonToInt(input); +} + +absl::StatusOr> LaxConvertJsonToInt32( + JSONValueConstRef input) { + return LaxConvertJsonToInt(input); +} + +absl::StatusOr> LaxConvertJsonToUint64( + JSONValueConstRef input) { + return LaxConvertJsonToInt(input); +} + +absl::StatusOr> LaxConvertJsonToUint32( + JSONValueConstRef input) { + return LaxConvertJsonToInt(input); +} + +template +absl::StatusOr> LaxConvertJsonToFloat( JSONValueConstRef input) { + // Note that unlike integers, we don't convert booleans into floats. if (input.IsInt64()) { - return ConvertNumericToNumeric(input.GetInt64()); + return ConvertNumericToNumeric(input.GetInt64()); } else if (input.IsUInt64()) { - return ConvertNumericToNumeric(input.GetUInt64()); + return ConvertNumericToNumeric(input.GetUInt64()); } else if (input.IsDouble()) { - return input.GetDouble(); + return ConvertNumericToNumeric(input.GetDouble()); } else if (input.IsString()) { - return ConvertStringToNumeric(input.GetString()); + return ConvertStringToNumeric(input.GetString()); } return std::nullopt; } +absl::StatusOr> LaxConvertJsonToFloat64( + JSONValueConstRef input) { + return LaxConvertJsonToFloat(input); +} + +absl::StatusOr> LaxConvertJsonToFloat32( + JSONValueConstRef input) { + return LaxConvertJsonToFloat(input); +} + absl::StatusOr> LaxConvertJsonToString( JSONValueConstRef input) { if (input.IsBoolean()) { @@ -527,6 +746,65 @@ absl::StatusOr> LaxConvertJsonToString( return std::nullopt; } +template +absl::StatusOr>>> +LaxConvertJsonToArray( + JSONValueConstRef input, + absl::FunctionRef>(JSONValueConstRef)> + converter) { + if (!input.IsArray()) { + return std::nullopt; + } + + std::vector> result; + result.reserve(input.GetArraySize()); + for (int i = 0; i < input.GetArraySize(); ++i) { + ZETASQL_ASSIGN_OR_RETURN(*std::back_inserter(result), + converter(input.GetArrayElement(i))); + } + return result; +} + +absl::StatusOr>>> +LaxConvertJsonToBoolArray(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToBool); +} + +absl::StatusOr>>> +LaxConvertJsonToInt64Array(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToInt64); +} + +absl::StatusOr>>> +LaxConvertJsonToInt32Array(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToInt32); +} + +absl::StatusOr>>> +LaxConvertJsonToUint64Array(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToUint64); +} + +absl::StatusOr>>> +LaxConvertJsonToUint32Array(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToUint32); +} + +absl::StatusOr>>> +LaxConvertJsonToFloat64Array(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToFloat64); +} + +absl::StatusOr>>> +LaxConvertJsonToFloat32Array(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToFloat32); +} + +absl::StatusOr>>> +LaxConvertJsonToStringArray(JSONValueConstRef input) { + return LaxConvertJsonToArray(input, LaxConvertJsonToString); +} + absl::StatusOr JsonArray(absl::Span args, const LanguageOptions& language_options, bool canonicalize_zero) { @@ -631,8 +909,8 @@ absl::StatusOr JsonRemove(JSONValueRef input, continue; } } - // Nonexistent member, invalid array index or type mismatch. Do nothing and - // exit. + // Nonexistent member, invalid array index or type mismatch. Do nothing + // and exit. return false; } @@ -665,9 +943,9 @@ absl::Status JsonAddArrayElement(JSONValueRef input, // If the value is a NULL ARRAY ignore the operation. return absl::OkStatus(); } - // If the value to be inserted is an array and add_each_element is true, the - // function adds each element separately instead of a single JSON array - // value. + // If the value to be inserted is an array and add_each_element is true, + // the function adds each element separately instead of a single JSON + // array value. elements_to_insert.reserve(value.num_elements()); for (const Value& element : value.elements()) { ZETASQL_ASSIGN_OR_RETURN( @@ -818,8 +1096,8 @@ absl::Status JsonSet(JSONValueRef input, StrictJSONPathIterator& path_iterator, // 1) If token in path exists in current JSON element, continue processing // the JSON subtree with the next path token. // 2) If the member doesn't exist or the array index is out of bounds or - // the current JSON element is null, then exit the loop. Auto-creation will - // happen next. + // the current JSON element is null, then exit the loop. Auto-creation + // will happen next. // 3) If there is a type mismatch, this is not a valid Set operation so // ignore operation and return early. for (; !path_iterator.End(); ++path_iterator) { @@ -862,7 +1140,8 @@ absl::Status JsonSet(JSONValueRef input, StrictJSONPathIterator& path_iterator, } if (!path_iterator.End()) { - // Auto-creation will happen. Make sure it won't create an oversized array. + // Auto-creation will happen. Make sure it won't create an oversized + // array. size_t path_position = path_iterator.Depth() - 1; for (; !path_iterator.End(); ++path_iterator) { const StrictJSONPathToken& token = *path_iterator; @@ -907,14 +1186,6 @@ absl::Status JsonSet(JSONValueRef input, StrictJSONPathIterator& path_iterator, return absl::OkStatus(); } -absl::Status JsonSet(JSONValueRef input, StrictJSONPathIterator& path_iterator, - const Value& value, - const LanguageOptions& language_options, - bool canonicalize_zero) { - return JsonSet(input, path_iterator, value, /*create_if_missing==*/true, - language_options, canonicalize_zero); -} - namespace { struct JsonValueNode { @@ -1034,5 +1305,189 @@ absl::Status JsonStripNulls(JSONValueRef input, return StripNullsImpl(input, include_arrays, options); } +namespace { + +class JsonTreeWalker { + public: + JsonTreeWalker(JSONValueConstRef root, StrictJSONPathIterator* path_iterator); + + // Processes matches for the provided JSONPath given the JSON tree. + absl::Status Process(); + + // Fetch the computed result. In order for the result to be valid must have + // called Process(). + // + // Result only valid on first call. + JSONValue ConsumeResult() { return std::move(output_); } + + private: + // Handles a matched JSONValue. + absl::Status HandleMatch(JSONValueConstRef json_value); + + struct StackElement { + public: + // The current index of the JSONPathToken to process in the + // StrictJSONPathIterator. If `path_token_index` == path_iterator.size(), + // this indicates we have hit the end of the JSONPath and there are no + // tokens left to process. + int path_token_index; + // Matched subtree for a processed JSONPath prefix. + JSONValueConstRef subtree; + }; + + // If path_token referenced by `stack_element` contains an object + // key(string type), performs lax match operation in the `subtree`, updates + // `path_element_stack_`, and returns true. Returns false if path_token is + // not an object key which indicates no operation is performed. + bool MaybeProcessObjectKey(const StackElement& stack_element); + // If path_token referenced by `stack_element` contains an array + // index(int type), performs lax match operation in the `subtree`, updates + // `path_element_stack_`, and returns true. Returns false if path_token is + // not an array index which indicates no operation is performed. + bool MaybeProcessArrayIndex(const StackElement& stack_element); + + // Represents the tuples of to process. + std::vector path_element_stack_; + // The results. + JSONValue output_; + StrictJSONPathIterator* path_iterator_; +}; + +JsonTreeWalker::JsonTreeWalker(JSONValueConstRef root, + StrictJSONPathIterator* path_iterator) + : path_iterator_(path_iterator) { + // Push the first JSONPathToken and root JSON to process. The 0th index of + // `path_iterator` is a no-op '$' token and we simply skip this. + path_element_stack_.push_back({.path_token_index = 1, .subtree = root}); + output_.GetRef().SetToEmptyArray(); +} + +absl::Status JsonTreeWalker::HandleMatch(JSONValueConstRef json_value) { + // Add the matched JSON to output JSON Array. + ZETASQL_RETURN_IF_ERROR( + output_.GetRef().AppendArrayElement(JSONValue::CopyFrom(json_value))); + return absl::OkStatus(); +} + +absl::Status JsonTreeWalker::Process() { + while (!path_element_stack_.empty()) { + StackElement stack_element = path_element_stack_.back(); + path_element_stack_.pop_back(); + + if (stack_element.path_token_index == path_iterator_->Size()) { + // We have finished processing the entire JSONPath. Add matched element + // to output. + ZETASQL_RETURN_IF_ERROR(HandleMatch(stack_element.subtree)); + continue; + } + + // The current path token expects an object. + // + // If failed to process, something has gone really wrong here. The + // JSONPathToken is guaranteed to be an object key or an array index. + // Reasoning: The beginning no-op token('$') is already processed. + ZETASQL_RET_CHECK(MaybeProcessObjectKey(stack_element) || + MaybeProcessArrayIndex(stack_element)) + << "Unexpected JSONPathToken type encountered during " + "JSON_QUERY_LAX matching."; + } + return absl::OkStatus(); +} + +bool JsonTreeWalker::MaybeProcessObjectKey(const StackElement& stack_element) { + const std::string* key = + path_iterator_->GetToken(stack_element.path_token_index) + .MaybeGetObjectKey(); + if (key == nullptr) { + // The JSONPathToken is not an object key. + return false; + } + JSONValueConstRef subtree = stack_element.subtree; + if (subtree.IsObject()) { + // Path token and JSON subtree match OBJECT types. + auto maybe_member = subtree.GetMemberIfExists(*key); + if (!maybe_member.has_value()) { + // The JSON OBJECT does not contain the path token. This branch is not + // a match. + return true; + } + // Match the next path token. + path_element_stack_.push_back( + {.path_token_index = stack_element.path_token_index + 1, + .subtree = *maybe_member}); + } else if (subtree.IsArray()) { + // The JSON subtree is an array and path token expects an object. We add + // elements in reverse order to stack as we need to add matched JSON + // path results in DFS in-order traversal. + const auto& elements = subtree.GetArrayElements(); + for (auto it = elements.rbegin(); it != elements.rend(); ++it) { + JSONValueConstRef element = *it; + if (element.IsObject()) { + auto maybe_member = element.GetMemberIfExists(*key); + if (!maybe_member.has_value()) { + // The object doesn't contain the JSONPathToken. No match for this + // branch. Continue processing rest of array elements. + continue; + } + path_element_stack_.push_back( + {.path_token_index = stack_element.path_token_index + 1, + .subtree = *maybe_member}); + } else if (path_iterator_->GetJsonPathOptions().recursive && + element.IsArray()) { + // This is a nested array element. Only process if `recursive` is set + // to true. Else, this branch is not a match. + path_element_stack_.push_back( + {.path_token_index = stack_element.path_token_index, + .subtree = element}); + } + // Scalar value. No match for this branch. + } + } + // If `subtree` is not an object or an array this indicates it's a scalar + // value. A scalar value indicates no match for this branch so no further + // processing is required. + return true; +} + +bool JsonTreeWalker::MaybeProcessArrayIndex(const StackElement& stack_element) { + const int64_t* array_index = + path_iterator_->GetToken(stack_element.path_token_index) + .MaybeGetArrayIndex(); + if (array_index == nullptr) { + // The JSONPathToken is not an array index. + return false; + } + JSONValueConstRef subtree = stack_element.subtree; + if (subtree.IsArray()) { + // The subtree matches expected JSONPathToken array type. + if (*array_index < subtree.GetArraySize()) { + JSONValueConstRef matched_element = subtree.GetArrayElement(*array_index); + path_element_stack_.push_back( + {.path_token_index = stack_element.path_token_index + 1, + .subtree = matched_element}); + } + // The JSONPathToken array_index is larger than the size of the array. + // This branch is not a match. + } else if (*array_index == 0) { + // The subtree is not an array. If the `array_index` = 0 implicitly + // autowrap and access the 0th element. Essentially an `array_index` = 0 + // is a no-op JSONPathToken. + path_element_stack_.push_back( + {.path_token_index = stack_element.path_token_index + 1, + .subtree = subtree}); + } + return true; +} + +} // namespace + +absl::StatusOr JsonQueryLax(JSONValueConstRef input, + StrictJSONPathIterator& path_iterator) { + ZETASQL_RET_CHECK(path_iterator.GetJsonPathOptions().lax); + JsonTreeWalker walker(input, &path_iterator); + ZETASQL_RETURN_IF_ERROR(walker.Process()); + return walker.ConsumeResult(); +} + } // namespace functions } // namespace zetasql diff --git a/zetasql/public/functions/json.h b/zetasql/public/functions/json.h index 1619c9ee3..dcedc19c9 100644 --- a/zetasql/public/functions/json.h +++ b/zetasql/public/functions/json.h @@ -280,10 +280,35 @@ absl::StatusOr MergeJSONPathsIntoSqlStandardMode( // fractional part or is not within the INT64 range). absl::StatusOr ConvertJsonToInt64(JSONValueConstRef input); -// Converts 'input' into a Boolean. +// Converts 'input' into a INT32. +// Returns an error if: +// - 'input' does not contain a number. +// - 'input' is not within the INT32 value domain (meaning the number has a +// fractional part or is not within the INT32 range). +absl::StatusOr ConvertJsonToInt32(JSONValueConstRef input); + +// Converts 'input' into a UINT64. +// Returns an error if: +// - 'input' does not contain a number. +// - 'input' is not within the UINT64 value domain (meaning the number has a +// fractional part, is negative or is not within the UINT64 range). +absl::StatusOr ConvertJsonToUint64(JSONValueConstRef input); + +// Converts 'input' into a UINT32. +// Returns an error if: +// - 'input' does not contain a number. +// - 'input' is not within the UINT32 value domain (meaning the number has a +// fractional part, is negative or is not within the UINT32 range). +absl::StatusOr ConvertJsonToUint32(JSONValueConstRef input); + +// Converts 'input' into a BOOL. +// Returns an error if: +// - 'input' does not contain a boolean. absl::StatusOr ConvertJsonToBool(JSONValueConstRef input); -// Converts 'input' into a String. +// Converts 'input' into a STRING. +// Returns an error if: +// - 'input' does not contain a string. absl::StatusOr ConvertJsonToString(JSONValueConstRef input); // Mode to determine how to handle numbers that cannot be round-tripped. @@ -298,6 +323,71 @@ absl::StatusOr ConvertJsonToDouble(JSONValueConstRef input, WideNumberMode mode, ProductMode product_mode); +// Converts 'input' into a Float. +// 'mode': defines what happens with a number that cannot be converted to float +// without loss of precision: +// - 'exact': function fails if result cannot be round-tripped through float. +// - 'round': the numeric value stored in JSON will be rounded to FLOAT. +absl::StatusOr ConvertJsonToFloat(JSONValueConstRef input, + WideNumberMode mode, + ProductMode product_mode); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToInt64` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToInt64Array( + JSONValueConstRef input); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToInt32` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToInt32Array( + JSONValueConstRef input); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToUint64` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToUint64Array( + JSONValueConstRef input); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToUint32` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToUint32Array( + JSONValueConstRef input); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToBool` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToBoolArray( + JSONValueConstRef input); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToString` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToStringArray( + JSONValueConstRef input); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToDouble` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToDoubleArray( + JSONValueConstRef input, WideNumberMode mode, ProductMode product_mode); + +// Converts 'input' into a ARRAY. +// Returns an error if: +// - 'input' is not a JSON array. +// - `ConvertJsonToFloat` fails for any element of 'input'. +absl::StatusOr> ConvertJsonToFloatArray( + JSONValueConstRef input, WideNumberMode mode, ProductMode product_mode); + // Returns the type of the outermost JSON value as a text string. absl::StatusOr GetJsonType(JSONValueConstRef input); @@ -314,14 +404,80 @@ absl::StatusOr> LaxConvertJsonToBool( absl::StatusOr> LaxConvertJsonToInt64( JSONValueConstRef input); -// Similar to the above function except converts json 'input' into Float. +// Similar to the above function except converts json 'input' into INT32. +// Floating point numbers are rounded when converted to INT32. +// Integers outside INT32 range yield nullopt. +absl::StatusOr> LaxConvertJsonToInt32( + JSONValueConstRef input); + +// Similar to the above function except converts json 'input' into UINT64. +// Floating point numbers are rounded when converted to UINT64. +// Integers outside UINT64 range yield nullopt. +absl::StatusOr> LaxConvertJsonToUint64( + JSONValueConstRef input); + +// Similar to the above function except converts json 'input' into UINT32. +// Floating point numbers are rounded when converted to UINT32. +// Integers outside UINT32 range yield nullopt. +absl::StatusOr> LaxConvertJsonToUint32( + JSONValueConstRef input); + +// Similar to the above function except converts json 'input' into DOUBLE. absl::StatusOr> LaxConvertJsonToFloat64( JSONValueConstRef input); -// Similar to the above function except converts json 'input' into String. +// Similar to the above function except converts json 'input' into FLOAT. +absl::StatusOr> LaxConvertJsonToFloat32( + JSONValueConstRef input); + +// Similar to the above function except converts json 'input' into STRING. absl::StatusOr> LaxConvertJsonToString( JSONValueConstRef input); +// Converts a JSON 'input' into a ARRAY. +// Upon success the function returns the converted value. If the value is not +// an array, returns nullopt. If an element of the input array fails lax +// conversion, that element will be set to nullopt in the resulting vector. +// Returns non-ok status if there's an internal error during execution. +// For more details on the conversion rules +// see (broken link). +absl::StatusOr>>> +LaxConvertJsonToBoolArray(JSONValueConstRef input); + +// Similar to the above function except converts JSON 'input' into INT64 array. +// Floating point numbers are rounded when converted to INT64. +absl::StatusOr>>> +LaxConvertJsonToInt64Array(JSONValueConstRef input); + +// Similar to the above function except converts JSON 'input' into INT32 array. +// Floating point numbers are rounded when converted to INT32. +absl::StatusOr>>> +LaxConvertJsonToInt32Array(JSONValueConstRef input); + +// Similar to the above function except converts JSON 'input' into UINT64 array. +// Floating point numbers are rounded when converted to UINT64. +absl::StatusOr>>> +LaxConvertJsonToUint64Array(JSONValueConstRef input); + +// Similar to the above function except converts JSON 'input' into UINT32 array. +// Floating point numbers are rounded when converted to UINT32. +absl::StatusOr>>> +LaxConvertJsonToUint32Array(JSONValueConstRef input); + +// Similar to the above function except converts JSON 'input' into FLOAT32 +// array. +absl::StatusOr>>> +LaxConvertJsonToFloat64Array(JSONValueConstRef input); + +// Similar to the above function except converts JSON 'input' into FLOAT32 +// array. +absl::StatusOr>>> +LaxConvertJsonToFloat32Array(JSONValueConstRef input); + +// Similar to the above function except converts JSON 'input' into STRING array. +absl::StatusOr>>> +LaxConvertJsonToStringArray(JSONValueConstRef input); + // Converts a variadic number of arguments into a JSON array of these arguments. // If 'canonicalize_zero' is true, the sign on a signed zero is removed when // converting a numeric type to JSON. @@ -522,13 +678,6 @@ absl::Status JsonSet(JSONValueRef input, const LanguageOptions& language_options, bool canonicalize_zero); -ABSL_DEPRECATED("Use above version of JsonSet() instead") -absl::Status JsonSet(JSONValueRef input, - json_internal::StrictJSONPathIterator& path_iterator, - const Value& value, - const LanguageOptions& language_options, - bool canonicalize_zero); - // Cleans up `input` by removing JSON 'null' and optionally empty containers // from the JSON subtree pointed to by `path_iterator`. If `path_iterator` // points to a nonexistent path does nothing. @@ -600,6 +749,70 @@ absl::Status JsonStripNulls( JSONValueRef input, json_internal::StrictJSONPathIterator& path_iterator, bool include_arrays, bool remove_empty); +// Extracts values from `input` matched by JSONPath `path_iterator`. This +// version of JSON_QUERY implementation extracts data using lax semantics. Lax +// semantics implicitly adapts matching JSONPath to the structure of the +// `input`. +// +// If the `path_iterator`.GetJsonPathOptions().recursive is set to true: A path +// token expects an object(.member) and the json_doc element is an array, each +// element in the array is recursively unwrapped until a non-array type is +// encountered prior to matching member. +// If set to false: Only a single level array is unwrapped prior to matching +// member. +// +// Matched results are wrapped in a JSON array. If `path_iterator` matches no +// data an empty JSON array is returned. +// +// Example 1: +// JsonQueryLax(JSON '[{"a":1}, [{"a":[2], {"b":3}, 4]]', '$.a') +// `path_iterator`.GetJsonPathOptions().recursive = true +// Result: JSON '[1, [2]]' +// Reasoning: Because `recursive` is set to true, we recursively unwrap before +// matching token `.a`. +// +// Example 2: +// JsonQueryLax(JSON '[{"a":1}, [{"a":[2], {"b":3}, 4]]', '$.a') +// `path_iterator`.GetJsonPathOptions().recursive = false +// Result: JSON '[1]' +// Reasoning: Because `recursive` is set to false, token `.a` does not match +// json_doc element '[{"a":[2], {"b":3}, 4]]'. +// +// Example 3: +// JsonQueryLax(JSON '[[{"a":[]}]]', '$.a') +// Result: JSON '[[]]' +// `path_iterator`.GetJsonPathOptions().recursive = true +// Reasoning: Because `recursive` is set to true, we recursively unwrap before +// matching token `.a`. +// +// Example 4: +// JsonQueryLax(JSON '[[{"a":[]}]]', '$.a') +// `path_iterator`.GetJsonPathOptions().recursive = false +// Result: JSON '[]' +// Reasoning: Because `recursive` is set to false, no data is matched. +// +// Example 5: +// JSON_QUERY_LAX(JSON '1', '$[0]') +// `path_iterator`.GetJsonPathOptions().recursive = {false or true} +// Result: JSON '[1]' +// Reasoning: Path '[0]' forces autowrap of json doc element '1' -> '[1]' before +// matching the token '[0]. `recursive` has no effect on autowrap of arrays. +// +// Example 6: +// JSON_QUERY_LAX(JSON '1', '$[1]') +// `path_iterator`.GetJsonPathOptions().recursive = {false or true} +// Result: JSON '[]' (No matching results) +// Reasoning: Path '[1]' forces autowrap of json doc element '1' -> '[1]' before +// matching the token '[1]'. However, index 1 is larger than the array size. +// `recursive` has no effect on autowrap of arrays. +// +// Invariant: path_iterator->JsonPathOptions->lax must be 'true', otherwise the +// function will return an error. This is because non-lax JSON_QUERY does not +// follow StrictPath semantics. +absl::StatusOr JsonQueryLax( + JSONValueConstRef input, + json_internal::StrictJSONPathIterator& path_iterator); + } // namespace functions } // namespace zetasql #endif // ZETASQL_PUBLIC_FUNCTIONS_JSON_H_ diff --git a/zetasql/public/functions/json_internal.cc b/zetasql/public/functions/json_internal.cc index b5a89c3bb..fb5e4c414 100644 --- a/zetasql/public/functions/json_internal.cc +++ b/zetasql/public/functions/json_internal.cc @@ -27,9 +27,11 @@ #include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" +#include "absl/strings/match.h" #include "absl/strings/numbers.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" +#include "zetasql/base/case.h" #include "re2/re2.h" #include "zetasql/base/status_macros.h" @@ -65,16 +67,31 @@ static LazyRE2 kEscKeyRegexStandard = { static LazyRE2 kUnSupportedLexer = {"(\\*|\\.\\.|@)"}; static LazyRE2 kBeginRegex = {"\\$"}; +// This is a regex that supports lax notation. It matches keywords +// {lax, recursive} case agnostically. The keywords are specified before the +// `kBeginRegex`. +static LazyRE2 kLaxOptionsKeywordRegex = { + R"regexp((?i)(lax|recursive)\s+)regexp"}; +constexpr absl::string_view kLaxKeyword = "lax"; +constexpr absl::string_view kRecursiveKeyword = "recursive"; + constexpr char kStandardEscapeChar = '"'; constexpr char kLegacyEscapeChar = '\''; constexpr absl::string_view kBeginToken = ""; +// We have separate error messages because engines currently validate against +// this message. Engines should not be verifying ZetaSQL internals but that is +// the current state. +constexpr absl::string_view kPrefixErrorMsg = "JSONPath must start with '$'"; +constexpr absl::string_view kLaxPrefixErrorMsg = + "JSONPath must start with zero or more unique modifiers followed by '$'"; + } // namespace // Checks if the given JSON path is supported and valid. absl::Status IsValidJSONPath(absl::string_view text, bool sql_standard_mode) { if (!RE2::Consume(&text, *kBeginRegex)) { - return absl::OutOfRangeError("JSONPath must start with '$'"); + return absl::OutOfRangeError(kPrefixErrorMsg); } const RE2* esc_key_regex = kEscKeyRegex.get(); @@ -103,11 +120,15 @@ absl::Status IsValidJSONPath(absl::string_view text, bool sql_standard_mode) { return absl::OkStatus(); } -absl::Status IsValidJSONPathStrict(absl::string_view text) { +static absl::string_view GetPrefixErrorMessage(bool enable_lax_mode) { + return enable_lax_mode ? kLaxPrefixErrorMsg : kPrefixErrorMsg; +} + +static absl::Status ValidateAndConsumeStrictPathAfterKeywords( + absl::string_view& text, bool enable_lax_mode) { if (!RE2::Consume(&text, *kBeginRegex)) { - return absl::OutOfRangeError("JSONPath must start with '$'"); + return absl::OutOfRangeError(GetPrefixErrorMessage(enable_lax_mode)); } - while (!text.empty()) { if (!RE2::Consume(&text, *kKeyRegex) && !RE2::Consume(&text, *kEscKeyRegexStandard)) { @@ -126,6 +147,62 @@ absl::Status IsValidJSONPathStrict(absl::string_view text) { return absl::OkStatus(); } +// Validates and consumes the lax keywords(specified before the `kBeginRegex`) +// and returns the specified options. +// Returns an error if there are invalid keyword combinations. +static absl::StatusOr +GetOptionsAndConsumeKeywords(absl::string_view& text) { + StrictJSONPathIterator::JsonPathOptions lax_options; + + // The valid combinations are {'lax recursive', 'recursive lax', 'lax'} case + // agnostic. + absl::string_view matched_keyword; + while (RE2::Consume(&text, *kLaxOptionsKeywordRegex, &matched_keyword)) { + // Verify keywords are not repeated. + if (!lax_options.lax && + zetasql_base::CaseEqual(matched_keyword, kLaxKeyword)) { + lax_options.lax = true; + } else if (!lax_options.recursive && + zetasql_base::CaseEqual(matched_keyword, kRecursiveKeyword)) { + lax_options.recursive = true; + } else { + return absl::OutOfRangeError(kLaxPrefixErrorMsg); + } + } + + // 'recursive' without keyword 'lax' is invalid. + if (lax_options.recursive && !lax_options.lax) { + return absl::OutOfRangeError( + "JSONPath has an invalid combination of modifiers. The 'lax' modifier " + "must be included if 'recursive' is specified."); + } + return lax_options; +} + +absl::Status IsValidJSONPathStrict(absl::string_view text, + bool enable_lax_mode) { + if (enable_lax_mode) { + // For validity checks we can ignore the returned JsonPathOptions. + ZETASQL_RETURN_IF_ERROR(GetOptionsAndConsumeKeywords(text).status()); + } + return ValidateAndConsumeStrictPathAfterKeywords(text, enable_lax_mode); +} + +absl::StatusOr IsValidAndLaxJSONPath(absl::string_view text) { + ZETASQL_ASSIGN_OR_RETURN(StrictJSONPathIterator::JsonPathOptions options, + GetOptionsAndConsumeKeywords(text)); + if (options.lax) { + ZETASQL_RETURN_IF_ERROR(ValidateAndConsumeStrictPathAfterKeywords( + text, /*enable_lax_mode=*/true)); + return true; + } + // This is a non-lax path. Therefore, doesn't need to have strict path + // semantics. + ZETASQL_RETURN_IF_ERROR(IsValidJSONPath(text, /*sql_standard_mode=*/true)); + // This is a valid JSONPath but does not follow lax mode. + return false; +} + void RemoveBackSlashFollowedByChar(std::string* token, char esc_chr) { if (token && !token->empty()) { std::string::const_iterator ritr = token->cbegin(); @@ -149,7 +226,7 @@ void RemoveBackSlashFollowedByChar(std::string* token, char esc_chr) { absl::StatusOr> ValidateAndInitializePathTokens( absl::string_view path, bool sql_standard_mode) { if (!RE2::Consume(&path, *kBeginRegex)) { - return absl::OutOfRangeError("JSONPath must start with '$'"); + return absl::OutOfRangeError(kPrefixErrorMsg); } std::vector tokens; tokens.push_back(std::string(kBeginToken)); @@ -201,12 +278,16 @@ ValidJSONPathIterator::Create(absl::string_view js_path, // Validates and initializes a json path. During parsing, initializes all tokens // that can be re-used by `StrictJSONPathIterator`. This avoids duplicate // intensive regex matching. -absl::StatusOr> -ValidateAndInitializeStrictPathTokens(absl::string_view path) { - std::vector tokens; +absl::StatusOr> +StrictJSONPathIterator::Create(absl::string_view path, bool enable_lax_mode) { + JsonPathOptions path_options; + if (enable_lax_mode) { + ZETASQL_ASSIGN_OR_RETURN(path_options, GetOptionsAndConsumeKeywords(path)); + } if (!RE2::Consume(&path, *kBeginRegex)) { - return absl::OutOfRangeError("JSONPath must start with '$'"); + return absl::OutOfRangeError(GetPrefixErrorMessage(enable_lax_mode)); } + std::vector tokens; tokens.push_back(StrictJSONPathToken(std::monostate())); while (!path.empty()) { std::string parsed_string; @@ -228,14 +309,8 @@ ValidateAndInitializeStrictPathTokens(absl::string_view path) { absl::StrCat("Invalid token in JSONPath at: ", path)); } } - return tokens; -} - -absl::StatusOr> -StrictJSONPathIterator::Create(absl::string_view path) { - ZETASQL_ASSIGN_OR_RETURN(std::vector tokens, - ValidateAndInitializeStrictPathTokens(path)); - return absl::WrapUnique(new StrictJSONPathIterator(std::move(tokens))); + return absl::WrapUnique( + new StrictJSONPathIterator(std::move(tokens), path_options)); } } // namespace json_internal diff --git a/zetasql/public/functions/json_internal.h b/zetasql/public/functions/json_internal.h index d81c96489..f04a3e852 100644 --- a/zetasql/public/functions/json_internal.h +++ b/zetasql/public/functions/json_internal.h @@ -48,8 +48,12 @@ namespace json_internal { void RemoveBackSlashFollowedByChar(std::string* token, char esc_chr); -// Bi-directed iterator over JSON path tokens. Functions in this class are -// inlined due to performance reasons. +// This class acts both like an STL container and iterator over JSON path +// tokens. The input JSONPath is parsed and underlying memory for the tokens are +// owned by this class. This is a read-only interface and JSONPath tokens cannot +// be modified after initialization as part of the contract. +// +// Functions in this class are inlined due to performance reasons. template class JSONPathIterator { public: @@ -77,6 +81,12 @@ class JSONPathIterator { return tokens_[depth_ - 1]; } + // Undefined behavior if `i` is out of bounds. + inline const Token& GetToken(int i) const { + ABSL_DCHECK(i >= 0 && i < tokens_.size()); + return tokens_[i]; + } + inline bool NoSuffixToken() { return depth_ == tokens_.size(); } inline size_t Size() { return tokens_.size(); } @@ -130,7 +140,17 @@ class ValidJSONPathIterator final : public JSONPathIterator { // // Strict notation only allows '[]' to refer to an array location and .property // to refer to an object member. -absl::Status IsValidJSONPathStrict(absl::string_view text); +// +// If `enable_lax_mode` is set to true, enables lax path notation. +absl::Status IsValidJSONPathStrict(absl::string_view text, + bool enable_lax_mode = false); + +// Returns whether the given JSON path is valid and contains lax modifier. +// Returns: +// 1) True - The path is valid and in lax mode. +// 2) False - The path is valid but not in lax mode. +// 3) Error - Invalid path. +absl::StatusOr IsValidAndLaxJSONPath(absl::string_view text); // Represents a strict parsed token. class StrictJSONPathToken { @@ -161,13 +181,30 @@ class StrictJSONPathToken { class StrictJSONPathIterator final : public JSONPathIterator { public: - // JSON path much be in standard SQL. + // Specified path options for input JSONPath. The option fields can only be + // set to true if `enable_lax_mode`= true. + struct JsonPathOptions { + bool lax = false; + // Invariant: `recursive`= true only if `lax`= true. This is verified + // during path parsing. + bool recursive = false; + }; + + // JSON path must be in standard SQL. If `enable_lax_mode` is set to true, + // enables lax path notation. static absl::StatusOr> Create( - absl::string_view json_path); + absl::string_view json_path, bool enable_lax_mode = false); + + // Returns the path options specified by `json_path`. + JsonPathOptions GetJsonPathOptions() const { return json_path_options_; } private: - explicit StrictJSONPathIterator(std::vector tokens) - : JSONPathIterator(std::move(tokens)) {} + StrictJSONPathIterator(std::vector tokens, + JsonPathOptions json_path_options) + : JSONPathIterator(std::move(tokens)), + json_path_options_(std::move(json_path_options)) {} + + JsonPathOptions json_path_options_; }; // @@ -295,6 +332,7 @@ class JSONPathExtractor : public zetasql::JSONParser { // Stack Usage Invariant: !accept_ && match_ if (!accept_ && extend_match_) { stack_.pop(); + has_index_token_ = false; } MaintainInvariantMovingUp(); return !stop_on_first_match_; @@ -482,6 +520,15 @@ class JSONPathExtractor : public zetasql::JSONParser { // To report all matches remove this variable. bool stop_on_first_match_ = false; bool parsed_null_result_ = false; + // `has_index_token_` is set when: + // - The start of an array is seen, AND + // - Tokens match till curr_depth_-1 (i.e. `extend_match_`= true ), AND + // - The sub-tree is not accepted (i.e. `accept_` = false), AND + // - The current token given by `path_iterator` is a valid index. + // `has_index_token_` must be reset when: + // - The end of an array is seen, AND + // - Tokens match till curr_depth_-1 (i.e. `extend_match_`= true ), AND + // - The sub-tree is not accepted (i.e. `accept_` = false). bool has_index_token_ = false; unsigned int index_token_; // Whether to escape special JSON characters (e.g. newlines). diff --git a/zetasql/public/functions/json_test.cc b/zetasql/public/functions/json_test.cc index c157f5fc4..1ca00efeb 100644 --- a/zetasql/public/functions/json_test.cc +++ b/zetasql/public/functions/json_test.cc @@ -58,6 +58,7 @@ #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" +#include "absl/types/span.h" #include "zetasql/base/status_macros.h" namespace zetasql { @@ -66,6 +67,8 @@ namespace { using ::testing::ElementsAreArray; using ::testing::HasSubstr; +using ::testing::IsNan; +using ::testing::Optional; using ::zetasql_base::testing::IsOkAndHolds; using ::zetasql_base::testing::StatusIs; @@ -951,6 +954,88 @@ TEST(JsonPathExtractorTest, SimpleValidPath) { EXPECT_THAT(tokens, ElementsAreArray(gold)); } +TEST(JsonPathExtractorTest, ValidValueAccessEmptyPath) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr iptr, + ValidJSONPathIterator::Create("$", /*sql_standard_mode=*/true)); + EXPECT_EQ(iptr->Size(), 1); + EXPECT_EQ(iptr->GetToken(0), ""); +} + +TEST(JsonPathExtractorTest, ValidValueAccessObjectTokens) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr iptr, + ValidJSONPathIterator::Create("$.a.b.c", /*sql_standard_mode=*/true)); + EXPECT_EQ(iptr->Size(), 4); + EXPECT_EQ(iptr->GetToken(0), ""); + EXPECT_EQ(iptr->GetToken(1), "a"); + EXPECT_EQ(iptr->GetToken(2), "b"); + EXPECT_EQ(iptr->GetToken(3), "c"); +} + +TEST(JsonPathExtractorTest, ValidValueAccessArrayAndObjectTokens) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr iptr, + ValidJSONPathIterator::Create( + "$[0].a.b[1].c", /*sql_standard_mode=*/true)); + EXPECT_EQ(iptr->Size(), 6); + EXPECT_EQ(iptr->GetToken(0), ""); + EXPECT_EQ(iptr->GetToken(1), "0"); + EXPECT_EQ(iptr->GetToken(2), "a"); + EXPECT_EQ(iptr->GetToken(3), "b"); + EXPECT_EQ(iptr->GetToken(4), "1"); + EXPECT_EQ(iptr->GetToken(5), "c"); +} + +TEST(JsonPathExtractorTest, ValidValueAccessNonStandardSql) { + std::string path = "$['b.c'][d].e['f.g'][3]"; + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr iptr, + ValidJSONPathIterator::Create(path, /*sql_standard_mode=*/false)); + EXPECT_EQ(iptr->Size(), 6); + EXPECT_EQ(iptr->GetToken(0), ""); + EXPECT_EQ(iptr->GetToken(1), "b.c"); + EXPECT_EQ(iptr->GetToken(2), "d"); + EXPECT_EQ(iptr->GetToken(3), "e"); + EXPECT_EQ(iptr->GetToken(4), "f.g"); + EXPECT_EQ(iptr->GetToken(5), "3"); +} + +class JSONPathIteratorTest + : public ::testing::TestWithParam< + std::tuple> { + protected: + absl::string_view GetPath() const { return std::get<0>(GetParam()); } + bool IsSqlStandardMode() const { return std::get<1>(GetParam()); } +}; + +INSTANTIATE_TEST_SUITE_P( + JSONPathIteratorSQLModeAgnosticTests, JSONPathIteratorTest, + ::testing::Combine(::testing::Values("$", "$.a.b", "$[0][12]", "$[0].a", + "$.a.b.c.d.e.f.g.h.i.j"), + ::testing::Bool())); + +INSTANTIATE_TEST_SUITE_P( + JSONPathIteratorStandardSQLModeTests, JSONPathIteratorTest, + ::testing::Combine(::testing::Values(R"($.a."b.c".d.0.e)", R"($."a['b']")"), + ::testing::Values(true))); + +INSTANTIATE_TEST_SUITE_P( + JSONPathIteratorNonStandardSQLModeTests, JSONPathIteratorTest, + ::testing::Combine(::testing::Values(R"($.a['\'\'\\s '].g[1])", + "$['b.c'][d].e['f.g'][3]"), + ::testing::Values(false))); + +TEST_P(JSONPathIteratorTest, ValidValueAccess) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr iptr, + ValidJSONPathIterator::Create(GetPath(), + /*sql_standard_mode=*/IsSqlStandardMode())); + ValidJSONPathIterator& itr = *(iptr); + for (int i = 0; i < itr.Size(); ++i, ++itr) { + EXPECT_EQ(itr.GetToken(i), *itr); + } +} + TEST(JsonPathExtractorTest, BackAndForthIteration) { const char* const input = "$.a.b"; ZETASQL_ASSERT_OK_AND_ASSIGN( @@ -1208,30 +1293,202 @@ TEST(IsValidJSONPathTest, StrictBasicTests) { TEST(IsValidJSONPathTest, InvalidPathStrictTests) { // Invalid cases. - std::vector invalid_paths = {"$[0-]", - "$[0_]", - "$[-1]" - "[0]" - "$[a]", - "$['a']", - "$.a.b.c['efgh'].e", - "$.", - ".a", - "$[9223372036854775807990]"}; + std::vector invalid_paths = { + "$[0-]", "$[0_]", "$[-1]", "[0]", "$[a]", "$['a']", "$.a.b.c['efgh'].e", + "$.", ".a", "$[9223372036854775807990]", + // Lax mode not enabled. Doesn't support lax keywords. + "lax $", "lax $.a", "lax recursive $", "recursive lax $.a"}; for (const std::string& invalid_path : invalid_paths) { EXPECT_THAT(IsValidJSONPathStrict(invalid_path), StatusIs(absl::StatusCode::kOutOfRange)); EXPECT_FALSE(StrictJSONPathIterator::Create(invalid_path).ok()); } + + // Verify error message for compatibility. + EXPECT_THAT( + IsValidJSONPathStrict("a"), + StatusIs(absl::StatusCode::kOutOfRange, "JSONPath must start with '$'")); + EXPECT_THAT( + StrictJSONPathIterator::Create("a"), + StatusIs(absl::StatusCode::kOutOfRange, "JSONPath must start with '$'")); +} + +TEST(IsValidJSONPathTest, ValidLaxJSONPaths) { + // No keywords. + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("$", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("$.a", /*enable_lax_mode=*/true)); + // lax is part of the path not a keyword. + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("$.a lax", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("$.a.recursive", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("$.a recursive", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK( + IsValidJSONPathStrict("$.a lax.recursive", /*enable_lax_mode=*/true)); + // Includes keywords. + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("lax $", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("lax $.a", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK( + IsValidJSONPathStrict("lax recursive $", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("lax recursive $.a.lax", + /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK( + IsValidJSONPathStrict("recursive lax $", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK( + IsValidJSONPathStrict("recursive lax $.a", /*enable_lax_mode=*/true)); + // Mixed-case keywords. + ZETASQL_EXPECT_OK(IsValidJSONPathStrict("Recursive LAX $", /*enable_lax_mode=*/true)); + ZETASQL_EXPECT_OK( + IsValidJSONPathStrict("LAx Recursive $.a", /*enable_lax_mode=*/true)); +} + +TEST(IsValidJSONPathTest, InvalidLaxJSONPaths) { + // Invalid keyword combinations. + EXPECT_FALSE( + IsValidJSONPathStrict("lax lax $", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + IsValidJSONPathStrict("lax recursive lax $", /*enable_lax_mode=*/true) + .ok()); + // Case insensitive matching. + EXPECT_FALSE( + IsValidJSONPathStrict("lax recursive LAX $", /*enable_lax_mode=*/true) + .ok()); + EXPECT_FALSE( + IsValidJSONPathStrict("recursive $", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE(IsValidJSONPathStrict("recursive lax recursive $", + /*enable_lax_mode=*/true) + .ok()); + // Case insensitive matching. + EXPECT_FALSE(IsValidJSONPathStrict("recursive lax RECURSIVE $", + /*enable_lax_mode=*/true) + .ok()); + EXPECT_FALSE( + IsValidJSONPathStrict("invalid $.a", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + IsValidJSONPathStrict("lax invalid $.a", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + IsValidJSONPathStrict("invalid lax $", /*enable_lax_mode=*/true).ok()); + // Invalid whitespace. + EXPECT_FALSE(IsValidJSONPathStrict("lax$", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + IsValidJSONPathStrict("laxrecursive $", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + IsValidJSONPathStrict("recursivelax $.a", /*enable_lax_mode=*/true).ok()); + // Doesn't contain "$". + EXPECT_FALSE(IsValidJSONPathStrict("lax", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE(IsValidJSONPathStrict("lax a", /*enable_lax_mode=*/true).ok()); + // No whitespace allowed before first keyword. + EXPECT_FALSE(IsValidJSONPathStrict(" $", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE(IsValidJSONPathStrict(" $.a", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE(IsValidJSONPathStrict(" lax $", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + IsValidJSONPathStrict(" lax recursive $", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + IsValidJSONPathStrict(" recursive lax $.a", /*enable_lax_mode=*/true) + .ok()); + + // Verify error message for compatibility. + EXPECT_THAT(IsValidJSONPathStrict("a", /*enable_lax_mode=*/true), + StatusIs(absl::StatusCode::kOutOfRange, + "JSONPath must start with zero or more unique modifiers " + "followed by '$'")); +} + +TEST(JSONPathTest, StrictPathOptionsTests) { + auto verify_fn = [](absl::string_view path_str, bool lax, bool recursive) { + ZETASQL_ASSERT_OK_AND_ASSIGN(const std::unique_ptr path_itr, + StrictJSONPathIterator::Create(path_str, true)); + EXPECT_EQ(path_itr->GetJsonPathOptions().lax, lax); + EXPECT_EQ(path_itr->GetJsonPathOptions().recursive, recursive); + }; + verify_fn("$", /*lax=*/false, /*recursive=*/false); + verify_fn("$.a", /*lax=*/false, /*recursive=*/false); + verify_fn("lax $", /*lax=*/true, /*recursive=*/false); + verify_fn("lax $.a", /*lax=*/true, /*recursive=*/false); + verify_fn("lax recursive $", /*lax=*/true, /*recursive=*/true); + verify_fn("lax recursive $.a", /*lax=*/true, /*recursive=*/true); + verify_fn("recursive lax $", /*lax=*/true, /*recursive=*/true); + verify_fn("recursive lax $.a", /*lax=*/true, /*recursive=*/true); + // lax is part of the path not a keyword. + verify_fn("$.a lax", /*lax=*/false, /*recursive=*/false); + verify_fn("$.a recursive", /*lax=*/false, /*recursive=*/false); + verify_fn("$.a recursive.lax", /*lax=*/false, /*recursive=*/false); + // recursive is part of the path not a keyword. + verify_fn("lax $.recursive", /*lax=*/true, /*recursive=*/false); +} + +TEST(JSONPathTest, IsValidAndLaxJSONPath) { + EXPECT_THAT(IsValidAndLaxJSONPath("lax $"), IsOkAndHolds(true)); + EXPECT_THAT(IsValidAndLaxJSONPath("Recursive lax $.a"), IsOkAndHolds(true)); + EXPECT_THAT(IsValidAndLaxJSONPath("$"), IsOkAndHolds(false)); + EXPECT_THAT(IsValidAndLaxJSONPath("$.lax"), IsOkAndHolds(false)); + // Invalid JSONPaths. + EXPECT_FALSE(IsValidAndLaxJSONPath(" lax $").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("recursive $").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("invalid $").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("lax invalid $").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("invalid lax $").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("lax$").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("laxrecursive $").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("recursivelax $").ok()); + // Non-standard sql paths are invalid for StrictJSONPathIterator. + EXPECT_FALSE(IsValidAndLaxJSONPath("$.").ok()); + EXPECT_FALSE(IsValidAndLaxJSONPath("$$").ok()); +} + +TEST(JSONPathTest, InvalidLaxJSONPathIterator) { + // Invalid keyword combinations. + EXPECT_FALSE( + StrictJSONPathIterator::Create(" lax $", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + StrictJSONPathIterator::Create("lax a", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + StrictJSONPathIterator::Create("recursive $.a", /*enable_lax_mode=*/true) + .ok()); + EXPECT_FALSE( + StrictJSONPathIterator::Create("lax invalid $", /*enable_lax_mode=*/true) + .ok()); + EXPECT_FALSE( + StrictJSONPathIterator::Create("lax$", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + StrictJSONPathIterator::Create("laxrecursive $", /*enable_lax_mode=*/true) + .ok()); + EXPECT_FALSE(StrictJSONPathIterator::Create("recursivelax $.a", + /*enable_lax_mode=*/true) + .ok()); + // Non-standard sql path. + EXPECT_FALSE( + StrictJSONPathIterator::Create("$.", /*enable_lax_mode=*/true).ok()); + EXPECT_FALSE( + StrictJSONPathIterator::Create("$$", /*enable_lax_mode=*/true).ok()); + + // Verify error message for compatibility. + EXPECT_THAT(StrictJSONPathIterator::Create("a", /*enable_lax_mode=*/true), + StatusIs(absl::StatusCode::kOutOfRange, + "JSONPath must start with zero or more unique modifiers " + "followed by '$'")); } -TEST(JsonPathTest, StrictPathTests) { +class StrictJsonPathIteratorTest + : public ::testing::TestWithParam< + std::tuple> { + protected: + bool EnableLaxMode() const { return std::get<0>(GetParam()); } + absl::string_view PathPrefix() const { return std::get<1>(GetParam()); } +}; + +INSTANTIATE_TEST_SUITE_P( + StrictJsonPathIteratorTests, StrictJsonPathIteratorTest, + ::testing::Values(std::make_tuple(false, ""), std::make_tuple(true, "lax "), + std::make_tuple(true, "lax recursive "), + std::make_tuple(true, "Recursive LAX "))); + +TEST_P(StrictJsonPathIteratorTest, Valid) { // Test all functions for iterating through a JSON path using // StrictJSONPathIterator. { + std::string path = absl::StrCat(PathPrefix(), "$"); ZETASQL_ASSERT_OK_AND_ASSIGN(const std::unique_ptr path_itr, - StrictJSONPathIterator::Create("$")); + StrictJSONPathIterator::Create(path, EnableLaxMode())); EXPECT_EQ((**path_itr).MaybeGetArrayIndex(), nullptr); EXPECT_EQ((**path_itr).MaybeGetObjectKey(), nullptr); EXPECT_FALSE(++(*path_itr)); @@ -1243,8 +1500,9 @@ TEST(JsonPathTest, StrictPathTests) { EXPECT_FALSE(path_itr->End()); } { + std::string path = absl::StrCat(PathPrefix(), "$.1 "); ZETASQL_ASSERT_OK_AND_ASSIGN(const std::unique_ptr path_itr, - StrictJSONPathIterator::Create("$.1 ")); + StrictJSONPathIterator::Create(path, EnableLaxMode())); // Skip first token. EXPECT_TRUE(++(*path_itr)); EXPECT_EQ(*(**path_itr).MaybeGetObjectKey(), "1 "); @@ -1253,8 +1511,9 @@ TEST(JsonPathTest, StrictPathTests) { EXPECT_TRUE(path_itr->End()); } { + std::string path = absl::StrCat(PathPrefix(), "$[ 0 ]"); ZETASQL_ASSERT_OK_AND_ASSIGN(const std::unique_ptr path_itr, - StrictJSONPathIterator::Create("$[ 0 ]")); + StrictJSONPathIterator::Create(path, EnableLaxMode())); // Skip first token. EXPECT_TRUE(++(*path_itr)); EXPECT_EQ(*(**path_itr).MaybeGetArrayIndex(), 0); @@ -1263,8 +1522,9 @@ TEST(JsonPathTest, StrictPathTests) { EXPECT_TRUE(path_itr->End()); } { + std::string path = absl::StrCat(PathPrefix(), R"($."a")"); ZETASQL_ASSERT_OK_AND_ASSIGN(const std::unique_ptr path_itr, - StrictJSONPathIterator::Create(R"($."a")")); + StrictJSONPathIterator::Create(path, EnableLaxMode())); // Skip first token. EXPECT_TRUE(++(*path_itr)); EXPECT_EQ(*(**path_itr).MaybeGetObjectKey(), "a"); @@ -1274,8 +1534,9 @@ TEST(JsonPathTest, StrictPathTests) { } { // Path escaping. + std::string path = absl::StrCat(PathPrefix(), R"($."a\"b")"); ZETASQL_ASSERT_OK_AND_ASSIGN(const std::unique_ptr path_itr, - StrictJSONPathIterator::Create(R"($."a\"b")")); + StrictJSONPathIterator::Create(path, EnableLaxMode())); EXPECT_TRUE(++(*path_itr)); EXPECT_EQ(*(**path_itr).MaybeGetObjectKey(), R"(a\"b)"); EXPECT_TRUE(path_itr->NoSuffixToken()); @@ -1284,8 +1545,9 @@ TEST(JsonPathTest, StrictPathTests) { } { // Test iterating and rewind. + std::string path = absl::StrCat(PathPrefix(), "$.\"b.c.d\"[1].e"); ZETASQL_ASSERT_OK_AND_ASSIGN(const std::unique_ptr path_itr, - StrictJSONPathIterator::Create("$.\"b.c.d\"[1].e")); + StrictJSONPathIterator::Create(path, EnableLaxMode())); // Skip first token. EXPECT_TRUE(++(*path_itr)); EXPECT_EQ(*(**path_itr).MaybeGetObjectKey(), "b.c.d"); @@ -1785,6 +2047,106 @@ TEST(JSONPathExtractorTest, TestReuseOfPathIterator) { } } +template +void ExpectJsonExtractionYieldsNull(absl::string_view json_str, + absl::string_view json_path, + bool sql_standard_mode) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr path_itr, + ValidJSONPathIterator::Create(json_path, sql_standard_mode)); + JsonExtractor extractor(json_str, path_itr.get()); + bool is_null = false; + if constexpr (std::is_same_v || + std::is_same_v) { + std::string result; + EXPECT_TRUE(extractor.Extract(&result, &is_null)); + EXPECT_TRUE(is_null); + } else if constexpr (std::is_same_v) { + std::vector result; + EXPECT_TRUE(extractor.ExtractArray(&result, &is_null)); + } else { + std::vector> result; + EXPECT_TRUE(extractor.ExtractStringArray(&result, &is_null)); + } + EXPECT_TRUE(is_null); +} + +class JsonPathExtractionTest : public ::testing::TestWithParam { + protected: + bool IsSqlStandardMode() const { return GetParam(); } +}; + +INSTANTIATE_TEST_SUITE_P(JSONPathExtractionSQLModeAgnosticTests, + JsonPathExtractionTest, + ::testing::Values(false, true)); + +// Test case for b/326281185, b/326974631. +TEST_P(JsonPathExtractionTest, TestIndexTokenCorruption) { + // clang-format off + std::string json_str = R"( + { + "a" : [ + { + "b" : [ + { + "c": 1, + "d": 2 + } + ] + }, + { + "b" : [ + { + "c" : 3, + "d" : 4 + }, + { + "c" : 5, + "d" : 6 + } + ] + } + ] + } + )"; + std::string json_str_for_arrays = R"( + { + "a" : [ + { + "b" : [ + { + "c": [1,2], + "d": [3,4] + } + ] + }, + { + "b" : [ + { + "c" : [5,6], + "d" : [7,8] + }, + { + "c" : [9,10], + "d" : [10,11] + } + ] + } + ] + } + )"; + // clang-format on + std::string json_path = "$.a[0].b[1].c"; + ExpectJsonExtractionYieldsNull(json_str, json_path, + IsSqlStandardMode()); + ExpectJsonExtractionYieldsNull(json_str, json_path, + IsSqlStandardMode()); + ExpectJsonExtractionYieldsNull( + json_str_for_arrays, json_path, IsSqlStandardMode()); + ExpectJsonExtractionYieldsNull( + json_str_for_arrays, json_path, IsSqlStandardMode()); +} + TEST(JSONPathArrayExtractorTest, BasicParsing) { std::string input = "[ {\"l00\" : { \"l01\" : \"a10\", \"l11\" : \"test\" }}, {\"l10\" : { " @@ -2372,7 +2734,7 @@ void ExtractArrayOrStringArray( } template -void ComplianceJSONExtractArrayTest(const std::vector& tests, +void ComplianceJSONExtractArrayTest(absl::Span tests, bool sql_standard_mode) { for (const FunctionTestCall& test : tests) { if (test.params.params()[0].is_null() || @@ -2642,16 +3004,31 @@ TEST(JsonPathEvaluatorTest, DeeplyNestedObjectCausesFailure) { TEST(JsonConversionTest, ConvertJsonToInt64) { std::vector>> inputs_and_expected_outputs; + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), 1); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), -1); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); inputs_and_expected_outputs.emplace_back(JSONValue(10.0), 10); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); inputs_and_expected_outputs.emplace_back( JSONValue(std::numeric_limits::min()), std::numeric_limits::min()); inputs_and_expected_outputs.emplace_back( JSONValue(std::numeric_limits::max()), std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); // Other types should return an error + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(1e100), std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(1.5), std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); @@ -2678,6 +3055,149 @@ TEST(JsonConversionTest, ConvertJsonToInt64) { } } +TEST(JsonConversionTest, ConvertJsonToInt32) { + std::vector>> + inputs_and_expected_outputs; + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), -1); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(10.0), 10); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), std::nullopt); + // Other types should return an error + inputs_and_expected_outputs.emplace_back(JSONValue(1e100), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(1.5), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"([10, 20])").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("null").value(), std::nullopt); + + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("INT32('$0')", input.GetConstRef().ToString())); + + absl::StatusOr output = ConvertJsonToInt32(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToUint64) { + std::vector>> + inputs_and_expected_outputs; + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(10.0), 10); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + // Other types should return an error + inputs_and_expected_outputs.emplace_back(JSONValue(1e100), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(1.5), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"([10, 20])").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("null").value(), std::nullopt); + + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("UINT64('$0')", input.GetConstRef().ToString())); + + absl::StatusOr output = ConvertJsonToUint64(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToUint32) { + std::vector>> + inputs_and_expected_outputs; + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(10.0), 10); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), std::nullopt); + // Other types should return an error + inputs_and_expected_outputs.emplace_back(JSONValue(1e100), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(1.5), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"([10, 20])").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("null").value(), std::nullopt); + + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("UINT32('$0')", input.GetConstRef().ToString())); + + absl::StatusOr output = ConvertJsonToUint32(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + TEST(JsonConversionTest, ConvertJsonToBool) { std::vector>> inputs_and_expected_outputs; @@ -2685,6 +3205,8 @@ TEST(JsonConversionTest, ConvertJsonToBool) { inputs_and_expected_outputs.emplace_back(JSONValue(true), true); // Other types should return an error inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), + std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), std::nullopt); inputs_and_expected_outputs.emplace_back( @@ -2720,6 +3242,8 @@ TEST(JsonConversionTest, ConvertJsonToString) { "12¿©?Æ"); // Other types should return an error inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), + std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); inputs_and_expected_outputs.emplace_back( JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); @@ -2746,12 +3270,6 @@ TEST(JsonConversionTest, ConvertJsonToDouble) { // Behavior the same when wide_number_mode is "round" and "exact" inputs_and_expected_outputs.emplace_back(JSONValue(1.0), 1.0); inputs_and_expected_outputs.emplace_back(JSONValue(-1.0), -1.0); - inputs_and_expected_outputs.emplace_back( - JSONValue(std::numeric_limits::min()), - std::numeric_limits::min()); - inputs_and_expected_outputs.emplace_back( - JSONValue(std::numeric_limits::max()), - std::numeric_limits::max()); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), double{1}); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), double{-1}); inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), double{1}); @@ -2759,6 +3277,25 @@ TEST(JsonConversionTest, ConvertJsonToDouble) { JSONValue(int64_t{-9007199254740992}), double{-9007199254740992}); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{9007199254740992}), double{9007199254740992}); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::lowest()), + std::numeric_limits::lowest()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::lowest()), + std::numeric_limits::lowest()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), + std::numeric_limits::max()); + // Other types should return an error inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), @@ -2790,7 +3327,7 @@ TEST(JsonConversionTest, ConvertJsonToDouble) { } } -TEST(JsonConversionTest, ConvertJsonToDoubleFailInExactOnly) { +TEST(JsonConversionTest, ConvertJsonToDoubleFailInExactMode) { std::vector> inputs_and_expected_outputs; // Number too large to round trip inputs_and_expected_outputs.emplace_back( @@ -2803,16 +3340,18 @@ TEST(JsonConversionTest, ConvertJsonToDoubleFailInExactOnly) { for (const auto& [input, expected_output] : inputs_and_expected_outputs) { SCOPED_TRACE(absl::Substitute("DOUBLE('$0', 'round')", input.GetConstRef().ToString())); - absl::StatusOr output = ConvertJsonToDouble( - input.GetConstRef(), WideNumberMode::kRound, PRODUCT_INTERNAL); - - EXPECT_TRUE(output.ok()); - EXPECT_EQ(*output, expected_output); + EXPECT_THAT(ConvertJsonToDouble(input.GetConstRef(), WideNumberMode::kRound, + PRODUCT_INTERNAL), + IsOkAndHolds(expected_output)); SCOPED_TRACE(absl::Substitute("DOUBLE('$0', 'exact')", input.GetConstRef().ToString())); - output = ConvertJsonToDouble(input.GetConstRef(), WideNumberMode::kExact, - PRODUCT_INTERNAL); - EXPECT_FALSE(output.ok()); + EXPECT_THAT( + ConvertJsonToDouble(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_INTERNAL), + StatusIs( + absl::StatusCode::kOutOfRange, + HasSubstr( + "cannot be converted to DOUBLE without loss of precision"))); } } @@ -2821,42 +3360,170 @@ TEST(JsonConversionTest, ConvertJsonToDoubleErrorMessage) { // Internal mode uses DOUBLE in error message SCOPED_TRACE(absl::Substitute("DOUBLE('$0', 'exact')", input.GetConstRef().ToString())); - absl::StatusOr output = ConvertJsonToDouble( - input.GetConstRef(), WideNumberMode::kExact, PRODUCT_INTERNAL); - EXPECT_FALSE(output.ok()); - EXPECT_EQ(output.status().message(), - "JSON number: 18446744073709551615 cannot be converted to DOUBLE " - "without loss of precision"); + EXPECT_THAT( + ConvertJsonToDouble(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_INTERNAL), + StatusIs(absl::StatusCode::kOutOfRange, + HasSubstr("JSON number: 18446744073709551615 cannot be " + "converted to DOUBLE without loss of precision"))); // External mode uses FLOAT64 in error message SCOPED_TRACE(absl::Substitute("FLOAT64('$0', 'exact')", input.GetConstRef().ToString())); - output = ConvertJsonToDouble(input.GetConstRef(), WideNumberMode::kExact, - PRODUCT_EXTERNAL); - EXPECT_FALSE(output.ok()); - EXPECT_EQ(output.status().message(), - "JSON number: 18446744073709551615 cannot be converted to FLOAT64 " - "without loss of precision"); + EXPECT_THAT( + ConvertJsonToDouble(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_EXTERNAL), + StatusIs(absl::StatusCode::kOutOfRange, + HasSubstr("JSON number: 18446744073709551615 cannot be " + "converted to FLOAT64 without loss of precision"))); } -TEST(JsonConversionTest, GetJsonType) { - std::vector>> +TEST(JsonConversionTest, ConvertJsonToFloat) { + std::vector>> inputs_and_expected_outputs; - inputs_and_expected_outputs.emplace_back(JSONValue(2.0), "number"); - inputs_and_expected_outputs.emplace_back(JSONValue(-1.0), "number"); - inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), "number"); - inputs_and_expected_outputs.emplace_back(JSONValue(true), "boolean"); + // Behavior the same when wide_number_mode is "round" and "exact" + inputs_and_expected_outputs.emplace_back(JSONValue(1.0), 1.0f); + inputs_and_expected_outputs.emplace_back(JSONValue(-1.0), -1.0f); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), 1.0f); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), -1.0f); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1.0f); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-16777216}), + float{-16777216}); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{16777216}), + float{16777216}); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::lowest()), + std::numeric_limits::lowest()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::lowest()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); + // Other types should return an error + inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), - "string"); + std::nullopt); inputs_and_expected_outputs.emplace_back( - JSONValue::ParseJSONString(R"({"a": 1})").value(), "object"); + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); inputs_and_expected_outputs.emplace_back( - JSONValue::ParseJSONString(R"([10, 20])").value(), "array"); + JSONValue::ParseJSONString(R"([10, 20])").value(), std::nullopt); inputs_and_expected_outputs.emplace_back( - JSONValue::ParseJSONString("null").value(), "null"); + JSONValue::ParseJSONString("null").value(), std::nullopt); + + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + { + SCOPED_TRACE(absl::Substitute("FLOAT('$0', 'exact')", + input.GetConstRef().ToString())); + absl::StatusOr output = ConvertJsonToFloat( + input.GetConstRef(), WideNumberMode::kExact, PRODUCT_INTERNAL); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } + { + SCOPED_TRACE(absl::Substitute("FLOAT('$0', 'round')", + input.GetConstRef().ToString())); + absl::StatusOr output = ConvertJsonToFloat( + input.GetConstRef(), WideNumberMode::kRound, PRODUCT_INTERNAL); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } + } +} + +TEST(JsonConversionTest, ConvertJsonToFloatFailInExactMode) { + std::vector> inputs_and_expected_outputs; + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), 0); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{16777217}), + float{16777216}); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-16777217}), + float{-16777216}); + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE(absl::Substitute("FLOAT('$0', 'round')", + input.GetConstRef().ToString())); + EXPECT_THAT(ConvertJsonToFloat(input.GetConstRef(), WideNumberMode::kRound, + PRODUCT_INTERNAL), + IsOkAndHolds(expected_output)); + SCOPED_TRACE(absl::Substitute("FLOAT('$0', 'exact')", + input.GetConstRef().ToString())); + EXPECT_THAT( + ConvertJsonToFloat(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_INTERNAL), + StatusIs( + absl::StatusCode::kOutOfRange, + HasSubstr( + "cannot be converted to FLOAT without loss of precision"))); + } +} + +TEST(JsonConversionTest, ConvertJsonToFloatErrorMessage) { + JSONValue input = JSONValue(uint64_t{16777217}); + // Internal mode uses FLOAT in error message + SCOPED_TRACE( + absl::Substitute("FLOAT('$0', 'exact')", input.GetConstRef().ToString())); + EXPECT_THAT( + ConvertJsonToFloat(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_INTERNAL), + StatusIs(absl::StatusCode::kOutOfRange, + HasSubstr("JSON number: 16777217 cannot be converted to FLOAT " + "without loss of precision"))); + + // External mode uses FLOAT32 in error message + SCOPED_TRACE(absl::Substitute("FLOAT64('$0', 'exact')", + input.GetConstRef().ToString())); + EXPECT_THAT( + ConvertJsonToFloat(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_EXTERNAL), + StatusIs(absl::StatusCode::kOutOfRange, + HasSubstr("JSON number: 16777217 cannot be converted to FLOAT32 " + "without loss of precision"))); +} + +TEST(JsonConversionTest, ConvertJsonToInt64Array) { + std::vector>>> + cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([1])", std::vector{1}); + cases.emplace_back(R"([-1])", std::vector{-1}); + cases.emplace_back(R"([10.0])", std::vector{10}); + cases.emplace_back(R"([1, -1, 10.0])", std::vector{1, -1, 10}); + cases.emplace_back(R"([18446744073709551615])", std::nullopt); + cases.emplace_back( + R"([4294967295])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"([-9223372036854775808])", + std::vector{std::numeric_limits::min()}); + cases.emplace_back(R"([9223372036854775807])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"([-2147483648])", + std::vector{std::numeric_limits::min()}); + cases.emplace_back(R"([2147483647])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"([true])", std::nullopt); + cases.emplace_back(R"([1.5])", std::nullopt); + cases.emplace_back(R"([[1]])", std::nullopt); + cases.emplace_back(R"({"a": 1})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); SCOPED_TRACE( - absl::Substitute("TYPE('$0')", input.GetConstRef().ToString())); - absl::StatusOr output = GetJsonType(input.GetConstRef()); + absl::Substitute("INT64_ARRAY('$0')", input.GetConstRef().ToString())); + absl::StatusOr> output = + ConvertJsonToInt64Array(input.GetConstRef()); EXPECT_EQ(output.ok(), expected_output.has_value()); if (output.ok() && expected_output.has_value()) { EXPECT_EQ(*output, *expected_output); @@ -2864,26 +3531,382 @@ TEST(JsonConversionTest, GetJsonType) { } } -TEST(JsonLaxConversionTest, Bool) { - std::vector>> - inputs_and_expected_outputs; - // Bools - inputs_and_expected_outputs.emplace_back(JSONValue(true), true); - inputs_and_expected_outputs.emplace_back(JSONValue(false), false); - // Strings - inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"true"}), - true); - inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"false"}), - false); +TEST(JsonConversionTest, ConvertJsonToInt32Array) { + std::vector>>> + cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([1])", std::vector{1}); + cases.emplace_back(R"([-1])", std::vector{-1}); + cases.emplace_back(R"([10.0])", std::vector{10}); + cases.emplace_back(R"([1, -1, 10.0])", std::vector{1, -1, 10}); + cases.emplace_back(R"([18446744073709551615])", std::nullopt); + cases.emplace_back(R"([4294967295])", std::nullopt); + cases.emplace_back(R"([-9223372036854775808])", std::nullopt); + cases.emplace_back(R"([9223372036854775807])", std::nullopt); + cases.emplace_back(R"([-2147483648])", + std::vector{std::numeric_limits::min()}); + cases.emplace_back(R"([2147483647])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"([true])", std::nullopt); + cases.emplace_back(R"([1.5])", std::nullopt); + cases.emplace_back(R"([[1]])", std::nullopt); + cases.emplace_back(R"({"a": 1})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE( + absl::Substitute("INT32_ARRAY('$0')", input.GetConstRef().ToString())); + absl::StatusOr> output = + ConvertJsonToInt32Array(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToUint64Array) { + std::vector>>> + cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([1])", std::vector{1}); + cases.emplace_back(R"([10.0])", std::vector{10}); + cases.emplace_back(R"([1, 10.0])", std::vector{1, 10}); + cases.emplace_back(R"([-1])", std::nullopt); + cases.emplace_back(R"([1, -1, 10.0])", std::nullopt); + cases.emplace_back( + R"([18446744073709551615])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back( + R"([4294967295])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"([-9223372036854775808])", std::nullopt); + cases.emplace_back( + R"([9223372036854775807])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"([-2147483648])", std::nullopt); + cases.emplace_back( + R"([2147483647])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"([true])", std::nullopt); + cases.emplace_back(R"([1.5])", std::nullopt); + cases.emplace_back(R"([[1]])", std::nullopt); + cases.emplace_back(R"({"a": 1})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE( + absl::Substitute("UINT64_ARRAY('$0')", input.GetConstRef().ToString())); + absl::StatusOr> output = + ConvertJsonToUint64Array(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToUint32Array) { + std::vector>>> + cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([1])", std::vector{1}); + cases.emplace_back(R"([10.0])", std::vector{10}); + cases.emplace_back(R"([1, 10.0])", std::vector{1, 10}); + cases.emplace_back(R"([-1])", std::nullopt); + cases.emplace_back(R"([1, -1, 10.0])", std::nullopt); + cases.emplace_back(R"([18446744073709551615])", std::nullopt); + cases.emplace_back( + R"([4294967295])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"([-9223372036854775808])", std::nullopt); + cases.emplace_back(R"([9223372036854775807])", std::nullopt); + cases.emplace_back(R"([-2147483648])", std::nullopt); + cases.emplace_back( + R"([2147483647])", + std::vector{std::numeric_limits::max()}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"([true])", std::nullopt); + cases.emplace_back(R"([1.5])", std::nullopt); + cases.emplace_back(R"([[1]])", std::nullopt); + cases.emplace_back(R"({"a": 1})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE( + absl::Substitute("UINT32_ARRAY('$0')", input.GetConstRef().ToString())); + absl::StatusOr> output = + ConvertJsonToUint32Array(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToBoolArray) { + std::vector>>> cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([false])", std::vector{false}); + cases.emplace_back(R"([true])", std::vector{true}); + cases.emplace_back(R"([false, true])", std::vector{false, true}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"([1])", std::nullopt); + cases.emplace_back(R"([[false]])", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE( + absl::Substitute("BOOL_ARRAY('$0')", input.GetConstRef().ToString())); + absl::StatusOr> output = + ConvertJsonToBoolArray(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToStringArray) { + std::vector>>> + cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([""])", std::vector{""}); + cases.emplace_back(R"(["a"])", std::vector{"a"}); + cases.emplace_back(R"(["", "a", "null"])", + std::vector{"", "a", "null"}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::vector{"null"}); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"("a")", std::nullopt); + cases.emplace_back(R"([1])", std::nullopt); + cases.emplace_back(R"([["a"]])", std::nullopt); + cases.emplace_back(R"({"a": "b"})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE( + absl::Substitute("STRING_ARRAY('$0')", input.GetConstRef().ToString())); + absl::StatusOr> output = + ConvertJsonToStringArray(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToDoubleArray) { + std::vector>>> cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([1.0])", std::vector{1.0}); + cases.emplace_back(R"([-1.0])", std::vector{-1.0}); + cases.emplace_back(R"([1.0, -1.0])", std::vector{1.0, -1.0}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("1.0")", std::nullopt); + cases.emplace_back(R"([false])", std::nullopt); + cases.emplace_back(R"([[1.0]])", std::nullopt); + cases.emplace_back(R"({"a": 1.0})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("DOUBLE_ARRAY('$0', 'exact')", + input.GetConstRef().ToString())); + absl::StatusOr> output = ConvertJsonToDoubleArray( + input.GetConstRef(), WideNumberMode::kExact, PRODUCT_INTERNAL); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + SCOPED_TRACE(absl::Substitute("DOUBLE_ARRAY('$0', 'round')", + input.GetConstRef().ToString())); + output = ConvertJsonToDoubleArray(input.GetConstRef(), + WideNumberMode::kRound, PRODUCT_INTERNAL); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToDoubleArrayFailInExactOnly) { + std::vector>> cases; + // Number too large to round trip + cases.emplace_back(R"([18446744073709551615])", + std::vector{1.8446744073709552e+19}); + // Number too small to round trip + cases.emplace_back(R"([-9007199254740993])", + std::vector{-9007199254740992}); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("DOUBLE_ARRAY('$0', 'round')", + input.GetConstRef().ToString())); + EXPECT_THAT( + ConvertJsonToDoubleArray(input.GetConstRef(), WideNumberMode::kRound, + PRODUCT_INTERNAL), + IsOkAndHolds(expected_output)); + SCOPED_TRACE(absl::Substitute("DOUBLE_ARRAY('$0', 'exact')", + input.GetConstRef().ToString())); + EXPECT_THAT( + ConvertJsonToDoubleArray(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_INTERNAL), + StatusIs( + absl::StatusCode::kOutOfRange, + HasSubstr( + R"(cannot be converted to DOUBLE without loss of precision)"))); + } +} + +TEST(JsonConversionTest, ConvertJsonToFloatArray) { + std::vector>>> cases; + cases.emplace_back(R"([])", std::vector{}); + cases.emplace_back(R"([1.0])", std::vector{1.0}); + cases.emplace_back(R"([-1.0])", std::vector{-1.0}); + cases.emplace_back(R"([1.0, -1.0])", std::vector{1.0, -1.0}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"([null])", std::nullopt); + cases.emplace_back(R"(["null"])", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("1.0")", std::nullopt); + cases.emplace_back(R"([false])", std::nullopt); + cases.emplace_back(R"([[1.0]])", std::nullopt); + cases.emplace_back(R"({"a": 1.0})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("FLOAT_ARRAY('$0', 'exact')", + input.GetConstRef().ToString())); + absl::StatusOr> output = ConvertJsonToFloatArray( + input.GetConstRef(), WideNumberMode::kExact, PRODUCT_INTERNAL); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + SCOPED_TRACE(absl::Substitute("FLOAT_ARRAY('$0', 'round')", + input.GetConstRef().ToString())); + output = ConvertJsonToFloatArray(input.GetConstRef(), + WideNumberMode::kRound, PRODUCT_INTERNAL); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonConversionTest, ConvertJsonToFloatArrayFailInExactOnly) { + std::vector>> cases; + // Number too large to round trip + cases.emplace_back(R"([16777217])", std::vector{16777216}); + // Number too small to round trip + cases.emplace_back(R"([-16777217])", std::vector{-16777216}); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("FLOAT_ARRAY('$0', 'round')", + input.GetConstRef().ToString())); + EXPECT_THAT( + ConvertJsonToFloatArray(input.GetConstRef(), WideNumberMode::kRound, + PRODUCT_INTERNAL), + IsOkAndHolds(expected_output)); + + SCOPED_TRACE(absl::Substitute("FLOAT_ARRAY('$0', 'exact')", + input.GetConstRef().ToString())); + + EXPECT_THAT( + ConvertJsonToFloatArray(input.GetConstRef(), WideNumberMode::kExact, + PRODUCT_INTERNAL), + StatusIs( + absl::StatusCode::kOutOfRange, + HasSubstr( + R"(cannot be converted to FLOAT without loss of precision)"))); + } +} + +TEST(JsonConversionTest, GetJsonType) { + std::vector>> + inputs_and_expected_outputs; + inputs_and_expected_outputs.emplace_back(JSONValue(2.0), "number"); + inputs_and_expected_outputs.emplace_back(JSONValue(-1.0), "number"); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{1}), "number"); + inputs_and_expected_outputs.emplace_back(JSONValue(true), "boolean"); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), + "string"); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), "object"); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"([10, 20])").value(), "array"); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("null").value(), "null"); + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("TYPE('$0')", input.GetConstRef().ToString())); + absl::StatusOr output = GetJsonType(input.GetConstRef()); + EXPECT_EQ(output.ok(), expected_output.has_value()); + if (output.ok() && expected_output.has_value()) { + EXPECT_EQ(*output, *expected_output); + } + } +} + +TEST(JsonLaxConversionTest, Bool) { + std::vector>> + inputs_and_expected_outputs; + // Bools + inputs_and_expected_outputs.emplace_back(JSONValue(true), true); + inputs_and_expected_outputs.emplace_back(JSONValue(false), false); + // Strings + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"true"}), + true); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"false"}), + false); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"TRue"}), true); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"FaLse"}), false); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + std::nullopt); // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), false); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{10}), true); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), true); inputs_and_expected_outputs.emplace_back( JSONValue(int64_t{std::numeric_limits::min()}), true); inputs_and_expected_outputs.emplace_back( @@ -2912,10 +3935,8 @@ TEST(JsonLaxConversionTest, Bool) { for (const auto& [input, expected_output] : inputs_and_expected_outputs) { SCOPED_TRACE( absl::Substitute("LAX_BOOL($0)", input.GetConstRef().ToString())); - absl::StatusOr> result = - LaxConvertJsonToBool(input.GetConstRef()); - ZETASQL_ASSERT_OK(result); - EXPECT_EQ(*result, expected_output); + EXPECT_THAT(LaxConvertJsonToBool(input.GetConstRef()), + IsOkAndHolds(expected_output)); } } @@ -2937,16 +3958,30 @@ TEST(JsonLaxConversionTest, Int64) { std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + std::nullopt); // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{10}), 10); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), -1); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); inputs_and_expected_outputs.emplace_back( - JSONValue(int64_t{std::numeric_limits::min()}), + JSONValue(int64_t{std::numeric_limits::min()}), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), std::numeric_limits::min()); inputs_and_expected_outputs.emplace_back( - JSONValue(int64_t{std::numeric_limits::max()}), + JSONValue(std::numeric_limits::max()), std::numeric_limits::max()); inputs_and_expected_outputs.emplace_back( - JSONValue(uint64_t{std::numeric_limits::max()}), std::nullopt); + JSONValue(uint64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1}), 1); inputs_and_expected_outputs.emplace_back(JSONValue(double{3.5}), 4); inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1e2}), 110); @@ -2969,14 +4004,215 @@ TEST(JsonLaxConversionTest, Int64) { for (const auto& [input, expected_output] : inputs_and_expected_outputs) { SCOPED_TRACE( absl::Substitute("LAX_INT64('$0')", input.GetConstRef().ToString())); - absl::StatusOr> result = - LaxConvertJsonToInt64(input.GetConstRef()); - ZETASQL_ASSERT_OK(result); - EXPECT_EQ(*result, expected_output); + EXPECT_THAT(LaxConvertJsonToInt64(input.GetConstRef()), + IsOkAndHolds(expected_output)); } } -TEST(JsonLaxConversionTest, Float) { +TEST(JsonLaxConversionTest, Int32) { + std::vector>> + inputs_and_expected_outputs; + // Bools + inputs_and_expected_outputs.emplace_back(JSONValue(true), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(false), 0); + // Strings + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), 10); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1"}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1e2"}), + 110); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"+1.5"}), 2); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::string{"123456789012345678.0"}), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1e100"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + std::nullopt); + // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{10}), 10); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), -1); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(double{3.5}), 4); + inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1e2}), 110); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{123456789012345678.0}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::min()}), 0); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::lowest()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::max()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("1e100").value(), std::nullopt); + // Object/Array/Null + inputs_and_expected_outputs.emplace_back(JSONValue(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("[1]").value(), std::nullopt); + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("LAX_INT32('$0')", input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToInt32(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonLaxConversionTest, Uint64) { + std::vector>> + inputs_and_expected_outputs; + // Bools + inputs_and_expected_outputs.emplace_back(JSONValue(true), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(false), 0); + // Strings + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), 10); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1"}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1e2"}), + 110); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"+1.5"}), 2); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::string{"123456789012345678.0"}), 123456789012345678); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1e100"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + std::nullopt); + // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{10}), 10); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(double{3.5}), 4); + inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1e2}), 110); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{123456789012345678.0}), 123456789012345680); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::min()}), 0); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::lowest()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::max()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("1e100").value(), std::nullopt); + // Object/Array/Null + inputs_and_expected_outputs.emplace_back(JSONValue(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("[1]").value(), std::nullopt); + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("LAX_UINT64('$0')", input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToUint64(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonLaxConversionTest, Uint32) { + std::vector>> + inputs_and_expected_outputs; + // Bools + inputs_and_expected_outputs.emplace_back(JSONValue(true), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(false), 0); + // Strings + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), 10); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1"}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1e2"}), + 110); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"+1.5"}), 2); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::string{"123456789012345678.0"}), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1e100"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + std::nullopt); + // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{0}), 0); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{10}), 10); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-1}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::min()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::numeric_limits::max()), std::nullopt); + + inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1}), 1); + inputs_and_expected_outputs.emplace_back(JSONValue(double{3.5}), 4); + inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1e2}), 110); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{123456789012345678.0}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::min()}), 0); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::lowest()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::max()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("1e100").value(), std::nullopt); + // Object/Array/Null + inputs_and_expected_outputs.emplace_back(JSONValue(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("[1]").value(), std::nullopt); + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("LAX_UINT32('$0')", input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToUint32(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonLaxConversionTest, Double) { std::vector>> inputs_and_expected_outputs; // Bools @@ -2994,10 +4230,13 @@ TEST(JsonLaxConversionTest, Float) { inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"+1.5"}), 1.5); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + std::nullopt); // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-10}), -10); inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{9007199254740993}), 9007199254740992); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); inputs_and_expected_outputs.emplace_back( JSONValue(int64_t{std::numeric_limits::min()}), static_cast(std::numeric_limits::min())); @@ -3031,28 +4270,100 @@ TEST(JsonLaxConversionTest, Float) { for (const auto& [input, expected_output] : inputs_and_expected_outputs) { SCOPED_TRACE( absl::Substitute("LAX_FLOAT64('$0')", input.GetConstRef().ToString())); - absl::StatusOr> result = - LaxConvertJsonToFloat64(input.GetConstRef()); - ZETASQL_ASSERT_OK(result); - EXPECT_EQ(*result, expected_output); + EXPECT_THAT(LaxConvertJsonToFloat64(input.GetConstRef()), + IsOkAndHolds(expected_output)); } // Special cases. - ZETASQL_ASSERT_OK_AND_ASSIGN( - std::optional result, - LaxConvertJsonToFloat64(JSONValue(std::string{"NaN"}).GetConstRef())); - ASSERT_TRUE(result.has_value()); - EXPECT_TRUE(std::isnan(*result)); - ZETASQL_ASSERT_OK_AND_ASSIGN( - result, - LaxConvertJsonToFloat64(JSONValue(std::string{"Inf"}).GetConstRef())); - ASSERT_TRUE(result.has_value()); - EXPECT_TRUE(std::isinf(*result)); - ZETASQL_ASSERT_OK_AND_ASSIGN(result, - LaxConvertJsonToFloat64( - JSONValue(std::string{"-InfiNiTY"}).GetConstRef())); - ASSERT_TRUE(result.has_value()); - EXPECT_TRUE(std::isinf(*result)); + EXPECT_THAT( + LaxConvertJsonToFloat64(JSONValue(std::string{"NaN"}).GetConstRef()), + IsOkAndHolds(Optional(IsNan()))); + EXPECT_THAT( + LaxConvertJsonToFloat64(JSONValue(std::string{"Inf"}).GetConstRef()), + IsOkAndHolds(std::numeric_limits::infinity())); + EXPECT_THAT(LaxConvertJsonToFloat64( + JSONValue(std::string{"-InfiNiTY"}).GetConstRef()), + IsOkAndHolds(-std::numeric_limits::infinity())); +} + +TEST(JsonLaxConversionTest, Float) { + std::vector>> + inputs_and_expected_outputs; + // Bools + inputs_and_expected_outputs.emplace_back(JSONValue(true), std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(false), std::nullopt); + // Strings + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), 10.0); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"-10"}), + -10.0); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1"}), 1.1); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"1.1e2"}), + 110.0); + inputs_and_expected_outputs.emplace_back( + JSONValue(std::string{"9007199254740993"}), 9007199254740992.0); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"+1.5"}), 1.5); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), + std::nullopt); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + std::nullopt); + // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-10}), -10); + inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{9007199254740993}), + 9007199254740992); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), 1); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::min()}), + static_cast(std::numeric_limits::min())); + inputs_and_expected_outputs.emplace_back( + JSONValue(int64_t{std::numeric_limits::max()}), + static_cast(std::numeric_limits::max())); + inputs_and_expected_outputs.emplace_back( + JSONValue(uint64_t{std::numeric_limits::max()}), + static_cast((std::numeric_limits::max()))); + inputs_and_expected_outputs.emplace_back(JSONValue(double{1.1}), 1.1); + inputs_and_expected_outputs.emplace_back(JSONValue(double{3.5}), 3.5); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("1.1e2").value(), 110); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::min()}), + std::numeric_limits::min()); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::lowest()}), + std::numeric_limits::lowest()); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::max()}), + std::numeric_limits::max()); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::min()}), 0.0); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::lowest()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue(double{std::numeric_limits::max()}), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("1e100").value(), std::nullopt); + // Object/Array/Null + inputs_and_expected_outputs.emplace_back(JSONValue(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString(R"({"a": 1})").value(), std::nullopt); + inputs_and_expected_outputs.emplace_back( + JSONValue::ParseJSONString("[1]").value(), std::nullopt); + for (const auto& [input, expected_output] : inputs_and_expected_outputs) { + SCOPED_TRACE( + absl::Substitute("LAX_FLOAT32('$0')", input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToFloat32(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } + + // Special cases. + EXPECT_THAT( + LaxConvertJsonToFloat32(JSONValue(std::string{"NaN"}).GetConstRef()), + IsOkAndHolds(Optional(IsNan()))); + EXPECT_THAT( + LaxConvertJsonToFloat32(JSONValue(std::string{"Inf"}).GetConstRef()), + IsOkAndHolds(std::numeric_limits::infinity())); + EXPECT_THAT(LaxConvertJsonToFloat32( + JSONValue(std::string{"-InfiNiTY"}).GetConstRef()), + IsOkAndHolds(-std::numeric_limits::infinity())); } TEST(JsonLaxConversionTest, String) { @@ -3065,8 +4376,11 @@ TEST(JsonLaxConversionTest, String) { inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"foo"}), "foo"); inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"10"}), "10"); + inputs_and_expected_outputs.emplace_back(JSONValue(std::string{"null"}), + "null"); // Numbers. Note that -inf, inf, and NaN are not valid JSON numeric values. inputs_and_expected_outputs.emplace_back(JSONValue(int64_t{-10}), "-10"); + inputs_and_expected_outputs.emplace_back(JSONValue(uint64_t{1}), "1"); inputs_and_expected_outputs.emplace_back( JSONValue(int64_t{std::numeric_limits::min()}), absl::StrCat(std::numeric_limits::min())); @@ -3098,10 +4412,471 @@ TEST(JsonLaxConversionTest, String) { for (const auto& [input, expected_output] : inputs_and_expected_outputs) { SCOPED_TRACE( absl::Substitute("LAX_STRING('$0')", input.GetConstRef().ToString())); - absl::StatusOr> result = - LaxConvertJsonToString(input.GetConstRef()); - ZETASQL_ASSERT_OK(result); - EXPECT_EQ(*result, expected_output); + EXPECT_THAT(LaxConvertJsonToString(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToBoolArray) { + std::vector< + std::pair>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", + std::vector>{false, true}); + cases.emplace_back( + R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{true, false, std::nullopt, std::nullopt, + std::nullopt}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt, std::nullopt, std::nullopt, + std::nullopt, std::nullopt, std::nullopt, std::nullopt}); + cases.emplace_back( + R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{false, false, false, true, true, true, + true, true, true}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{true, true, true, true}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back(R"([4294967295, 18446744073709551615, 1e100])", + std::vector>{true, true, true}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{true, true, true, true, true, true}); + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_BOOL_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToBoolArray(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToInt64Array) { + std::vector>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", + std::vector>{0, 1}); + cases.emplace_back(R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt, std::nullopt, + std::nullopt}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{0, 0, 0, 10, -10, 1, -1, 2, 110}); + cases.emplace_back( + R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{0, 0, 0, 10, -10, 1, -1, 2, 110}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{std::numeric_limits::min(), + std::numeric_limits::max(), + std::numeric_limits::min(), + std::numeric_limits::max()}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back( + R"([4294967295, 18446744073709551615, 1e100])", + std::vector>{std::numeric_limits::max(), + std::nullopt, std::nullopt}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{std::nullopt, 0, std::nullopt, + std::nullopt, 0, std::nullopt}); + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_INT64_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToInt64Array(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToInt32Array) { + std::vector>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", + std::vector>{0, 1}); + cases.emplace_back(R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt, std::nullopt, + std::nullopt}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{0, 0, 0, 10, -10, 1, -1, 2, 110}); + cases.emplace_back( + R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{0, 0, 0, 10, -10, 1, -1, 2, 110}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{std::numeric_limits::min(), + std::numeric_limits::max(), + std::nullopt, std::nullopt}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back(R"([4294967295, 18446744073709551615, 1e100])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{std::nullopt, 0, std::nullopt, + std::nullopt, 0, std::nullopt}); + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_INT32_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToInt32Array(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToUint64Array) { + std::vector>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", + std::vector>{0, 1}); + cases.emplace_back(R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt, std::nullopt, + std::nullopt}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{0, 0, 0, 10, std::nullopt, 1, + std::nullopt, 2, 110}); + cases.emplace_back(R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{ + 0, 0, 0, 10, std::nullopt, 1, std::nullopt, 2, 110}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{ + std::nullopt, std::numeric_limits::max(), std::nullopt, + std::numeric_limits::max()}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back(R"([4294967295, 18446744073709551615, 1e100])", + std::vector>{ + std::numeric_limits::max(), + std::numeric_limits::max(), std::nullopt}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{std::nullopt, 0, std::nullopt, + std::nullopt, 0, std::nullopt}); + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_UINT64_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToUint64Array(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToUint32Array) { + std::vector>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", + std::vector>{0, 1}); + cases.emplace_back(R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt, std::nullopt, + std::nullopt}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{0, 0, 0, 10, std::nullopt, 1, + std::nullopt, 2, 110}); + cases.emplace_back(R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{ + 0, 0, 0, 10, std::nullopt, 1, std::nullopt, 2, 110}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{std::nullopt, + std::numeric_limits::max(), + std::nullopt, std::nullopt}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back( + R"([4294967295, 18446744073709551615, 1e100])", + std::vector>{std::numeric_limits::max(), + std::nullopt, std::nullopt}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{std::nullopt, 0, std::nullopt, + std::nullopt, 0, std::nullopt}); + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_UINT32_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToUint32Array(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToFloat64Array) { + std::vector< + std::pair>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", std::vector>{ + std::nullopt, std::nullopt}); + cases.emplace_back(R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt, std::nullopt, + std::nullopt}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{0, 0, 0, 10, -10, 1.1, -1.1, 1.5, + 110}); + cases.emplace_back(R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{0, 0, 0, 10, -10, 1.1, + -1.1, 1.5, 110}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{std::numeric_limits::min(), + std::numeric_limits::max(), + std::numeric_limits::min(), + std::numeric_limits::max()}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back(R"([4294967295, 18446744073709551615, 1e100])", + std::vector>{ + std::numeric_limits::max(), + std::numeric_limits::max(), 1e100}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{-3.40282e+38, 1.17549e-38, 3.40282e+38, + -1.79769e+308, 2.22507e-308, + 1.79769e+308}); + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_FLOAT64_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToFloat64Array(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToFloat32Array) { + std::vector< + std::pair>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", std::vector>{ + std::nullopt, std::nullopt}); + cases.emplace_back(R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{ + std::nullopt, std::nullopt, std::nullopt, std::nullopt, + std::nullopt}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{0, 0, 0, 10, -10, 1.1, -1.1, 1.5, 110}); + cases.emplace_back( + R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{0, 0, 0, 10, -10, 1.1, -1.1, 1.5, 110}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{std::numeric_limits::min(), + std::numeric_limits::max(), + std::numeric_limits::min(), + std::numeric_limits::max()}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back(R"([4294967295, 18446744073709551615, 1e100])", + std::vector>{ + std::numeric_limits::max(), + std::numeric_limits::max(), std::nullopt}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{-3.40282e+38, 1.17549e-38, 3.40282e+38, + std::nullopt, 2.22507e-308, + std::nullopt}); + + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_FLOAT32_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToFloat32Array(input.GetConstRef()), + IsOkAndHolds(expected_output)); + } +} + +TEST(JsonConversionTest, LaxConvertJsonToStringArray) { + std::vector>>>> + cases; + cases.emplace_back(R"([])", std::vector>{}); + cases.emplace_back(R"([null])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([false, true])", + std::vector>{"false", "true"}); + cases.emplace_back(R"(["TRue", "FaLse", "foo", "null", ""])", + std::vector>{ + "TRue", "FaLse", "foo", "null", ""}); + cases.emplace_back( + R"(["0", "0.0", "-0.0", "10", "-10", "1.1", "-1.1", "+1.5", "1.1e2"])", + std::vector>{"0", "0.0", "-0.0", "10", "-10", + "1.1", "-1.1", "+1.5", "1.1e2"}); + cases.emplace_back( + R"([0, 0.0, -0.0, 10, -10, 1.1, -1.1, 1.5, 1.1e2])", + std::vector>{"0", "0", "0", "10", "-10", "1.1", + "-1.1", "1.5", "110"}); + // int32_t:min, int32_t:max, int64_t:min, int64_t:max + cases.emplace_back( + R"([-2147483648, 2147483647, -9223372036854775808, 9223372036854775807])", + std::vector>{"-2147483648", "2147483647", + "-9223372036854775808", + "9223372036854775807"}); + // uint32_t:max, uint64_t:max, extremely large number + cases.emplace_back(R"([4294967295, 18446744073709551615])", + std::vector>{ + "4294967295", "18446744073709551615"}); + // float:lowest, float:min, float:max, double:lowest, double:min, double:max + cases.emplace_back( + R"([-3.40282e+38, 1.17549e-38, 3.40282e+38, -1.79769e+308, 2.22507e-308, 1.79769e+308])", + std::vector>{"-3.40282e+38", "1.17549e-38", + "3.40282e+38", "-1.79769e+308", + "2.22507e-308", "1.79769e+308"}); + cases.emplace_back(R"([[false]])", + std::vector>{std::nullopt}); + cases.emplace_back(R"([{"a": false}])", + std::vector>{std::nullopt}); + cases.emplace_back(R"(null)", std::nullopt); + cases.emplace_back(R"(false)", std::nullopt); + cases.emplace_back(R"(true)", std::nullopt); + cases.emplace_back(R"(1)", std::nullopt); + cases.emplace_back(R"(1.0)", std::nullopt); + cases.emplace_back(R"("foo")", std::nullopt); + cases.emplace_back(R"({"a": false})", std::nullopt); + + for (const auto& [json_text, expected_output] : cases) { + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue input, + JSONValue::ParseJSONString(json_text)); + SCOPED_TRACE(absl::Substitute("LAX_STRING_ARRAY('$0')", + input.GetConstRef().ToString())); + EXPECT_THAT(LaxConvertJsonToStringArray(input.GetConstRef()), + IsOkAndHolds(expected_output)); } } @@ -3423,8 +5198,9 @@ TEST(JsonObjectBuilderTest, StrictNumberParsing) { } } -std::unique_ptr ParseJSONPath(absl::string_view path) { - return StrictJSONPathIterator::Create(path).value(); +std::unique_ptr ParseJSONPath(absl::string_view path, + bool enable_lax = false) { + return StrictJSONPathIterator::Create(path, enable_lax).value(); } TEST(JsonRemoveTest, InvalidJSONPath) { @@ -3665,8 +5441,8 @@ TEST(JsonInsertArrayTest, FailedConversionComesFirst) { ref, JsonEq(JSONValue::ParseJSONString(kInitialValue)->GetConstRef())); } { - // Insertion in null would have created an array but failed conversion comes - // first. + // Insertion in null would have created an array but failed conversion + // comes first. auto path_iter = ParseJSONPath("$[1][1]"); JSONValue value = JSONValue::ParseJSONString(kInitialValue).value(); JSONValueRef ref = value.GetRef(); @@ -3964,8 +5740,8 @@ TEST(JsonAppendArrayTest, FailedConversionComesFirst) { ref, JsonEq(JSONValue::ParseJSONString(kInitialValue)->GetConstRef())); } { - // Insertion in null would have created an array but failed conversion comes - // first. + // Insertion in null would have created an array but failed conversion + // comes first. auto path_iter = ParseJSONPath("$[1]"); JSONValue value = JSONValue::ParseJSONString(kInitialValue).value(); JSONValueRef ref = value.GetRef(); @@ -4099,8 +5875,8 @@ TEST(JsonSetTest, SingleJsonScalarCases) { JSONValueRef ref = json.GetRef(); std::unique_ptr path_iterator = ParseJSONPath(path); - ZETASQL_ASSERT_OK(JsonSet(ref, *path_iterator, value, /*create_if_missing=*/true, - LanguageOptions(), + ZETASQL_ASSERT_OK(JsonSet(ref, *path_iterator, value, + /*create_if_missing=*/true, LanguageOptions(), /*canonicalize_zero=*/true)); if (expected_output.has_value()) { EXPECT_THAT( @@ -4175,7 +5951,8 @@ TEST(JsonSetTest, ValidTopLevelEmptyArray) { // Set past the end of the array. test_fn(/*path=*/"$[2]", /*expected_output=*/"[null, null, 999]"); // Recursive creation of nested arrays. - test_fn(/*path=*/"$[2][1]", /*expected_output=*/"[null, null, [null, 999]]"); + test_fn(/*path=*/"$[2][1]", + /*expected_output=*/"[null, null, [null, 999]]"); // Recursive creation of nested arrays and objects. test_fn(/*path=*/"$[2][1].a", /*expected_output=*/R"([null, null, [null, {"a":999}]])"); @@ -4204,7 +5981,8 @@ TEST(JsonSetTest, ComplexTests) { // Insert object into null. test_fn( /*path=*/"$.a.b.c", - /*expected_output=*/R"({"a":{"b":{"c":999}}, "b":{}, "c":[], "d":{"e":1}, + /*expected_output=*/ + R"({"a":{"b":{"c":999}}, "b":{}, "c":[], "d":{"e":1}, "f":[2, [], {}, [3, 4]]})"); // Insert array into null. test_fn( @@ -4335,10 +6113,10 @@ TEST(JsonSetTest, FailedConversionComesFirst) { LanguageOptions options; options.EnableLanguageFeature(FEATURE_JSON_STRICT_NUMBER_PARSING); - ASSERT_THAT( - JsonSet(ref, *path_iter, big_value, /*create_if_missing=*/true, options, - /*canonicalize_zero=*/true), - StatusIs(absl::StatusCode::kOutOfRange)); + ASSERT_THAT(JsonSet(ref, *path_iter, big_value, + /*create_if_missing=*/true, options, + /*canonicalize_zero=*/true), + StatusIs(absl::StatusCode::kOutOfRange)); EXPECT_THAT( ref, JsonEq(JSONValue::ParseJSONString(kInitialValue)->GetConstRef())); } @@ -4599,6 +6377,166 @@ TEST(JsonStripNullsTest, AllNullsOrEmptyArray) { kInitialValue); } +TEST(JsonQueryLax, InvalidPathInput) { + std::unique_ptr path_iterator = + ParseJSONPath("$.a", /*enable_lax=*/true); + JSONValue value; + // The input is not a lax path. + EXPECT_FALSE(JsonQueryLax(value.GetRef(), *path_iterator).status().ok()); +} + +class JsonQueryLaxTest + : public ::testing::TestWithParam< + std::tuple> { + protected: + absl::string_view GetJSONDoc() const { return std::get<0>(GetParam()); } + absl::string_view GetPath() const { return std::get<1>(GetParam()); } + absl::string_view GetExpectedResult() const { + return std::get<2>(GetParam()); + } +}; + +INSTANTIATE_TEST_SUITE_P( + JsonQueryLaxSimpleTests, JsonQueryLaxTest, + ::testing::Values( + // We do not test order or capitalization of JSONPath keywords as + // these are already tested in StrictJSONPathIterator tests. + // + // Key doesn't exist + std::make_tuple(R"({"a":1})", "lax $.A", "[]"), + std::make_tuple(R"({"a":1})", "lax recursive $.A", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax $.b", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax recursive $.b", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax $.a.c", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax recursive $.a.c", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax $[1]", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax recursive $[1]", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax $[0].b", "[]"), + std::make_tuple(R"({"a":{"b":1}})", "lax recursive $[0].b", "[]"), + // NULL JSON input. + std::make_tuple("null", "lax $", "[null]"), + std::make_tuple("null", "lax recursive $", "[null]"), + std::make_tuple("null", "lax $.a", "[]"), + std::make_tuple("null", "lax recursive $.a", "[]"), + std::make_tuple("null", "lax $[0]", "[null]"), + std::make_tuple("null", "lax recursive $[0]", "[null]"), + std::make_tuple("null", "lax $[1]", "[]"), + std::make_tuple("null", "lax recursive $[1]", "[]"), + // Normal object key match with matching types. + std::make_tuple(R"({"a":null})", "lax $", R"([{"a":null}])"), + std::make_tuple(R"({"a":null})", "lax recursive $", R"([{"a":null}])"), + std::make_tuple(R"({"a":null})", "lax $.a", "[null]"), + std::make_tuple(R"({"a":null})", "lax recursive $.a", "[null]"), + std::make_tuple(R"({"a":1, "b":2})", "lax $.b", "[2]"), + std::make_tuple(R"({"a":1, "b":2})", "lax recursive $.b", "[2]"), + // Normal array index match with matching types. + std::make_tuple(R"([[null]])", "lax $", "[[[null]]]"), + std::make_tuple(R"([[null]])", "lax recursive $", "[[[null]]]"), + std::make_tuple(R"([[null]])", "lax $[0]", "[[null]]"), + std::make_tuple(R"([[null]])", "lax recursive $[0]", "[[null]]"), + // Index larger than array size. + std::make_tuple(R"([[null], 1])", "lax $[2]", "[]"), + std::make_tuple(R"([[null], 1])", "lax recursive $[2]", "[]"), + // Single level of array. + std::make_tuple(R"([{"a":1}])", "lax $.a", "[1]"), + std::make_tuple(R"([{"a":1}])", "lax recursive $.a", "[1]"), + // 2-level of arrays. + std::make_tuple(R"([[{"a":1}]])", "lax $.a", "[]"), + std::make_tuple(R"([{"a":1}])", "lax recursive $.a", "[1]"), + // Mix of 1,2,3 levels of arrays. + std::make_tuple( + R"([{"a":1}, {"b":2}, [[{"a":3}]],[[{"b":4}]],[[[{"a":5}]]]])", + "lax $.a", "[1]"), + std::make_tuple( + R"([{"a":1}, {"b":2}, [[{"a":3}]],[[{"b":4}]],[[[{"a":5}]]]])", + "lax recursive $.a", "[1,3,5]"), + std::make_tuple( + R"([{"a":1}, {"b":2}, [[{"a":3}]],[[{"b":4}]],[[[{"a":5}]]]])", + "lax $.b", "[2]"), + std::make_tuple( + R"([{"a":1}, {"b":2}, [[{"a":3}]],[[{"b":4}]],[[[{"a":5}]]]])", + "lax recursive $.b", "[2,4]"), + // Wrap non-array before match. + std::make_tuple(R"({"a":1})", "lax $[0].a", "[1]"), + std::make_tuple(R"({"a":1})", "lax recursive $[0].a", "[1]"), + std::make_tuple(R"({"a":[1]})", "lax $[0][0].a", "[[1]]"), + std::make_tuple(R"({"a":[1]})", "lax recursive $[0][0].a", "[[1]]"), + // Key 'b' doesn't exist in matched JSON subtree. + std::make_tuple(R"({"a":[1]})", "lax $[0][0].b", "[]"), + std::make_tuple(R"({"a":[1]})", "lax recursive $[0][0].b", "[]"), + // Wrap non-array before match. Index larger than array size. + std::make_tuple("1", "lax $[1]", "[]"), + std::make_tuple("1", "lax recursive $[1]", "[]"), + // Second index is larger than size of wrapped array. + std::make_tuple(R"({"a":1})", "lax $[0][1].a", "[]"), + std::make_tuple(R"({"a":1})", "lax recursive $[0][1].a", "[]"))); + +INSTANTIATE_TEST_SUITE_P( + JsonQueryLaxComplexTests, JsonQueryLaxTest, + ::testing::Values( + // 'b' is not included in every nested object. + std::make_tuple(R"({"a":[{"b":1}, {"c":2}, {"b":3}, [{"b":4}]]})", + "lax $.a.b", "[1,3]"), + std::make_tuple(R"({"a":[{"b":1}, {"c":2}, {"b":3}, [{"b":4}]]})", + "lax recursive $.a.b", "[1,3,4]"), + // 'c' is not included in every nested object. + std::make_tuple( + R"({"a":[{"b":1}, {"c":2}, {"b":3}, [{"b":4}], null, + [[{"c":5}]]]})", + "lax $.a.c", "[2]"), + std::make_tuple( + R"({"a":[{"b":1}, {"c":2}, {"b":3}, [{"b":4}], null, + [[{"c":5}]]]})", + "lax recursive $.a.c", "[2,5]"), + // 'a.b' has different levels of nestedness + std::make_tuple(R"({"a":[1, {"b":2}, [{"b":3}, null, 4, + {"b":[5]}]]})", + "lax $.a.b", "[2]"), + std::make_tuple(R"({"a":[1, {"b":2}, [{"b":3}, 4, null, {"b":[5]}]]})", + "lax recursive $.a.b", "[2, 3, [5]]"), + // Both 'a' and 'a.b' have different levels of nestedness. + std::make_tuple( + R"([{"a":[1, {"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "lax $.a.b", "[2]"), + std::make_tuple( + R"([{"a":[1, {"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "lax recursive $.a.b", "[2, 3, 4, 5]"), + // Specific array indices and different levels of nestedness with + // autowrap. + std::make_tuple( + R"([{"a":[{"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "lax $.a[0].b", "[2]"), + std::make_tuple( + R"([{"a":[{"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "lax recursive $.a[0].b", "[2, 5]"), + std::make_tuple( + R"([{"a":[{"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "lax $[1].a[0].b", "[5]"), + std::make_tuple( + R"([{"a":[{"b":2}, [{"b":3}, [{"b": 4}]]]}, + [{"a":{"b":5}}, null]])", + "lax recursive $[1].a[0][0][0].b", "[5]"))); + +TEST_P(JsonQueryLaxTest, Success) { + std::unique_ptr path_iterator = + ParseJSONPath(GetPath(), /*enable_lax=*/true); + // Increment the path iterator to ensure it is correctly reset during + // execution. + ++(*path_iterator); + JSONValue value = JSONValue::ParseJSONString(GetJSONDoc()).value(); + ZETASQL_ASSERT_OK_AND_ASSIGN(JSONValue result, + JsonQueryLax(value.GetRef(), *path_iterator)); + EXPECT_THAT( + result.GetConstRef(), + JsonEq(JSONValue::ParseJSONString(GetExpectedResult())->GetConstRef())); +} + } // namespace } // namespace json_internal diff --git a/zetasql/public/functions/parse_date_time_test.cc b/zetasql/public/functions/parse_date_time_test.cc index 4b12b4ac8..366be6fe3 100644 --- a/zetasql/public/functions/parse_date_time_test.cc +++ b/zetasql/public/functions/parse_date_time_test.cc @@ -41,6 +41,7 @@ #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "absl/strings/substitute.h" #include "absl/time/time.h" #include "zetasql/base/map_util.h" @@ -61,10 +62,10 @@ const FormatDateTimestampOptions kExpandQandJ = // the canonical form '2015-01-31 12:34:56.999999+0'). If // is empty then an error is expected and we validate that parsing the string // fails. This is a helper for TestParseSecondsSinceEpoch(). -static void TestParseStringToTimestamp(const std::string& format, - const std::string& timestamp_string, - const std::string& default_time_zone, - const std::string& expected_result) { +static void TestParseStringToTimestamp(absl::string_view format, + absl::string_view timestamp_string, + absl::string_view default_time_zone, + absl::string_view expected_result) { int64_t timestamp = std::numeric_limits::max(); if (expected_result.empty()) { // An error is expected. @@ -302,7 +303,7 @@ static void TestParseTime(const FunctionTestCall& testcase) { }; }; auto ParseTimeResultValidator = [](const Value& expected_result, - const std::string& actual_string) { + absl::string_view actual_string) { return expected_result.type_kind() == TYPE_TIME && expected_result.DebugString() == actual_string; }; @@ -335,7 +336,7 @@ static void TestParseDatetime(const FunctionTestCall& testcase) { }; }; auto ParseDatetimeResultValidator = [](const Value& expected_result, - const std::string& actual_string) { + absl::string_view actual_string) { return expected_result.type_kind() == TYPE_DATETIME && expected_result.DebugString() == actual_string; }; @@ -1041,9 +1042,11 @@ struct ParseDatetimeTest { ParseDatetimeTest(const std::string& format_in, const std::string& input_string_in, const absl::CivilDay result_date_in, - const std::string& expected_dow_in) - : format(format_in), input_string(input_string_in), - result_date(result_date_in), expected_dow(expected_dow_in) {} + absl::string_view expected_dow_in) + : format(format_in), + input_string(input_string_in), + result_date(result_date_in), + expected_dow(expected_dow_in) {} // Constructor for tests that only apply to DATETIME and TIMESTAMP types // (when including format elements that specify hour/minute/second/etc.). @@ -1058,10 +1061,12 @@ struct ParseDatetimeTest { // hour/minute/second elements). ParseDatetimeTest(const std::string& format_in, const std::string& input_string_in, - const std::string& error_substr_in, + absl::string_view error_substr_in, bool excludes_date_in = false) - : format(format_in), input_string(input_string_in), - error_substr(error_substr_in), excludes_date(excludes_date_in) {} + : format(format_in), + input_string(input_string_in), + error_substr(error_substr_in), + excludes_date(excludes_date_in) {} std::string DebugString() const { return absl::StrCat( @@ -1096,7 +1101,7 @@ void CheckParseTimestampResultImpl(const ParseDatetimeTest& test, const absl::Status& parse_status, std::optional parsed_date, std::optional parsed_datetime, - const std::string& test_type) { + absl::string_view test_type) { if (test.result_date.has_value()) { ZETASQL_EXPECT_OK(parse_status) << " test: " << test.DebugString(); @@ -1175,7 +1180,7 @@ void CheckParseTimestampResult(const ParseDatetimeTest& test, void CheckParseTimestampResult(const ParseDatetimeTest& test, const absl::Status& parse_status, std::optional parsed_datetime, - const std::string& test_type) { + absl::string_view test_type) { CheckParseTimestampResultImpl(test, parse_status, std::nullopt, parsed_datetime, test_type); } diff --git a/zetasql/public/functions/range.cc b/zetasql/public/functions/range.cc index 3f54769d8..f64f680f2 100644 --- a/zetasql/public/functions/range.cc +++ b/zetasql/public/functions/range.cc @@ -16,6 +16,8 @@ #include "zetasql/public/functions/range.h" +#include +#include #include #include #include @@ -23,6 +25,8 @@ #include "zetasql/common/errors.h" #include "zetasql/public/functions/date_time_util.h" #include "zetasql/public/interval_value.h" +#include "zetasql/public/types/range_type.h" +#include "zetasql/public/value.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/match.h" @@ -48,6 +52,53 @@ std::optional UnboundedOrValue( return {boundary_value}; } +static Value RangeBoundaryAsDate(std::optional boundary) { + return boundary ? Value::Date(*boundary) : Value::NullDate(); +} + +static Value RangeBoundaryAsDatetimeFromPacked64Micros( + std::optional boundary) { + return boundary.has_value() ? Value::DatetimeFromPacked64Micros(*boundary) + : Value::NullDatetime(); +} + +static Value RangeBoundaryAsTimestampFromUnixMicros( + std::optional boundary) { + return boundary.has_value() ? Value::TimestampFromUnixMicros(*boundary) + : Value::NullTimestamp(); +} + +// Creates a Value object of type 'range_type' from given 'boundaries'. +static absl::StatusOr MakeRange(const RangeType* range_type, + RangeBoundaries boundaries) { + if (!(range_type->element_type()->kind() == TypeKind::TYPE_DATE)) { + return MakeEvalError() << "MakeRange is not implemented for " + << range_type->element_type()->kind(); + } + return Value::MakeRange(RangeBoundaryAsDate(boundaries.start), + RangeBoundaryAsDate(boundaries.end)); +} + +// Creates a Value object of type 'range_type' from given 'boundaries'. +absl::StatusOr MakeRange(const RangeType* range_type, + RangeBoundaries boundaries) { + switch (range_type->element_type()->kind()) { + case TypeKind::TYPE_DATETIME: { + return Value::MakeRange( + RangeBoundaryAsDatetimeFromPacked64Micros(boundaries.start), + RangeBoundaryAsDatetimeFromPacked64Micros(boundaries.end)); + } + case TypeKind::TYPE_TIMESTAMP: { + return Value::MakeRange( + RangeBoundaryAsTimestampFromUnixMicros(boundaries.start), + RangeBoundaryAsTimestampFromUnixMicros(boundaries.end)); + } + default: + return MakeEvalError() << "MakeRange is not implemented for " + << range_type->element_type()->kind(); + } +} + } // namespace absl::StatusOr ParseRangeBoundaries( @@ -105,6 +156,31 @@ absl::StatusOr ParseRangeBoundaries( .end = UnboundedOrValue(end)}; } +absl::StatusOr DeserializeRangeValueFromBytes( + const RangeType* range_type, absl::string_view bytes, size_t* bytes_read) { + switch (range_type->element_type()->kind()) { + case TypeKind::TYPE_DATE: { + ZETASQL_ASSIGN_OR_RETURN(RangeBoundaries boundaries, + DeserializeRangeFromBytes(bytes, bytes_read)); + return MakeRange(range_type, boundaries); + } + case TypeKind::TYPE_DATETIME: { + ZETASQL_ASSIGN_OR_RETURN(RangeBoundaries boundaries, + DeserializeRangeFromBytes(bytes, bytes_read)); + return MakeRange(range_type, boundaries); + } + case TypeKind::TYPE_TIMESTAMP: { + ZETASQL_ASSIGN_OR_RETURN(RangeBoundaries boundaries, + DeserializeRangeFromBytes(bytes, bytes_read)); + return MakeRange(range_type, boundaries); + } + default: + return MakeEvalError() + << "DeserializeRangeFromBytes is not implemented for " + << range_type->element_type()->kind(); + } +} + namespace functions { absl::StatusOr diff --git a/zetasql/public/functions/range.h b/zetasql/public/functions/range.h index ded06a1e6..0d1977bdf 100644 --- a/zetasql/public/functions/range.h +++ b/zetasql/public/functions/range.h @@ -28,6 +28,10 @@ #include "zetasql/public/civil_time.h" #include "zetasql/public/functions/date_time_util.h" #include "zetasql/public/interval_value.h" +#include "zetasql/public/type.pb.h" +#include "zetasql/public/types/range_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/value.h" #include "absl/base/optimization.h" #include "absl/status/status.h" #include "absl/status/statusor.h" @@ -322,6 +326,13 @@ template void SerializeRangeAndAppendToBytes(const RangeBoundaries& range_boundaries, std::string* bytes); +template +std::string SerializeRange(const RangeBoundaries& range_boundaries) { + std::string bytes; + SerializeRangeAndAppendToBytes(range_boundaries, &bytes); + return bytes; +} + // Deserializes RANGE from bytes and returns range boundaries. // The number of bytes would be written to "bytes_read" if it is not null // Size of "bytes" can be longer than the size of the encoded range value @@ -416,6 +427,10 @@ absl::StatusOr> DeserializeRangeFromBytes( return RangeBoundaries{start, end}; } +absl::StatusOr DeserializeRangeValueFromBytes( + const RangeType* range_type, absl::string_view bytes, + size_t* bytes_read = nullptr); + template absl::StatusOr GetEncodedRangeSize(absl::string_view bytes) { if (ABSL_PREDICT_FALSE(bytes.size() < sizeof(uint8_t))) { diff --git a/zetasql/public/functions/range_test.cc b/zetasql/public/functions/range_test.cc index 507320ac0..784e9ad0e 100644 --- a/zetasql/public/functions/range_test.cc +++ b/zetasql/public/functions/range_test.cc @@ -18,6 +18,7 @@ #include #include +#include #include #include #include @@ -26,6 +27,7 @@ #include "zetasql/base/testing/status_matchers.h" #include "zetasql/compliance/functions_testlib.h" +#include "zetasql/public/civil_time.h" #include "zetasql/public/functions/date_time_util.h" #include "zetasql/public/interval_value.h" #include "zetasql/public/options.pb.h" @@ -212,8 +214,7 @@ void TestDeserializeRangeTooFewBytes( if (test_case.serialized_size == 1) return; RangeBoundaries range_boundaries = {test_case.start, test_case.end}; - std::string serialized_range; - SerializeRangeAndAppendToBytes(range_boundaries, &serialized_range); + std::string serialized_range = SerializeRange(range_boundaries); std::string serialized_range_header = serialized_range.substr(0, 1); EXPECT_THAT(DeserializeRangeFromBytes(serialized_range_header), zetasql_base::testing::StatusIs( @@ -679,6 +680,156 @@ TEST(DatetimeRangeArrayGeneratorGenerateTest, EmitterReturnsError) { EXPECT_EQ(num_emitter_calls, 1); } +template +void TestRangeCreation(RangeBoundaries range_boundaries, + size_t serialized_size, const RangeType* range_type, + std::function)> boundary_ctor) { + std::string buffer = SerializeRange(range_boundaries); + size_t bytes_read; + ZETASQL_ASSERT_OK_AND_ASSIGN(Value range, DeserializeRangeValueFromBytes( + range_type, buffer, &bytes_read)); + ASSERT_EQ(bytes_read, serialized_size); + Value start = boundary_ctor(range_boundaries.start); + Value end = boundary_ctor(range_boundaries.end); + EXPECT_EQ(start, range.start()); + EXPECT_EQ(end, range.end()); +} + +template +struct RangeCreationTestCase { + RangeBoundaries range_boundaries; + size_t serialized_size; +}; + +std::vector> GetDateRangeCreationTestCases() { + int32_t start = 1; + int32_t end = 2; + return { + { + .range_boundaries = RangeBoundaries{{start}, {end}}, + .serialized_size = 9, + }, + { + .range_boundaries = RangeBoundaries{{start}, {}}, + .serialized_size = 5, + }, + { + .range_boundaries = RangeBoundaries{{}, {end}}, + .serialized_size = 5, + }, + { + .range_boundaries = RangeBoundaries{{}, {}}, + .serialized_size = 1, + }, + }; +} + +class DateRangeCreationTest + : public ::testing::TestWithParam> {}; + +INSTANTIATE_TEST_SUITE_P(DateRangeCreationTest, DateRangeCreationTest, + ::testing::ValuesIn(GetDateRangeCreationTestCases())); + +TEST_P(DateRangeCreationTest, SerializeDeserializeSucceeds) { + TestRangeCreation(GetParam().range_boundaries, + GetParam().serialized_size, types::DateRangeType(), + [](std::optional v) { + if (v.has_value()) { + return zetasql::values::Date(v.value()); + } else { + return zetasql::values::NullDate(); + } + }); +} + +std::vector> +GetDatetimeRangeCreationTestCases() { + int64_t start = DatetimeValue::FromYMDHMSAndNanos(1, 2, 3, 4, 5, 6, 7) + .Packed64DatetimeMicros(); + int64_t end = DatetimeValue::FromYMDHMSAndNanos(2, 2, 3, 4, 5, 6, 7) + .Packed64DatetimeMicros(); + return { + { + .range_boundaries = RangeBoundaries{{start}, {end}}, + .serialized_size = 17, + }, + { + .range_boundaries = RangeBoundaries{{start}, {}}, + .serialized_size = 9, + }, + { + .range_boundaries = RangeBoundaries{{}, {end}}, + .serialized_size = 9, + }, + { + .range_boundaries = RangeBoundaries{{}, {}}, + .serialized_size = 1, + }, + }; +} + +class DatetimeRangeCreationTest + : public ::testing::TestWithParam> {}; + +INSTANTIATE_TEST_SUITE_P( + DatetimeRangeCreationTest, DatetimeRangeCreationTest, + ::testing::ValuesIn(GetDatetimeRangeCreationTestCases())); + +TEST_P(DatetimeRangeCreationTest, SerializeDeserializeSucceeds) { + TestRangeCreation( + GetParam().range_boundaries, GetParam().serialized_size, + types::DatetimeRangeType(), [](std::optional v) { + if (v.has_value()) { + return Value::DatetimeFromPacked64Micros(v.value()); + } else { + return zetasql::values::NullDatetime(); + } + }); +} + +std::vector> +GetTimestampRangeCreationTestCases() { + int64_t start = 1; + int64_t end = 2; + return { + { + .range_boundaries = RangeBoundaries{{start}, {end}}, + .serialized_size = 17, + }, + { + .range_boundaries = RangeBoundaries{{start}, {}}, + .serialized_size = 9, + }, + { + .range_boundaries = RangeBoundaries{{}, {end}}, + .serialized_size = 9, + }, + { + .range_boundaries = RangeBoundaries{{}, {}}, + .serialized_size = 1, + }, + }; +} + +class TimestampRangeCreationTest + : public ::testing::TestWithParam> {}; + +INSTANTIATE_TEST_SUITE_P( + TimestampRangeCreationTest, TimestampRangeCreationTest, + ::testing::ValuesIn(GetTimestampRangeCreationTestCases())); + +TEST_P(TimestampRangeCreationTest, SerializeDeserializeSucceeds) { + TestRangeCreation( + GetParam().range_boundaries, GetParam().serialized_size, + types::TimestampRangeType(), [](std::optional v) { + if (v.has_value()) { + return Value::TimestampFromUnixMicros(v.value()); + } else { + return zetasql::values::NullTimestamp(); + } + }); +} + } // namespace } // namespace functions } // namespace zetasql diff --git a/zetasql/public/functions/regexp.cc b/zetasql/public/functions/regexp.cc index a5e7c1553..9e895af69 100644 --- a/zetasql/public/functions/regexp.cc +++ b/zetasql/public/functions/regexp.cc @@ -37,6 +37,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/substitute.h" #include "absl/types/optional.h" +#include "absl/types/span.h" #include "unicode/utf8.h" #include "zetasql/base/status.h" #include "zetasql/base/status_macros.h" @@ -371,7 +372,7 @@ bool RegExp::Replace(absl::string_view str, absl::string_view newsub, } bool RegExp::Rewrite(absl::string_view rewrite, - const std::vector& groups, + absl::Span groups, int32_t max_out_size, std::string* out, absl::Status* error) const { for (const char* s = rewrite.data(); s < rewrite.end(); ++s) { diff --git a/zetasql/public/functions/regexp.h b/zetasql/public/functions/regexp.h index 701b7af41..a24f68fd8 100644 --- a/zetasql/public/functions/regexp.h +++ b/zetasql/public/functions/regexp.h @@ -27,6 +27,7 @@ #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "re2/re2.h" namespace zetasql { @@ -254,9 +255,8 @@ class RegExp { // of logging it, and (2) enforces output string limit set by // SetMaxOutSize(). bool Rewrite(absl::string_view rewrite, - const std::vector& groups, - int32_t max_out_size, std::string* out, - absl::Status* error) const; + absl::Span groups, int32_t max_out_size, + std::string* out, absl::Status* error) const; // The compiled RE2 object. It is NULL if this has not been initialized yet. std::unique_ptr re_; diff --git a/zetasql/public/functions/string_format.cc b/zetasql/public/functions/string_format.cc index 952774b59..9b8bd66e2 100644 --- a/zetasql/public/functions/string_format.cc +++ b/zetasql/public/functions/string_format.cc @@ -45,6 +45,7 @@ #include "absl/strings/str_format.h" #include "absl/strings/string_view.h" #include "absl/time/time.h" +#include "absl/types/span.h" #include "unicode/utf8.h" #include "re2/re2.h" #include "zetasql/base/ret_check.h" @@ -155,7 +156,8 @@ bool StringFormatEvaluator::ValueAsString(const Value& value, // This adds .0 for integral values, which we want, but adds // CASTs on NaNs, etc, which we don't want. // We always use external typing for some reason. - cord_buffer_.Append(value.GetSQLLiteral(ProductMode::PRODUCT_EXTERNAL)); + cord_buffer_.Append(value.GetSQLLiteral(ProductMode::PRODUCT_EXTERNAL, + use_external_float32_)); } break; } @@ -193,8 +195,10 @@ bool StringFormatEvaluator::ValueAsString(const Value& value, } StringFormatEvaluator::StringFormatEvaluator(ProductMode product_mode, - bool canonicalize_zero) + bool canonicalize_zero, + bool use_external_float32) : product_mode_(product_mode), + use_external_float32_(use_external_float32), type_resolver_(nullptr), status_(), canonicalize_zero_(canonicalize_zero) {} @@ -292,8 +296,8 @@ absl::Status StringFormatEvaluator::Format(absl::Span values, } absl::Status StringFormatEvaluator::FormatString( - const std::vector& raw_parts, - const std::vector& format_parts, absl::Cord* out, + absl::Span raw_parts, + absl::Span format_parts, absl::Cord* out, bool* set_null) { ZETASQL_DCHECK_OK(status_); ABSL_DCHECK_GE(raw_parts.size(), format_parts.size()); @@ -414,7 +418,8 @@ bool StringFormatEvaluator::ValueLiteralSetter(const FormatPart& part, return false; } } - string_buffer_ = value_var->GetSQLLiteral(ProductMode::PRODUCT_EXTERNAL); + string_buffer_ = value_var->GetSQLLiteral(ProductMode::PRODUCT_EXTERNAL, + use_external_float32_); fmt_string_.view = string_buffer_; // Check for invalid UTF8. // This is necessary because while engines are required to validate utf-8, @@ -1673,7 +1678,8 @@ AbslFormatConvert(const FormatGsqlNumeric& value, absl::Status StringFormatUtf8(absl::string_view format_string, absl::Span values, ProductMode product_mode, std::string* output, - bool* is_null, bool canonicalize_zero) { + bool* is_null, bool canonicalize_zero, + bool use_external_float32) { bool maybe_need_proto_factory = false; std::vector types; for (const Value& value : values) { @@ -1694,8 +1700,8 @@ absl::Status StringFormatUtf8(absl::string_view format_string, if (maybe_need_proto_factory) { factory = std::make_unique(); } - string_format_internal::StringFormatEvaluator evaluator(product_mode, - canonicalize_zero); + string_format_internal::StringFormatEvaluator evaluator( + product_mode, canonicalize_zero, use_external_float32); ZETASQL_RETURN_IF_ERROR(evaluator.SetTypes(std::move(types), factory.get())); ZETASQL_RETURN_IF_ERROR(evaluator.SetPattern(format_string)); diff --git a/zetasql/public/functions/string_format.h b/zetasql/public/functions/string_format.h index c3c907385..6a3386665 100644 --- a/zetasql/public/functions/string_format.h +++ b/zetasql/public/functions/string_format.h @@ -119,7 +119,8 @@ struct FormatGsqlNumeric { class StringFormatEvaluator { public: explicit StringFormatEvaluator(ProductMode product_mode, - bool canonicalize_zero = false); + bool canonicalize_zero = false, + bool use_external_float32 = false); StringFormatEvaluator(const StringFormatEvaluator&) = delete; StringFormatEvaluator& operator=(const StringFormatEvaluator&) = delete; @@ -141,6 +142,7 @@ class StringFormatEvaluator { private: const ProductMode product_mode_; + const bool use_external_float32_; google::protobuf::DynamicMessageFactory* type_resolver_ = nullptr; std::string pattern_; @@ -200,8 +202,8 @@ class StringFormatEvaluator { int64_t provided_arg_count() { return arg_types_.size(); } // Bulk of the work. - absl::Status FormatString(const std::vector& raw_parts, - const std::vector& format_parts, + absl::Status FormatString(absl::Span raw_parts, + absl::Span format_parts, absl::Cord* out, bool* set_null); absl::Status TypeError(int64_t index, absl::string_view expected, @@ -325,10 +327,15 @@ class StringFormatEvaluator { // Shorthand for doing FORMAT in one call. `format_string` and STRING type // values are checked for invalid UTF-8 sequences and will result in an error. +// Setting the optional parameter `use_external_float32` to true will return +// FLOAT32 as the type name for TYPE_FLOAT. +// TODO: Remove `use_external_float32` once all engines are +// updated. absl::Status StringFormatUtf8(absl::string_view format_string, absl::Span values, ProductMode product_mode, std::string* output, - bool* is_null, bool canonicalize_zero = false); + bool* is_null, bool canonicalize_zero = false, + bool use_external_float32 = false); absl::Status CheckStringFormatUtf8ArgumentTypes(absl::string_view format_string, std::vector types, diff --git a/zetasql/public/functions/string_format_test.cc b/zetasql/public/functions/string_format_test.cc index 2ed9eb73f..394138d9d 100644 --- a/zetasql/public/functions/string_format_test.cc +++ b/zetasql/public/functions/string_format_test.cc @@ -28,6 +28,7 @@ #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" namespace zetasql { namespace functions { @@ -35,7 +36,7 @@ namespace functions { using ::testing::HasSubstr; using ::zetasql_base::testing::StatusIs; -void TestBadPattern(absl::string_view pattern, const std::vector& values, +void TestBadPattern(absl::string_view pattern, absl::Span values, bool canonicalize_zero) { std::string output; bool is_null; @@ -148,19 +149,6 @@ TEST_P(StringFormatTest, TestBadUtf8Values) { Value::Array(array_of_struct_type, {bad_struct_value}); TestBadValue("%t", bad_array_of_struct_value, canonicalize_zero); TestBadValue("%T", bad_array_of_struct_value, canonicalize_zero); - - zetasql_test__::KitchenSinkPB proto; - proto.set_string_val("abc\xc1xyz"); - - const ProtoType* proto_type; - ZETASQL_ASSERT_OK(type_factory.MakeProtoType(proto.GetDescriptor(), &proto_type)); - absl::Cord bytes; - ABSL_CHECK(proto.SerializePartialToCord(&bytes)); - const Value bad_proto_value = Value::Proto(proto_type, bytes); - TestBadValue("%p", bad_proto_value, canonicalize_zero); - TestBadValue("%P", bad_proto_value, canonicalize_zero); - TestBadValue("%t", bad_proto_value, canonicalize_zero); - TestBadValue("%T", bad_proto_value, canonicalize_zero); } TEST_P(StringFormatTest, TestBadJsonValue) { diff --git a/zetasql/public/id_string.h b/zetasql/public/id_string.h index ccbd0eb3e..efa0d0589 100644 --- a/zetasql/public/id_string.h +++ b/zetasql/public/id_string.h @@ -47,6 +47,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "absl/synchronization/mutex.h" +#include "absl/types/span.h" #include "zetasql/base/endian.h" namespace zetasql { @@ -374,8 +375,8 @@ struct IdStringEqualFunc { }; template -bool IdStringVectorHasPrefix(const std::vector& original_idstrings, - const std::vector& prefix_idstrings) { +bool IdStringVectorHasPrefix(absl::Span original_idstrings, + absl::Span prefix_idstrings) { if (prefix_idstrings.size() > original_idstrings.size()) { return false; } diff --git a/zetasql/public/id_string_test.cc b/zetasql/public/id_string_test.cc index 800224afa..dbb3f71f5 100644 --- a/zetasql/public/id_string_test.cc +++ b/zetasql/public/id_string_test.cc @@ -32,6 +32,8 @@ #include "absl/strings/ascii.h" #include "absl/strings/match.h" #include "absl/strings/str_join.h" +#include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" namespace zetasql { diff --git a/zetasql/public/input_argument_type.cc b/zetasql/public/input_argument_type.cc index e0d1ef8a6..5fe453bb9 100644 --- a/zetasql/public/input_argument_type.cc +++ b/zetasql/public/input_argument_type.cc @@ -27,6 +27,7 @@ #include "absl/memory/memory.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" namespace zetasql { @@ -183,7 +184,7 @@ std::string InputArgumentType::DebugString(bool verbose) const { // static std::string InputArgumentType::ArgumentsToString( - const std::vector& arguments, ProductMode product_mode, + absl::Span arguments, ProductMode product_mode, absl::Span argument_names) { constexpr int kMaxArgumentsStringLength = 1024; std::string arguments_string; diff --git a/zetasql/public/input_argument_type.h b/zetasql/public/input_argument_type.h index 981670974..ad9ae7514 100644 --- a/zetasql/public/input_argument_type.h +++ b/zetasql/public/input_argument_type.h @@ -31,6 +31,7 @@ #include "zetasql/public/value.h" #include "absl/container/flat_hash_set.h" #include "absl/types/optional.h" +#include "absl/types/span.h" namespace zetasql { @@ -163,7 +164,7 @@ class InputArgumentType { // TODO: Separate ArgumentsToString into a debug string function, // which is what many callers want, and a function for generating diagnostics. static std::string ArgumentsToString( - const std::vector& arguments, + absl::Span arguments, ProductMode product_mode = PRODUCT_INTERNAL, absl::Span argument_names = {}); diff --git a/zetasql/public/input_argument_type_test.cc b/zetasql/public/input_argument_type_test.cc index 77367dc65..2f3ba948d 100644 --- a/zetasql/public/input_argument_type_test.cc +++ b/zetasql/public/input_argument_type_test.cc @@ -29,6 +29,7 @@ #include "gtest/gtest.h" #include "absl/strings/cord.h" #include "absl/strings/str_join.h" +#include "absl/types/span.h" namespace zetasql { namespace { @@ -36,7 +37,7 @@ namespace { using ::testing::Optional; static std::string ArgumentDebugStrings( - const std::vector& arguments) { + absl::Span arguments) { std::vector argument_strings; for (const InputArgumentType& argument : arguments) { argument_strings.push_back(argument.DebugString(true /* verbose */)); diff --git a/zetasql/public/interval_value.cc b/zetasql/public/interval_value.cc index b6b2ec69e..8434e3f8a 100644 --- a/zetasql/public/interval_value.cc +++ b/zetasql/public/interval_value.cc @@ -593,6 +593,11 @@ absl::StatusOr IntervalValue::ParseFromString( return status; } return IntervalValue::FromMicros(value); + case functions::MILLISECOND: + ZETASQL_RETURN_IF_ERROR(internal::Multiply(kMicrosInMilli, value, &value)); + return IntervalValue::FromMicros(value); + case functions::MICROSECOND: + return IntervalValue::FromMicros(value); default: return ::zetasql_base::OutOfRangeErrorBuilder() << "Unsupported interval datetime field " @@ -1122,7 +1127,7 @@ absl::StatusOr IntervalValue::Parse(absl::string_view input, } absl::StatusOr IntervalValue::FromInteger( - int64_t value, functions::DateTimestampPart part) { + int64_t value, functions::DateTimestampPart part, bool allow_nanos) { switch (part) { case functions::YEAR: return IntervalValue::FromYMDHMS(value, 0, 0, 0, 0, 0); @@ -1136,6 +1141,20 @@ absl::StatusOr IntervalValue::FromInteger( return IntervalValue::FromYMDHMS(0, 0, 0, 0, value, 0); case functions::SECOND: return IntervalValue::FromYMDHMS(0, 0, 0, 0, 0, value); + case functions::MILLISECOND: + if (absl::Status status = + internal::Multiply(kMicrosInMilli, value, &value); + !status.ok()) { + return status; + } + return IntervalValue::FromMicros(value); + case functions::MICROSECOND: + return IntervalValue::FromMicros(value); + case functions::NANOSECOND: + if (!allow_nanos) { + break; + } + return IntervalValue::FromNanos(value); case functions::QUARTER: { if (absl::Status status = internal::Multiply( IntervalValue::kMonthsInQuarter, value, &value); @@ -1153,10 +1172,11 @@ absl::StatusOr IntervalValue::FromInteger( return IntervalValue::FromYMDHMS(0, 0, value, 0, 0, 0); } default: - return ::zetasql_base::OutOfRangeErrorBuilder() - << "Invalid interval datetime field " - << functions::DateTimestampPart_Name(part); + break; } + return ::zetasql_base::OutOfRangeErrorBuilder() + << "Invalid interval datetime field " + << functions::DateTimestampPart_Name(part); } absl::StatusOr IntervalValue::Extract( @@ -1280,4 +1300,10 @@ absl::StatusOr JustifyInterval(const IntervalValue& v) { return IntervalValue::FromMonthsDaysNanos(months, days, nanos); } +bool IdenticalIntervals(const IntervalValue& v1, const IntervalValue& v2) { + return v1.get_months() == v2.get_months() && v1.get_days() == v2.get_days() && + v1.get_micros() == v2.get_micros() && + v1.get_nano_fractions() == v2.get_nano_fractions(); +} + } // namespace zetasql diff --git a/zetasql/public/interval_value.h b/zetasql/public/interval_value.h index 927d93087..54a753a9a 100644 --- a/zetasql/public/interval_value.h +++ b/zetasql/public/interval_value.h @@ -423,8 +423,9 @@ class IntervalValue final { } // Interval constructor from integer for given datetime part field. + // If allow_nanos=false, an error is return when NANOSECOND part is provided. static absl::StatusOr FromInteger( - int64_t value, functions::DateTimestampPart part); + int64_t value, functions::DateTimestampPart part, bool allow_nanos); private: IntervalValue(int64_t months, int64_t days, int64_t micros = 0) { @@ -504,10 +505,17 @@ absl::StatusOr JustifyHours(const IntervalValue& v); // have the same sign. absl::StatusOr JustifyDays(const IntervalValue& v); -// Normalizes 24 hour time periods into full days, and after thatn 30 day time +// Normalizes 24 hour time periods into full days, and after than 30 day time // periods into full months. Adjusts all date parts to have the same sign. absl::StatusOr JustifyInterval(const IntervalValue& v); +// Checks if the two INTERVAL values are identical. +// This is different from the behavior of the equality operator, which treats +// some different INTERVAL values as equal (e.g. +// INTERVAL 1 MONTH == INTERVAL 30 DAY). This functions treats INTERVAL values +// as identical only when all their parts are equal. +bool IdenticalIntervals(const IntervalValue& v1, const IntervalValue& v2); + } // namespace zetasql #endif // ZETASQL_PUBLIC_INTERVAL_VALUE_H_ diff --git a/zetasql/public/interval_value_test.cc b/zetasql/public/interval_value_test.cc index 5be862547..3f35ca049 100644 --- a/zetasql/public/interval_value_test.cc +++ b/zetasql/public/interval_value_test.cc @@ -30,6 +30,7 @@ #include "absl/random/distributions.h" #include "absl/random/random.h" #include "absl/status/status.h" +#include "absl/strings/string_view.h" namespace zetasql { namespace { @@ -386,6 +387,7 @@ void ExpectLess(IntervalValue v1, IntervalValue v2) { EXPECT_NE(v2, v1); EXPECT_FALSE(v1 == v2); EXPECT_FALSE(v2 == v1); + EXPECT_FALSE(IdenticalIntervals(v1, v2)); } TEST(IntervalValueTest, Comparisons) { @@ -449,6 +451,90 @@ TEST(IntervalValueTest, Comparisons) { ExpectLess(MonthsDaysNanos(5, 15, 9999999), MonthsDaysNanos(5, 15, 10000000)); } +TEST(IntervalValueTest, IdenticalIntervals) { + // Zeros are identical. + EXPECT_TRUE(IdenticalIntervals(Micros(0), Nanos(0))); + EXPECT_TRUE(IdenticalIntervals(Seconds(0), Micros(0))); + EXPECT_TRUE(IdenticalIntervals(Minutes(0), Seconds(0))); + EXPECT_TRUE(IdenticalIntervals(Hours(0), Minutes(0))); + EXPECT_TRUE(IdenticalIntervals(Days(0), Hours(0))); + EXPECT_TRUE(IdenticalIntervals(Months(0), Days(0))); + EXPECT_TRUE(IdenticalIntervals(Years(0), Months(0))); + + // Exact parts + EXPECT_TRUE(IdenticalIntervals(Nanos(12345), Nanos(12345))); + EXPECT_TRUE(IdenticalIntervals(Nanos(-12345), Nanos(-12345))); + EXPECT_TRUE(IdenticalIntervals(Micros(12345), Micros(12345))); + EXPECT_TRUE(IdenticalIntervals(Micros(-12345), Micros(-12345))); + EXPECT_TRUE(IdenticalIntervals(Seconds(12345), Seconds(12345))); + EXPECT_TRUE(IdenticalIntervals(Seconds(-12345), Seconds(-12345))); + EXPECT_TRUE(IdenticalIntervals(Minutes(12345), Minutes(12345))); + EXPECT_TRUE(IdenticalIntervals(Minutes(-12345), Minutes(-12345))); + EXPECT_TRUE(IdenticalIntervals(Hours(12345), Hours(12345))); + EXPECT_TRUE(IdenticalIntervals(Hours(-12345), Hours(-12345))); + EXPECT_TRUE(IdenticalIntervals(Days(12345), Days(12345))); + EXPECT_TRUE(IdenticalIntervals(Days(-12345), Days(-12345))); + EXPECT_TRUE(IdenticalIntervals(Months(12345), Months(12345))); + EXPECT_TRUE(IdenticalIntervals(Months(-12345), Months(-12345))); + EXPECT_TRUE(IdenticalIntervals(Years(1234), Years(1234))); + EXPECT_TRUE(IdenticalIntervals(Years(-1234), Years(-1234))); + + // Identical within micros part + EXPECT_TRUE( + IdenticalIntervals(Micros(-1), Nanos(-1 * IntervalValue::kNanosInMicro))); + EXPECT_TRUE( + IdenticalIntervals(Micros(3), Nanos(3 * IntervalValue::kNanosInMicro))); + EXPECT_TRUE(IdenticalIntervals(Seconds(-5), + Micros(-5 * IntervalValue::kMicrosInSecond))); + EXPECT_TRUE(IdenticalIntervals(Seconds(7), + Micros(7 * IntervalValue::kMicrosInSecond))); + EXPECT_TRUE(IdenticalIntervals( + Minutes(-11), Seconds(-11 * IntervalValue::kSecondsInMinute))); + EXPECT_TRUE(IdenticalIntervals( + Minutes(13), Seconds(13 * IntervalValue::kSecondsInMinute))); + EXPECT_TRUE(IdenticalIntervals(Hours(-17), + Minutes(-17 * IntervalValue::kMinutesInHour))); + EXPECT_TRUE(IdenticalIntervals(Hours(-31), + Minutes(-31 * IntervalValue::kMinutesInHour))); + EXPECT_TRUE(IdenticalIntervals(Years(-37), + Months(-37 * IntervalValue::kMonthsInYear))); + EXPECT_TRUE( + IdenticalIntervals(Years(41), Months(41 * IntervalValue::kMonthsInYear))); + // Not identical when mixing micros and days parts + EXPECT_FALSE( + IdenticalIntervals(Days(-43), Hours(-43 * IntervalValue::kHoursInDay))); + EXPECT_FALSE( + IdenticalIntervals(Days(47), Hours(47 * IntervalValue::kHoursInDay))); + // Not identical when mixing days and months parts + EXPECT_FALSE( + IdenticalIntervals(Months(-53), Days(-53 * IntervalValue::kDaysInMonth))); + EXPECT_FALSE( + IdenticalIntervals(Months(59), Days(59 * IntervalValue::kDaysInMonth))); + // Not identical when mixing micros and months parts + EXPECT_FALSE(IdenticalIntervals(Months(-61), + Micros(-61 * IntervalValue::kMicrosInMonth))); + EXPECT_FALSE(IdenticalIntervals(Months(67), + Micros(67 * IntervalValue::kMicrosInMonth))); + // Mixed parts + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(1, 1, 0), Days(31))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(1, -1, 0), Days(29))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(-1, 1, 0), Days(-29))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(-1, -1, 0), Days(-31))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(0, 1, 10), + Micros(IntervalValue::kMicrosInDay + 10))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(-1, 30, 1), Micros(1))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(2, -61, 0), Days(-1))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(-3, 92, -10), + MonthsDaysMicros(0, 2, -10))); + + EXPECT_TRUE(IdenticalIntervals(MonthsDaysMicros(1, 2, 3), + MonthsDaysNanos(1, 2, 3000))); + EXPECT_TRUE(IdenticalIntervals(MonthsDaysMicros(1, 2, -3), + MonthsDaysNanos(1, 2, -3000))); + EXPECT_FALSE(IdenticalIntervals(MonthsDaysMicros(10, -301, 9), + MonthsDaysNanos(0, -1, 9000))); +} + TEST(IntervalValueTest, UnaryMinus) { EXPECT_EQ(Days(0), -Days(0)); EXPECT_EQ(Nanos(IntervalValue::kMaxNanos), -Nanos(-IntervalValue::kMaxNanos)); @@ -1344,11 +1430,54 @@ TEST(IntervalValueTest, ParseFromString1) { ExpectParseError("9223372036854775808", SECOND, /*allow_nanos=*/false); ExpectParseError("-9223372036854775809", SECOND, /*allow_nanos=*/false); + EXPECT_EQ("0-0 0 0:0:0.123", + ParseToString("123", MILLISECOND, /*allow_nanos=*/false)); + EXPECT_EQ("0-0 0 -0:0:0.123", + ParseToString("-123", MILLISECOND, /*allow_nanos=*/false)); + EXPECT_EQ("0-0 0 5:7:36.123", + ParseToString("18456123", MILLISECOND, /*allow_nanos=*/false)); + EXPECT_EQ("0-0 0 -5:7:36.123", + ParseToString("-18456123", MILLISECOND, /*allow_nanos=*/false)); + EXPECT_EQ("0-0 0 87840000:0:0", ParseToString("316224000000000", MILLISECOND, + /*allow_nanos=*/false)); + EXPECT_EQ( + "0-0 0 -87840000:0:0", + ParseToString("-316224000000000", MILLISECOND, /*allow_nanos=*/false)); + + // exceeds max number of milliseconds + ExpectParseError("3162240000000001", MILLISECOND, /*allow_nanos=*/false); + ExpectParseError("-3162240000000001", MILLISECOND, /*allow_nanos=*/false); + // overflow fitting into int64_t at SimpleAtoi + ExpectParseError("9223372036854775808", MILLISECOND, /*allow_nanos=*/false); + ExpectParseError("-9223372036854775809", MILLISECOND, /*allow_nanos=*/false); + // Overflow in multiplication + ExpectParseError("9223372036854775807", MILLISECOND, /*allow_nanos=*/false); + ExpectParseError("-9223372036854775807", MILLISECOND, /*allow_nanos=*/false); + + EXPECT_EQ("0-0 0 0:0:0.123456", + ParseToString("123456", MICROSECOND, /*allow_nanos=*/false)); + EXPECT_EQ("0-0 0 -0:0:0.123456", + ParseToString("-123456", MICROSECOND, /*allow_nanos=*/false)); + EXPECT_EQ("0-0 0 5:7:36.123456", + ParseToString("18456123456", MICROSECOND, /*allow_nanos=*/false)); + EXPECT_EQ("0-0 0 -5:7:36.123456", + ParseToString("-18456123456", MICROSECOND, /*allow_nanos=*/false)); + EXPECT_EQ( + "0-0 0 87840000:0:0", + ParseToString("316224000000000000", MICROSECOND, /*allow_nanos=*/false)); + EXPECT_EQ( + "0-0 0 -87840000:0:0", + ParseToString("-316224000000000000", MICROSECOND, /*allow_nanos=*/false)); + // exceeds max number of microseconds + ExpectParseError("3162240000000000001", MICROSECOND, /*allow_nanos=*/false); + ExpectParseError("-3162240000000000001", MICROSECOND, /*allow_nanos=*/false); + // overflow fitting into int64_t at SimpleAtoi + ExpectParseError("9223372036854775808", MICROSECOND, /*allow_nanos=*/false); + ExpectParseError("-9223372036854775809", MICROSECOND, /*allow_nanos=*/false); + // Unsupported dateparts ExpectParseError("0", functions::DAYOFWEEK, /*allow_nanos=*/false); ExpectParseError("0", functions::DAYOFYEAR, /*allow_nanos=*/false); - ExpectParseError("0", functions::MILLISECOND, /*allow_nanos=*/false); - ExpectParseError("0", functions::MICROSECOND, /*allow_nanos=*/false); ExpectParseError("0", functions::NANOSECOND, /*allow_nanos=*/true); ExpectParseError("0", functions::DATE, /*allow_nanos=*/false); ExpectParseError("0", functions::DATETIME, /*allow_nanos=*/false); @@ -2363,7 +2492,7 @@ void ExpectFromISO8601(absl::string_view expected, absl::string_view input, } void ExpectFromISO8601Error(absl::string_view input, - const std::string& error_text, bool allow_nanos) { + absl::string_view error_text, bool allow_nanos) { EXPECT_THAT( IntervalValue::ParseFromISO8601(input, allow_nanos), StatusIs(absl::StatusCode::kOutOfRange, testing::HasSubstr(error_text))) @@ -2624,11 +2753,36 @@ TEST(IntervalValueTest, FixedBinaryRepresentation) { std::string FromIntegerToString(int64_t value, functions::DateTimestampPart part) { - return IntervalValue::FromInteger(value, part).value().ToString(); + // Verify that micros and nanos modes produce the same results. + std::string micros_mode = + IntervalValue::FromInteger(value, part, /*allow_nanos=*/false) + .value() + .ToString(); + std::string nanos_mode = + IntervalValue::FromInteger(value, part, /*allow_nanos=*/true) + .value() + .ToString(); + EXPECT_EQ(micros_mode, nanos_mode); + return micros_mode; +} + +std::string FromIntegerToStringAllowNanos(int64_t value, + functions::DateTimestampPart part) { + return IntervalValue::FromInteger(value, part, /*allow_nanos=*/true) + .value() + .ToString(); } void ExpectFromIntegerError(int64_t value, functions::DateTimestampPart part) { - EXPECT_THAT(IntervalValue::FromInteger(value, part), + EXPECT_THAT(IntervalValue::FromInteger(value, part, /*allow_nanos=*/false), + StatusIs(absl::StatusCode::kOutOfRange)); + EXPECT_THAT(IntervalValue::FromInteger(value, part, /*allow_nanos=*/true), + StatusIs(absl::StatusCode::kOutOfRange)); +} + +void ExpectFromIntegerErrorDisallowNanos(int64_t value, + functions::DateTimestampPart part) { + EXPECT_THAT(IntervalValue::FromInteger(value, part, /*allow_nanos=*/false), StatusIs(absl::StatusCode::kOutOfRange)); } @@ -2650,11 +2804,40 @@ TEST(IntervalValueTest, FromInteger) { EXPECT_EQ("0-0 0 0:0:8", FromIntegerToString(8, SECOND)); EXPECT_EQ("0-0 0 -87840000:0:0", FromIntegerToString(-316224000000, SECOND)); + EXPECT_EQ("0-0 0 0:0:0.123", FromIntegerToString(123, MILLISECOND)); + EXPECT_EQ("0-0 0 -0:0:0.123", FromIntegerToString(-123, MILLISECOND)); + EXPECT_EQ("0-0 0 5:7:36.123", FromIntegerToString(18456123, MILLISECOND)); + EXPECT_EQ("0-0 0 -5:7:36.123", FromIntegerToString(-18456123, MILLISECOND)); + EXPECT_EQ("0-0 0 87840000:0:0", + FromIntegerToString(316224000000000, MILLISECOND)); + EXPECT_EQ("0-0 0 -87840000:0:0", + FromIntegerToString(-316224000000000, MILLISECOND)); + EXPECT_EQ("0-0 0 0:0:0.123456", FromIntegerToString(123456, MICROSECOND)); + EXPECT_EQ("0-0 0 -0:0:0.123456", FromIntegerToString(-123456, MICROSECOND)); + EXPECT_EQ("0-0 0 5:7:36.123456", + FromIntegerToString(18456123456, MICROSECOND)); + EXPECT_EQ("0-0 0 -5:7:36.123456", + FromIntegerToString(-18456123456, MICROSECOND)); + EXPECT_EQ("0-0 0 87840000:0:0", + FromIntegerToString(316224000000000000, MICROSECOND)); + EXPECT_EQ("0-0 0 -87840000:0:0", + FromIntegerToString(-316224000000000000, MICROSECOND)); + EXPECT_EQ("0-0 0 5:7:36.123456789", + FromIntegerToStringAllowNanos(18456123456789, NANOSECOND)); + EXPECT_EQ("0-0 0 -5:7:36.123456789", + FromIntegerToStringAllowNanos(-18456123456789, NANOSECOND)); + EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, YEAR)); EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, QUARTER)); EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, MONTH)); EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, WEEK)); EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, DAY)); + EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, HOUR)); + EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, MINUTE)); + EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, SECOND)); + EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, MILLISECOND)); + EXPECT_EQ("0-0 0 0:0:0", FromIntegerToString(0, MICROSECOND)); + EXPECT_EQ("0-0 0 0:0:0", FromIntegerToStringAllowNanos(0, NANOSECOND)); // Exceeds maximum allowed value ExpectFromIntegerError(10001, YEAR); @@ -2665,19 +2848,23 @@ TEST(IntervalValueTest, FromInteger) { ExpectFromIntegerError(87840001, HOUR); ExpectFromIntegerError(-5270400001, MINUTE); ExpectFromIntegerError(316224000001, SECOND); + ExpectFromIntegerError(3162240000000001, MILLISECOND); + ExpectFromIntegerError(-3162240000000001, MILLISECOND); + ExpectFromIntegerError(3162240000000000001, MICROSECOND); + ExpectFromIntegerError(-3162240000000000001, MICROSECOND); // Overflow in multiplication ExpectFromIntegerError(9223372036854775807, QUARTER); ExpectFromIntegerError(-9223372036854775807, QUARTER); ExpectFromIntegerError(9223372036854775807, WEEK); ExpectFromIntegerError(-9223372036854775807, WEEK); + ExpectFromIntegerError(9223372036854775807, MILLISECOND); + ExpectFromIntegerError(-9223372036854775807, MILLISECOND); // Invalid datetime part fields ExpectFromIntegerError(0, functions::DAYOFWEEK); ExpectFromIntegerError(0, functions::DAYOFYEAR); - ExpectFromIntegerError(0, functions::MILLISECOND); - ExpectFromIntegerError(0, functions::MICROSECOND); - ExpectFromIntegerError(0, functions::NANOSECOND); + ExpectFromIntegerErrorDisallowNanos(0, functions::NANOSECOND); ExpectFromIntegerError(0, functions::DATE); ExpectFromIntegerError(0, functions::DATETIME); ExpectFromIntegerError(0, functions::TIME); diff --git a/zetasql/public/interval_value_test_util.h b/zetasql/public/interval_value_test_util.h index ecef23de1..250977bb1 100644 --- a/zetasql/public/interval_value_test_util.h +++ b/zetasql/public/interval_value_test_util.h @@ -20,8 +20,10 @@ #include #include "zetasql/public/interval_value.h" +#include "gmock/gmock.h" #include "absl/random/distributions.h" #include "absl/random/random.h" +#include "absl/strings/str_format.h" namespace zetasql { @@ -101,6 +103,16 @@ constexpr absl::string_view kSerializedIntervalWithInvalidFields( "\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01", 16); static_assert(kSerializedIntervalWithInvalidFields.size() == 16); +// The following gmock matcher can be used to verify that two INTERVAL values +// are identical. This is different from the behavior of the default equality +// operator, which treats some different INTERVAL values as equal (e.g. +// INTERVAL 1 MONTH == INTERVAL 30 DAY). The matcher treats INTERVAL values as +// identical only when all their parts are equal. +MATCHER_P(IdenticalInterval, value, + absl::StrFormat("is identical to INTERVAL '%s'", value.ToString())) { + return IdenticalIntervals(arg, value); +} + } // namespace interval_testing } // namespace zetasql diff --git a/zetasql/public/json_value_test.cc b/zetasql/public/json_value_test.cc index c8b2ea46a..e7c15007a 100644 --- a/zetasql/public/json_value_test.cc +++ b/zetasql/public/json_value_test.cc @@ -38,6 +38,7 @@ #include "absl/status/statusor.h" #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" +#include "absl/types/span.h" namespace { @@ -451,7 +452,7 @@ TEST(JSONValueTest, MoveFrom) { constexpr absl::string_view kInitialValue = R"({"a":{"b":{"c":1}}, "d":2, "e":[3, 4, [5, 6]]})"; using TokenValue = std::variant; - auto verify_func = [&](const std::vector& path_tokens, + auto verify_func = [&](absl::Span path_tokens, absl::string_view member_json_string, absl::string_view modified_original_json_string) { JSONValue original_value = diff --git a/zetasql/public/language_options.cc b/zetasql/public/language_options.cc index 4d6402944..3ed349d38 100644 --- a/zetasql/public/language_options.cc +++ b/zetasql/public/language_options.cc @@ -58,9 +58,19 @@ LanguageOptions::GetLanguageFeaturesForVersion(LanguageVersion version) { features.insert(FEATURE_V_1_4_NULLIFZERO_ZEROIFNULL); features.insert(FEATURE_V_1_4_ARRAY_FIND_FUNCTIONS); features.insert(FEATURE_V_1_4_PI_FUNCTIONS); - features.insert(FEATURE_V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE); - features.insert(FEATURE_V_1_4_CORRESPONDING_BY); + features.insert(FEATURE_V_1_4_CORRESPONDING_FULL); features.insert(FEATURE_V_1_4_GROUP_BY_ALL); + features.insert(FEATURE_V_1_4_CREATE_MODEL_WITH_ALIASED_QUERY_LIST); + features.insert(FEATURE_V_1_4_REMOTE_MODEL); + features.insert(FEATURE_V_1_4_LITERAL_CONCATENATION); + features.insert(FEATURE_V_1_4_ENABLE_FLOAT_DISTANCE_FUNCTIONS); + features.insert(FEATURE_V_1_4_DOT_PRODUCT); + features.insert(FEATURE_V_1_4_MANHATTAN_DISTANCE); + features.insert(FEATURE_V_1_4_L1_NORM); + features.insert(FEATURE_V_1_4_L2_NORM); + features.insert(FEATURE_V_1_4_ARRAY_ZIP); + features.insert(FEATURE_V_1_4_GROUPING_SETS); + features.insert(FEATURE_V_1_4_GROUPING_BUILTIN); ABSL_FALLTHROUGH_INTENDED; case VERSION_1_3: // NO CHANGES SHOULD HAPPEN INSIDE THE VERSIONS BELOW, which are diff --git a/zetasql/public/literal_remover.cc b/zetasql/public/literal_remover.cc index 41f9423c6..3adc8db61 100644 --- a/zetasql/public/literal_remover.cc +++ b/zetasql/public/literal_remover.cc @@ -40,6 +40,7 @@ #include "absl/status/status.h" #include "absl/strings/ascii.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" @@ -218,7 +219,7 @@ absl::Status AddLimitOffsetLiteralsToIgnoringSet( } // namespace absl::Status ReplaceLiteralsByParameters( - const std::string& sql, + absl::string_view sql, const LiteralReplacementOptions& literal_replacement_options, const AnalyzerOptions& analyzer_options, const ResolvedStatement* stmt, LiteralReplacementMap* literal_map, diff --git a/zetasql/public/literal_remover.h b/zetasql/public/literal_remover.h index f2952c1f6..7c2476ef0 100644 --- a/zetasql/public/literal_remover.h +++ b/zetasql/public/literal_remover.h @@ -24,6 +24,7 @@ #include "absl/container/flat_hash_set.h" #include "absl/container/node_hash_set.h" #include "absl/status/status.h" +#include "absl/strings/string_view.h" namespace zetasql { @@ -52,7 +53,7 @@ struct LiteralReplacementOptions { // This can return errors that point at a location in the input. How this // location is reported is given by . absl::Status ReplaceLiteralsByParameters( - const std::string& sql, + absl::string_view sql, const LiteralReplacementOptions& literal_replacement_options, const AnalyzerOptions& analyzer_options, const ResolvedStatement* stmt, LiteralReplacementMap* literal_map, diff --git a/zetasql/public/non_sql_function.cc b/zetasql/public/non_sql_function.cc index 84fe2b091..c42cdedef 100644 --- a/zetasql/public/non_sql_function.cc +++ b/zetasql/public/non_sql_function.cc @@ -23,6 +23,7 @@ #include "zetasql/public/error_helpers.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "absl/strings/string_view.h" #include "zetasql/base/ret_check.h" namespace zetasql { @@ -31,7 +32,7 @@ namespace zetasql { const char NonSqlFunction::kNonSqlFunctionGroup[] = "Non_sql_function"; NonSqlFunction::NonSqlFunction( - const std::string& name, Mode mode, + absl::string_view name, Mode mode, const std::vector& function_signatures, const FunctionOptions& function_options, const ResolvedCreateFunctionStmt* resolved_create_function_statement, diff --git a/zetasql/public/non_sql_function.h b/zetasql/public/non_sql_function.h index e8cc3ef39..fe6d4dba9 100644 --- a/zetasql/public/non_sql_function.h +++ b/zetasql/public/non_sql_function.h @@ -25,6 +25,7 @@ #include "zetasql/public/function.h" #include "zetasql/public/parse_resume_location.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "absl/strings/string_view.h" #include "absl/types/optional.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" @@ -79,7 +80,7 @@ class NonSqlFunction : public Function { private: // Constructor for valid functions. NonSqlFunction( - const std::string& name, Mode mode, + absl::string_view name, Mode mode, const std::vector& function_signatures, const FunctionOptions& function_options, const ResolvedCreateFunctionStmt* resolved_create_function_statement, diff --git a/zetasql/public/numeric_value.h b/zetasql/public/numeric_value.h index 4710a1a31..49cdac395 100644 --- a/zetasql/public/numeric_value.h +++ b/zetasql/public/numeric_value.h @@ -50,7 +50,7 @@ class NumericValue final { public: // Must use integral_constant to utilize the optimizations for integer // divisions with constant 64-bit divisors. - static constexpr std::integral_constant + static constexpr std::integral_constant kScalingFactor{}; static constexpr int kMaxIntegerDigits = 29; static constexpr int kMaxFractionalDigits = 9; diff --git a/zetasql/public/numeric_value_test_utils.h b/zetasql/public/numeric_value_test_utils.h index 6ced04537..6fc324692 100644 --- a/zetasql/public/numeric_value_test_utils.h +++ b/zetasql/public/numeric_value_test_utils.h @@ -18,9 +18,11 @@ #define ZETASQL_PUBLIC_NUMERIC_VALUE_TEST_UTILS_H_ #include +#include #include #include +#include "zetasql/common/multiprecision_int.h" #include "zetasql/public/numeric_value.h" #include "absl/random/random.h" #include "absl/status/statusor.h" @@ -30,9 +32,8 @@ namespace zetasql { // Generates a random valid numeric value, intended to cover all precisions // and scales when enough numbers are generated. Do not assume the result to // follow any specific distribution. -template -T MakeRandomNumericValue(absl::BitGen* random, - int* num_truncated_digits = nullptr) { +template +T MakeRandomNumericValue(URBG* random, int* num_truncated_digits = nullptr) { constexpr int kNumWords = sizeof(T) / sizeof(uint64_t); constexpr int kNumBits = sizeof(T) * 8; std::array value; @@ -70,8 +71,8 @@ T MakeRandomNumericValue(absl::BitGen* random, return result.Trunc(scale - digits_to_trunc); } -template -T MakeRandomNonZeroNumericValue(absl::BitGen* random) { +template +T MakeRandomNonZeroNumericValue(URBG* random) { T result = MakeRandomNumericValue(random); if (result == T()) { int64_t v = absl::Uniform(*random); @@ -80,17 +81,16 @@ T MakeRandomNonZeroNumericValue(absl::BitGen* random) { return result; } -template -T MakeRandomPositiveNumericValue(absl::BitGen* random) { +template +T MakeRandomPositiveNumericValue(URBG* random) { absl::StatusOr abs = MakeRandomNonZeroNumericValue(random).Abs(); // Abs fails only when result = T::MinValue(). return abs.ok() ? abs.value() : T::MaxValue(); } // Generate a random double value that can be losslessly converted to T. -template -double MakeLosslessRandomDoubleValue(uint max_integer_bits, - absl::BitGen* random) { +template +double MakeLosslessRandomDoubleValue(uint max_integer_bits, URBG* random) { uint max_mantissa_bits = std::min(53, T::kMaxFractionalDigits + max_integer_bits); int64_t mantissa = absl::Uniform( diff --git a/zetasql/public/numeric_value_test_utils_test.cc b/zetasql/public/numeric_value_test_utils_test.cc index 556c50e60..e8cf07c05 100644 --- a/zetasql/public/numeric_value_test_utils_test.cc +++ b/zetasql/public/numeric_value_test_utils_test.cc @@ -16,6 +16,7 @@ #include "zetasql/public/numeric_value_test_utils.h" +#include #include #include "zetasql/base/testing/status_matchers.h" @@ -23,6 +24,7 @@ #include "gtest/gtest.h" #include "absl/random/random.h" #include "absl/strings/str_format.h" +#include "absl/strings/string_view.h" namespace zetasql { @@ -85,7 +87,7 @@ void TestMakeRandomCoverage(T (*generator)(absl::BitGen*), (T::kMaxIntegerDigits + 1) * (T::kMaxFractionalDigits + 1) - 1; - // Should cover at least 90% of the possile combinations. + // Should cover at least 90% of the possible combinations. // In an experiment with 200 runs, the minimum coverage is 99.9% for // BigNumericValue and 100% for NumericValue. EXPECT_GE(num_covered, 9 * kNumPossibleCombinations / 10); @@ -109,27 +111,31 @@ TEST(BigNumericTest, MakeRandomNumericValue_Coverage) { } TEST(NumericTest, MakeRandomNonZeroNumericValue_Coverage) { - TestMakeRandomCoverage(&MakeRandomNonZeroNumericValue, - /* expect_zero_values = */ false, - /* expect_negative_values = */ true); + TestMakeRandomCoverage( + &MakeRandomNonZeroNumericValue, + /* expect_zero_values = */ false, + /* expect_negative_values = */ true); } TEST(BigNumericTest, MakeRandomNonZeroNumericValue_Coverage) { - TestMakeRandomCoverage(&MakeRandomNonZeroNumericValue, - /* expect_zero_values = */ false, - /* expect_negative_values = */ true); + TestMakeRandomCoverage( + &MakeRandomNonZeroNumericValue, + /* expect_zero_values = */ false, + /* expect_negative_values = */ true); } TEST(NumericTest, MakeRandomPositiveNumericValue_Coverage) { - TestMakeRandomCoverage(&MakeRandomPositiveNumericValue, - /* expect_zero_values = */ false, - /* expect_negative_values = */ false); + TestMakeRandomCoverage( + &MakeRandomPositiveNumericValue, + /* expect_zero_values = */ false, + /* expect_negative_values = */ false); } TEST(BigNumericTest, MakeRandomPositiveNumericValue_Coverage) { - TestMakeRandomCoverage(&MakeRandomPositiveNumericValue, - /* expect_zero_values = */ false, - /* expect_negative_values = */ false); + TestMakeRandomCoverage( + &MakeRandomPositiveNumericValue, + /* expect_zero_values = */ false, + /* expect_negative_values = */ false); } template diff --git a/zetasql/public/options.proto b/zetasql/public/options.proto index ed0fdab3d..b630989c9 100644 --- a/zetasql/public/options.proto +++ b/zetasql/public/options.proto @@ -108,7 +108,7 @@ message LanguageFeatureOptions { // All optional features are off by default. Some features have a negative // meaning, so turning them on will remove a feature or enable an error. enum LanguageFeature { - reserved 36 to 39, 44 to 46, 48, 55, 74, 99, 13046, 14001, 999003; + reserved 36 to 39, 44 to 46, 48, 55, 74, 99, 110, 13046, 14001, 14024, 999003; // CROSS-VERSION FEATURES // @@ -291,7 +291,7 @@ enum LanguageFeature { [(language_feature_options).in_development = true]; // Enables the group_selection_strategy=PUBLIC_GROUPS option. See - // (broken link). + // (broken link). Requires FEATURE_V_1_1_WITH_ON_SUBQUERY. FEATURE_DIFFERENTIAL_PRIVACY_PUBLIC_GROUPS = 100 [(language_feature_options).in_development = true]; @@ -433,6 +433,10 @@ enum LanguageFeature { // See (broken link) for details. FEATURE_JSON_MUTATOR_FUNCTIONS = 98; + // Enables JSON lax notation for JSON_QUERY function. + // See (broken link) for details. + FEATURE_JSON_QUERY_LAX = 115; + // Enables support for WITH PARTITION COLUMNS in CREATE EXTERNAL TABLE. // Example: // CREATE EXTERNAL TABLE t WITH PARTITION COLUMNS (x int64) @@ -441,7 +445,6 @@ enum LanguageFeature { [(language_feature_options).in_development = true]; // INTERVAL data type. (broken link) - // TODO: remove in_development and consider enabling it. FEATURE_INTERVAL_TYPE = 49 [(language_feature_options).in_development = true]; // Enables support for the ONEOF_CASE pseudo-accessor in the EXTRACT function. @@ -451,6 +454,11 @@ enum LanguageFeature { // ((broken link)) FEATURE_EXTRACT_ONEOF_CASE = 50; + // Enables tokenized search ((broken link)). + // TODO: Remove "in_development" tag. + FEATURE_TOKENIZED_SEARCH = 51 + [(language_feature_options).in_development = true]; + // Enables support for the following parameterized types. // - STRING(L) / BYTES(L) // - NUMERIC(P) / NUMERIC(P, S) @@ -606,9 +614,7 @@ enum LanguageFeature { // Enables support for WITH CONNECTION in CREATE TABLE. // More details in (broken link). - // TODO: Remove in_development tag - FEATURE_CREATE_TABLE_WITH_CONNECTION = 91 - [(language_feature_options).in_development = true]; + FEATURE_CREATE_TABLE_WITH_CONNECTION = 91; // This feature can be enabled to disable the OUTER JOIN of arrays. // RIGHT JOIN or FULL JOIN of independent arrays is producing a shape that @@ -679,6 +685,20 @@ enum LanguageFeature { // ); FEATURE_IDENTITY_COLUMNS = 109; + // Enables support for EXTERNAL SCHEMA DDL statements. + // Example: + // CREATE EXTERNAL SCHEMA external_ds WITH CONNECTION some_connection OPTIONS( + // external_source = "some_external_source") + FEATURE_EXTERNAL_SCHEMA_DDL = 111 + [(language_feature_options).in_development = true]; + + // When an non-templated argument to a templated SQL function is a NULL or + // empty array literal, it will be attached to the exact argument type in the + // signature, if the feature is enabled. Otherwise, the NULL or empty array + // argument will be passed as untyped. + // Background: b/259962379. + FEATURE_TEMPLATED_SQL_FUNCTION_RESOLVE_WITH_TYPED_ARGS = 114; + // -> Add more cross-version features here. // -> DO NOT add more versioned features into versions that are frozen. // New features should be added for the *next* version number. @@ -1047,8 +1067,7 @@ enum LanguageFeature { // Enable support for remote models (e.g. CREATE MODEL ... REMOTE ...) // (broken link) - FEATURE_V_1_4_REMOTE_MODEL = 14009 - [(language_feature_options).in_development = true]; + FEATURE_V_1_4_REMOTE_MODEL = 14009; // Allow struct field access like s[OFFSET(0)]. // See (broken link) @@ -1079,8 +1098,7 @@ enum LanguageFeature { // Enables CREATE MODEL statement to support one or more aliased queries as // input. // See (broken link). - FEATURE_V_1_4_CREATE_MODEL_WITH_ALIASED_QUERY_LIST = 14016 - [(language_feature_options).in_development = true]; + FEATURE_V_1_4_CREATE_MODEL_WITH_ALIASED_QUERY_LIST = 14016; // Enables load to temp tables FEATURE_V_1_4_LOAD_DATA_TEMP_TABLE = 14017; @@ -1096,15 +1114,11 @@ enum LanguageFeature { [(language_feature_options).in_development = true]; // Enables grouping builtin function for grouping sets ((broken link)) - // TODO: Remove "in_development" flag. - FEATURE_V_1_4_GROUPING_BUILTIN = 14020 - [(language_feature_options).in_development = true]; + FEATURE_V_1_4_GROUPING_BUILTIN = 14020; // Enables group by grouping sets/cube (see (broken link)) // This will also enable GROUP BY (). - // TODO: Remove "in_development" flag - FEATURE_V_1_4_GROUPING_SETS = 14021 - [(language_feature_options).in_development = true]; + FEATURE_V_1_4_GROUPING_SETS = 14021; // Preserve the original annotation when adding implicit CAST over the // expression inside scan. @@ -1112,13 +1126,10 @@ enum LanguageFeature { FEATURE_V_1_4_PRESERVE_ANNOTATION_IN_IMPLICIT_CAST_IN_SCAN = 14022 [(language_feature_options).in_development = true]; - // Enables outer and strict modes (FULL/LEFT/STRICT) for set operations. - // (broken link) - FEATURE_V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE = 14023; - - // Enables CORRESPONDING_BY for set operations. + // Enables all the features of CORRESPONDING, including outer and strict modes + // (FULL/LEFT/STRICT) and CORRESPONDING_BY. // (broken link) - FEATURE_V_1_4_CORRESPONDING_BY = 14024; + FEATURE_V_1_4_CORRESPONDING_FULL = 14023; // Supporting LIKE ANY/SOME/ALL with array of elements. // (broken link) @@ -1169,10 +1180,7 @@ enum LanguageFeature { // Enables the ARRAY_ZIP function, which has both non-lambda and lambda // signatures. // See (broken link). - // TODO: Remove the "in_development" flag when the feature - // implementation is done. - FEATURE_V_1_4_ARRAY_ZIP = 14032 - [(language_feature_options).in_development = true]; + FEATURE_V_1_4_ARRAY_ZIP = 14032; // Enables explicit UNNEST with multiple arguments in FROM clause. // See (broken link) @@ -1192,9 +1200,7 @@ enum LanguageFeature { // Enables the following float32 signatures: // * COSINE_DISTANCE(ARRAY, ARRAY) // * EUCLIDEAN_DISTANCE(ARRAY, ARRAY) - // TODO: Remove "in_development" flag. - FEATURE_V_1_4_ENABLE_FLOAT_DISTANCE_FUNCTIONS = 14037 - [(language_feature_options).in_development = true]; + FEATURE_V_1_4_ENABLE_FLOAT_DISTANCE_FUNCTIONS = 14037; // Enables MEASURE syntax. See (broken link). // This is an EXPERIMENTAL feature in a prototype state. @@ -1210,6 +1216,114 @@ enum LanguageFeature { // (broken link) FEATURE_V_1_4_GROUP_BY_ALL = 14039; + // REQUIRES: FEATURE_V_1_4_SQL_MACROS must be enabled + // Enforces strict macro expansions. For example: require exact number of + // arguments, always require parentheses on invocations, no implicit splicing. + // See (broken link) + FEATURE_V_1_4_ENFORCE_STRICT_MACROS = 14040 + [(language_feature_options).in_development = true]; + + // Allows general expression arguments to LIMIT and OFFSET. + FEATURE_V_1_4_LIMIT_OFFSET_EXPRESSIONS = 14042 + [(language_feature_options).in_development = true]; + + // MAP datatype. (broken link) + FEATURE_V_1_4_MAP_TYPE = 14043 + [(language_feature_options).in_development = true]; + + // Disables the FLOAT32 type. See (broken link). + // FLOAT32 type is expected to be enabled by default in all the engines, so + // this feature should only be used when the engine does not want to support + // the type. Hence this feature should NOT be `ideally_enabled`. + FEATURE_V_1_4_DISABLE_FLOAT32 = 14044 [ + (language_feature_options).in_development = true, + (language_feature_options).ideally_enabled = false + ]; + + // Enables string & bytes literal concatenation. + // See (broken link) + FEATURE_V_1_4_LITERAL_CONCATENATION = 14045; + + // Enables the following DOT_PRODUCT function signatures: + // * DOT_PRODUCT(ARRAY, ARRAY) + // * DOT_PRODUCT(ARRAY, ARRAY) + // * DOT_PRODUCT(ARRAY, ARRAY) + FEATURE_V_1_4_DOT_PRODUCT = 14046; + + // Feature flag to opt in from the newer implementation of NOT LIKE + // ANY|SOME|ALL as described in (broken link). + // TODO: Remove "in_development" flag. + // TODO: Remove this flag after BQ rolls out the + // feature. ETA Q1 2024. + FEATURE_V_1_4_OPT_IN_NEW_BEHAVIOR_NOT_LIKE_ANY_SOME_ALL = 14047 + [(language_feature_options).in_development = true]; + + // Enables the following MANHATTAN_DISTANCE function signatures: + // * MANHATTAN_DISTANCE(ARRAY, ARRAY) + // * MANHATTAN_DISTANCE(ARRAY, ARRAY) + // * MANHATTAN_DISTANCE(ARRAY, ARRAY) + FEATURE_V_1_4_MANHATTAN_DISTANCE = 14048; + + // Enables the following L1_NORM function signatures: + // * L1_NORM(ARRAY) + // * L1_NORM(ARRAY) + // * L1_NORM(ARRAY) + FEATURE_V_1_4_L1_NORM = 14049; + + // Enables the following L1_NORM function signatures: + // * L2_NORM(ARRAY) + // * L2_NORM(ARRAY) + // * L2_NORM(ARRAY) + FEATURE_V_1_4_L2_NORM = 14050; + + // REQUIRES: FEATURE_V_1_3_BRACED_PROTO_CONSTRUCTORS must be enabled. + // Enables braced STRUCT constructor: STRUCT {a:1, b:2}. + // TODO: Remove "in_development" flag. + FEATURE_V_1_4_STRUCT_BRACED_CONSTRUCTORS = 14051 + [(language_feature_options).in_development = true]; + + // Enable WITH RECURSIVE DEPTH modifier. + // See (broken link):explicit-recursion-depth. + // TODO: Remove "in_development" flag. + FEATURE_V_1_4_WITH_RECURSIVE_DEPTH_MODIFIER = 14052 + [(language_feature_options).in_development = true]; + + // Enables the following JSON functions: + // * BOOL_ARRAY, FLOAT64_ARRAY, INT64_ARRAY, STRING_ARRAY + // + // If JSON_LAX_VALUE_EXTRACTION_FUNCTIONS is also enabled, + // then these functions are also enabled: + // * LAX_BOOL_ARRAY, LAX_FLOAT64_ARRAY, LAX_INT64_ARRAY, LAX_STRING_ARRAY + // + FEATURE_V_1_4_JSON_ARRAY_VALUE_EXTRACTION_FUNCTIONS = 14053 + [(language_feature_options).in_development = true]; + + // Ensures that computations properly defer side effects such as errors when + // they get separated from larger expressions with conditional operators, + // to fulfill the language spec on conditional evaluation. + // See (broken link) + FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION = 14054 + [(language_feature_options).in_development = true]; + + // Enables the following JSON functions: + // * FLOAT32, INT32, UINT64, UINT32 + // + // If JSON_LAX_VALUE_EXTRACTION_FUNCTIONS is also enabled, + // then these functions are also enabled: + // * LAX_FLOAT32, LAX_INT32, LAX_UINT64, LAX_UINT32 + // + // If FEATURE_V_1_4_JSON_ARRAY_VALUE_EXTRACTION_FUNCTIONS is also enabled, + // then these functions are also enabled: + // * FLOAT32_ARRAY,INT32_ARRAY, UINT32_ARRAY, UINT64_ARRAY + // + // If FEATURE_V_1_4_JSON_ARRAY_VALUE_EXTRACTION_FUNCTIONS and + // JSON_LAX_VALUE_EXTRACTION_FUNCTIONS are also enabled, then these functions + // are also enabled: + // * LAX_FLOAT32_ARRAY, LAX_INT32_ARRAY, LAX_UINT32_ARRAY, LAX_UINT64_ARRAY + // + FEATURE_V_1_4_JSON_MORE_VALUE_EXTRACTION_FUNCTIONS = 14055 + [(language_feature_options).in_development = true]; + // EXPERIMENTAL FEATURES - // Do not add features in this section. Use the in_development annotation // instead. @@ -1253,9 +1367,9 @@ message ResolvedASTRewriteOptions { // We support these rewrites to allow syntactic improvements which generate new // node types to be quickly and easily available to engines without needing each // engine to implement support for the new node types. -// Next ID: 24 +// Next ID: 27 enum ResolvedASTRewrite { - reserved 13, 15; + reserved 13, 15, 21; // Make sure default value of 0 is not an resolved AST rewrite. REWRITE_INVALID_DO_NOT_USE = 0; @@ -1384,9 +1498,6 @@ enum ResolvedASTRewrite { (rewrite_options).in_development = true ]; - REWRITE_SET_OPERATION_CORRESPONDING = 21 - [(rewrite_options).default_enabled = true]; - // Rewrites rollup and cube to grouping sets, more specifically rewrites // ResolvedRollup and ResolvedCube to ResolvedGroupingSet. This rewriter is // supposed to be enabled by default. @@ -1394,6 +1505,19 @@ enum ResolvedASTRewrite { (rewrite_options).default_enabled = true, (rewrite_options).in_development = true ]; + + REWRITE_INSERT_DML_VALUES = 24 [(rewrite_options).default_enabled = false]; + + // Rewrite multiway UNNEST to joins against singleton UNNEST. + REWRITE_MULTIWAY_UNNEST = 25 [ + (rewrite_options).default_enabled = false, + (rewrite_options).in_development = true + ]; + + // Supports rewriting ResolvedAggregationThresholdAggregateScan. + // See (broken link). + REWRITE_AGGREGATION_THRESHOLD = 26 + [(rewrite_options).default_enabled = false]; } // ZetaSQL rewriter options. diff --git a/zetasql/public/parse_helpers.cc b/zetasql/public/parse_helpers.cc index fe9805c44..e60f9a19d 100644 --- a/zetasql/public/parse_helpers.cc +++ b/zetasql/public/parse_helpers.cc @@ -31,6 +31,7 @@ #include "zetasql/public/options.pb.h" #include "zetasql/public/parse_resume_location.h" #include "zetasql/resolved_ast/resolved_node_kind.pb.h" +#include "absl/strings/string_view.h" namespace zetasql { @@ -103,6 +104,8 @@ ResolvedNodeKind GetStatementKind(ASTNodeKind node_kind) { return RESOLVED_CREATE_APPROX_VIEW_STMT; case AST_CREATE_SCHEMA_STATEMENT: return RESOLVED_CREATE_SCHEMA_STMT; + case AST_CREATE_EXTERNAL_SCHEMA_STATEMENT: + return RESOLVED_CREATE_EXTERNAL_SCHEMA_STMT; case AST_CREATE_SNAPSHOT_TABLE_STATEMENT: return RESOLVED_CREATE_SNAPSHOT_TABLE_STMT; case AST_CREATE_TABLE_STATEMENT: @@ -174,6 +177,8 @@ ResolvedNodeKind GetStatementKind(ASTNodeKind node_kind) { return RESOLVED_ALTER_DATABASE_STMT; case AST_ALTER_SCHEMA_STATEMENT: return RESOLVED_ALTER_SCHEMA_STMT; + case AST_ALTER_EXTERNAL_SCHEMA_STATEMENT: + return RESOLVED_ALTER_EXTERNAL_SCHEMA_STMT; case AST_ALTER_TABLE_STATEMENT: return RESOLVED_ALTER_TABLE_STMT; case AST_ALTER_VIEW_STATEMENT: @@ -228,7 +233,7 @@ ResolvedNodeKind GetNextStatementKind( RESOLVED_CREATE_TABLE_AS_SELECT_STMT : GetStatementKind(node_kind); } -absl::Status GetStatementProperties(const std::string& input, +absl::Status GetStatementProperties(absl::string_view input, const LanguageOptions& language_options, StatementProperties* statement_properties) { return GetNextStatementProperties(ParseResumeLocation::FromStringView(input), @@ -242,7 +247,7 @@ absl::Status GetNextStatementProperties( // Parsing the next statement properties may return an AST for statement // level hints, so we create an arena here to own the AST nodes. ParserOptions parser_options; - parser_options.set_language_options(&language_options); + parser_options.set_language_options(language_options); parser_options.CreateDefaultArenasIfNotSet(); parser::ASTStatementProperties ast_statement_properties; @@ -278,6 +283,7 @@ absl::Status GetNextStatementProperties( case AST_ALTER_PRIVILEGE_RESTRICTION_STATEMENT: case AST_ALTER_ROW_ACCESS_POLICY_STATEMENT: case AST_ALTER_SCHEMA_STATEMENT: + case AST_ALTER_EXTERNAL_SCHEMA_STATEMENT: case AST_ALTER_TABLE_STATEMENT: case AST_ALTER_VIEW_STATEMENT: case AST_ALTER_MODEL_STATEMENT: @@ -294,6 +300,7 @@ absl::Status GetNextStatementProperties( case AST_CREATE_PRIVILEGE_RESTRICTION_STATEMENT: case AST_CREATE_ROW_ACCESS_POLICY_STATEMENT: case AST_CREATE_SCHEMA_STATEMENT: + case AST_CREATE_EXTERNAL_SCHEMA_STATEMENT: case AST_CREATE_SNAPSHOT_TABLE_STATEMENT: case AST_CREATE_TABLE_FUNCTION_STATEMENT: case AST_CREATE_TABLE_STATEMENT: @@ -313,6 +320,7 @@ absl::Status GetNextStatementProperties( case AST_UNDROP_STATEMENT: case AST_RENAME_STATEMENT: case AST_CREATE_SNAPSHOT_STATEMENT: + case AST_DEFINE_MACRO_STATEMENT: statement_properties->statement_category = StatementProperties::DDL; break; case AST_DELETE_STATEMENT: diff --git a/zetasql/public/parse_helpers.h b/zetasql/public/parse_helpers.h index c6829cecb..ccfa45e86 100644 --- a/zetasql/public/parse_helpers.h +++ b/zetasql/public/parse_helpers.h @@ -125,7 +125,7 @@ struct StatementProperties { // // Returns OK for invalid syntax, with an UNKNOWN statement node kind. Only // returns internal errors. -absl::Status GetStatementProperties(const std::string& input, +absl::Status GetStatementProperties(absl::string_view input, const LanguageOptions& language_options, StatementProperties* statement_properties); diff --git a/zetasql/public/parse_helpers_test.cc b/zetasql/public/parse_helpers_test.cc index 6a659302a..666cda975 100644 --- a/zetasql/public/parse_helpers_test.cc +++ b/zetasql/public/parse_helpers_test.cc @@ -510,4 +510,16 @@ TEST(GetNextStatementPropertiesTest, BasicStatements) { } } +// Tests the fix for the regression in b/266192857 +TEST(GetNextStatementKindTest, DefineMacroStmt) { + // We do not yet have ResolvedDefineMacroStatemment + ParseResumeLocation parse_resume_location = + ParseResumeLocation::FromString("DEFINE MACRO m 1"); + + LanguageOptions language_options; + StatementProperties statement_properties; + ZETASQL_EXPECT_OK(GetNextStatementProperties(parse_resume_location, language_options, + &statement_properties)); +} + } // namespace zetasql diff --git a/zetasql/public/parse_location.h b/zetasql/public/parse_location.h index f18e6f411..b8f5f3c9c 100644 --- a/zetasql/public/parse_location.h +++ b/zetasql/public/parse_location.h @@ -65,6 +65,15 @@ class ParseLocationPoint { // a negative value for invalid ParseLocationPoints. int GetByteOffset() const { return byte_offset_; } + // Sets the byte offset without changing the filename_. + void SetByteOffset(int byte_offset) { byte_offset_ = byte_offset; } + + // Increments the offset without changing the filename_. + void IncrementByteOffset(int increment) { + ABSL_DCHECK(IsValid()); + byte_offset_ += increment; + } + // Returns true if the is non-negative. bool IsValid() const { return byte_offset_ >= 0; } @@ -123,6 +132,8 @@ class ParseLocationPoint { class ParseLocationRange { public: ParseLocationRange() = default; + ParseLocationRange(ParseLocationPoint start, ParseLocationPoint end) + : start_(start), end_(end) {} void set_start(ParseLocationPoint start) { start_ = start; } void set_end(ParseLocationPoint end) { end_ = end; } @@ -130,6 +141,9 @@ class ParseLocationRange { ParseLocationPoint start() const { return start_; } ParseLocationPoint end() const { return end_; } + ParseLocationPoint& mutable_start() { return start_; } + ParseLocationPoint& mutable_end() { return end_; } + absl::StatusOr ToProto() const { // The ParseLocationProto only has a single field for the filename, so it // cannot represent a ParseLocationRange where the start and end locations diff --git a/zetasql/public/parse_tokens.cc b/zetasql/public/parse_tokens.cc index 70772e6f4..8a1559f64 100644 --- a/zetasql/public/parse_tokens.cc +++ b/zetasql/public/parse_tokens.cc @@ -23,12 +23,12 @@ #include #include -#include "zetasql/base/logging.h" +#include "zetasql/base/arena.h" #include "zetasql/common/errors.h" #include "zetasql/parser/bison_parser.bison.h" #include "zetasql/parser/bison_parser_mode.h" -#include "zetasql/parser/flex_tokenizer.h" #include "zetasql/parser/keywords.h" +#include "zetasql/parser/token_disambiguator.h" #include "zetasql/public/functions/convert_string.h" #include "zetasql/public/parse_resume_location.h" #include "zetasql/public/strings.h" @@ -37,6 +37,7 @@ #include "absl/memory/memory.h" #include "absl/strings/ascii.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" #include "zetasql/base/status_macros.h" @@ -55,7 +56,7 @@ enum class TokenizerState { kNone, kIdentifier, kIdentifierDot }; // offset 'error_offset' into 'location'. static absl::Status MakeSyntaxErrorAtLocationOffset( const ParseLocationRange& location, int error_offset, - const std::string& error_message) { + absl::string_view error_message) { absl::string_view filename = location.start().filename(); const int total_error_offset = location.start().GetByteOffset() + error_offset; @@ -86,34 +87,6 @@ static absl::Status ConvertBisonToken(int bison_token, break; } - case BisonParserImpl::token::KW_OPEN_HINT: { - // This is one token "@{" in Flex, but we want to return two tokens. - ParseLocationRange first_location = location; - first_location.set_end(ParseLocationPoint::FromByteOffset( - location.start().filename(), location.start().GetByteOffset() + 1)); - parse_tokens->emplace_back(first_location, "@", ParseToken::KEYWORD); - - ParseLocationRange second_location = location; - second_location.set_start(ParseLocationPoint::FromByteOffset( - location.end().filename(), location.end().GetByteOffset() - 1)); - parse_tokens->emplace_back(second_location, "{", ParseToken::KEYWORD); - break; - } - - case BisonParserImpl::token::KW_DOT_STAR: { - // This is one token ".*" in Flex, but we want to return two tokens. - ParseLocationRange first_location = location; - first_location.set_end(ParseLocationPoint::FromByteOffset( - location.start().filename(), location.start().GetByteOffset() + 1)); - parse_tokens->emplace_back(first_location, ".", ParseToken::KEYWORD); - - ParseLocationRange second_location = location; - second_location.set_start(ParseLocationPoint::FromByteOffset( - location.end().filename(), location.end().GetByteOffset() - 1)); - parse_tokens->emplace_back(second_location, "*", ParseToken::KEYWORD); - break; - } - case BisonParserImpl::token::STRING_LITERAL: { std::string parsed_value; int error_offset; @@ -251,9 +224,13 @@ absl::Status GetParseTokens(const ParseTokenOptions& options, mode = parser::BisonParserMode::kTokenizerPreserveComments; } - auto tokenizer = std::make_unique( - mode, resume_location->filename(), resume_location->input(), - resume_location->byte_position(), options.language_options); + auto arena = std::make_unique(/*block_size=*/4096); + ZETASQL_ASSIGN_OR_RETURN( + auto tokenizer, + parser::DisambiguatorLexer::Create( + mode, resume_location->filename(), resume_location->input(), + resume_location->byte_position(), options.language_options, + /*macro_catalog=*/nullptr, arena.get())); absl::Status status; ParseLocationRange location; diff --git a/zetasql/public/parse_tokens_test.cc b/zetasql/public/parse_tokens_test.cc index 4fbec6e2d..74f973a22 100644 --- a/zetasql/public/parse_tokens_test.cc +++ b/zetasql/public/parse_tokens_test.cc @@ -26,7 +26,6 @@ #include "zetasql/public/parse_resume_location.h" #include "gmock/gmock.h" #include "gtest/gtest.h" -#include "absl/flags/flag.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "absl/strings/strip.h" diff --git a/zetasql/public/proto/type_annotation.proto b/zetasql/public/proto/type_annotation.proto index 897a572ba..5a47c85a6 100644 --- a/zetasql/public/proto/type_annotation.proto +++ b/zetasql/public/proto/type_annotation.proto @@ -149,6 +149,10 @@ message FieldFormat { // Can be applied to bytes fields. INTERVAL = 14; + // A ZetaSQL TOKENLIST value. + // Can be applied to bytes fields. + TOKENLIST = 15; + // A ZetaSQL Range value encoded as bytes. // // Can be applied to bytes fields. diff --git a/zetasql/public/proto_util_test.cc b/zetasql/public/proto_util_test.cc index cc27ef590..bfda812e1 100644 --- a/zetasql/public/proto_util_test.cc +++ b/zetasql/public/proto_util_test.cc @@ -44,6 +44,7 @@ #include "absl/flags/flag.h" #include "absl/status/statusor.h" #include "absl/strings/cord.h" +#include "absl/strings/string_view.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" #include "zetasql/base/status_macros.h" @@ -82,7 +83,7 @@ class ReadProtoFieldsTest : public ::testing::TestWithParam { } template - absl::StatusOr ReadField(const std::string& field_name, + absl::StatusOr ReadField(absl::string_view field_name, FieldFormat::Format format, const Type* type, const Value& default_value, const absl::Cord& bytes, @@ -97,7 +98,7 @@ class ReadProtoFieldsTest : public ::testing::TestWithParam { return value; } - absl::StatusOr ReadField(const std::string& field_name, + absl::StatusOr ReadField(absl::string_view field_name, FieldFormat::Format format, const Type* type, const Value& default_value, bool get_has_bit = false) { @@ -108,7 +109,7 @@ class ReadProtoFieldsTest : public ::testing::TestWithParam { get_has_bit); } - absl::StatusOr ReadHasBit(const std::string& name, + absl::StatusOr ReadHasBit(absl::string_view name, FieldFormat::Format format, const Type* type) { Value default_value; diff --git a/zetasql/public/proto_value_conversion.cc b/zetasql/public/proto_value_conversion.cc index 6358a3895..bb2c1ca14 100644 --- a/zetasql/public/proto_value_conversion.cc +++ b/zetasql/public/proto_value_conversion.cc @@ -158,6 +158,7 @@ static absl::Status CheckFieldFormat(const Value& value, case TYPE_NUMERIC: case TYPE_BIGNUMERIC: case TYPE_JSON: + case TYPE_TOKENLIST: case TYPE_RANGE: break; @@ -491,6 +492,16 @@ absl::Status MergeValueToProtoField(const Value& value, } return absl::OkStatus(); } + case TYPE_TOKENLIST: { + ZETASQL_RET_CHECK_EQ(field->type(), google::protobuf::FieldDescriptor::TYPE_BYTES); + std::string bytes = value.tokenlist_value().GetBytes(); + if (field->is_repeated()) { + reflection->AddString(proto_out, field, std::move(bytes)); + } else { + reflection->SetString(proto_out, field, std::move(bytes)); + } + return absl::OkStatus(); + } case TYPE_RANGE: { ZETASQL_RET_CHECK_EQ(field->type(), google::protobuf::FieldDescriptor::TYPE_BYTES); std::string encoded_range; @@ -608,7 +619,8 @@ absl::Status ProtoFieldToValue(const google::protobuf::Message& proto, if (!type->IsDate() && !type->IsTimestamp() && !type->IsArray() && !type->IsTime() && !type->IsDatetime() && !type->IsGeography() && !type->IsNumericType() && !type->IsBigNumericType() && - !type->IsRange() && !type->IsInterval() && !type->IsJsonType()) { + !type->IsTokenList() && !type->IsRange() && !type->IsInterval() && + !type->IsJsonType()) { ZETASQL_RET_CHECK_EQ(FieldFormat::DEFAULT_FORMAT, field_format) << "Format " << FieldFormat::Format_Name(field_format) << " not supported for zetasql type " << type->DebugString(); @@ -954,6 +966,19 @@ absl::Status ProtoFieldToValue(const google::protobuf::Message& proto, *value_out = Value::Json(std::move(json_value)); return absl::OkStatus(); } + case TypeKind::TYPE_TOKENLIST: { + ZETASQL_RET_CHECK_EQ(google::protobuf::FieldDescriptor::CPPTYPE_STRING, field->cpp_type()) + << field->DebugString(); + ZETASQL_RET_CHECK_EQ(FieldFormat::TOKENLIST, field_format) + << FieldFormat::Format_Name(field_format); + std::string value = + field->is_repeated() + ? reflection->GetRepeatedString(proto, field, index) + : reflection->GetString(proto, field); + *value_out = + Value::TokenList(tokens::TokenList::FromBytes(std::move(value))); + return absl::OkStatus(); + } case TypeKind::TYPE_RANGE: { ZETASQL_RET_CHECK_EQ(google::protobuf::FieldDescriptor::CPPTYPE_STRING, field->cpp_type()) << field->DebugString(); @@ -965,41 +990,26 @@ absl::Status ProtoFieldToValue(const google::protobuf::Message& proto, field->is_repeated() ? reflection->GetRepeatedString(proto, field, index) : reflection->GetString(proto, field); - - Value start, end; switch (field_format) { case FieldFormat::RANGE_DATES_ENCODED: { - ZETASQL_ASSIGN_OR_RETURN(RangeBoundaries range, - DeserializeRangeFromBytes(value)); - start = range.start ? Value::Date(*range.start) : Value::NullDate(); - end = range.end ? Value::Date(*range.end) : Value::NullDate(); + ZETASQL_ASSIGN_OR_RETURN(*value_out, DeserializeRangeValueFromBytes( + types::DateRangeType(), value)); break; } case FieldFormat::RANGE_DATETIMES_ENCODED: { - ZETASQL_ASSIGN_OR_RETURN(RangeBoundaries range, - DeserializeRangeFromBytes(value)); - start = range.start ? Value::DatetimeFromPacked64Micros(*range.start) - : Value::NullDatetime(); - end = range.end ? Value::DatetimeFromPacked64Micros(*range.end) - : Value::NullDatetime(); + ZETASQL_ASSIGN_OR_RETURN(*value_out, DeserializeRangeValueFromBytes( + types::DatetimeRangeType(), value)); break; } case FieldFormat::RANGE_TIMESTAMPS_ENCODED: { - ZETASQL_ASSIGN_OR_RETURN(RangeBoundaries range, - DeserializeRangeFromBytes(value)); - start = range.start ? Value::TimestampFromUnixMicros(*range.start) - : Value::NullTimestamp(); - end = range.end ? Value::TimestampFromUnixMicros(*range.end) - : Value::NullTimestamp(); + ZETASQL_ASSIGN_OR_RETURN(*value_out, DeserializeRangeValueFromBytes( + types::TimestampRangeType(), value)); break; } default: ZETASQL_RET_CHECK_FAIL() << "Unsupported RANGE field format: " << FieldFormat::Format_Name(field_format); } - ZETASQL_ASSIGN_OR_RETURN(*value_out, - Value::MakeRange(std::move(start), std::move(end))); - return absl::OkStatus(); } default: diff --git a/zetasql/public/proto_value_conversion_test.cc b/zetasql/public/proto_value_conversion_test.cc index a8f4e07c1..1159f0ac4 100644 --- a/zetasql/public/proto_value_conversion_test.cc +++ b/zetasql/public/proto_value_conversion_test.cc @@ -40,6 +40,7 @@ #include "zetasql/public/options.pb.h" #include "zetasql/public/proto/type_annotation.pb.h" #include "zetasql/public/simple_catalog.h" +#include "zetasql/public/token_list_util.h" #include "zetasql/public/type.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/types/struct_type.h" @@ -48,7 +49,6 @@ #include "zetasql/resolved_ast/resolved_node_kind.pb.h" #include "gmock/gmock.h" #include "gtest/gtest.h" -#include "absl/flags/flag.h" #include "absl/functional/bind_front.h" #include "absl/status/status.h" #include "absl/status/statusor.h" @@ -94,7 +94,7 @@ class ProtoValueConversionTest : public ::testing::Test { ~ProtoValueConversionTest() override = default; - absl::Status ParseLiteralExpression(const std::string& expression_sql, + absl::Status ParseLiteralExpression(absl::string_view expression_sql, Value* value_out) { std::unique_ptr output; LanguageOptions language_options; @@ -104,6 +104,7 @@ class ProtoValueConversionTest : public ::testing::Test { language_options.EnableLanguageFeature(FEATURE_BIGNUMERIC_TYPE); language_options.EnableLanguageFeature(FEATURE_JSON_TYPE); language_options.EnableLanguageFeature(FEATURE_INTERVAL_TYPE); + language_options.EnableLanguageFeature(FEATURE_TOKENIZED_SEARCH); language_options.EnableLanguageFeature(FEATURE_RANGE_TYPE); ZETASQL_RETURN_IF_ERROR(AnalyzeExpression(expression_sql, AnalyzerOptions(language_options), @@ -150,14 +151,14 @@ class ProtoValueConversionTest : public ::testing::Test { // Performs the round-trip test for the given SQL expression and options. // Returns true if no test error was encountered and false otherwise. - bool RoundTripTest(const std::string& expression_sql, + bool RoundTripTest(absl::string_view expression_sql, const ConvertTypeToProtoOptions& options) { DoRoundTripTest(expression_sql, options); return !::testing::Test::HasFailure(); } absl::StatusOr> LiteralValueToProto( - const std::string& expression_sql, + absl::string_view expression_sql, const ConvertTypeToProtoOptions& options) { Value value; ZETASQL_RETURN_IF_ERROR(ParseLiteralExpression(expression_sql, &value)); @@ -191,7 +192,7 @@ class ProtoValueConversionTest : public ::testing::Test { TypeFactory type_factory_; private: - void DoRoundTripTest(const std::string& expression_sql, + void DoRoundTripTest(absl::string_view expression_sql, const ConvertTypeToProtoOptions& options) { // Print out the current state in case of failure. SCOPED_TRACE(absl::Substitute( @@ -274,6 +275,7 @@ TEST_F(ProtoValueConversionTest, RoundTrip) { "AS BIGNUMERIC))", "STRUCT(CAST(NULL AS GEOGRAPHY))", "STRUCT(CAST(NULL AS JSON))", + "STRUCT(CAST(NULL AS TOKENLIST))", "STRUCT(RANGE '[2022-12-06, 2022-12-07)')", "STRUCT(CAST(NULL AS RANGE))", "STRUCT(RANGE '[2022-12-05 16:44:00.000007, 2022-12-05 " @@ -369,8 +371,8 @@ TEST_F(ProtoValueConversionTest, RoundTrip) { "[CAST(NULL AS STRUCT>)]", "[CAST(NULL AS zetasql.ProtoTypeProto)]", "[CAST(NULL AS NUMERIC)]", "[CAST(NULL AS BIGNUMERIC)]", "[CAST(NULL AS GEOGRAPHY)]", - "[CAST(NULL AS RANGE)]", "[CAST(NULL AS RANGE)]", - "[CAST(NULL AS RANGE)]", + "[CAST(NULL AS TOKENLIST)]", "[CAST(NULL AS RANGE)]", + "[CAST(NULL AS RANGE)]", "[CAST(NULL AS RANGE)]", "[CAST(NULL AS JSON)]"}; for (bool array_wrappers : {true, false}) { @@ -730,6 +732,25 @@ TEST_F(ProtoValueConversionTest, InvalidRange) { HasSubstr("Too few bytes to read RANGE"))); } +// Go through the motions on a non-NULL tokenlist. +TEST_F(ProtoValueConversionTest, TokenListRoundTrip) { + const Value t = TokenListFromStringArray({"tokenlist"}); + + const StructType* struct_type = nullptr; + ZETASQL_ASSERT_OK(type_factory_.MakeStructType({{"t", type_factory_.get_tokenlist()}}, + &struct_type)); + + const Value value = Value::Struct(struct_type, {&t, 1}); + const ConvertTypeToProtoOptions options; + std::unique_ptr proto; + ZETASQL_ASSERT_OK(ValueToProto(value, options, &proto)); + + Value result_value; + ZETASQL_ASSERT_OK(ConvertProtoMessageToStructOrArrayValue(*proto, value.type(), + &result_value)); + EXPECT_EQ(result_value.field(0), t); +} + // Verify MergeValueToProtoField using various combinations of destination proto // field type, label and ZetaSQL format annotation. class MergeValueToProtoFieldTypeCombinationsTest diff --git a/zetasql/public/signature_match_result.cc b/zetasql/public/signature_match_result.cc index 14d1ac7b1..edcbcd1f4 100644 --- a/zetasql/public/signature_match_result.cc +++ b/zetasql/public/signature_match_result.cc @@ -49,8 +49,7 @@ void SignatureMatchResult::UpdateFromResult( literals_coerced_ += other_result.literals_coerced_; literals_distance_ += other_result.literals_distance_; mismatch_message_ = other_result.mismatch_message(); - tvf_mismatch_message_ = other_result.tvf_mismatch_message(); - tvf_bad_argument_index_ = other_result.tvf_bad_argument_index(); + bad_argument_index_ = other_result.bad_argument_index(); tvf_relation_coercion_map_ = other_result.tvf_relation_coercion_map_; } diff --git a/zetasql/public/signature_match_result.h b/zetasql/public/signature_match_result.h index 1a393c1bf..29a4d3c80 100644 --- a/zetasql/public/signature_match_result.h +++ b/zetasql/public/signature_match_result.h @@ -82,22 +82,12 @@ class SignatureMatchResult { mismatch_message_ = message; } - // Error message for why TVF signature doesn't match the function call. - // If we use mismatch_message_ for tvf, the existing tvf error message will be - // changed to include detail about all mismatch cases even if we don't enable - // the detailed mismatch error behavior. - // TODO: merge the tvf code path with the general detailed - // mismatch path. - std::string tvf_mismatch_message() const { return tvf_mismatch_message_; } - void set_tvf_mismatch_message(absl::string_view message) { - ABSL_DCHECK(tvf_mismatch_message_.empty()) << tvf_mismatch_message_; - tvf_mismatch_message_ = message; - } - - int tvf_bad_argument_index() const { return tvf_bad_argument_index_; } - void set_tvf_bad_argument_index(int index) { - tvf_bad_argument_index_ = index; - } + // Index of argument that causes the signature to mismatch, or -1 if unknown. + // For now, it's only set when resolving TVFs. + // TODO: set this for other mismatch cases for precise error + // cursor for functions with a single signature. + int bad_argument_index() const { return bad_argument_index_; } + void set_bad_argument_index(int index) { bad_argument_index_ = index; } struct ArgumentColumnPair { int argument_index = 0; @@ -142,14 +132,13 @@ class SignatureMatchResult { int literals_coerced_; // Number of literal coercions. int literals_distance_; // How far non-literals were coerced. - // If the TVF call was invalid because of a particular argument, this + // If the function call was invalid because of a particular argument, this // zero-based index is updated to indicate which argument was invalid. - int tvf_bad_argument_index_ = -1; + int bad_argument_index_ = -1; bool allow_mismatch_message_ = false; std::string mismatch_message_; - std::string tvf_mismatch_message_; // If the TVF call was valid, this stores type coercions necessary for // relation arguments. The key is (argument index, column index) where the diff --git a/zetasql/public/simple_catalog.cc b/zetasql/public/simple_catalog.cc index b58423ca6..a5dde806f 100644 --- a/zetasql/public/simple_catalog.cc +++ b/zetasql/public/simple_catalog.cc @@ -41,6 +41,7 @@ #include "zetasql/public/table_valued_function.h" #include "zetasql/public/types/annotation.h" #include "zetasql/public/types/type_deserializer.h" +#include "zetasql/public/types/type_factory.h" #include "zetasql/base/case.h" #include "absl/container/btree_map.h" #include "absl/container/flat_hash_map.h" @@ -51,6 +52,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "absl/synchronization/mutex.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/source_location.h" #include "zetasql/base/ret_check.h" @@ -326,7 +328,7 @@ void SimpleCatalog::AddModel(absl::string_view name, const Model* model) { zetasql_base::InsertOrDie(&models_, absl::AsciiStrToLower(name), model); } -void SimpleCatalog::AddConnection(const std::string& name, +void SimpleCatalog::AddConnection(absl::string_view name, const Connection* connection) { absl::MutexLock l(&mutex_); zetasql_base::InsertOrDie(&connections_, absl::AsciiStrToLower(name), connection); @@ -343,7 +345,7 @@ void SimpleCatalog::AddType(absl::string_view name, const Type* type) { ABSL_CHECK(types_.insert({absl::AsciiStrToLower(name), type}).second); } -void SimpleCatalog::AddCatalog(const std::string& name, Catalog* catalog) { +void SimpleCatalog::AddCatalog(absl::string_view name, Catalog* catalog) { absl::MutexLock l(&mutex_); AddCatalogLocked(name, catalog); } @@ -352,7 +354,7 @@ void SimpleCatalog::AddCatalogLocked(absl::string_view name, Catalog* catalog) { zetasql_base::InsertOrDie(&catalogs_, absl::AsciiStrToLower(name), catalog); } -void SimpleCatalog::AddFunctionLocked(const std::string& name, +void SimpleCatalog::AddFunctionLocked(absl::string_view name, const Function* function) { zetasql_base::InsertOrDie(&functions_, absl::AsciiStrToLower(name), function); if (!function->alias_name().empty() && @@ -369,13 +371,13 @@ void SimpleCatalog::AddFunction(const std::string& name, } void SimpleCatalog::AddTableValuedFunctionLocked( - const std::string& name, const TableValuedFunction* table_function) { + absl::string_view name, const TableValuedFunction* table_function) { zetasql_base::InsertOrDie(&table_valued_functions_, absl::AsciiStrToLower(name), table_function); } void SimpleCatalog::AddTableValuedFunction( - const std::string& name, const TableValuedFunction* function) { + absl::string_view name, const TableValuedFunction* function) { absl::MutexLock l(&mutex_); AddTableValuedFunctionLocked(name, function); } @@ -386,13 +388,13 @@ void SimpleCatalog::AddProcedure(absl::string_view name, zetasql_base::InsertOrDie(&procedures_, absl::AsciiStrToLower(name), procedure); } -void SimpleCatalog::AddConstant(const std::string& name, +void SimpleCatalog::AddConstant(absl::string_view name, const Constant* constant) { absl::MutexLock l(&mutex_); AddConstantLocked(name, constant); } -void SimpleCatalog::AddConstantLocked(const std::string& name, +void SimpleCatalog::AddConstantLocked(absl::string_view name, const Constant* constant) { zetasql_base::InsertOrDie(&constants_, absl::AsciiStrToLower(name), constant); } @@ -420,14 +422,14 @@ void SimpleCatalog::AddOwnedTable(absl::string_view name, const Table* table) { AddOwnedTable(name, absl::WrapUnique(table)); } -void SimpleCatalog::AddOwnedModel(const std::string& name, +void SimpleCatalog::AddOwnedModel(absl::string_view name, std::unique_ptr model) { AddModel(name, model.get()); absl::MutexLock l(&mutex_); owned_models_.emplace_back(std::move(model)); } -void SimpleCatalog::AddOwnedModel(const std::string& name, const Model* model) { +void SimpleCatalog::AddOwnedModel(absl::string_view name, const Model* model) { AddOwnedModel(name, absl::WrapUnique(model)); } @@ -473,14 +475,14 @@ void SimpleCatalog::AddOwnedTableValuedFunction( } void SimpleCatalog::AddOwnedTableValuedFunctionLocked( - const std::string& name, + absl::string_view name, std::unique_ptr table_function) { AddTableValuedFunctionLocked(name, table_function.get()); owned_table_valued_functions_.emplace_back(std::move(table_function)); } void SimpleCatalog::AddOwnedProcedure( - const std::string& name, std::unique_ptr procedure) { + absl::string_view name, std::unique_ptr procedure) { AddProcedure(name, procedure.get()); absl::MutexLock l(&mutex_); owned_procedures_.push_back(std::move(procedure)); @@ -498,7 +500,7 @@ bool SimpleCatalog::AddOwnedProcedureIfNotPresent( return true; } -void SimpleCatalog::AddOwnedProcedure(const std::string& name, +void SimpleCatalog::AddOwnedProcedure(absl::string_view name, const Procedure* procedure) { AddOwnedProcedure(name, absl::WrapUnique(procedure)); } @@ -593,14 +595,14 @@ void SimpleCatalog::AddOwnedCatalog(Catalog* catalog) { AddOwnedCatalog(absl::WrapUnique(catalog)); } -void SimpleCatalog::AddOwnedCatalogLocked(const std::string& name, +void SimpleCatalog::AddOwnedCatalogLocked(absl::string_view name, std::unique_ptr catalog) { AddCatalogLocked(name, catalog.get()); owned_catalogs_.emplace_back(std::move(catalog)); } bool SimpleCatalog::AddOwnedCatalogIfNotPresent( - const std::string& name, std::unique_ptr catalog) { + absl::string_view name, std::unique_ptr catalog) { absl::MutexLock l(&mutex_); if (catalogs_.contains(absl::AsciiStrToLower(name))) { return false; @@ -656,7 +658,7 @@ void SimpleCatalog::AddOwnedTableValuedFunction( } bool SimpleCatalog::AddOwnedTableValuedFunctionIfNotPresent( - const std::string& name, + absl::string_view name, std::unique_ptr* table_function) { absl::MutexLock l(&mutex_); // If the table function name exists, return false. @@ -713,7 +715,7 @@ void SimpleCatalog::AddOwnedConstant(const Constant* constant) { } void SimpleCatalog::AddOwnedConnection( - const std::string& name, std::unique_ptr connection) { + absl::string_view name, std::unique_ptr connection) { AddConnection(name, connection.get()); absl::MutexLock l(&mutex_); owned_connections_.push_back(std::move(connection)); @@ -1115,8 +1117,17 @@ absl::StatusOr> SimpleCatalog::Deserialize( const absl::Span pools, const ExtendedTypeDeserializer* extended_type_deserializer) { // Create a top level catalog that owns the TypeFactory. - std::unique_ptr catalog(new SimpleCatalog(proto.name())); + return Deserialize(proto, pools, /*type_factory=*/nullptr, + extended_type_deserializer); +} +absl::StatusOr> SimpleCatalog::Deserialize( + const SimpleCatalogProto& proto, + const absl::Span pools, + zetasql::TypeFactory* type_factory, + const ExtendedTypeDeserializer* extended_type_deserializer) { + std::unique_ptr catalog( + new SimpleCatalog(proto.name(), type_factory)); ZETASQL_RETURN_IF_ERROR( catalog->DeserializeImpl(proto, TypeDeserializer(catalog->type_factory(), pools, @@ -1332,6 +1343,15 @@ absl::Status SimpleCatalog::GetFunctions( return absl::OkStatus(); } +absl::Status SimpleCatalog::GetTableValuedFunctions( + absl::flat_hash_set* output) const { + ZETASQL_RET_CHECK_NE(output, nullptr); + ZETASQL_RET_CHECK(output->empty()); + absl::MutexLock lock(&mutex_); + InsertValuesFromMap(table_valued_functions_, output); + return absl::OkStatus(); +} + std::vector SimpleCatalog::table_names() const { absl::MutexLock l(&mutex_); std::vector table_names; @@ -1457,7 +1477,7 @@ SimpleTable::SimpleTable(absl::string_view name, } SimpleTable::SimpleTable(absl::string_view name, - const std::vector& columns, + absl::Span columns, const int64_t serialization_id) : name_(name), id_(serialization_id) { for (const NameAndAnnotatedType& name_and_annotated_type : columns) { @@ -1562,7 +1582,7 @@ absl::Status SimpleTable::InsertColumnToColumnMap(const Column* column) { return absl::OkStatus(); } -void SimpleTable::SetContents(const std::vector>& rows) { +void SimpleTable::SetContents(absl::Span> rows) { column_major_contents_.clear(); column_major_contents_.resize(NumColumns()); for (int i = 0; i < NumColumns(); ++i) { @@ -1835,8 +1855,8 @@ absl::StatusOr> SimpleConstant::Deserialize( } SimpleModel::SimpleModel(const std::string& name, - const std::vector& inputs, - const std::vector& outputs, + absl::Span inputs, + absl::Span outputs, const int64_t id) : name_(name), id_(id) { for (const NameAndType& name_and_type : inputs) { diff --git a/zetasql/public/simple_catalog.h b/zetasql/public/simple_catalog.h index 859358836..2f0fd747f 100644 --- a/zetasql/public/simple_catalog.h +++ b/zetasql/public/simple_catalog.h @@ -38,6 +38,7 @@ #include "zetasql/public/type.h" #include "zetasql/public/types/annotation.h" #include "zetasql/public/types/type_deserializer.h" +#include "zetasql/public/types/type_factory.h" #include "zetasql/public/value.h" #include "absl/base/macros.h" #include "absl/base/thread_annotations.h" @@ -174,21 +175,20 @@ class SimpleCatalog : public EnumerableCatalog { void AddModel(absl::string_view name, const Model* model) ABSL_LOCKS_EXCLUDED(mutex_); void AddModel(const Model* model) ABSL_LOCKS_EXCLUDED(mutex_); - void AddOwnedModel(const std::string& name, - std::unique_ptr model) + void AddOwnedModel(absl::string_view name, std::unique_ptr model) ABSL_LOCKS_EXCLUDED(mutex_); void AddOwnedModel(std::unique_ptr model) ABSL_LOCKS_EXCLUDED(mutex_); - void AddOwnedModel(const std::string& name, const Model* model); + void AddOwnedModel(absl::string_view name, const Model* model); void AddOwnedModel(const Model* model) ABSL_LOCKS_EXCLUDED(mutex_); bool AddOwnedModelIfNotPresent(std::unique_ptr model) ABSL_LOCKS_EXCLUDED(mutex_); // Connections - void AddConnection(const std::string& name, const Connection* connection) + void AddConnection(absl::string_view name, const Connection* connection) ABSL_LOCKS_EXCLUDED(mutex_); void AddConnection(const Connection* connection) ABSL_LOCKS_EXCLUDED(mutex_); - void AddOwnedConnection(const std::string& name, + void AddOwnedConnection(absl::string_view name, std::unique_ptr connection) ABSL_LOCKS_EXCLUDED(mutex_); void AddOwnedConnection(std::unique_ptr connection) @@ -208,7 +208,7 @@ class SimpleCatalog : public EnumerableCatalog { ABSL_LOCKS_EXCLUDED(mutex_); // Catalogs - void AddCatalog(const std::string& name, Catalog* catalog) + void AddCatalog(absl::string_view name, Catalog* catalog) ABSL_LOCKS_EXCLUDED(mutex_); void AddCatalog(Catalog* catalog) ABSL_LOCKS_EXCLUDED(mutex_); void AddOwnedCatalog(const std::string& name, @@ -218,7 +218,7 @@ class SimpleCatalog : public EnumerableCatalog { ABSL_LOCKS_EXCLUDED(mutex_); void AddOwnedCatalog(const std::string& name, Catalog* catalog); void AddOwnedCatalog(Catalog* catalog) ABSL_LOCKS_EXCLUDED(mutex_); - bool AddOwnedCatalogIfNotPresent(const std::string& name, + bool AddOwnedCatalogIfNotPresent(absl::string_view name, std::unique_ptr catalog) ABSL_LOCKS_EXCLUDED(mutex_); @@ -243,7 +243,7 @@ class SimpleCatalog : public EnumerableCatalog { void AddOwnedFunction(const Function* function) ABSL_LOCKS_EXCLUDED(mutex_); // Table Valued Functions - void AddTableValuedFunction(const std::string& name, + void AddTableValuedFunction(absl::string_view name, const TableValuedFunction* function) ABSL_LOCKS_EXCLUDED(mutex_); void AddTableValuedFunction(const TableValuedFunction* function) @@ -254,7 +254,7 @@ class SimpleCatalog : public EnumerableCatalog { void AddOwnedTableValuedFunction( std::unique_ptr function); bool AddOwnedTableValuedFunctionIfNotPresent( - const std::string& name, + absl::string_view name, std::unique_ptr* table_function); bool AddOwnedTableValuedFunctionIfNotPresent( std::unique_ptr* table_function); @@ -266,18 +266,18 @@ class SimpleCatalog : public EnumerableCatalog { // Procedures void AddProcedure(absl::string_view name, const Procedure* procedure); void AddProcedure(const Procedure* procedure) ABSL_LOCKS_EXCLUDED(mutex_); - void AddOwnedProcedure(const std::string& name, + void AddOwnedProcedure(absl::string_view name, std::unique_ptr procedure); void AddOwnedProcedure(std::unique_ptr procedure) ABSL_LOCKS_EXCLUDED(mutex_); bool AddOwnedProcedureIfNotPresent(std::unique_ptr procedure) ABSL_LOCKS_EXCLUDED(mutex_); - void AddOwnedProcedure(const std::string& name, const Procedure* procedure); + void AddOwnedProcedure(absl::string_view name, const Procedure* procedure); void AddOwnedProcedure(const Procedure* procedure) ABSL_LOCKS_EXCLUDED(mutex_); // Constants - void AddConstant(const std::string& name, const Constant* constant); + void AddConstant(absl::string_view name, const Constant* constant); void AddConstant(const Constant* constant) ABSL_LOCKS_EXCLUDED(mutex_); void AddOwnedConstant(const std::string& name, std::unique_ptr constant); @@ -450,6 +450,14 @@ class SimpleCatalog : public EnumerableCatalog { const SimpleCatalogProto& proto, absl::Span pools, const ExtendedTypeDeserializer* extended_type_deserializer = nullptr); + // Same as the `Deserialize()` above, except that callers of this function + // could provide a `type_factory`. If the provided `type_factory` is nullptr, + // it will fallback to the above `Deserialize()`. + static absl::StatusOr> Deserialize( + const SimpleCatalogProto& proto, + absl::Span pools, + zetasql::TypeFactory* type_factory, + const ExtendedTypeDeserializer* extended_type_deserializer); ABSL_DEPRECATED("Inline me!") static absl::Status Deserialize( const SimpleCatalogProto& proto, @@ -481,6 +489,8 @@ class SimpleCatalog : public EnumerableCatalog { absl::flat_hash_set* output) const override; absl::Status GetFunctions( absl::flat_hash_set* output) const override; + absl::Status GetTableValuedFunctions( + absl::flat_hash_set* output) const override; // Accessors for reading a copy of the object lists in this SimpleCatalog. // This is intended primarily for tests. @@ -524,24 +534,24 @@ class SimpleCatalog : public EnumerableCatalog { // Helper methods for adding objects while holding . void AddCatalogLocked(absl::string_view name, Catalog* catalog) ABSL_EXCLUSIVE_LOCKS_REQUIRED(mutex_); - void AddOwnedCatalogLocked(const std::string& name, + void AddOwnedCatalogLocked(absl::string_view name, std::unique_ptr catalog) ABSL_EXCLUSIVE_LOCKS_REQUIRED(mutex_); // TODO: Refactor the Add*() methods for other object types // to use a common locked implementation, similar to these for Function. - void AddFunctionLocked(const std::string& name, const Function* function) + void AddFunctionLocked(absl::string_view name, const Function* function) ABSL_EXCLUSIVE_LOCKS_REQUIRED(mutex_); void AddOwnedFunctionLocked(const std::string& name, std::unique_ptr function) ABSL_EXCLUSIVE_LOCKS_REQUIRED(mutex_); - void AddTableValuedFunctionLocked(const std::string& name, + void AddTableValuedFunctionLocked(absl::string_view name, const TableValuedFunction* table_function) ABSL_EXCLUSIVE_LOCKS_REQUIRED(mutex_); void AddOwnedTableValuedFunctionLocked( - const std::string& name, + absl::string_view name, std::unique_ptr table_function) ABSL_EXCLUSIVE_LOCKS_REQUIRED(mutex_); - void AddConstantLocked(const std::string& name, const Constant* constant) + void AddConstantLocked(absl::string_view name, const Constant* constant) ABSL_EXCLUSIVE_LOCKS_REQUIRED(mutex_); int RemoveFunctionsLocked( @@ -649,7 +659,7 @@ class SimpleTable : public Table { SimpleTable(absl::string_view name, const std::vector& columns, int64_t serialization_id = 0); SimpleTable(absl::string_view name, - const std::vector& columns, + absl::Span columns, int64_t serialization_id = 0); // Make a table with the given Columns. @@ -762,7 +772,7 @@ class SimpleTable : public Table { // CreateEvaluatorTableIterator() is called. // CAVEAT: This is not preserved by serialization/deserialization. It is only // relevant to users of the evaluator API defined in public/evaluator.h. - void SetContents(const std::vector>& rows); + void SetContents(absl::Span> rows); absl::StatusOr> CreateEvaluatorTableIterator( @@ -912,8 +922,8 @@ class SimpleModel : public Model { // Make a model with input and output columns with the given names and types. // Crashes if there are duplicate column names. typedef std::pair NameAndType; - SimpleModel(const std::string& name, const std::vector& inputs, - const std::vector& outputs, int64_t id = 0); + SimpleModel(const std::string& name, absl::Span inputs, + absl::Span outputs, int64_t id = 0); // Make a model with the given inputs and outputs. // Crashes if there are duplicate column names. diff --git a/zetasql/public/simple_token_list.cc b/zetasql/public/simple_token_list.cc new file mode 100644 index 000000000..2be23d405 --- /dev/null +++ b/zetasql/public/simple_token_list.cc @@ -0,0 +1,79 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/public/simple_token_list.h" + +#include +#include +#include +#include + +#include "zetasql/public/simple_token_list.pb.h" +#include "absl/strings/escaping.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/str_format.h" + +namespace zetasql::tokens { + +TextToken TextToken::Make(std::string text) { + TextToken text_token; + text_token.text_ = text; + return text_token; +} + +std::string TextToken::DebugString() const { + std::string out; + absl::StrAppend(&out, "{"); + absl::StrAppendFormat(&out, "text: '%s'", absl::Utf8SafeCEscape(text())); + absl::StrAppend(&out, "}"); + return out; +} + +std::string TokenList::GetBytes() const { + SimpleTokenListProto proto; + proto.mutable_token()->Assign(data_.begin(), data_.end()); + std::string serialized; + proto.SerializeToString(&serialized); + return serialized; +} + +TokenList TokenList::FromBytes(std::string serialized) { + SimpleTokenListProto proto; + if (proto.ParseFromString(serialized)) { + std::vector tokenlist(proto.token().begin(), + proto.token().end()); + return TokenList(tokenlist); + } else { + return TokenList(); + } +} + +bool TokenList::IsValid() const { return true; } + +size_t TokenList::SpaceUsed() const { + size_t allocated_string_size = std::accumulate( + data_.begin(), data_.end(), 0, + [](size_t size, const auto& str) -> size_t { + size += + (str.capacity() > std::string().capacity() ? str.capacity() : 0); + return size; + }); + + return sizeof(TokenList) + data_.capacity() * sizeof(std::string) + + allocated_string_size; +} + +} // namespace zetasql::tokens diff --git a/zetasql/public/simple_token_list.h b/zetasql/public/simple_token_list.h new file mode 100644 index 000000000..e213e2b7e --- /dev/null +++ b/zetasql/public/simple_token_list.h @@ -0,0 +1,129 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PUBLIC_SIMPLE_TOKEN_LIST_H_ +#define ZETASQL_PUBLIC_SIMPLE_TOKEN_LIST_H_ + +#include +#include +#include + +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/strings/string_view.h" + +namespace zetasql::tokens { + +// TextToken stores token data for snippeting and scoring. TextTokens usually +// come in a list. +class TextToken { + public: + static TextToken Make(std::string text); + + TextToken() = default; + + void set_text(std::string text) { text_ = std::move(text); } + absl::string_view text() const { return text_; } + + std::string DebugString() const; + + friend bool operator==(const TextToken& a, const TextToken& b) { + return a.text_ == b.text_; + } + + friend bool operator!=(const TextToken& a, const TextToken& b) { + return !(a == b); + } + + template + friend H AbslHashValue(H h, const TextToken& t) { + return H::combine(std::move(h), t.text_); + } + + private: + std::string text_; +}; + +// A TokenList is an ordered list of tokens (used for snippeting, etc). +class TokenList { + public: + TokenList() = default; + explicit TokenList(std::vector data) : data_(std::move(data)) {} + + TokenList(TokenList&& other) = default; + TokenList& operator=(TokenList&& other) = default; + + std::string GetBytes() const; + // Constructs a TokenList from serialized bytes. + static TokenList FromBytes(std::string serialized); + + // Allows iteration over this TokenList. + class Iterator { + public: + // Returns true if there is no more data to be iterated over. + bool done() const { return cur_ == data_.size(); } + // Retrieves the next TextToken from the TokenList being iterated over. + absl::Status Next(TextToken& token) { + token.set_text(data_[cur_++]); + return absl::OkStatus(); + } + + private: + friend class TokenList; + explicit Iterator(const std::vector& data) : data_(data) {} + + const std::vector& data_; + size_t cur_ = 0; + }; + + absl::StatusOr GetIterator() const { return Iterator(data_); } + + // Logical equality comparison for TokenList objects. + bool EquivalentTo(const TokenList& other) const { + return data_ == other.data_; + } + + // Hash method for the TokenList objects. Not meant to be efficient. + template + friend H AbslHashValue(H h, const TokenList& token_list) { + auto iter = token_list.GetIterator(); + if (!iter.ok()) { + return H::combine(std::move(h), 0); + } + TextToken t; + while (!iter->done()) { + if (!iter->Next(t).ok()) { + return H::combine(std::move(h), 0); + } + h = H::combine(std::move(h), t); + } + return h; + } + + // Convenience: checks if the TokenList is well-formed, by attempting to + // fully decode. An empty TokenList is considered valid. + bool IsValid() const; + + // Estimated in-memory byte size. + size_t SpaceUsed() const; + + private: + std::vector data_; +}; + +} // namespace zetasql::tokens + +#endif // ZETASQL_PUBLIC_SIMPLE_TOKEN_LIST_H_ diff --git a/zetasql/public/simple_token_list.proto b/zetasql/public/simple_token_list.proto new file mode 100644 index 000000000..ea3a85e2b --- /dev/null +++ b/zetasql/public/simple_token_list.proto @@ -0,0 +1,26 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +syntax = "proto2"; + +package zetasql; + +option java_package = "com.google.zetasql"; +option java_outer_classname = "SimpleTokenListProtos"; + +message SimpleTokenListProto { + repeated string token = 1; +} diff --git a/zetasql/public/strings_test.cc b/zetasql/public/strings_test.cc index 39bf0cc4e..2f70cccc2 100644 --- a/zetasql/public/strings_test.cc +++ b/zetasql/public/strings_test.cc @@ -1268,7 +1268,7 @@ TEST(StringsTest, ParseIdentifierPath) { // Run each path through the parser to ensure that the vectors match. std::unique_ptr parser_output; ParserOptions parser_options; - parser_options.set_language_options(&language_options); + parser_options.set_language_options(language_options); ZETASQL_ASSERT_OK(ParseExpression(test.input, parser_options, &parser_output)); const ASTPathExpression* parsed_path = parser_output->expression()->GetAs(); @@ -1372,7 +1372,7 @@ TEST(StringsTest, ParseIdentifierPathWithSlashes) { // FROM ". std::unique_ptr parser_output; zetasql::ParserOptions parser_options; - parser_options.set_language_options(&language_options); + parser_options.set_language_options(language_options); ZETASQL_ASSERT_OK(zetasql::ParseStatement( absl::StrFormat("SELECT * FROM %s", test.input), parser_options, &parser_output)); diff --git a/zetasql/public/table_from_proto.cc b/zetasql/public/table_from_proto.cc index a3ca9eeb4..364149068 100644 --- a/zetasql/public/table_from_proto.cc +++ b/zetasql/public/table_from_proto.cc @@ -21,13 +21,14 @@ #include "google/protobuf/descriptor.pb.h" #include "google/protobuf/descriptor.h" #include "zetasql/public/proto/wire_format_annotation.pb.h" +#include "absl/strings/string_view.h" #include "zetasql/base/source_location.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" namespace zetasql { -TableFromProto::TableFromProto(const std::string& name) : SimpleTable(name) {} +TableFromProto::TableFromProto(absl::string_view name) : SimpleTable(name) {} TableFromProto::~TableFromProto() { } diff --git a/zetasql/public/table_from_proto.h b/zetasql/public/table_from_proto.h index 987056d25..e958a637b 100644 --- a/zetasql/public/table_from_proto.h +++ b/zetasql/public/table_from_proto.h @@ -22,6 +22,7 @@ #include "google/protobuf/descriptor.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/type.h" +#include "absl/strings/string_view.h" #include "zetasql/base/status.h" namespace zetasql { @@ -49,7 +50,7 @@ struct TableFromProtoOptions { // a Catalog Table using TableFromProto. class TableFromProto : public SimpleTable { public: - explicit TableFromProto(const std::string& name); + explicit TableFromProto(absl::string_view name); TableFromProto(const TableFromProto&) = delete; TableFromProto& operator=(const TableFromProto&) = delete; ~TableFromProto() override; diff --git a/zetasql/public/table_name_resolver.cc b/zetasql/public/table_name_resolver.cc index 27933eb35..7ca6c95ed 100644 --- a/zetasql/public/table_name_resolver.cc +++ b/zetasql/public/table_name_resolver.cc @@ -331,6 +331,15 @@ absl::Status TableNameResolver::FindInStatement(const ASTStatement* statement) { } break; + case AST_CREATE_EXTERNAL_SCHEMA_STATEMENT: + if (analyzer_options_->language().SupportsStatementKind( + RESOLVED_CREATE_EXTERNAL_SCHEMA_STMT) && + analyzer_options_->language().LanguageFeatureEnabled( + FEATURE_EXTERNAL_SCHEMA_DDL)) { + return absl::OkStatus(); + } + break; + case AST_CREATE_SNAPSHOT_TABLE_STATEMENT: if (analyzer_options_->language().SupportsStatementKind( RESOLVED_CREATE_SNAPSHOT_TABLE_STMT)) { @@ -819,6 +828,14 @@ absl::Status TableNameResolver::FindInStatement(const ASTStatement* statement) { return absl::OkStatus(); } break; + case AST_ALTER_EXTERNAL_SCHEMA_STATEMENT: + if (analyzer_options_->language().SupportsStatementKind( + RESOLVED_ALTER_EXTERNAL_SCHEMA_STMT) && + analyzer_options_->language().LanguageFeatureEnabled( + FEATURE_EXTERNAL_SCHEMA_DDL)) { + return absl::OkStatus(); + } + break; case AST_ALTER_TABLE_STATEMENT: if (analyzer_options_->language().SupportsStatementKind( RESOLVED_ALTER_TABLE_SET_OPTIONS_STMT) || diff --git a/zetasql/public/table_valued_function.cc b/zetasql/public/table_valued_function.cc index 2376d00e1..4315b365a 100644 --- a/zetasql/public/table_valued_function.cc +++ b/zetasql/public/table_valued_function.cc @@ -37,6 +37,8 @@ #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -151,20 +153,21 @@ std::string TableValuedFunction::DebugString() const { } std::string TableValuedFunction::GetTVFSignatureErrorMessage( - const std::string& tvf_name_string, + absl::string_view tvf_name_string, const std::vector& input_arg_types, int signature_idx, const SignatureMatchResult& signature_match_result, - const LanguageOptions& language_options) const { - // Merge of tvf_mismatch_message and mismatch_message should be considered - // when show_function_signature_mismatch_details is enabled by default. - if (!signature_match_result.tvf_mismatch_message().empty()) { + const LanguageOptions& language_options, + bool show_function_signature_mismatch_details) const { + // bad_argument_index is set for some specific tvf mismatch cases. + if (signature_match_result.bad_argument_index() != -1) { // TODO: Update this error message when we support more than one // TVF signature. - return absl::StrCat(signature_match_result.tvf_mismatch_message(), " of ", + return absl::StrCat(signature_match_result.mismatch_message(), " of ", GetSupportedSignaturesUserFacingText( language_options, /*print_template_and_name_details=*/false)); - } else if (!signature_match_result.mismatch_message().empty()) { + } else if (show_function_signature_mismatch_details && + !signature_match_result.mismatch_message().empty()) { return absl::StrCat( Function::GetGenericNoMatchingFunctionSignatureErrorMessage( tvf_name_string, input_arg_types, language_options.product_mode(), @@ -643,7 +646,7 @@ absl::Status ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF::Deserialize( absl::Status ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF:: IsValidForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( bool isTemplated, - const std::vector& extra_columns) const { + absl::Span extra_columns) const { ZETASQL_RET_CHECK(isTemplated) << "Does not support non-templated argument type"; absl::flat_hash_set name_set; diff --git a/zetasql/public/table_valued_function.h b/zetasql/public/table_valued_function.h index 9bcbbe64a..fdfaa4321 100644 --- a/zetasql/public/table_valued_function.h +++ b/zetasql/public/table_valued_function.h @@ -17,7 +17,6 @@ #ifndef ZETASQL_PUBLIC_TABLE_VALUED_FUNCTION_H_ #define ZETASQL_PUBLIC_TABLE_VALUED_FUNCTION_H_ -#include #include #include #include @@ -27,11 +26,10 @@ #include #include -#include "zetasql/base/logging.h" -#include "google/protobuf/descriptor.h" #include "zetasql/common/errors.h" #include "zetasql/public/catalog.h" #include "zetasql/public/deprecation_warning.pb.h" +#include "zetasql/public/evaluator_table_iterator.h" #include "zetasql/public/function.pb.h" #include "zetasql/public/function_signature.h" #include "zetasql/public/input_argument_type.h" @@ -39,16 +37,20 @@ #include "zetasql/public/options.pb.h" #include "zetasql/public/parse_location.h" #include "zetasql/public/type.h" +#include "zetasql/public/types/collation.h" #include "zetasql/public/types/type_deserializer.h" #include "zetasql/public/value.h" #include "zetasql/resolved_ast/resolved_ast_enums.pb.h" +#include "absl/base/attributes.h" +#include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status.h" +#include "zetasql/base/status_builder.h" namespace zetasql { @@ -58,6 +60,7 @@ class SignatureMatchResult; class TVFInputArgumentType; class TVFRelationColumnProto; class TVFRelationProto; +class TVFSchemaColumn; class TVFSignature; class TableValuedFunctionProto; class TableValuedFunctionOptionsProto; @@ -210,10 +213,11 @@ class TableValuedFunction { // FunctionResolver::SignatureMatches and 'language_options' should contain // the language options for the query. std::string GetTVFSignatureErrorMessage( - const std::string& tvf_name_string, + absl::string_view tvf_name_string, const std::vector& input_arg_types, int signature_idx, const SignatureMatchResult& signature_match_result, - const LanguageOptions& language_options) const; + const LanguageOptions& language_options, + bool show_function_signature_mismatch_details) const; // Serializes this table-valued function to a protocol buffer. Subclasses may // override this to add more information as needed. @@ -313,7 +317,7 @@ class TableValuedFunction { // arguments). If the arguments are incompatible then this method returns // a descriptive error message indicating the nature of the failure. // - // Otherwise, this method fills 'output_tvf_call' to indicate the result + // Otherwise, this method fills 'output_tvf_signature' to indicate the result // schema of the table returned by this TVF call. // // This method accepts a Catalog and TypeFactory for possible use when @@ -328,14 +332,57 @@ class TableValuedFunction { TypeFactory* type_factory, std::shared_ptr* output_tvf_signature) const = 0; - const TableValuedFunctionOptions& tvf_options() const { - return tvf_options_; - } + const TableValuedFunctionOptions& tvf_options() const { return tvf_options_; } void set_sql_security(ResolvedCreateStatementEnums::SqlSecurity security) { sql_security_ = security; } + // Argument for TVF evaluator table iterator. Only one field will be set. + struct TvfEvaluatorArg { + // If set, this argument is a Value. + std::optional value; + // If set, this argument is a relation. + std::unique_ptr relation; + // If set, this argument is a model. + const Model* model; + }; + + // The CreateEvaluatorTableIterator method allows a subclass to provide TVF + // evaluation logic for reference implementation. Implementation should return + // an iterator representing a table containing the results of the TVF call. + // + // 'input_arguments' is a list of input arguments passed to the TVF. + // Relational inputs are represented as table iterators. The implementation + // takes ownership of these iterators and can iterate in any order necessary. + // + // 'output_columns' is a list of output columns selected by the resolved + // TVF scan, which is subset of output columns returned by Resolve method. + // This parameter allows implementation to prune unused columns and provides + // column names and types for TVFs with dynamic schemas. + // + // 'function_call_signature' contains signature that matched the invocation, + // including concrete arguments. Set only if the invocation is ambiguous and + // contains omitted arguments. + // + // The output iterator must provide column names, types and values for all + // output_columns. If the TVF schema is fixed and pruning is not necessary, + // the iterator can ignore output_columns parameter and provide columns for + // all output columns of the result schema. The order of columns is not + // relevant. + // + // Not used for zetasql analysis. + // Used only for evaluating queries on this table with the reference + // implementation, using the interfaces in evaluator.h. + virtual absl::StatusOr> + CreateEvaluator(std::vector input_arguments, + const std::vector& output_columns, + const FunctionSignature* function_call_signature) const { + return zetasql_base::UnimplementedErrorBuilder() + << "TVF " << FullName() + << " does not support the API in evaluator.h"; + } + protected: // Returns user facing text (to be used in error messages) for the // specified table function . For example: @@ -547,7 +594,7 @@ class TVFRelation { bool is_value_table_; }; -bool operator == (const TVFRelation& a, const TVFRelation& b); +bool operator==(const TVFRelation& a, const TVFRelation& b); inline std::ostream& operator<<(std::ostream& out, const TVFRelation& relation) { @@ -987,8 +1034,7 @@ class ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF // tvf. // d. if extra column is pseudo column, which is invalid usage for this tvf. absl::Status IsValidForwardInputSchemaToOutputSchemaWithAppendedColumnTVF( - bool isTemplated, - const std::vector& extra_columns) const; + bool isTemplated, absl::Span extra_columns) const; private: const std::vector extra_columns_; diff --git a/zetasql/public/templated_sql_tvf.cc b/zetasql/public/templated_sql_tvf.cc index 2f44e5ec2..5357bd856 100644 --- a/zetasql/public/templated_sql_tvf.cc +++ b/zetasql/public/templated_sql_tvf.cc @@ -293,7 +293,7 @@ absl::Status TemplatedSQLTVF::ForwardNestedResolutionAnalysisError( } absl::Status TemplatedSQLTVF::MakeTVFQueryAnalysisError( - const std::string& message) const { + absl::string_view message) const { std::string result = absl::StrCat("Analysis of table-valued function ", FullName(), " failed"); if (!message.empty()) { diff --git a/zetasql/public/templated_sql_tvf.h b/zetasql/public/templated_sql_tvf.h index 6f5df5c68..883321cfd 100644 --- a/zetasql/public/templated_sql_tvf.h +++ b/zetasql/public/templated_sql_tvf.h @@ -32,6 +32,7 @@ #include "zetasql/resolved_ast/resolved_ast.h" #include "absl/base/macros.h" #include "absl/status/status.h" +#include "absl/strings/string_view.h" // This file includes interfaces and classes related to templated SQL // TVFs. It includes classes to represent TemplatedSQLTVFs and their @@ -138,7 +139,7 @@ class TemplatedSQLTVF : public TableValuedFunction { // Returns a new error reporting a failed expectation of the sql_body_ // (for example, if it is a CREATE TABLE instead of a SELECT statement). // If 'message' is not empty, appends it to the end of the error string. - absl::Status MakeTVFQueryAnalysisError(const std::string& message = "") const; + absl::Status MakeTVFQueryAnalysisError(absl::string_view message = "") const; // If non-NULL, this Catalog is used when the Resolve() method is called // to resolve the TVF expression for given arguments. diff --git a/zetasql/public/testing/test_case_options_util.cc b/zetasql/public/testing/test_case_options_util.cc index 2eb63564e..ce29c990f 100644 --- a/zetasql/public/testing/test_case_options_util.cc +++ b/zetasql/public/testing/test_case_options_util.cc @@ -28,7 +28,7 @@ namespace zetasql { absl::StatusOr GetRequiredLanguageFeatures( const file_based_test_driver::TestCaseOptions& test_case_options) { LanguageOptions::LanguageFeatureSet enabled_set; - const std::string features_string = + const std::string& features_string = test_case_options.GetString(kLanguageFeatures); if (!features_string.empty()) { const std::vector feature_list = diff --git a/zetasql/public/token_list.h b/zetasql/public/token_list.h new file mode 100644 index 000000000..c934fbbec --- /dev/null +++ b/zetasql/public/token_list.h @@ -0,0 +1,22 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PUBLIC_TOKEN_LIST_H_ +#define ZETASQL_PUBLIC_TOKEN_LIST_H_ + +#include "zetasql/public/simple_token_list.h" + +#endif // ZETASQL_PUBLIC_TOKEN_LIST_H_ diff --git a/zetasql/public/token_list_util.cc b/zetasql/public/token_list_util.cc new file mode 100644 index 000000000..f82484c95 --- /dev/null +++ b/zetasql/public/token_list_util.cc @@ -0,0 +1,31 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/public/token_list_util.h" + +#include +#include + +#include "zetasql/public/simple_token_list.h" +#include "zetasql/public/value.h" + +namespace zetasql { + +Value TokenListFromStringArray(std::vector tokens) { + return Value::TokenList(tokens::TokenList(std::move(tokens))); +} + +} // namespace zetasql diff --git a/zetasql/public/token_list_util.h b/zetasql/public/token_list_util.h new file mode 100644 index 000000000..1b55daba3 --- /dev/null +++ b/zetasql/public/token_list_util.h @@ -0,0 +1,32 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PUBLIC_TOKEN_LIST_UTIL_H_ +#define ZETASQL_PUBLIC_TOKEN_LIST_UTIL_H_ + +#include +#include + +#include "zetasql/public/simple_token_list.h" +#include "zetasql/public/value.h" + +namespace zetasql { + +Value TokenListFromStringArray(std::vector tokens); + +} // namespace zetasql + +#endif // ZETASQL_PUBLIC_TOKEN_LIST_UTIL_H_ diff --git a/zetasql/public/type.proto b/zetasql/public/type.proto index 2d8206d68..0f944ec9b 100644 --- a/zetasql/public/type.proto +++ b/zetasql/public/type.proto @@ -25,7 +25,7 @@ option cc_enable_arenas = true; option java_package = "com.google.zetasql"; option java_outer_classname = "ZetaSQLType"; -// NEXT_ID: 31 +// NEXT_ID: 32 enum TypeKind { // User code that switches on this enum must have a default case so // builds won't break if new enums get added. @@ -74,10 +74,16 @@ enum TypeKind { // INTERVAL type is controlled by FEATURE_INTERVAL_TYPE TYPE_INTERVAL = 27; + // TOKENLIST type is controlled by FEATURE_TOKENIZED_SEARCH + TYPE_TOKENLIST = 28; + // RANGE type is controlled by FEATURE_RANGE_TYPE TYPE_RANGE = 29; -} + // MAP type is controlled by FEATURE_MAP_TYPE + TYPE_MAP = 31; +} +// // This represents the serialized form of the zetasql::Type. message TypeProto { optional TypeKind type_kind = 1; @@ -89,6 +95,7 @@ message TypeProto { optional ProtoTypeProto proto_type = 4; optional EnumTypeProto enum_type = 5; optional RangeTypeProto range_type = 8; + optional MapTypeProto map_type = 10; // These s may (optionally) be populated only for // the 'outermost' TypeProto when serializing a ZetaSQL Type, @@ -129,6 +136,11 @@ message StructTypeProto { repeated StructFieldProto field = 1; } +message MapTypeProto { + optional TypeProto key_type = 1; + optional TypeProto value_type = 2; +} + message ProtoTypeProto { // The _full_ name of the proto without the catalog name. optional string proto_name = 1; diff --git a/zetasql/public/types/BUILD b/zetasql/public/types/BUILD index 3b30ed907..7a29fe5dc 100644 --- a/zetasql/public/types/BUILD +++ b/zetasql/public/types/BUILD @@ -27,6 +27,7 @@ cc_library( "extended_type.cc", "internal_utils.cc", "internal_utils.h", + "map_type.cc", "proto_type.cc", "range_type.cc", "simple_type.cc", @@ -45,6 +46,7 @@ cc_library( "container_type.h", "enum_type.h", "extended_type.h", + "map_type.h", "proto_type.h", "range_type.h", "simple_type.h", @@ -72,6 +74,7 @@ cc_library( "//zetasql/common:float_margin", "//zetasql/common:proto_helper", "//zetasql/common:string_util", + "//zetasql/common:thread_stack", "//zetasql/common:unicode_utils", "//zetasql/public:annotation_cc_proto", "//zetasql/public:civil_time", @@ -82,6 +85,7 @@ cc_library( "//zetasql/public:options_cc_proto", "//zetasql/public:simple_value_cc_proto", "//zetasql/public:strings", + "//zetasql/public:token_list", "//zetasql/public:type_annotation_cc_proto", "//zetasql/public:type_cc_proto", "//zetasql/public:type_modifiers_cc_proto", @@ -137,6 +141,7 @@ cc_library( "//zetasql/public:interval_value", "//zetasql/public:json_value", "//zetasql/public:numeric_value", + "//zetasql/public:token_list", "//zetasql/public:value_content", "@com_google_absl//absl/strings:cord", "@com_google_absl//absl/types:optional", @@ -156,3 +161,18 @@ cc_library( "@com_google_absl//absl/time", ], ) + +cc_test( + name = "map_type_test", + srcs = ["map_type_test.cc"], + deps = [ + ":types", + "//zetasql/base/testing:status_matchers", + "//zetasql/base/testing:zetasql_gtest_main", + "//zetasql/public:language_options", + "//zetasql/public:value", + "//zetasql/testdata:test_schema_cc_proto", + "//zetasql/testing:test_value", + "@com_google_protobuf//:protobuf", + ], +) diff --git a/zetasql/public/types/array_type.cc b/zetasql/public/types/array_type.cc index dd69e7bbb..6f249cce7 100644 --- a/zetasql/public/types/array_type.cc +++ b/zetasql/public/types/array_type.cc @@ -271,16 +271,21 @@ absl::Status ArrayType::SerializeToProtoAndDistinctFileDescriptorsImpl( file_descriptor_set_map); } -std::string ArrayType::ShortTypeName(ProductMode mode) const { - return absl::StrCat("ARRAY<", element_type_->ShortTypeName(mode), ">"); +std::string ArrayType::ShortTypeName(ProductMode mode, + bool use_external_float32) const { + return absl::StrCat( + "ARRAY<", element_type_->ShortTypeName(mode, use_external_float32), ">"); } -std::string ArrayType::TypeName(ProductMode mode) const { - return absl::StrCat("ARRAY<", element_type_->TypeName(mode), ">"); +std::string ArrayType::TypeName(ProductMode mode, + bool use_external_float32) const { + return absl::StrCat("ARRAY<", + element_type_->TypeName(mode, use_external_float32), ">"); } absl::StatusOr ArrayType::TypeNameWithModifiers( - const TypeModifiers& type_modifiers, ProductMode mode) const { + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const { const TypeParameters& type_params = type_modifiers.type_parameters(); if (!type_params.IsEmpty() && type_params.num_children() != 1) { return MakeSqlError() @@ -299,14 +304,14 @@ absl::StatusOr ArrayType::TypeNameWithModifiers( TypeModifiers::MakeTypeModifiers( type_params.IsEmpty() ? TypeParameters() : type_params.child(0), collation.Empty() ? Collation() : collation.child(0)), - mode)); + mode, use_external_float32)); return absl::StrCat("ARRAY<", element_type_name, ">"); } absl::StatusOr ArrayType::ValidateAndResolveTypeParameters( const std::vector& type_parameter_values, ProductMode mode) const { - return MakeSqlError() << ShortTypeName(mode) + return MakeSqlError() << ShortTypeName(mode, /*use_external_float32=*/false) << " type cannot have type parameters by itself, it " "can only have type parameters on its element type"; } @@ -555,7 +560,8 @@ std::string ArrayType::GetFormatPrefix( break; } case Type::FormatValueContentOptions::Mode::kSQLExpression: { - prefix.append(TypeName(options.product_mode)); + prefix.append( + TypeName(options.product_mode, options.use_external_float32)); prefix.push_back('['); break; } diff --git a/zetasql/public/types/array_type.h b/zetasql/public/types/array_type.h index efa4a1c45..66795f120 100644 --- a/zetasql/public/types/array_type.h +++ b/zetasql/public/types/array_type.h @@ -61,13 +61,27 @@ class ArrayType : public ContainerType { std::string* type_description) const override; bool SupportsEquality() const override; - std::string ShortTypeName(ProductMode mode) const override; - std::string TypeName(ProductMode mode) const override; + std::string ShortTypeName(ProductMode mode, + bool use_external_float32) const override; + std::string ShortTypeName(ProductMode mode) const override { + return ShortTypeName(mode, /*use_external_float32=*/false); + } + std::string TypeName(ProductMode mode, + bool use_external_float32) const override; + std::string TypeName(ProductMode mode) const override { + return TypeName(mode, /*use_external_float32=*/false); + } // Same as above, but the type modifier values are appended to the SQL name // for this ArrayType. absl::StatusOr TypeNameWithModifiers( - const TypeModifiers& type_modifiers, ProductMode mode) const override; + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const override; + absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode) const override { + return TypeNameWithModifiers(type_modifiers, mode, + /*use_external_float32=*/false); + } bool UsingFeatureV12CivilTimeType() const override { return element_type_->UsingFeatureV12CivilTimeType(); diff --git a/zetasql/public/types/container_type.cc b/zetasql/public/types/container_type.cc index a13b6e681..6f2b57f75 100644 --- a/zetasql/public/types/container_type.cc +++ b/zetasql/public/types/container_type.cc @@ -20,10 +20,15 @@ #include #include +#include "zetasql/base/check.h" + namespace zetasql { // Format container in non-recursive way. The users might give a deeply // nested struct and cause stack overflow crashes for recursive methods +// TODO: Refactor for MAP type support. MapType is a container but +// does not inherit from the (now incorrectly named) ContainerType. Investigate +// factoring the heap-based stack structure out into a common supertype. std::string ContainerType::FormatValueContent( const ValueContent& value_content, const Type::FormatValueContentOptions& options) const { diff --git a/zetasql/public/types/container_type.h b/zetasql/public/types/container_type.h index d0104198d..89cae8325 100644 --- a/zetasql/public/types/container_type.h +++ b/zetasql/public/types/container_type.h @@ -87,18 +87,6 @@ class ContainerType : public Type { const Type* type; }; - std::string FormatValueContentContainerElement( - const internal::ValueContentContainerElement element, const Type* type, - const FormatValueContentOptions& options) const { - if (element.is_null()) { - return options.as_literal() - ? "NULL" - : absl::StrCat("CAST(NULL AS ", - type->TypeName(options.product_mode), ")"); - } - return type->FormatValueContent(element.value_content(), options); - } - std::optional ValueContentContainerElementLess( const internal::ValueContentContainerElement& x, const internal::ValueContentContainerElement& y, const Type* x_type, diff --git a/zetasql/public/types/enum_type.cc b/zetasql/public/types/enum_type.cc index 7d046fadd..39329b57f 100644 --- a/zetasql/public/types/enum_type.cc +++ b/zetasql/public/types/enum_type.cc @@ -37,6 +37,7 @@ #include "absl/hash/hash.h" #include "absl/status/status.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "absl/types/span.h" #include "zetasql/base/status_macros.h" @@ -135,7 +136,7 @@ std::string EnumType::TypeName() const { return absl::StrCat(catalog_name_path, ToIdentifierLiteral(RawEnumName())); } -std::string EnumType::ShortTypeName(ProductMode mode_unused) const { +std::string EnumType::ShortTypeName() const { // Special case for built-in zetasql enums. Since ShortTypeName is used in // the user facing error messages, we need to make these enum names look // as special language elements. @@ -201,7 +202,7 @@ bool EnumType::IsValidEnumValue( return true; } -bool EnumType::FindNumber(const std::string& name, int* number) const { +bool EnumType::FindNumber(absl::string_view name, int* number) const { const google::protobuf::EnumValueDescriptor* value_descr = enum_descriptor_->FindValueByName(name); if (!IsValidEnumValue(value_descr)) { diff --git a/zetasql/public/types/enum_type.h b/zetasql/public/types/enum_type.h index ba87a1feb..95760eb22 100644 --- a/zetasql/public/types/enum_type.h +++ b/zetasql/public/types/enum_type.h @@ -31,6 +31,7 @@ #include "absl/hash/hash.h" #include "absl/status/status.h" #include "absl/status/statusor.h" +#include "absl/strings/string_view.h" #include "absl/types/span.h" namespace zetasql { @@ -88,6 +89,10 @@ class EnumType : public Type { // is just the descriptor full_name (without back-ticks). The back-ticks // are not necessary for TypeName() to be reparseable, so should be removed. std::string TypeName(ProductMode mode_unused) const override; + std::string TypeName(ProductMode mode_unused, + bool use_external_float32_unused) const override { + return TypeName(); + } // EnumType does not support type parameters or collation, which is why // TypeName(mode) is used. absl::StatusOr TypeNameWithModifiers( @@ -98,9 +103,22 @@ class EnumType : public Type { ZETASQL_RET_CHECK(collation.Empty()); return TypeName(mode); } + absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const override { + return TypeNameWithModifiers(type_modifiers, mode); + } + + std::string ShortTypeName(ProductMode mode_unused) const override { + return ShortTypeName(); + } + + std::string ShortTypeName(ProductMode mode_unused, + bool use_external_float32_unused) const override { + return ShortTypeName(); + } - std::string ShortTypeName( - ProductMode mode_unused = ProductMode::PRODUCT_INTERNAL) const override; + std::string ShortTypeName() const; std::string TypeName() const; // Enum-specific version does not need mode. // Nested catalog names, that were passed to the constructor. @@ -117,7 +135,7 @@ class EnumType : public Type { // Find the enum number given a corresponding name. Returns true // upon success, and false if the name is not found. - ABSL_MUST_USE_RESULT bool FindNumber(const std::string& name, + ABSL_MUST_USE_RESULT bool FindNumber(absl::string_view name, int* number) const; // Helper function to determine if a given descriptor value is valid. diff --git a/zetasql/public/types/enum_type_test.cc b/zetasql/public/types/enum_type_test.cc new file mode 100644 index 000000000..d72ffb682 --- /dev/null +++ b/zetasql/public/types/enum_type_test.cc @@ -0,0 +1,392 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/public/types/enum_type.h" + +#include + +#include "zetasql/public/functions/rounding_mode.pb.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/types/array_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/testdata/bad_test_schema.pb.h" +#include "zetasql/testdata/recursive_schema.pb.h" +#include "zetasql/testdata/test_schema.pb.h" +#include "gmock/gmock.h" +#include "gtest/gtest.h" +#include "google/protobuf/descriptor.h" + +namespace zetasql { + +using google::protobuf::EnumDescriptor; +using testing::ElementsAre; +using testing::IsEmpty; +using testing::NotNull; + +bool TestEquals(const Type* type1, const Type* type2) { + // Test that Equivalent also returns the same thing as Equals for this + // comparison. This helper is not used for cases where Equals and + // Equivalent may differ. + EXPECT_EQ(type1->Equals(type2), type1->Equivalent(type2)) + << "type1: " << type1->DebugString() + << "\ntype2: " << type2->DebugString(); + return type1->Equals(type2); +} + +TEST(EnumTypeTest, Basics) { + TypeFactory factory; + const EnumType* enum_type; + const EnumDescriptor* enum_descriptor = zetasql_test__::TestEnum_descriptor(); + ZETASQL_EXPECT_OK(factory.MakeEnumType(enum_descriptor, &enum_type)); + EXPECT_THAT(enum_type, NotNull()); + EXPECT_FALSE(enum_type->UsingFeatureV12CivilTimeType()); + { + LanguageOptions options; + options.set_product_mode(PRODUCT_INTERNAL); + EXPECT_TRUE(enum_type->IsSupportedType(options)); + } + { + LanguageOptions options; + options.set_product_mode(PRODUCT_EXTERNAL); + EXPECT_FALSE(enum_type->IsSupportedType(options)); + options.EnableLanguageFeature(FEATURE_PROTO_BASE); + EXPECT_TRUE(enum_type->IsSupportedType(options)); + } +} + +TEST(EnumTypeTest, EnumTypeFromCompiledEnumAndDescriptor) { + TypeFactory factory; + + const EnumDescriptor* enum_descriptor = zetasql_test__::TestEnum_descriptor(); + const EnumType* enum_type; + ZETASQL_ASSERT_OK(factory.MakeEnumType(enum_descriptor, &enum_type)); + EXPECT_THAT(enum_type, NotNull()); + EXPECT_EQ("ENUM", enum_type->DebugString()); + EXPECT_EQ("`zetasql_test__.TestEnum`", enum_type->TypeName(PRODUCT_INTERNAL)); + EXPECT_THAT(enum_type->CatalogNamePath(), IsEmpty()); + + const ArrayType* array_type; + ZETASQL_ASSERT_OK(factory.MakeArrayType(enum_type, &array_type)); + EXPECT_EQ("ARRAY>", array_type->DebugString()); + EXPECT_EQ("ARRAY<`zetasql_test__.TestEnum`>", + array_type->TypeName(PRODUCT_INTERNAL)); + + const std::string* name = nullptr; + EXPECT_TRUE(enum_type->FindName(1, &name)); + EXPECT_THAT(name, NotNull()); + EXPECT_EQ("TESTENUM1", *name); + EXPECT_FALSE(enum_type->FindName(777, &name)); + + int number = 777; + EXPECT_TRUE(enum_type->FindNumber("TESTENUM0", &number)); + EXPECT_EQ(0, number); + EXPECT_FALSE(enum_type->FindNumber("BLAH", &number)); + + EXPECT_FALSE(enum_type->IsSimpleType()); + EXPECT_FALSE(enum_type->IsArray()); + EXPECT_FALSE(enum_type->IsProto()); + EXPECT_FALSE(enum_type->IsStruct()); + EXPECT_FALSE(enum_type->IsStructOrProto()); + EXPECT_FALSE(enum_type->IsRange()); + EXPECT_FALSE(enum_type->IsMap()); + + EXPECT_TRUE(enum_type->IsEnum()); + + EXPECT_EQ(enum_type, enum_type->AsEnum()); + EXPECT_EQ(nullptr, enum_type->AsStruct()); + EXPECT_EQ(nullptr, enum_type->AsArray()); + EXPECT_EQ(nullptr, enum_type->AsProto()); + EXPECT_EQ(nullptr, enum_type->AsRange()); + + EXPECT_EQ(enum_type->DebugString(true), + "ENUM>)"); +} + +TEST(EnumTypeTest, EnumTypeWithCatalogNameFromCompiledEnumAndDescriptor) { + TypeFactory factory; + + const EnumDescriptor* enum_descriptor = zetasql_test__::TestEnum_descriptor(); + const EnumType* enum_type; + ZETASQL_ASSERT_OK( + factory.MakeEnumType(enum_descriptor, &enum_type, {"all", "catalogs"})); + + EXPECT_EQ("`all`.catalogs.`zetasql_test__.TestEnum`", + enum_type->TypeName(PRODUCT_INTERNAL)); + EXPECT_EQ("`all`.catalogs.zetasql_test__.TestEnum", + enum_type->ShortTypeName(PRODUCT_INTERNAL)); + EXPECT_THAT(enum_type->CatalogNamePath(), ElementsAre("all", "catalogs")); + + EXPECT_EQ(enum_type->DebugString(false), + "`all`.catalogs.ENUM"); + EXPECT_EQ(enum_type->DebugString(true), + "`all`.catalogs.ENUM>)"); + + const EnumType* enum_type_without_catalog; + ZETASQL_ASSERT_OK(factory.MakeEnumType(enum_descriptor, &enum_type_without_catalog)); + EXPECT_FALSE(enum_type->Equals(enum_type_without_catalog)); + EXPECT_TRUE(enum_type->Equivalent(enum_type_without_catalog)); + + const EnumType* enum_type_another_catalog; + ZETASQL_ASSERT_OK(factory.MakeEnumType(enum_descriptor, &enum_type_another_catalog, + {"another_catalog"})); + EXPECT_FALSE(enum_type->Equals(enum_type_another_catalog)); + EXPECT_TRUE(enum_type->Equivalent(enum_type_another_catalog)); +} + +TEST(EnumTypeTest, Equalities) { + TypeFactory factory; + const EnumDescriptor* enum_descriptor = zetasql_test__::TestEnum_descriptor(); + const EnumType* enum_type; + ZETASQL_ASSERT_OK(factory.MakeEnumType(enum_descriptor, &enum_type)); + ASSERT_TRUE(TestEquals(enum_type, enum_type)); +} + +TEST(EnumTypeTest, IsValidEnumValue) { + TypeFactory factory; + + const EnumType* opaque_type = types::RoundingModeEnumType(); + ASSERT_TRUE(opaque_type->IsOpaque()); + EXPECT_FALSE(opaque_type->IsValidEnumValue(nullptr)); + // this also returns null, but ensures the call pattern works as expected. + EXPECT_FALSE(opaque_type->IsValidEnumValue( + opaque_type->enum_descriptor()->FindValueByName("Fake Name"))); + + // Marked invalid in rounding_mode.proto + EXPECT_FALSE(opaque_type->IsValidEnumValue( + opaque_type->enum_descriptor()->FindValueByName( + "ROUNDING_MODE_UNSPECIFIED"))); + EXPECT_FALSE(opaque_type->IsValidEnumValue( + opaque_type->enum_descriptor()->FindValueByNumber( + functions::ROUNDING_MODE_UNSPECIFIED))); + + // Normal + EXPECT_TRUE(opaque_type->IsValidEnumValue( + opaque_type->enum_descriptor()->FindValueByName("ROUND_HALF_EVEN"))); +} + +TEST(EnumTypeTest, IsValidOpaqueEnumValue) { + TypeFactory factory; + const EnumType* opaque_type = types::RoundingModeEnumType(); + + // Non-opaque variant shouldn't care about the invalid_enum_value annotation. + const EnumType* non_opaque_type; + ZETASQL_ASSERT_OK( + factory.MakeEnumType(opaque_type->enum_descriptor(), &non_opaque_type)); + + EXPECT_TRUE(non_opaque_type->IsValidEnumValue( + opaque_type->enum_descriptor()->FindValueByName( + "ROUNDING_MODE_UNSPECIFIED"))); + + // Ever enum value should be valid, since it is non-opaque + const google::protobuf::EnumDescriptor* type_descriptor = + non_opaque_type->enum_descriptor(); + ASSERT_GT(type_descriptor->value_count(), 0); + for (int i = 0; i < type_descriptor->value_count(); ++i) { + EXPECT_TRUE(non_opaque_type->IsValidEnumValue(type_descriptor->value(i))); + } +} + +TEST(EnumTypeTest, OpaqueEnumTypesAreNotEqualToTheirEnumTypes) { + TypeFactory factory; + + const EnumType* opaque_type = nullptr; + ZETASQL_ASSERT_OK(internal::TypeFactoryHelper::MakeOpaqueEnumType( + &factory, zetasql_test__::TestEnum_descriptor(), &opaque_type, {})); + ASSERT_TRUE(opaque_type->IsOpaque()); + ASSERT_TRUE(opaque_type->Equals(opaque_type)); + + const EnumType* non_opaque_type; + ZETASQL_ASSERT_OK( + factory.MakeEnumType(opaque_type->enum_descriptor(), &non_opaque_type)); + ASSERT_FALSE(non_opaque_type->IsOpaque()); + ASSERT_FALSE(opaque_type->Equals(non_opaque_type)); + ASSERT_FALSE(non_opaque_type->Equals(opaque_type)); +} + +TEST(EnumTypeTest, OpaqueEnumTypesAreNotEquilvalentToTheirEnumTypes) { + TypeFactory factory; + + const EnumType* opaque_type = nullptr; + ZETASQL_ASSERT_OK(internal::TypeFactoryHelper::MakeOpaqueEnumType( + &factory, zetasql_test__::TestEnum_descriptor(), &opaque_type, {})); + ASSERT_TRUE(opaque_type->IsOpaque()); + ASSERT_TRUE(opaque_type->Equals(opaque_type)); + + const EnumType* non_opaque_type; + ZETASQL_ASSERT_OK( + factory.MakeEnumType(opaque_type->enum_descriptor(), &non_opaque_type)); + ASSERT_FALSE(non_opaque_type->IsOpaque()); + ASSERT_FALSE(opaque_type->Equivalent(non_opaque_type)); + ASSERT_FALSE(non_opaque_type->Equivalent(opaque_type)); +} + +TEST(EnumTypeTest, OpaqueEnumTypesCacheCorrectly) { + TypeFactory factory; + const EnumType* enum_type = nullptr; + ZETASQL_ASSERT_OK(internal::TypeFactoryHelper::MakeOpaqueEnumType( + &factory, zetasql_test__::TestEnum_descriptor(), &enum_type, {})); + const EnumType* same_enum_type = nullptr; + + ZETASQL_ASSERT_OK(internal::TypeFactoryHelper::MakeOpaqueEnumType( + &factory, zetasql_test__::TestEnum_descriptor(), &same_enum_type, {})); + // Should be pointer equality. + ASSERT_EQ(enum_type, same_enum_type); +} + +TEST(EnumTypeTest, OpaqueEnumTypesHaveCorrectStringFunctions) { + const EnumType* rounding_mode = types::RoundingModeEnumType(); + EXPECT_EQ(rounding_mode->ShortTypeName(), "ROUNDING_MODE"); + EXPECT_EQ(rounding_mode->TypeName(), "ROUNDING_MODE"); +} + +TEST(EnumTypeTest, EnumTypesAreCached) { + TypeFactory factory; + + const Type* type1; + ZETASQL_ASSERT_OK( + factory.MakeEnumType(zetasql_test__::TestEnum_descriptor(), &type1)); + + const Type* type2; + ZETASQL_ASSERT_OK( + factory.MakeEnumType(zetasql_test__::TestEnum_descriptor(), &type2)); + EXPECT_EQ(type1, type2); + + const Type* type_with_empty_catalog; + ZETASQL_ASSERT_OK(factory.MakeEnumType(zetasql_test__::TestEnum_descriptor(), + &type_with_empty_catalog, {})); + EXPECT_EQ(type_with_empty_catalog, type1); + + const Type* type_with_catalog1; + ZETASQL_ASSERT_OK(factory.MakeEnumType(zetasql_test__::TestEnum_descriptor(), + &type_with_catalog1, {"catalog", "a.b"})); + EXPECT_NE(type_with_catalog1, type1); + + const Type* type_with_catalog2; + ZETASQL_ASSERT_OK(factory.MakeEnumType(zetasql_test__::TestEnum_descriptor(), + &type_with_catalog2, {"catalog", "a.b"})); + EXPECT_EQ(type_with_catalog2, type_with_catalog1); + + const Type* type_with_catalog3; + ZETASQL_ASSERT_OK(factory.MakeEnumType(zetasql_test__::TestEnum_descriptor(), + &type_with_catalog3, {"catalog.a", "b"})); + EXPECT_NE(type_with_catalog3, type_with_catalog1); +} + +TEST(EnumTypeTest, EnumTypeIsSupported) { + LanguageOptions product_external; + product_external.set_product_mode(ProductMode::PRODUCT_EXTERNAL); + + LanguageOptions product_internal; + product_internal.set_product_mode(ProductMode::PRODUCT_INTERNAL); + + LanguageOptions product_external_report_enabled = product_external; + product_external_report_enabled.EnableLanguageFeature( + FEATURE_DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS); + + LanguageOptions product_internal_report_enabled = product_internal; + product_internal_report_enabled.EnableLanguageFeature( + FEATURE_DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS); + + LanguageOptions proto_base_enabled; + proto_base_enabled.EnableLanguageFeature(FEATURE_PROTO_BASE); + + LanguageOptions proto_base_enabled_report_enabled; + proto_base_enabled_report_enabled.EnableLanguageFeature(FEATURE_PROTO_BASE); + proto_base_enabled_report_enabled.EnableLanguageFeature( + FEATURE_DIFFERENTIAL_PRIVACY_REPORT_FUNCTIONS); + + EXPECT_FALSE( + types::DifferentialPrivacyReportFormatEnumType()->IsSupportedType( + product_external)); + EXPECT_FALSE( + types::DifferentialPrivacyReportFormatEnumType()->IsSupportedType( + product_internal)); + EXPECT_TRUE(types::DifferentialPrivacyReportFormatEnumType()->IsSupportedType( + product_external_report_enabled)); + EXPECT_TRUE(types::DifferentialPrivacyReportFormatEnumType()->IsSupportedType( + product_internal_report_enabled)); + EXPECT_FALSE( + types::DifferentialPrivacyReportFormatEnumType()->IsSupportedType( + proto_base_enabled)); + EXPECT_TRUE(types::DifferentialPrivacyReportFormatEnumType()->IsSupportedType( + proto_base_enabled_report_enabled)); + + EXPECT_TRUE(types::DifferentialPrivacyGroupSelectionStrategyEnumType() + ->IsSupportedType(product_internal)); + EXPECT_FALSE(types::DifferentialPrivacyGroupSelectionStrategyEnumType() + ->IsSupportedType(product_external)); + + EXPECT_TRUE(types::DatePartEnumType()->IsSupportedType(product_external)); + EXPECT_TRUE(types::DatePartEnumType()->IsSupportedType(product_internal)); + EXPECT_TRUE(types::DatePartEnumType()->IsSupportedType(proto_base_enabled)); + + EXPECT_TRUE( + types::NormalizeModeEnumType()->IsSupportedType(product_external)); + EXPECT_TRUE( + types::NormalizeModeEnumType()->IsSupportedType(product_internal)); + EXPECT_TRUE( + types::NormalizeModeEnumType()->IsSupportedType(proto_base_enabled)); + + EXPECT_TRUE(types::RoundingModeEnumType()->IsSupportedType(product_external)); + EXPECT_TRUE(types::RoundingModeEnumType()->IsSupportedType(product_internal)); + EXPECT_TRUE( + types::RoundingModeEnumType()->IsSupportedType(proto_base_enabled)); + + EXPECT_FALSE( + types::ArrayFindModeEnumType()->IsSupportedType(product_external)); + EXPECT_TRUE( + types::ArrayFindModeEnumType()->IsSupportedType(product_internal)); + EXPECT_TRUE( + types::ArrayFindModeEnumType()->IsSupportedType(proto_base_enabled)); + + EXPECT_FALSE( + types::ArrayZipModeEnumType()->IsSupportedType(product_external)); + EXPECT_TRUE(types::ArrayZipModeEnumType()->IsSupportedType(product_internal)); + EXPECT_TRUE( + types::ArrayZipModeEnumType()->IsSupportedType(proto_base_enabled)); + + LanguageOptions range_type_enabled; + range_type_enabled.EnableLanguageFeature(FEATURE_RANGE_TYPE); + EXPECT_FALSE( + types::RangeSessionizeModeEnumType()->IsSupportedType(product_external)); + EXPECT_FALSE( + types::RangeSessionizeModeEnumType()->IsSupportedType(product_internal)); + EXPECT_FALSE(types::RangeSessionizeModeEnumType()->IsSupportedType( + proto_base_enabled)); + EXPECT_TRUE(types::RangeSessionizeModeEnumType()->IsSupportedType( + range_type_enabled)); +} + +} // namespace zetasql diff --git a/zetasql/public/types/internal_utils.h b/zetasql/public/types/internal_utils.h index eac0724a9..2fc7e7118 100644 --- a/zetasql/public/types/internal_utils.h +++ b/zetasql/public/types/internal_utils.h @@ -168,10 +168,22 @@ absl::Status PopulateDistinctFileDescriptorSets( // Generates a SQL cast expression that casts the literal represented by given // value (which can have any type V supported by absl::StrCat) to the given // ZetaSQL type. +// Setting the optional parameter `use_external_float32` to true will return +// FLOAT32 as the type name for TYPE_FLOAT. +// TODO: Remove `use_external_float32` once all engines are +// updated. +template +std::string GetCastExpressionString(const V& value, const Type* type, + ProductMode mode, + bool use_external_float32) { + return absl::StrCat("CAST(", value, " AS ", + type->TypeName(mode, use_external_float32), ")"); +} template std::string GetCastExpressionString(const V& value, const Type* type, ProductMode mode) { - return absl::StrCat("CAST(", value, " AS ", type->TypeName(mode), ")"); + return GetCastExpressionString(value, type, mode, + /*use_external_float32=*/false); } } // namespace internal diff --git a/zetasql/public/types/map_type.cc b/zetasql/public/types/map_type.cc new file mode 100644 index 000000000..64e98cd00 --- /dev/null +++ b/zetasql/public/types/map_type.cc @@ -0,0 +1,223 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/public/types/map_type.h" + +#include +#include +#include + + +#include "zetasql/base/logging.h" +#include "zetasql/common/thread_stack.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/options.pb.h" +#include "zetasql/public/type.pb.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/public/types/type_modifiers.h" +#include "zetasql/public/types/value_equality_check_options.h" +#include "zetasql/public/types/value_representations.h" +#include "zetasql/public/value.pb.h" +#include "zetasql/public/value_content.h" +#include "absl/hash/hash.h" +#include "zetasql/base/check.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/strings/str_cat.h" +#include "absl/strings/str_join.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +std::string MapType::ShortTypeName(ProductMode mode, + bool use_external_float32) const { + return absl::StrCat( + "MAP<", key_type_->ShortTypeName(mode, use_external_float32), ", ", + value_type_->ShortTypeName(mode, use_external_float32), ">"); +} + +std::string MapType::TypeName(ProductMode mode, + bool use_external_float32) const { + return absl::StrCat("MAP<", key_type_->TypeName(mode, use_external_float32), + ", ", value_type_->TypeName(mode, use_external_float32), + ">"); +} + +absl::StatusOr MapType::TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const { + // TODO: Implement TypeNameWithModifiers. + return absl::UnimplementedError( + "MapType::TypeNameWithModifiers is not yet supported."); +} + +bool MapType::SupportsOrdering(const LanguageOptions& language_options, + std::string* type_description) const { + return false; +} +bool MapType::SupportsEquality() const { return false; } + +bool MapType::IsSupportedType(const LanguageOptions& language_options) const { + return language_options.LanguageFeatureEnabled(FEATURE_V_1_4_MAP_TYPE) && + key_type_->IsSupportedType(language_options) && + key_type_->SupportsGrouping(language_options) && + value_type_->IsSupportedType(language_options); +} + +int MapType::nesting_depth() const { + return std::max(key_type_->nesting_depth(), value_type_->nesting_depth()) + 1; +} + +MapType::MapType(const TypeFactory* factory, const Type* key_type, + const Type* value_type) + : Type(factory, TYPE_MAP), key_type_(key_type), value_type_(value_type) {} +MapType::~MapType() = default; + +bool MapType::SupportsGroupingImpl(const LanguageOptions& language_options, + const Type** no_grouping_type) const { + // Map does not currently support grouping. (broken link). + *no_grouping_type = this; + return false; +} + +bool MapType::SupportsPartitioningImpl( + const LanguageOptions& language_options, + const Type** no_partitioning_type) const { + // Map does not currently support partitioning. (broken link). + *no_partitioning_type = this; + return false; +} + +absl::Status MapType::SerializeToProtoAndDistinctFileDescriptorsImpl( + const BuildFileDescriptorSetMapOptions& options, TypeProto* type_proto, + FileDescriptorSetMap* file_descriptor_set_map) const { + type_proto->set_type_kind(kind_); + ZETASQL_RETURN_IF_ERROR(key_type()->SerializeToProtoAndDistinctFileDescriptorsImpl( + options, type_proto->mutable_map_type()->mutable_key_type(), + file_descriptor_set_map)); + return value_type()->SerializeToProtoAndDistinctFileDescriptorsImpl( + options, type_proto->mutable_map_type()->mutable_value_type(), + file_descriptor_set_map); +} + +bool MapType::EqualsForSameKind(const Type* that, bool equivalent) const { + const MapType* other = that->AsMap(); + ABSL_DCHECK(other != nullptr) + << DebugString() << "::EqualsForSameKind cannot compare to non-MAP type " + << that->DebugString(); + return this->key_type()->EqualsImpl(other->key_type(), equivalent) && + this->value_type()->EqualsImpl(other->value_type(), equivalent); +} + +void MapType::DebugStringImpl(bool details, TypeOrStringVector* stack, + std::string* debug_string) const { + absl::StrAppend(debug_string, "MAP<"); + stack->push_back(">"); + stack->push_back(value_type()); + stack->push_back(", "); + stack->push_back(key_type()); +} + +void MapType::CopyValueContent(const ValueContent& from, + ValueContent* to) const { + from.GetAs()->Ref(); + *to = from; +} + +void MapType::ClearValueContent(const ValueContent& value) const { + value.GetAs()->Unref(); +} + +absl::HashState MapType::HashTypeParameter(absl::HashState state) const { + return value_type()->Hash(key_type()->Hash(std::move(state))); +} + +absl::HashState MapType::HashValueContent(const ValueContent& value, + absl::HashState state) const { + // TODO: Implement HashValueContent. + ABSL_LOG(FATAL) << "HashValueContent is not yet " // Crash OK + "supported on MapType."; +} + +bool MapType::ValueContentEquals( + const ValueContent& x, const ValueContent& y, + const ValueEqualityCheckOptions& options) const { + // Map does not currently support equality. (broken link). + return false; +} + +absl::Status MapType::SerializeValueContent(const ValueContent& value, + ValueProto* value_proto) const { + // TODO: Implement SerializeValueContent. + return absl::UnimplementedError( + "SerializeValueContent is not yet supported."); +} + +absl::Status MapType::DeserializeValueContent(const ValueProto& value_proto, + ValueContent* value) const { + // TODO: Implement DeserializeValueContent. + return absl::UnimplementedError( + "DeserializeValueContent is not yet supported."); +} + +bool MapType::ValueContentLess(const ValueContent& x, const ValueContent& y, + const Type* other_type) const { + ABSL_LOG(FATAL) << "Cannot compare " << DebugString() << " to " // Crash OK + << other_type->DebugString(); + return false; +} + +// TODO: When we add non-debug printing, should refactor to +// be implemented alongside existing ARRAY and STRUCT print logic. +std::string MapType::FormatValueContent( + const ValueContent& value, const FormatValueContentOptions& options) const { + + // For now, print only the map size if we are not in debug mode. + // TODO: determine a stable literal syntax for MAP type. + if (options.mode != Type::FormatValueContentOptions::Mode::kDebug) { + ABSL_LOG(ERROR) << "Map printing not yet implemented."; + return "Map printing not yet implemented."; + } + + std::string result = "{"; + internal::ValueContentMap* value_content_map = + value.GetAs()->value(); + auto map_entries = value_content_map->value_content_entries(); + + absl::StrAppend( + &result, absl::StrJoin( + map_entries, ", ", + [options, this](std::string* out, const auto& map_entry) { + auto& [key, value] = map_entry; + std::string key_str = FormatValueContentContainerElement( + key, this->key_type_, options); + std::string value_str = FormatValueContentContainerElement( + value, this->value_type_, options); + absl::StrAppend(out, key_str, ": ", value_str); + })); + absl::StrAppend(&result, "}"); + return result; +} + +const Type* GetMapKeyType(const Type* map_type) { + return static_cast(map_type)->key_type(); +} +const Type* GetMapValueType(const Type* map_type) { + return static_cast(map_type)->value_type(); +} + +} // namespace zetasql diff --git a/zetasql/public/types/map_type.h b/zetasql/public/types/map_type.h new file mode 100644 index 000000000..a536e59ef --- /dev/null +++ b/zetasql/public/types/map_type.h @@ -0,0 +1,147 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_PUBLIC_TYPES_MAP_TYPE_H_ +#define ZETASQL_PUBLIC_TYPES_MAP_TYPE_H_ + +#include +#include + +#include "zetasql/public/options.pb.h" +#include "zetasql/public/type.pb.h" +#include "zetasql/public/types/container_type.h" +#include "zetasql/public/types/type.h" +#include "absl/hash/hash.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" + +namespace zetasql { + +class LanguageOptions; +class TypeFactory; +class TypeParameterValue; +class TypeParameters; +class ValueContent; +class ValueProto; + +class MapType : public Type { + public: +#ifndef SWIG + MapType(const MapType&) = delete; + MapType& operator=(const MapType&) = delete; +#endif // SWIG + + const Type* key_type() const { return key_type_; } + const Type* value_type() const { return value_type_; } + + const MapType* AsMap() const override { return this; } + + std::string ShortTypeName(ProductMode mode, + bool use_external_float32) const override; + std::string ShortTypeName(ProductMode mode) const override { + return ShortTypeName(mode, /*use_external_float32=*/false); + } + std::string TypeName(ProductMode mode, + bool use_external_float32) const override; + std::string TypeName(ProductMode mode) const override { + return TypeName(mode, /*use_external_float32=*/false); + } + + absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const override; + absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode) const override { + return TypeNameWithModifiers(type_modifiers, mode, + /*use_external_float32=*/false); + } + + bool SupportsOrdering(const LanguageOptions& language_options, + std::string* type_description) const override; + bool SupportsEquality() const override; + + bool UsingFeatureV12CivilTimeType() const override { + return key_type_->UsingFeatureV12CivilTimeType() || + value_type_->UsingFeatureV12CivilTimeType(); + } + + bool IsSupportedType(const LanguageOptions& language_options) const override; + + int nesting_depth() const override; + + protected: + bool EqualsForSameKind(const Type* that, bool equivalent) const override; + + void DebugStringImpl(bool details, TypeOrStringVector* stack, + std::string* debug_string) const override; + + // Return estimated size of memory owned by this type. Map's owned memory + // does not include its key or value type's memory (which is owned by some + // TypeFactory). + int64_t GetEstimatedOwnedMemoryBytesSize() const override { + return sizeof(*this); + } + + private: + MapType(const TypeFactory* factory, const Type* key_type, + const Type* value_type); + ~MapType() override; + + bool SupportsGroupingImpl(const LanguageOptions& language_options, + const Type** no_grouping_type) const override; + + bool SupportsPartitioningImpl( + const LanguageOptions& language_options, + const Type** no_partitioning_type) const override; + + absl::Status SerializeToProtoAndDistinctFileDescriptorsImpl( + const BuildFileDescriptorSetMapOptions& options, TypeProto* type_proto, + FileDescriptorSetMap* file_descriptor_set_map) const override; + + void CopyValueContent(const ValueContent& from, + ValueContent* to) const override; + void ClearValueContent(const ValueContent& value) const override; + absl::HashState HashTypeParameter(absl::HashState state) const override; + absl::HashState HashValueContent(const ValueContent& value, + absl::HashState state) const override; + std::string FormatValueContent( + const ValueContent& value, + const FormatValueContentOptions& options) const override; + bool ValueContentEquals( + const ValueContent& x, const ValueContent& y, + const ValueEqualityCheckOptions& options) const override; + bool ValueContentLess(const ValueContent& x, const ValueContent& y, + const Type* other_type) const override; + + absl::Status SerializeValueContent(const ValueContent& value, + ValueProto* value_proto) const override; + absl::Status DeserializeValueContent(const ValueProto& value_proto, + ValueContent* value) const override; + + const Type* const key_type_; + const Type* const value_type_; + + friend class TypeFactory; +}; + +// Get the Type of the map key. map_type *must be* a MapType. +const Type* GetMapKeyType(const Type* map_type); +// Get the Type of the map value. map_type *must be* a MapType. +const Type* GetMapValueType(const Type* map_type); + +} // namespace zetasql + +#endif // ZETASQL_PUBLIC_TYPES_MAP_TYPE_H_ diff --git a/zetasql/public/types/map_type_test.cc b/zetasql/public/types/map_type_test.cc new file mode 100644 index 000000000..56a45a365 --- /dev/null +++ b/zetasql/public/types/map_type_test.cc @@ -0,0 +1,398 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/public/types/map_type.h" + +#include +#include +#include +#include + +#include "zetasql/base/testing/status_matchers.h" +#include "zetasql/public/language_options.h" +#include "zetasql/public/types/array_type.h" +#include "zetasql/public/types/enum_type.h" +#include "zetasql/public/types/proto_type.h" +#include "zetasql/public/types/range_type.h" +#include "zetasql/public/types/struct_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/public/value.h" +#include "zetasql/testdata/test_schema.pb.h" +#include "zetasql/testing/test_value.h" +#include "gmock/gmock.h" +#include "gtest/gtest.h" +#include "google/protobuf/descriptor.h" + +namespace zetasql { +namespace { + +using google::protobuf::EnumDescriptor; +using testing::HasSubstr; +using testing::NotNull; +using MapTestAllSimpleTypes = testing::TestWithParam; + +using MapTestFormatValueContentDebugMode = + testing::TestWithParam>; + +} // namespace + +// All types which can be constructed via TypeFactory::TypeFromSimpleTypeKind. +const auto kSimpleTypes = { + TYPE_INT32, TYPE_INT64, TYPE_UINT32, TYPE_UINT64, TYPE_BOOL, TYPE_FLOAT, + TYPE_DOUBLE, TYPE_STRING, TYPE_BYTES, TYPE_TIMESTAMP, + TYPE_DATE, TYPE_TIME, TYPE_DATETIME, TYPE_INTERVAL, TYPE_GEOGRAPHY, + TYPE_NUMERIC, TYPE_BIGNUMERIC, TYPE_JSON, TYPE_TOKENLIST}; + +INSTANTIATE_TEST_SUITE_P( + TypeTest, MapTestAllSimpleTypes, testing::ValuesIn(kSimpleTypes), + [](const testing::TestParamInfo& info) { + return TypeKind_Name(info.param); + }); + +// Asserts map type conformance to Type's Is...() and As...() methods, and +// asserts that equality and partitioning are disabled. +void BasicMapAsserts(const Type* map_type) { + EXPECT_FALSE(map_type->IsSimpleType()); + EXPECT_FALSE(map_type->IsEnum()); + EXPECT_FALSE(map_type->IsArray()); + EXPECT_FALSE(map_type->IsStruct()); + EXPECT_FALSE(map_type->IsProto()); + EXPECT_FALSE(map_type->IsStructOrProto()); + EXPECT_FALSE(map_type->IsRangeType()); + EXPECT_TRUE(map_type->IsMapType()); + EXPECT_EQ(map_type->AsStruct(), nullptr); + EXPECT_EQ(map_type->AsArray(), nullptr); + EXPECT_EQ(map_type->AsProto(), nullptr); + EXPECT_EQ(map_type->AsEnum(), nullptr); + EXPECT_EQ(map_type->AsRange(), nullptr); + + EXPECT_FALSE(map_type->SupportsEquality()); + + LanguageOptions language_options; + EXPECT_FALSE(map_type->IsSupportedType(language_options)); + + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + std::string no_partitioning_type; + EXPECT_FALSE( + map_type->SupportsPartitioning(language_options, &no_partitioning_type)); + EXPECT_EQ(no_partitioning_type, "MAP"); +} + +TEST_P(MapTestAllSimpleTypes, MapCanBeConstructedWithSimpleType) { + TypeFactory factory; + TypeKind type_kind = GetParam(); + + const Type* key_type = types::TypeFromSimpleTypeKind(type_kind); + const Type* value_type = types::TypeFromSimpleTypeKind(type_kind); + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + factory.MakeMapType(key_type, value_type)); + + EXPECT_TRUE(key_type == GetMapKeyType(map_type)); + EXPECT_TRUE(value_type == GetMapValueType(map_type)); + + BasicMapAsserts(map_type); + + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_with_map_key_type, + factory.MakeMapType(map_type, value_type)); + BasicMapAsserts(map_with_map_key_type); + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_with_map_value_type, + factory.MakeMapType(key_type, map_type)); + BasicMapAsserts(map_with_map_value_type); +} + +TEST(TypeTest, MapTypeRequiresKeyTypeToBeGroupable) { + TypeFactory factory; + ZETASQL_ASSERT_OK_AND_ASSIGN( + const Type* map_type_with_groupable_simple_key, + factory.MakeMapType(factory.get_string(), factory.get_string())); + + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_TRUE( + map_type_with_groupable_simple_key->IsSupportedType(language_options)); +} + +TEST(TypeTest, MapTypeRequiresKeyTypeToBeGroupableConditionallyGroupableKey) { + TypeFactory factory; + ZETASQL_ASSERT_OK_AND_ASSIGN( + const Type* map_type_with_array_key, + factory.MakeMapType(types::Int32ArrayType(), factory.get_string())); + + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_FALSE(map_type_with_array_key->IsSupportedType(language_options)); + language_options.EnableLanguageFeature(FEATURE_V_1_2_GROUP_BY_ARRAY); + EXPECT_TRUE(map_type_with_array_key->IsSupportedType(language_options)); +} + +TEST(TypeTest, MapTypeRequiresKeyAndValueTypesToBeSupported) { + TypeFactory factory; + ZETASQL_ASSERT_OK_AND_ASSIGN( + const Type* map_type, + factory.MakeMapType(types::DateRangeType(), factory.get_time())); + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_FALSE(map_type->IsSupportedType(language_options)); + + language_options.EnableLanguageFeature(FEATURE_V_1_2_CIVIL_TIME); + EXPECT_FALSE(map_type->IsSupportedType(language_options)); + + language_options.EnableLanguageFeature(FEATURE_RANGE_TYPE); + EXPECT_TRUE(map_type->IsSupportedType(language_options)); +} + +TEST(TypeTest, TestNamesValid) { + TypeFactory factory; + zetasql_test__::KitchenSinkPB kitchen_sink; + const ProtoType* proto_type; + ZETASQL_EXPECT_OK(factory.MakeProtoType(kitchen_sink.GetDescriptor(), &proto_type)); + EXPECT_THAT(proto_type, NotNull()); + + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + factory.MakeMapType(factory.get_string(), proto_type)); + + EXPECT_EQ(map_type->DebugString(), + "MAP>"); + EXPECT_EQ(map_type->ShortTypeName(PRODUCT_INTERNAL), + "MAP"); + EXPECT_EQ(map_type->TypeName(PRODUCT_INTERNAL), + "MAP"); +} + +TEST(TypeTest, TestNamesValidWithNesting) { + TypeFactory factory; + ZETASQL_ASSERT_OK_AND_ASSIGN( + const Type* map_type, + factory.MakeMapType(factory.get_string(), factory.get_string())); + const StructType* struct_type; + ZETASQL_ASSERT_OK(factory.MakeStructType({{"a", map_type}}, &struct_type)); + const ArrayType* array_type; + ZETASQL_ASSERT_OK(factory.MakeArrayType(struct_type, &array_type)); + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* outer_map_type, + factory.MakeMapType(array_type, array_type)); + + EXPECT_EQ(outer_map_type->DebugString(), + "MAP>>, ARRAY>>>"); + + EXPECT_EQ(outer_map_type->ShortTypeName(PRODUCT_INTERNAL), + "MAP>>, ARRAY>>>"); + EXPECT_EQ(outer_map_type->ShortTypeName(PRODUCT_EXTERNAL), + "MAP>>, ARRAY>>>"); + + EXPECT_EQ(outer_map_type->TypeName(PRODUCT_INTERNAL), + "MAP>>, ARRAY>>>"); + EXPECT_EQ(outer_map_type->TypeName(PRODUCT_EXTERNAL), + "MAP>>, ARRAY>>>"); +} +TEST(TypeTest, MapTypeWithStructValid) { + TypeFactory factory; + const StructType* struct_type; + ZETASQL_ASSERT_OK( + factory.MakeStructType({{"a", factory.get_string()}}, &struct_type)); + + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + factory.MakeMapType(struct_type, struct_type)); + BasicMapAsserts(map_type); + + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_FALSE(map_type->IsSupportedType(language_options)); + language_options.EnableLanguageFeature(FEATURE_V_1_2_GROUP_BY_STRUCT); + EXPECT_TRUE(map_type->IsSupportedType(language_options)); +} + +TEST(MapTest, MapTypeWithArrayValid) { + TypeFactory factory; + const ArrayType* array_type; + ZETASQL_ASSERT_OK(factory.MakeArrayType(factory.get_string(), &array_type)); + + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + factory.MakeMapType(array_type, array_type)); + BasicMapAsserts(map_type); + + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_FALSE(map_type->IsSupportedType(language_options)); + language_options.EnableLanguageFeature(FEATURE_V_1_2_GROUP_BY_ARRAY); + EXPECT_TRUE(map_type->IsSupportedType(language_options)); +} + +TEST(MapTest, MapTypeWithRangeValid) { + TypeFactory factory; + const RangeType* range_type; + ZETASQL_ASSERT_OK(factory.MakeRangeType(factory.get_timestamp(), &range_type)); + + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + factory.MakeMapType(range_type, range_type)); + BasicMapAsserts(map_type); + + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_FALSE(map_type->IsSupportedType(language_options)); + language_options.EnableLanguageFeature(FEATURE_RANGE_TYPE); + EXPECT_TRUE(map_type->IsSupportedType(language_options)); +} + +TEST(MapTest, MapTypeWithProtoValid) { + TypeFactory factory; + + zetasql_test__::KitchenSinkPB kitchen_sink; + const ProtoType* proto_type; + + ZETASQL_EXPECT_OK(factory.MakeProtoType(kitchen_sink.GetDescriptor(), &proto_type)); + EXPECT_THAT(proto_type, NotNull()); + + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + factory.MakeMapType(factory.get_string(), proto_type)); + BasicMapAsserts(map_type); + + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_TRUE(map_type->IsSupportedType(language_options)); +} +TEST(MapTest, MapTypeWithEnumValid) { + TypeFactory factory; + + const EnumType* enum_type; + const EnumDescriptor* enum_descriptor = zetasql_test__::TestEnum_descriptor(); + ZETASQL_EXPECT_OK(factory.MakeEnumType(enum_descriptor, &enum_type)); + EXPECT_THAT(enum_type, NotNull()); + + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + factory.MakeMapType(enum_type, enum_type)); + BasicMapAsserts(map_type); + + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_MAP_TYPE); + EXPECT_TRUE(map_type->IsSupportedType(language_options)); +} + +INSTANTIATE_TEST_SUITE_P( + MapTest, MapTestFormatValueContentDebugMode, + testing::ValuesIn(std::initializer_list>{ + { + test_values::Map({{"a", true}}), + R"({"a": true})", + }, + { + test_values::Map( + {{"a", true}, {"b", false}, {"c", Value::NullBool()}}), + R"({"a": true, "b": false, "c": NULL})", + }, + { + test_values::Map({{"foobar", Value::Int32(1)}, + {"zoobar", Value::Int32(2)}}), + R"({"foobar": 1, "zoobar": 2})", + }, + { + test_values::Map({{"a", test_values::Array({Value::Int32(1), + Value::Int32(2)})}}), + R"({"a": Array[Int32(1), Int32(2)]})", + }, + { + test_values::Map( + {{"nested", + test_values::Map( + {{"a", test_values::Map( + {{"b", test_values::Map( + {{"c", Value::Int32(1)}})}})}})}}), + R"({"nested": {"a": {"b": {"c": 1}}}})", + }, + { + test_values::Map( + {{"nested", + test_values::Struct( + {{"field", + test_values::Map( + {{"a", test_values::Map( + {{"b", Value::Int32(1)}})}})}})}}), + R"({"nested": Struct{field:Map>({"a": {"b": 1}})}})", + }, + })); + +TEST_P(MapTestFormatValueContentDebugMode, FormatValueContentDebugMode) { + auto& [map_value, expected_format_str] = GetParam(); + + Type::FormatValueContentOptions options; + options.mode = Type::FormatValueContentOptions::Mode::kDebug; + options.verbose = true; + + EXPECT_EQ( + map_value.type()->FormatValueContent(map_value.GetContent(), options), + expected_format_str); +} + +TEST(MapTest, FormatValueContentDebugModeEmptyMap) { + TypeFactory factory; + + Type::FormatValueContentOptions options; + options.mode = Type::FormatValueContentOptions::Mode::kDebug; + options.verbose = true; + + ZETASQL_ASSERT_OK_AND_ASSIGN( + const Type* map_type, + factory.MakeMapType(factory.get_string(), factory.get_int64())); + + ZETASQL_ASSERT_OK_AND_ASSIGN(Value map_value, Value::MakeMap(map_type, {})); + EXPECT_EQ(map_type->FormatValueContent(map_value.GetContent(), options), + "{}"); +} + +TEST(TypeFactoryTest, MapTypesAreCached) { + TypeFactory factory; + ZETASQL_ASSERT_OK_AND_ASSIGN( + const Type* type1, + factory.MakeMapType(factory.get_int64(), factory.get_double())); + ZETASQL_ASSERT_OK_AND_ASSIGN( + const Type* type2, + factory.MakeMapType(factory.get_int64(), factory.get_double())); + EXPECT_TRUE(type1 == type2) + << "Expected the two type pointers to be identical"; +} + +TEST_P(MapTestAllSimpleTypes, MapWithSimpleTypesUsesStaticFactory) { + TypeFactory factory; + TypeKind type_kind = GetParam(); + const Type* simple_type = types::TypeFromSimpleTypeKind(type_kind); + + const auto initial_factory_size = factory.GetEstimatedOwnedMemoryBytesSize(); + ZETASQL_ASSERT_OK(factory.MakeMapType(simple_type, simple_type)); + + // Our factory should not change size, because the static factory was used. + ASSERT_EQ(initial_factory_size, factory.GetEstimatedOwnedMemoryBytesSize()); +} + +TEST(TypeFactoryTest, MapWithComplexTypesUsesInstanceFactory) { + TypeFactory factory; + const Type* struct_type; + ZETASQL_ASSERT_OK(factory.MakeStructType({{"a", types::Int32Type()}}, &struct_type)); + + const auto initial_factory_size = factory.GetEstimatedOwnedMemoryBytesSize(); + ZETASQL_ASSERT_OK(factory.MakeMapType(struct_type, types::Int32Type())); + + ASSERT_NE(initial_factory_size, factory.GetEstimatedOwnedMemoryBytesSize()); +} + +} // namespace zetasql diff --git a/zetasql/public/types/proto_type.cc b/zetasql/public/types/proto_type.cc index 551ff5a61..6a28bb7c0 100644 --- a/zetasql/public/types/proto_type.cc +++ b/zetasql/public/types/proto_type.cc @@ -160,7 +160,7 @@ absl::Status ProtoType::GetFieldTypeByTagNumber(int number, CatalogNamePath(), type); } -absl::Status ProtoType::GetFieldTypeByName(const std::string& name, +absl::Status ProtoType::GetFieldTypeByName(absl::string_view name, TypeFactory* factory, bool use_obsolete_timestamp, const Type** type, @@ -191,7 +191,7 @@ std::string ProtoType::TypeName() const { return catalog_name_path; } -std::string ProtoType::ShortTypeName(ProductMode mode_unused) const { +std::string ProtoType::ShortTypeName() const { std::string catalog_name_path; if (catalog_name_ != nullptr) { absl::StrAppend(&catalog_name_path, *catalog_name_->path_string, "."); @@ -338,6 +338,9 @@ absl::Status ProtoType::GetTypeKindFromFieldDescriptor( case FieldFormat::INTERVAL: *kind = TYPE_INTERVAL; break; + case FieldFormat::TOKENLIST: + *kind = TYPE_TOKENLIST; + break; case FieldFormat::RANGE_DATES_ENCODED: case FieldFormat::RANGE_DATETIMES_ENCODED: case FieldFormat::RANGE_TIMESTAMPS_ENCODED: @@ -730,7 +733,7 @@ std::string ProtoType::FormatValueContent( if (!options.as_literal()) { return internal::GetCastExpressionString( ToBytesLiteral(std::string(GetCordValue(value))), this, - options.product_mode); + options.product_mode, options.use_external_float32); } google::protobuf::DynamicMessageFactory message_factory; @@ -743,12 +746,13 @@ std::string ProtoType::FormatValueContent( return "{}"; } - return absl::StrCat("{", - options.verbose - ? - message->DebugString() - : message->ShortDebugString(), - "}"); + return absl::StrCat( + "{", + options.verbose + ? + message->DebugString() + : message->ShortDebugString(), + "}"); } absl::Status status; diff --git a/zetasql/public/types/proto_type.h b/zetasql/public/types/proto_type.h index 147b2dcc3..fc07e2028 100644 --- a/zetasql/public/types/proto_type.h +++ b/zetasql/public/types/proto_type.h @@ -89,6 +89,10 @@ class ProtoType : public Type { // is just the descriptor full_name (without back-ticks). The back-ticks // are not necessary for TypeName() to be reparseable, so should be removed. std::string TypeName(ProductMode mode_unused) const override; + std::string TypeName(ProductMode mode_unused, + bool use_external_float32_unused) const override { + return TypeName(); + } // ProtoType does not support type parameters or collation, which is why // TypeName(mode) is used. absl::StatusOr TypeNameWithModifiers( @@ -99,8 +103,22 @@ class ProtoType : public Type { ZETASQL_RET_CHECK(collation.Empty()); return TypeName(mode); } - std::string ShortTypeName( - ProductMode mode_unused = ProductMode::PRODUCT_INTERNAL) const override; + absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32_unused) const override { + return TypeNameWithModifiers(type_modifiers, mode); + } + + std::string ShortTypeName(ProductMode mode_unused) const override { + return ShortTypeName(); + } + + std::string ShortTypeName(ProductMode mode_unused, + bool use_external_float32_unused) const override { + return ShortTypeName(); + } + + std::string ShortTypeName() const; std::string TypeName() const; // Proto-specific version does not need mode. // Nested catalog names, that were passed to the constructor. @@ -132,7 +150,7 @@ class ProtoType : public Type { const Type** type, std::string* name = nullptr) const; ABSL_DEPRECATED("Use overload without 'use_obsolete_timestamp' argument.") - absl::Status GetFieldTypeByName(const std::string& name, TypeFactory* factory, + absl::Status GetFieldTypeByName(absl::string_view name, TypeFactory* factory, bool use_obsolete_timestamp, const Type** type, int* number = nullptr) const; @@ -480,6 +498,7 @@ absl::Status ProtoType::ValidateTypeAnnotations( if (field_format != FieldFormat::ST_GEOGRAPHY_ENCODED && field_format != FieldFormat::NUMERIC && field_format != FieldFormat::BIGNUMERIC && + field_format != FieldFormat::TOKENLIST && field_format != FieldFormat::RANGE_DATES_ENCODED && field_format != FieldFormat::RANGE_DATETIMES_ENCODED && field_format != FieldFormat::RANGE_TIMESTAMPS_ENCODED && diff --git a/zetasql/public/types/simple_type.cc b/zetasql/public/types/simple_type.cc index bca6c0129..070ff4b8d 100644 --- a/zetasql/public/types/simple_type.cc +++ b/zetasql/public/types/simple_type.cc @@ -54,6 +54,7 @@ #include "absl/functional/any_invocable.h" #include "absl/functional/function_ref.h" #include "absl/hash/hash.h" +#include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/ascii.h" @@ -114,7 +115,7 @@ const std::map& SimpleTypeNameInfoMap() { {"bool", {TYPE_BOOL}}, {"boolean", {TYPE_BOOL}}, {"float", {TYPE_FLOAT, true}}, - {"float32", {TYPE_FLOAT, true}}, + {"float32", {TYPE_FLOAT}}, {"float64", {TYPE_DOUBLE}}, {"double", {TYPE_DOUBLE, true}}, {"bytes", {TYPE_BYTES}}, @@ -130,6 +131,7 @@ const std::map& SimpleTypeNameInfoMap() { {"bignumeric", {TYPE_BIGNUMERIC}}, {"bigdecimal", {TYPE_BIGNUMERIC, false, FEATURE_V_1_3_DECIMAL_ALIAS}}, {"json", {TYPE_JSON}}, + {"tokenlist", {TYPE_TOKENLIST}}, }; return *result; } @@ -142,29 +144,78 @@ struct TypeKindInfo { bool internal_product_mode_only = false; // If present, then the feature controls whether the type kind is enabled. // If absent, then the type kind does not require any language feature. + // Only one of `type_feature` or `disabling_type_feature` should be set. std::optional type_feature; + // If present, then the feature controls whether the type kind is enabled. + // If absent, then the type kind does not require any language feature. + // Only one of `type_feature` or `disabling_type_feature` should be set. + std::optional disabling_type_feature; + + // Builds a TypeKindInfo for both product modes. + static TypeKindInfo Build() { + return TypeKindInfo(/*internal_product_mode_only=*/false, std::nullopt, + std::nullopt); + } + + // Builds a TypeKindInfo for `PRODUCT_INTERNAL`. + static TypeKindInfo BuildInternalOnly() { + return TypeKindInfo(/*internal_product_mode_only=*/true, std::nullopt, + std::nullopt); + } + + // Builds a TypeKindInfo for both product modes, controlled by `type_feature`. + static TypeKindInfo BuildWithTypeFeature(LanguageFeature type_feature) { + return TypeKindInfo(/*internal_product_mode_only=*/false, type_feature, + std::nullopt); + } + + // Builds a TypeKindInfo for both product modes, controlled by + // `disabling_type_feature`. + static TypeKindInfo BuildWithDisablingTypeFeature( + LanguageFeature disabling_type_feature) { + return TypeKindInfo(/*internal_product_mode_only=*/false, std::nullopt, + disabling_type_feature); + } + + private: + TypeKindInfo(bool internal_product_mode_only, + std::optional type_feature, + std::optional disabling_type_feature) + : internal_product_mode_only(internal_product_mode_only) { + ABSL_CHECK(!(type_feature.has_value() && // Crash OK + disabling_type_feature.has_value())) + << "Only one of type_feature or disabling_type_feature should be set."; + this->type_feature = type_feature; + this->disabling_type_feature = disabling_type_feature; + } }; const std::map& SimpleTypeKindInfoMap() { static auto result = new std::map{ - {TYPE_INT32, {true}}, - {TYPE_UINT32, {true}}, - {TYPE_INT64, {}}, - {TYPE_UINT64, {true}}, - {TYPE_BOOL, {}}, - {TYPE_FLOAT, {true}}, - {TYPE_DOUBLE, {}}, - {TYPE_BYTES, {}}, - {TYPE_STRING, {}}, - {TYPE_DATE, {}}, - {TYPE_TIMESTAMP, {}}, - {TYPE_TIME, {false, FEATURE_V_1_2_CIVIL_TIME}}, - {TYPE_DATETIME, {false, FEATURE_V_1_2_CIVIL_TIME}}, - {TYPE_INTERVAL, {false, FEATURE_INTERVAL_TYPE}}, - {TYPE_GEOGRAPHY, {false, FEATURE_GEOGRAPHY}}, - {TYPE_NUMERIC, {false, FEATURE_NUMERIC_TYPE}}, - {TYPE_BIGNUMERIC, {false, FEATURE_BIGNUMERIC_TYPE}}, - {TYPE_JSON, {false, FEATURE_JSON_TYPE}}, + {TYPE_INT32, TypeKindInfo::BuildInternalOnly()}, + {TYPE_UINT32, TypeKindInfo::BuildInternalOnly()}, + {TYPE_INT64, TypeKindInfo::Build()}, + {TYPE_UINT64, TypeKindInfo::BuildInternalOnly()}, + {TYPE_BOOL, TypeKindInfo::Build()}, + {TYPE_FLOAT, TypeKindInfo::BuildWithDisablingTypeFeature( + FEATURE_V_1_4_DISABLE_FLOAT32)}, + {TYPE_DOUBLE, TypeKindInfo::Build()}, + {TYPE_BYTES, TypeKindInfo::Build()}, + {TYPE_STRING, TypeKindInfo::Build()}, + {TYPE_DATE, TypeKindInfo::Build()}, + {TYPE_TIMESTAMP, TypeKindInfo::Build()}, + {TYPE_TIME, TypeKindInfo::BuildWithTypeFeature(FEATURE_V_1_2_CIVIL_TIME)}, + {TYPE_DATETIME, + TypeKindInfo::BuildWithTypeFeature(FEATURE_V_1_2_CIVIL_TIME)}, + {TYPE_INTERVAL, + TypeKindInfo::BuildWithTypeFeature(FEATURE_INTERVAL_TYPE)}, + {TYPE_GEOGRAPHY, TypeKindInfo::BuildWithTypeFeature(FEATURE_GEOGRAPHY)}, + {TYPE_NUMERIC, TypeKindInfo::BuildWithTypeFeature(FEATURE_NUMERIC_TYPE)}, + {TYPE_BIGNUMERIC, + TypeKindInfo::BuildWithTypeFeature(FEATURE_BIGNUMERIC_TYPE)}, + {TYPE_JSON, TypeKindInfo::BuildWithTypeFeature(FEATURE_JSON_TYPE)}, + {TYPE_TOKENLIST, + TypeKindInfo::BuildWithTypeFeature(FEATURE_TOKENIZED_SEARCH)}, }; return *result; } @@ -175,6 +226,7 @@ struct TypeInfo { bool internal_product_mode_only = false; std::optional type_feature; std::optional alias_feature; + std::optional disabling_type_feature; }; // A map joining SimpleTypeNameInfoMap and SimpleTypeKindInfoMap. @@ -195,7 +247,8 @@ std::map* BuildSimpleTypeInfoMap() { TypeInfo{type_kind, type_name_info.internal_product_mode_only || type_kind_info.internal_product_mode_only, - type_kind_info.type_feature, type_name_info.alias_feature}); + type_kind_info.type_feature, type_name_info.alias_feature, + type_kind_info.disabling_type_feature}); } return result; } @@ -228,6 +281,10 @@ const IntervalValue& GetIntervalValue(const ValueContent& value) { return value.GetAs()->value(); } +const tokens::TokenList& GetTokenListValue(const ValueContent& value) { + return value.GetAs()->value(); +} + std::string AddTypePrefix(absl::string_view value, const Type* type, ProductMode mode) { return absl::StrCat(type->TypeName(mode), " ", ToStringLiteral(value)); @@ -260,8 +317,7 @@ SimpleType::SimpleType(const TypeFactory* factory, TypeKind kind) ABSL_CHECK(IsSimpleType(kind)) << kind; } -SimpleType::~SimpleType() { -} +SimpleType::~SimpleType() = default; bool SimpleType::IsSupportedType( const LanguageOptions& language_options) const { @@ -279,7 +335,12 @@ bool SimpleType::IsSupportedType( if (info.type_feature.has_value() && !language_options.LanguageFeatureEnabled(info.type_feature.value())) { return false; + } else if (info.disabling_type_feature.has_value() && + language_options.LanguageFeatureEnabled( + *info.disabling_type_feature)) { + return false; } + return true; } @@ -295,13 +356,15 @@ absl::Status SimpleType::SerializeToProtoAndDistinctFileDescriptorsImpl( return absl::OkStatus(); } -std::string SimpleType::TypeName(ProductMode mode) const { - return TypeKindToString(kind(), mode); +std::string SimpleType::TypeName(ProductMode mode, + bool use_external_float32) const { + return TypeKindToString(kind(), mode, use_external_float32); } absl::StatusOr SimpleType::TypeNameWithModifiers( - const TypeModifiers& type_modifiers, ProductMode mode) const { - std::string result_type_name = TypeName(mode); + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const { + std::string result_type_name = TypeName(mode, use_external_float32); const TypeParameters& type_params = type_modifiers.type_parameters(); // Prepares string for type parameters to append to type name. if (!type_params.IsEmpty()) { @@ -350,7 +413,7 @@ absl::StatusOr SimpleType::TypeNameWithModifiers( TypeKind SimpleType::GetTypeKindIfSimple( absl::string_view type_name, ProductMode mode, - const LanguageOptions::LanguageFeatureSet* language_features) { + const LanguageOptions::LanguageFeatureSet* enabled_language_features) { static const std::map* type_map = BuildSimpleTypeInfoMap(); const TypeInfo* type_info = @@ -361,13 +424,17 @@ TypeKind SimpleType::GetTypeKindIfSimple( if (mode == PRODUCT_EXTERNAL && type_info->internal_product_mode_only) { return TYPE_UNKNOWN; } - if (language_features != nullptr) { + if (enabled_language_features != nullptr) { if (type_info->type_feature.has_value() && - !language_features->contains(type_info->type_feature.value())) { + !enabled_language_features->contains(*type_info->type_feature)) { + return TYPE_UNKNOWN; + } else if (type_info->disabling_type_feature.has_value() && + enabled_language_features->contains( + *type_info->disabling_type_feature)) { return TYPE_UNKNOWN; } if (type_info->alias_feature.has_value() && - !language_features->contains(type_info->alias_feature.value())) { + !enabled_language_features->contains(*type_info->alias_feature)) { return TYPE_UNKNOWN; } } @@ -377,8 +444,7 @@ TypeKind SimpleType::GetTypeKindIfSimple( bool SimpleType::SupportsGroupingImpl(const LanguageOptions& language_options, const Type** no_grouping_type) const { const bool supports_grouping = - !this->IsGeography() && - !this->IsJson() && + !this->IsGeography() && !this->IsJson() && !this->IsTokenList() && !(this->IsFloatingPoint() && language_options.LanguageFeatureEnabled( FEATURE_DISALLOW_GROUP_BY_FLOAT)); if (no_grouping_type != nullptr) { @@ -409,6 +475,9 @@ void SimpleType::CopyValueContent(TypeKind kind, const ValueContent& from, case TYPE_JSON: from.GetAs()->Ref(); break; + case TYPE_TOKENLIST: + from.GetAs()->Ref(); + break; default: break; } @@ -442,6 +511,9 @@ void SimpleType::ClearValueContent(TypeKind kind, const ValueContent& value) { case TYPE_JSON: value.GetAs()->Unref(); return; + case TYPE_TOKENLIST: + value.GetAs()->Unref(); + return; default: return; } @@ -465,6 +537,8 @@ uint64_t SimpleType::GetValueContentExternallyAllocatedByteSize( return sizeof(internal::BigNumericRef); case TYPE_JSON: return value.GetAs()->physical_byte_size(); + case TYPE_TOKENLIST: + return value.GetAs()->physical_byte_size(); default: return 0; } @@ -474,6 +548,15 @@ absl::HashState SimpleType::HashTypeParameter(absl::HashState state) const { return state; // Simple types don't have parameters. } +namespace { + +absl::HashState HashTokenList(const ValueContent& value, + absl::HashState state) { + return absl::HashState::combine(std::move(state), GetTokenListValue(value)); +} + +} // namespace + absl::HashState SimpleType::HashValueContent(const ValueContent& value, absl::HashState state) const { // These codes are picked arbitrarily. @@ -539,6 +622,8 @@ absl::HashState SimpleType::HashValueContent(const ValueContent& value, return absl::HashState::combine(std::move(state), kGeographyHashCode); case TYPE_JSON: return absl::HashState::combine(std::move(state), GetJsonString(value)); + case TYPE_TOKENLIST: + return HashTokenList(value, std::move(state)); default: ABSL_LOG(ERROR) << "Unexpected type kind: " << kind(); return state; @@ -601,6 +686,8 @@ bool SimpleType::ValueContentEquals( case TYPE_JSON: { return GetJsonString(x) == GetJsonString(y); } + case TYPE_TOKENLIST: + return GetTokenListValue(x).EquivalentTo(GetTokenListValue(y)); default: ABSL_LOG(FATAL) << "Unexpected simple type kind: " << kind(); } @@ -668,6 +755,74 @@ bool SimpleType::ValueContentLess(const ValueContent& x, const ValueContent& y, } } +// Formats 'token' into 'out'. +void SimpleType::FormatTextToken(std::string& out, + const tokens::TextToken& token, + const FormatValueContentOptions& options) { + absl::StrAppendFormat(&out, "{text: '%s', is_display_only: true}", + absl::Utf8SafeCEscape(token.text())); +} + +std::string SimpleType::FormatTokenList( + const ValueContent& value, const FormatValueContentOptions& options) const { + if (options.mode != FormatValueContentOptions::Mode::kDebug) { + // TOKENLIST doesn't have literals. + // TODO: generate expression with standard constructor + // functions when available. + return internal::GetCastExpressionString( + ToBytesLiteral(GetTokenListValue(value).GetBytes()), this, + options.product_mode); + } + std::vector lines = FormatTokenLines(value, options); + if (lines.empty()) lines = {""}; + return absl::StrJoin(lines, "\n"); +} + +std::vector SimpleType::FormatTokenLines( + const ValueContent& value, const FormatValueContentOptions& options) { + auto iter = GetTokenListValue(value).GetIterator(); + if (!iter.ok()) { + return {iter.status().ToString()}; + } + tokens::TextToken buf, cur; + int run_length = 0; + std::vector lines; + auto add_token_line = [&](const tokens::TextToken& token, int run_length) { + std::string out; + FormatTextToken(out, token, options); + if (run_length > 1) absl::StrAppend(&out, " (", run_length, " times)"); + lines.push_back(std::move(out)); + }; + + while (!iter->done()) { + if (!iter->Next(cur).ok()) { + return {iter.status().ToString()}; + } + + if (!options.collapse_identical_tokens) { + std::string tok_str; + FormatTextToken(tok_str, cur, options); + lines.push_back(std::move(tok_str)); + } else { + if (buf == cur) { + ++run_length; + } else { + if (run_length > 0) { + add_token_line(buf, run_length); + } + buf = std::move(cur); + run_length = 1; + } + } + } + if (options.collapse_identical_tokens && run_length > 0) { + // As long as there was at least one token, buf will contain a token that + // has yet to be printed. + add_token_line(buf, run_length); + } + return lines; +} + std::string SimpleType::FormatValueContent( const ValueContent& value, const FormatValueContentOptions& options) const { switch (kind()) { @@ -737,7 +892,7 @@ std::string SimpleType::FormatValueContent( if (!std::isfinite(float_value)) { return internal::GetCastExpressionString( ToStringLiteral(RoundTripFloatToString(float_value)), this, - options.product_mode); + options.product_mode, options.use_external_float32); } else { std::string s = RoundTripFloatToString(float_value); // Make sure that doubles always print with a . or an 'e' so they @@ -748,7 +903,8 @@ std::string SimpleType::FormatValueContent( } return options.as_literal() ? s : internal::GetCastExpressionString( - s, this, options.product_mode); + s, this, options.product_mode, + options.use_external_float32); } } case TYPE_DOUBLE: { @@ -799,6 +955,8 @@ std::string SimpleType::FormatValueContent( ? absl::StrCat("JSON ", ToStringLiteral(s)) : s; } + case TYPE_TOKENLIST: + return FormatTokenList(value, options); default: ABSL_LOG(ERROR) << "Unexpected type kind: " << kind(); return ""; @@ -868,6 +1026,9 @@ absl::Status SimpleType::SerializeValueContent(const ValueContent& value, value_proto->set_interval_value( GetIntervalValue(value).SerializeAsBytes()); break; + case TYPE_TOKENLIST: + value_proto->set_tokenlist_value(GetTokenListValue(value).GetBytes()); + break; default: return absl::Status(absl::StatusCode::kInternal, absl::StrCat("Unsupported type ", DebugString())); @@ -1014,12 +1175,20 @@ absl::Status SimpleType::DeserializeValueContent(const ValueProto& value_proto, if (!value_proto.has_interval_value()) { return TypeMismatchError(value_proto); } - ZETASQL_ASSIGN_OR_RETURN(IntervalValue interval_v, - IntervalValue::DeserializeFromBytes( - value_proto.interval_value())); + ZETASQL_ASSIGN_OR_RETURN( + IntervalValue interval_v, + IntervalValue::DeserializeFromBytes(value_proto.interval_value())); value->set(new internal::IntervalRef(interval_v)); break; } + case TYPE_TOKENLIST: { + if (!value_proto.has_tokenlist_value()) { + return TypeMismatchError(value_proto); + } + value->set(new internal::TokenListRef( + tokens::TokenList::FromBytes(value_proto.tokenlist_value()))); + break; + } default: return absl::Status(absl::StatusCode::kInternal, absl::StrCat("Unsupported type ", DebugString())); @@ -1107,7 +1276,7 @@ absl::StatusOr SimpleType::ValidateAndResolveTypeParameters( } absl::StatusOr SimpleType::ResolveStringBytesTypeParameters( - const std::vector& type_parameter_values, + absl::Span type_parameter_values, ProductMode mode) const { if (type_parameter_values.size() != 1) { return MakeSqlError() << ShortTypeName(mode) @@ -1137,7 +1306,7 @@ absl::StatusOr SimpleType::ResolveStringBytesTypeParameters( absl::StatusOr SimpleType::ResolveNumericBignumericTypeParameters( - const std::vector& type_parameter_values, + absl::Span type_parameter_values, ProductMode mode) const { if (type_parameter_values.size() > 2) { return MakeSqlError() << ShortTypeName(mode) diff --git a/zetasql/public/types/simple_type.h b/zetasql/public/types/simple_type.h index 82d28f1db..68a2692ad 100644 --- a/zetasql/public/types/simple_type.h +++ b/zetasql/public/types/simple_type.h @@ -32,6 +32,7 @@ #include "absl/status/statusor.h" #include "absl/strings/string_view.h" #include "absl/time/time.h" +#include "absl/types/span.h" namespace zetasql { @@ -53,12 +54,22 @@ class SimpleType : public Type { SimpleType& operator=(const SimpleType&) = delete; #endif // SWIG - std::string TypeName(ProductMode mode) const override; + std::string TypeName(ProductMode mode, + bool use_external_float32) const override; + std::string TypeName(ProductMode mode) const override { + return TypeName(mode, /*use_external_float32=*/false); + } // Same as above, but the type modifier values are appended to the SQL name // for this SimpleType. absl::StatusOr TypeNameWithModifiers( - const TypeModifiers& type_modifiers, ProductMode mode) const override; + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const override; + absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode) const override { + return TypeNameWithModifiers(type_modifiers, mode, + /*use_external_float32=*/false); + } bool IsSupportedType(const LanguageOptions& language_options) const override; @@ -134,12 +145,12 @@ class SimpleType : public Type { // Resolves type parameters for STRING(L), BYTES(L). absl::StatusOr ResolveStringBytesTypeParameters( - const std::vector& type_parameter_values, + absl::Span type_parameter_values, ProductMode mode) const; // Resolves type parameters for NUMERIC(P), BIGNUMERIC(P), NUMERIC(P, S), // BIGNUMERIC(P, S) and create respective TypeParameters class. absl::StatusOr ResolveNumericBignumericTypeParameters( - const std::vector& type_parameter_values, + absl::Span type_parameter_values, ProductMode mode) const; // Validates the resolved numeric type parameters. // We put ValidateNumericTypeParameters() in Type class instead of @@ -148,6 +159,17 @@ class SimpleType : public Type { absl::Status ValidateNumericTypeParameters( const NumericTypeParametersProto& numeric_param, ProductMode mode) const; + // Returns the a TokenList formatted as a string. + std::string FormatTokenList(const ValueContent& value, + const FormatValueContentOptions& options) const; + // Returns the debug content of the TokenList as a sequence of lines, each + // representing a single token. + static std::vector FormatTokenLines( + const ValueContent& value, const FormatValueContentOptions& options); + + static void FormatTextToken(std::string& out, const tokens::TextToken& token, + const FormatValueContentOptions& options); + // Used for TYPE_TIMESTAMP. static absl::Time GetTimestampValue(const ValueContent& value); static absl::Status SetTimestampValue(absl::Time time, ValueContent* value); diff --git a/zetasql/public/types/struct_type.cc b/zetasql/public/types/struct_type.cc index eb434f36f..e4cb2bfc4 100644 --- a/zetasql/public/types/struct_type.cc +++ b/zetasql/public/types/struct_type.cc @@ -203,26 +203,29 @@ absl::StatusOr StructType::TypeNameImpl( return ret; } -std::string StructType::ShortTypeName(ProductMode mode) const { +std::string StructType::ShortTypeName(ProductMode mode, + bool use_external_float32) const { // Limit the output to three struct fields to avoid long error messages. const int field_limit = 3; const auto field_debug_fn = [=](const zetasql::Type* type, int field_index) { - return type->ShortTypeName(mode); + return type->ShortTypeName(mode, use_external_float32); }; return TypeNameImpl(field_limit, field_debug_fn).value(); } -std::string StructType::TypeName(ProductMode mode) const { +std::string StructType::TypeName(ProductMode mode, + bool use_external_float32) const { const auto field_debug_fn = [=](const zetasql::Type* type, int field_index) { - return type->TypeName(mode); + return type->TypeName(mode, use_external_float32); }; return TypeNameImpl(std::numeric_limits::max(), field_debug_fn).value(); } absl::StatusOr StructType::TypeNameWithModifiers( - const TypeModifiers& type_modifiers, ProductMode mode) const { + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const { const TypeParameters& type_params = type_modifiers.type_parameters(); if (!type_params.IsEmpty() && (type_params.num_children() != num_fields())) { return MakeSqlError() @@ -241,7 +244,7 @@ absl::StatusOr StructType::TypeNameWithModifiers( type_params.IsEmpty() ? TypeParameters() : type_params.child(field_index), collation.Empty() ? Collation() : collation.child(field_index)), - mode); + mode, use_external_float32); }; return TypeNameImpl(std::numeric_limits::max(), field_debug_fn); } @@ -249,7 +252,7 @@ absl::StatusOr StructType::TypeNameWithModifiers( absl::StatusOr StructType::ValidateAndResolveTypeParameters( const std::vector& type_parameter_values, ProductMode mode) const { - return MakeSqlError() << ShortTypeName(mode) + return MakeSqlError() << ShortTypeName(mode, /*use_external_float32=*/false) << " type cannot have type parameters by itself, it " "can only have type parameters on its struct fields"; } @@ -539,7 +542,8 @@ std::string StructType::GetFormatPrefix( prefix.push_back('('); break; case Type::FormatValueContentOptions::Mode::kSQLExpression: - prefix.append(TypeName(options.product_mode)); + prefix.append( + TypeName(options.product_mode, options.use_external_float32)); prefix.push_back('('); break; } diff --git a/zetasql/public/types/struct_type.h b/zetasql/public/types/struct_type.h index 99bc03fd2..ce0bc3082 100644 --- a/zetasql/public/types/struct_type.h +++ b/zetasql/public/types/struct_type.h @@ -105,13 +105,27 @@ class StructType : public ContainerType { bool UsingFeatureV12CivilTimeType() const override; - std::string ShortTypeName(ProductMode mode) const override; - std::string TypeName(ProductMode mode) const override; + std::string ShortTypeName(ProductMode mode, + bool use_external_float32) const override; + std::string ShortTypeName(ProductMode mode) const override { + return ShortTypeName(mode, /*use_external_float32=*/false); + }; + std::string TypeName(ProductMode mode, + bool use_external_float32) const override; + std::string TypeName(ProductMode mode) const override { + return TypeName(mode, /*use_external_float32=*/false); + } // Same as above, but the type modifier values are appended to the SQL name // for this StructType. absl::StatusOr TypeNameWithModifiers( - const TypeModifiers& type_modifiers, ProductMode mode) const override; + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32) const override; + absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode) const override { + return TypeNameWithModifiers(type_modifiers, mode, + /*use_external_float32=*/false); + } int nesting_depth() const override { return nesting_depth_; } diff --git a/zetasql/public/types/type.cc b/zetasql/public/types/type.cc index 374c94571..780b2a124 100644 --- a/zetasql/public/types/type.cc +++ b/zetasql/public/types/type.cc @@ -37,6 +37,7 @@ #include "zetasql/public/options.pb.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/types/array_type.h" +#include "zetasql/public/types/map_type.h" #include "zetasql/public/types/range_type.h" #include "zetasql/public/types/simple_type.h" #include "zetasql/public/types/struct_type.h" @@ -140,8 +141,12 @@ static const auto& GetTypeKindInfoMap() { {"JSON", 26, 26, true }}, {TYPE_INTERVAL, {"INTERVAL", 27, 27, true }}, + {TYPE_TOKENLIST, + {"TOKENLIST", 28, 28, true }}, {TYPE_RANGE, {"RANGE", 29, 29, false }}, + {TYPE_MAP, + {"MAP", 31, 31, false }}, // clang-format on // When a new entry is added here, update // TypeTest::VerifyCostAndSpecificity. @@ -183,7 +188,8 @@ TypeKind Type::ResolveBuiltinTypeNameToKindIfSimple( &language_options.GetEnabledLanguageFeatures()); } -std::string Type::TypeKindToString(TypeKind kind, ProductMode mode) { +std::string Type::TypeKindToString(TypeKind kind, ProductMode mode, + bool use_external_float32) { // Note that for types not externally supported we still want to produce // the internal names for them. This is because during development // we want error messages to indicate what the unsupported type actually @@ -192,6 +198,9 @@ std::string Type::TypeKindToString(TypeKind kind, ProductMode mode) { if (ABSL_PREDICT_TRUE(GetTypeKindInfoMap().contains(kind))) { if (mode == PRODUCT_EXTERNAL && kind == TYPE_DOUBLE) { return "FLOAT64"; + } else if (mode == PRODUCT_EXTERNAL && use_external_float32 && + kind == TYPE_FLOAT) { + return "FLOAT32"; } return GetTypeKindInfoMap().at(kind).name; } @@ -199,20 +208,22 @@ std::string Type::TypeKindToString(TypeKind kind, ProductMode mode) { } std::string Type::TypeKindListToString(const std::vector& kinds, - ProductMode mode) { + ProductMode mode, + bool use_external_float32) { std::vector kind_strings; kind_strings.reserve(kinds.size()); for (const TypeKind& kind : kinds) { - kind_strings.push_back(TypeKindToString(kind, mode)); + kind_strings.push_back(TypeKindToString(kind, mode, use_external_float32)); } return absl::StrJoin(kind_strings, ", "); } -std::string Type::TypeListToString(TypeListView types, ProductMode mode) { +std::string Type::TypeListToString(TypeListView types, ProductMode mode, + bool use_external_float32) { std::vector type_strings; type_strings.reserve(types.size()); for (const Type* type : types) { - type_strings.push_back(type->ShortTypeName(mode)); + type_strings.push_back(type->ShortTypeName(mode, use_external_float32)); } return absl::StrJoin(type_strings, ", "); } @@ -409,6 +420,8 @@ std::string Type::CapitalizedName() const { return "BigNumeric"; case TYPE_JSON: return "Json"; + case TYPE_TOKENLIST: + return "TokenList"; case TYPE_RANGE: // TODO: Consider moving to the types library and audit use of // DebugString. @@ -416,6 +429,9 @@ std::string Type::CapitalizedName() const { // in zetasql::Value. return absl::StrCat("Range<", this->AsRange()->element_type()->DebugString(), ">"); + case TYPE_MAP: + return absl::StrCat("Map<", GetMapKeyType(this)->CapitalizedName(), ", ", + GetMapValueType(this)->CapitalizedName(), ">"); case TYPE_ENUM: { if (AsEnum()->IsOpaque()) { return AsEnum()->ShortTypeName(ProductMode::PRODUCT_EXTERNAL); @@ -497,6 +513,7 @@ bool Type::SupportsPartitioningImpl(const LanguageOptions& language_options, const Type** no_partitioning_type) const { bool supports_partitioning = !this->IsGeography() && !this->IsFloatingPoint() && !this->IsJson(); + if (this->IsTokenList()) supports_partitioning = false; if (no_partitioning_type != nullptr) { *no_partitioning_type = supports_partitioning ? nullptr : this; @@ -507,6 +524,7 @@ bool Type::SupportsPartitioningImpl(const LanguageOptions& language_options, bool Type::SupportsOrdering(const LanguageOptions& language_options, std::string* type_description) const { bool supports_ordering = !IsGeography() && !IsJson(); + if (this->IsTokenList()) supports_ordering = false; if (supports_ordering) return true; if (type_description != nullptr) { *type_description = TypeKindToString(this->kind(), @@ -543,6 +561,25 @@ bool Type::SupportsEquality( return this->SupportsEquality(); } +bool Type::SupportsReturning(const LanguageOptions& language_options, + std::string* type_description) const { + switch (this->kind()) { + case TYPE_ARRAY: + return this->AsArray()->element_type()->SupportsReturning( + language_options, type_description); + case TYPE_STRUCT: + for (const StructField& field : this->AsStruct()->fields()) { + if (!field.type->SupportsReturning(language_options, + type_description)) { + return false; + } + } + return true; + default: + return true; + } +} + void Type::CopyValueContent(const ValueContent& from, ValueContent* to) const { *to = from; } @@ -558,10 +595,10 @@ absl::HashState Type::Hash(absl::HashState state) const { absl::Status Type::TypeMismatchError(const ValueProto& value_proto) const { return absl::Status( absl::StatusCode::kInternal, - absl::StrCat("Type mismatch: provided type ", DebugString(), - " but proto <", - value_proto.ShortDebugString(), - "> doesn't have field of that type and is not null")); + absl::StrCat( + "Type mismatch: provided type ", DebugString(), " but proto <", + value_proto.ShortDebugString(), + "> doesn't have field of that type and is not null")); } bool TypeEquals::operator()(const Type* const type1, @@ -612,7 +649,7 @@ absl::Status Type::ValidateResolvedTypeParameters( return absl::OkStatus(); } -std::string Type::AddCapitalizedTypePrefix(const std::string& input, +std::string Type::AddCapitalizedTypePrefix(absl::string_view input, bool is_null) const { if (kind() == TYPE_PROTO && !is_null) { // Proto types wrap their values using curly brackets, so don't need diff --git a/zetasql/public/types/type.h b/zetasql/public/types/type.h index b05f2e1a9..ad8c28d39 100644 --- a/zetasql/public/types/type.h +++ b/zetasql/public/types/type.h @@ -33,6 +33,7 @@ #include "google/protobuf/descriptor.h" #include "zetasql/common/float_margin.h" #include "zetasql/public/options.pb.h" +#include "zetasql/public/token_list.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/types/timestamp_util.h" #include "zetasql/public/types/value_equality_check_options.h" @@ -53,6 +54,7 @@ class ArrayType; class EnumType; class ExtendedType; class LanguageOptions; +class MapType; class ProtoType; class RangeType; class StructType; @@ -105,7 +107,9 @@ class Type { bool IsNumericType() const { return kind_ == TYPE_NUMERIC; } bool IsBigNumericType() const { return kind_ == TYPE_BIGNUMERIC; } bool IsJsonType() const { return kind_ == TYPE_JSON; } + bool IsTokenListType() const { return kind_ == TYPE_TOKENLIST; } bool IsRangeType() const { return kind_ == TYPE_RANGE; } + bool IsMapType() const { return kind_ == TYPE_MAP; } // DEPRECATED, use UsingFeatureV12CivilTimeType() instead. // @@ -137,12 +141,14 @@ class Type { bool IsGeography() const { return kind_ == TYPE_GEOGRAPHY; } bool IsJson() const { return kind_ == TYPE_JSON; } + bool IsTokenList() const { return kind_ == TYPE_TOKENLIST; } bool IsEnum() const { return kind_ == TYPE_ENUM; } bool IsArray() const { return kind_ == TYPE_ARRAY; } bool IsStruct() const { return kind_ == TYPE_STRUCT; } bool IsProto() const { return kind_ == TYPE_PROTO; } bool IsStructOrProto() const { return IsStruct() || IsProto(); } bool IsRange() const { return kind_ == TYPE_RANGE; } + bool IsMap() const { return kind_ == TYPE_MAP; } bool IsFloatingPoint() const { return IsFloat() || IsDouble(); } bool IsNumerical() const { @@ -210,6 +216,7 @@ class Type { virtual const EnumType* AsEnum() const { return nullptr; } virtual const ExtendedType* AsExtendedType() const { return nullptr; } virtual const RangeType* AsRange() const { return nullptr; } + virtual const MapType* AsMap() const { return nullptr; } // Returns true if the type supports grouping with respect to the // 'language_options'. E.g. struct type supports grouping if the @@ -269,9 +276,28 @@ class Type { if (IsGeography() || IsJson()) { return false; } + if (IsTokenList()) { + return false; + } return true; } + // Returns true if the type supports can be returned from the top level + // of any query, function, or other surface boundary with respect to + // 'language_options'. If the type is a compound type, also recursively + // check its field types. Specifically, this includes: + // + // 1) Output column list of ResolvedQueryStmt + // 2) Output column list from TVFs and Views + // 3) Column defs in ResolvedCreateTableStmt/ResolvedCreateTableAsSelectStmt + // 4) Return type of UDFs + // 5) Returning clause of DML statements + // + // When this returns false and 'type_description' is not null, also returns + // in 'type_description' a description of the type. + bool SupportsReturning(const LanguageOptions& language_options, + std::string* type_description = nullptr) const; + // Returns true if type supports equality with respect to the // 'language_options'. E.g. array type supports equality if the // FEATURE_V_1_1_ARRAY_EQUALITY option is enabled. @@ -413,19 +439,44 @@ class Type { // messages; for a parseable type name, use TypeName, and for logging, use // DebugString. For proto-based types, this just returns the type name, which // does not easily distinguish PROTOs from ENUMs. + // Setting the optional parameter `use_external_float32` to true will return + // FLOAT32 as the type name for TYPE_FLOAT, for PRODUCT_EXTERNAL mode. + // TODO: Remove `use_external_float32` once all engines are + // updated. virtual std::string ShortTypeName(ProductMode mode) const; + virtual std::string ShortTypeName(ProductMode mode, + bool use_external_float32_unused) const { + return ShortTypeName(mode); + } // Same as above, but returns a SQL name that is reparseable as part of a // query. This is not intended for user-facing informational or error // messages. + // Setting the optional parameter `use_external_float32` to true will return + // FLOAT32 as the type name for TYPE_FLOAT, for PRODUCT_EXTERNAL mode. + // TODO: Remove `use_external_float32` once all engines are + // updated. virtual std::string TypeName(ProductMode mode) const = 0; + virtual std::string TypeName(ProductMode mode, + bool use_external_float32_unused) const { + return TypeName(mode); + } // Same as above, but if contains non-empty modifiers, then // these modifiers are included with the SQL name for this type. The output is // reparseable as part of a query. If contains modifiers that // are not invalid for the given Type, then an error status will be returned. + // Setting the optional parameter `use_external_float32` to true will return + // FLOAT32 as the type name for TYPE_FLOAT, for PRODUCT_EXTERNAL mode. + // TODO: Remove `use_external_float32` once all engines are + // updated. virtual absl::StatusOr TypeNameWithModifiers( const TypeModifiers& type_modifiers, ProductMode mode) const = 0; + virtual absl::StatusOr TypeNameWithModifiers( + const TypeModifiers& type_modifiers, ProductMode mode, + bool use_external_float32_unused) const { + return TypeNameWithModifiers(type_modifiers, mode); + } // Returns the full description of the type without truncation. This should // only be used for logging or tests and not for any user-facing messages. For @@ -436,7 +487,7 @@ class Type { // Adds capitalized type name to a given string. // TODO Remove this method and use DebugString instead. - std::string AddCapitalizedTypePrefix(const std::string& input, + std::string AddCapitalizedTypePrefix(absl::string_view input, bool is_null) const; // Check if this type contains a field with the given name. @@ -478,13 +529,26 @@ class Type { static bool IsSimpleType(TypeKind kind); - static std::string TypeKindToString(TypeKind kind, ProductMode mode); + static std::string TypeKindToString(TypeKind kind, ProductMode mode, + bool use_external_float32); + static std::string TypeKindToString(TypeKind kind, ProductMode mode) { + return TypeKindToString(kind, mode, /*use_external_float32=*/false); + } + static std::string TypeKindListToString(const std::vector& kinds, + ProductMode mode, + bool use_external_float32); static std::string TypeKindListToString(const std::vector& kinds, - ProductMode mode); + ProductMode mode) { + return TypeKindListToString(kinds, mode, /*use_external_float32=*/false); + } // Returns comma-separated list of names of given . Type name is // generated using Type::ShortTypeName(). - static std::string TypeListToString(TypeListView types, ProductMode mode); + static std::string TypeListToString(TypeListView types, ProductMode mode, + bool use_external_float32); + static std::string TypeListToString(TypeListView types, ProductMode mode) { + return TypeListToString(types, mode, /*use_external_float32=*/false); + } // Returns the type kind if 'type_name' is a simple type in 'mode', assuming // all language features are enabled. Returns TYPE_UNKNOWN otherwise. @@ -625,6 +689,13 @@ class Type { bool add_simple_type_prefix() const { return mode != Mode::kDebug; } ProductMode product_mode = ProductMode::PRODUCT_EXTERNAL; + + // Setting `use_external_float32` to true will return + // FLOAT32 as the type name for TYPE_FLOAT. + // TODO: Remove `use_external_float32` once all engines are + // updated. + bool use_external_float32 = false; + Mode mode = Mode::kDebug; bool verbose = false; // Used with debug mode only. @@ -632,12 +703,28 @@ class Type { bool include_array_ordereness = false; int indent = 0; + bool collapse_identical_tokens = false; + FormatValueContentOptions IncreaseIndent(); // Number of columns per indentation. static const int kIndentStep = 2; }; + std::string FormatValueContentContainerElement( + const internal::ValueContentContainerElement element, const Type* type, + const FormatValueContentOptions& options) const { + if (element.is_null()) { + return options.as_literal() + ? "NULL" + : absl::StrCat("CAST(NULL AS ", + type->TypeName(options.product_mode, + options.use_external_float32), + ")"); + } + return type->FormatValueContent(element.value_content(), options); + } + // List of DebugStringImpl outputs. Used to serve as a stack in // DebugStringImpl to protect from stack overflows. // Note: SWIG will fail to process this file if we remove a white space @@ -708,6 +795,10 @@ class Type { FRIEND_TEST(TypeTest, FormatValueContentStructSQLExpressionMode); FRIEND_TEST(TypeTest, FormatValueContentStructDebugMode); FRIEND_TEST(TypeTest, FormatValueContentStructWithAnonymousFieldsDebugMode); + FRIEND_TEST(MapTest, FormatValueContentSQLLiteralMode); + FRIEND_TEST(MapTest, FormatValueContentSQLExpressionMode); + FRIEND_TEST(MapTestFormatValueContentDebugMode, FormatValueContentDebugMode); + FRIEND_TEST(MapTest, FormatValueContentDebugModeEmptyMap); // Copies value's content to another value. Is called when one value is // assigned to another. It's expected that content of destination is empty @@ -786,6 +877,7 @@ class Type { friend class ArrayType; friend class StructType; friend class RangeType; + friend class MapType; const internal::TypeStore* type_store_; // Used for lifetime checking only. const TypeKind kind_; diff --git a/zetasql/public/types/type_deserializer.cc b/zetasql/public/types/type_deserializer.cc index 85bdb2843..0064371e5 100644 --- a/zetasql/public/types/type_deserializer.cc +++ b/zetasql/public/types/type_deserializer.cc @@ -28,6 +28,7 @@ #include "zetasql/public/types/array_type.h" #include "zetasql/public/types/enum_type.h" #include "zetasql/public/types/extended_type.h" +#include "zetasql/public/types/map_type.h" #include "zetasql/public/types/proto_type.h" #include "zetasql/public/types/struct_type.h" #include "zetasql/public/types/type.h" @@ -49,6 +50,7 @@ absl::Status ValidateTypeProto(const TypeProto& type_proto) { (type_proto.type_kind() == TYPE_PROTO) != type_proto.has_proto_type() || (type_proto.type_kind() == TYPE_STRUCT) != type_proto.has_struct_type() || (type_proto.type_kind() == TYPE_RANGE) != type_proto.has_range_type() || + (type_proto.type_kind() == TYPE_MAP) != type_proto.has_map_type() || type_proto.type_kind() == __TypeKind__switch_must_have_a_default__) { if (type_proto.type_kind() != TYPE_GEOGRAPHY) { auto type_proto_debug_str = type_proto.DebugString(); @@ -192,6 +194,13 @@ absl::StatusOr TypeDeserializer::Deserialize( return range_type; } + case TYPE_MAP: { + ZETASQL_ASSIGN_OR_RETURN(const Type* key_type, + Deserialize(type_proto.map_type().key_type())); + ZETASQL_ASSIGN_OR_RETURN(const Type* value_type, + Deserialize(type_proto.map_type().value_type())); + return type_factory_->MakeMapType(key_type, value_type); + } default: return ::zetasql_base::UnimplementedErrorBuilder() << "Making Type of kind " diff --git a/zetasql/public/types/type_factory.cc b/zetasql/public/types/type_factory.cc index f27d0d979..3e6cabd59 100644 --- a/zetasql/public/types/type_factory.cc +++ b/zetasql/public/types/type_factory.cc @@ -49,6 +49,7 @@ #include "zetasql/public/types/array_type.h" #include "zetasql/public/types/enum_type.h" #include "zetasql/public/types/internal_utils.h" +#include "zetasql/public/types/map_type.h" #include "zetasql/public/types/proto_type.h" #include "zetasql/public/types/range_type.h" #include "zetasql/public/types/simple_type.h" @@ -73,6 +74,7 @@ #include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" +#include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" ABSL_FLAG(int32_t, zetasql_type_factory_nesting_depth_limit, @@ -180,6 +182,33 @@ absl::Status TypeFactoryHelper::MakeOpaqueEnumType( } // namespace internal +// The set of types for which the static type factory should be used during +// Array and Map type construction. +static const auto* StaticTypeSet() { + static const auto* kStaticTypeSet = new absl::flat_hash_set{ + types::Int32Type(), + types::Int64Type(), + types::Uint32Type(), + types::Uint64Type(), + types::BoolType(), + types::FloatType(), + types::DoubleType(), + types::StringType(), + types::BytesType(), + types::TimestampType(), + types::DateType(), + types::DatetimeType(), + types::TimeType(), + types::IntervalType(), + types::GeographyType(), + types::NumericType(), + types::BigNumericType(), + types::JsonType(), + types::TokenListType(), + }; + return kStaticTypeSet; +} + // Staticly initialize a few commonly used types. static TypeFactory* s_type_factory() { static TypeFactory* s_type_factory = new TypeFactory(); @@ -242,6 +271,7 @@ int64_t TypeFactory::GetEstimatedOwnedMemoryBytesSize() const { internal::GetExternallyAllocatedMemoryEstimate(cached_proto_types_) + internal::GetExternallyAllocatedMemoryEstimate(cached_enum_types_) + internal::GetExternallyAllocatedMemoryEstimate(cached_range_types_) + + internal::GetExternallyAllocatedMemoryEstimate(cached_map_types_) + internal::GetExternallyAllocatedMemoryEstimate( cached_proto_types_with_catalog_name_) + internal::GetExternallyAllocatedMemoryEstimate( @@ -302,6 +332,7 @@ const Type* TypeFactory::get_geography() { return types::GeographyType(); } const Type* TypeFactory::get_numeric() { return types::NumericType(); } const Type* TypeFactory::get_bignumeric() { return types::BigNumericType(); } const Type* TypeFactory::get_json() { return types::JsonType(); } +const Type* TypeFactory::get_tokenlist() { return types::TokenListType(); } const Type* TypeFactory::MakeSimpleType(TypeKind kind) { ABSL_CHECK(Type::IsSimpleType(kind)) @@ -313,27 +344,7 @@ const Type* TypeFactory::MakeSimpleType(TypeKind kind) { absl::Status TypeFactory::MakeArrayType(const Type* element_type, const ArrayType** result) { - static const auto* kStaticTypeSet = new absl::flat_hash_set{ - types::Int32Type(), - types::Int64Type(), - types::Uint32Type(), - types::Uint64Type(), - types::BoolType(), - types::FloatType(), - types::DoubleType(), - types::StringType(), - types::BytesType(), - types::TimestampType(), - types::DateType(), - types::DatetimeType(), - types::TimeType(), - types::IntervalType(), - types::GeographyType(), - types::NumericType(), - types::BigNumericType(), - types::JsonType(), - }; - if (this != s_type_factory() && kStaticTypeSet->contains(element_type)) { + if (this != s_type_factory() && StaticTypeSet()->contains(element_type)) { return s_type_factory()->MakeArrayType(element_type, result); } @@ -493,6 +504,7 @@ absl::Status TypeFactory::MakeProtoType( absl::Status TypeFactory::MakeEnumType( const google::protobuf::EnumDescriptor* enum_descriptor, const EnumType** result, absl::Span catalog_name_path) { + ZETASQL_RET_CHECK_NE(enum_descriptor, nullptr); *result = MakeEnumTypeImpl(enum_descriptor, catalog_name_path, /*is_opaque=*/false); return absl::OkStatus(); @@ -501,6 +513,7 @@ absl::Status TypeFactory::MakeEnumType( absl::Status TypeFactory::MakeEnumType( const google::protobuf::EnumDescriptor* enum_descriptor, const Type** result, absl::Span catalog_name_path) { + ZETASQL_RET_CHECK_NE(enum_descriptor, nullptr); *result = MakeEnumTypeImpl(enum_descriptor, catalog_name_path, /*is_opaque=*/false); return absl::OkStatus(); @@ -525,12 +538,12 @@ absl::Status TypeFactory::MakeRangeType(const Type* element_type, << "> is not supported"; } - static const auto* kStaticTypeSet = new absl::flat_hash_set{ + static const auto* kStaticRangeTypeSet = new absl::flat_hash_set{ types::TimestampType(), types::DateType(), types::DatetimeType(), }; - ABSL_DCHECK(kStaticTypeSet->contains(element_type)); + ABSL_DCHECK(kStaticRangeTypeSet->contains(element_type)); if (this != s_type_factory()) { return s_type_factory()->MakeRangeType(element_type, result); } @@ -576,6 +589,38 @@ absl::Status TypeFactory::MakeRangeType(const google::protobuf::FieldDescriptor* return absl::OkStatus(); } +// Note: all future MakeType methods should use absl::StatusOr. +absl::StatusOr TypeFactory::MakeMapType(const Type* key_type, + const Type* value_type) { + if (this != s_type_factory() && StaticTypeSet()->contains(key_type) && + StaticTypeSet()->contains(value_type)) { + return s_type_factory()->MakeMapType(key_type, value_type); + } + + AddDependency(key_type); + AddDependency(value_type); + + const int depth_limit = nesting_depth_limit(); + if (std::max(key_type->nesting_depth(), value_type->nesting_depth()) + 1 > + depth_limit) { + return ::zetasql_base::InvalidArgumentErrorBuilder() + << "Map type would exceed nesting depth limit of " << depth_limit; + } + + // Cannot use TypeFactory::MakeTypeWithChildElementType here because we have a + // pair of types. + absl::MutexLock lock(&store_->mutex_); + auto type_pair = std::make_pair(key_type, value_type); + auto it = cached_map_types_.find(type_pair); + if (it == cached_map_types_.end()) { + auto [inserted_it, _] = cached_map_types_.insert( + {type_pair, + TakeOwnershipLocked(new MapType(this, key_type, value_type))}); + it = inserted_it; + } + return it->second; +} + absl::StatusOr TypeFactory::InternalizeExtendedType( std::unique_ptr extended_type) { ZETASQL_RET_CHECK(extended_type); @@ -968,6 +1013,12 @@ static const Type* s_json_type() { return s_json_type; } +static const Type* s_tokenlist_type() { + static const Type* s_tokenlist_type = + new SimpleType(s_type_factory(), TYPE_TOKENLIST); + return s_tokenlist_type; +} + static const EnumType* s_date_part_enum_type() { static const EnumType* s_date_part_enum_type = [] { const EnumType* enum_type; @@ -1171,6 +1222,12 @@ static const ArrayType* s_json_array_type() { return s_json_array_type; } +static const ArrayType* s_tokenlist_array_type() { + static const ArrayType* s_tokenlist_array_type = + MakeArrayType(s_type_factory()->get_tokenlist()); + return s_tokenlist_array_type; +} + static const EnumType* GetArrayZipModeEnumType() { static const EnumType* s_array_zip_mode_enum_type = [] { const EnumType* enum_type; @@ -1204,6 +1261,7 @@ const Type* GeographyType() { return s_geography_type(); } const Type* NumericType() { return s_numeric_type(); } const Type* BigNumericType() { return s_bignumeric_type(); } const Type* JsonType() { return s_json_type(); } +const Type* TokenListType() { return s_tokenlist_type(); } const StructType* EmptyStructType() { return s_empty_struct_type(); } const EnumType* DatePartEnumType() { return s_date_part_enum_type(); } const EnumType* NormalizeModeEnumType() { return s_normalize_mode_enum_type(); } @@ -1246,6 +1304,8 @@ const ArrayType* BigNumericArrayType() { return s_bignumeric_array_type(); } const ArrayType* JsonArrayType() { return s_json_array_type(); } +const ArrayType* TokenListArrayType() { return s_tokenlist_array_type(); } + const Type* TypeFromSimpleTypeKind(TypeKind type_kind) { switch (type_kind) { case TYPE_INT32: @@ -1284,6 +1344,8 @@ const Type* TypeFromSimpleTypeKind(TypeKind type_kind) { return BigNumericType(); case TYPE_JSON: return JsonType(); + case TYPE_TOKENLIST: + return TokenListType(); default: ZETASQL_VLOG(1) << "Could not build static Type from type: " << Type::TypeKindToString(type_kind, PRODUCT_INTERNAL); @@ -1329,6 +1391,8 @@ const ArrayType* ArrayTypeFromSimpleTypeKind(TypeKind type_kind) { return BigNumericArrayType(); case TYPE_JSON: return JsonArrayType(); + case TYPE_TOKENLIST: + return TokenListArrayType(); default: ZETASQL_VLOG(1) << "Could not build static ArrayType from type: " << Type::TypeKindToString(type_kind, PRODUCT_INTERNAL); diff --git a/zetasql/public/types/type_factory.h b/zetasql/public/types/type_factory.h index dbae7acc0..98f2e1d2f 100644 --- a/zetasql/public/types/type_factory.h +++ b/zetasql/public/types/type_factory.h @@ -214,6 +214,7 @@ class TypeFactory { const Type* get_numeric(); const Type* get_bignumeric(); const Type* get_json(); + const Type* get_tokenlist(); // Return a Type object for a simple type. This works for all // non-parameterized scalar types. Enums, arrays, structs and protos must @@ -226,8 +227,7 @@ class TypeFactory { // created the must outlive this TypeFactory. absl::Status MakeArrayType(const Type* element_type, const ArrayType** result); - absl::Status MakeArrayType(const Type* element_type, - const Type** result); + absl::Status MakeArrayType(const Type* element_type, const Type** result); // Make a struct type. // The field names must be valid. @@ -268,6 +268,12 @@ class TypeFactory { absl::Status MakeRangeType(const google::protobuf::FieldDescriptor* field, const Type** result); + // Make a map type. + // must support grouping for the type to be supported. + // can be any type. + absl::StatusOr MakeMapType(const Type* key_type, + const Type* value_type); + // Stores the unique copy of an ExtendedType in the TypeFactory. If such // extended type already exists in the cache, frees `extended_type` and // returns a pointer to existing type. Otherwise, returns a pointer to added @@ -329,13 +335,6 @@ class TypeFactory { absl::Status GetProtoFieldType( bool ignore_annotations, const google::protobuf::FieldDescriptor* field_descr, absl::Span catalog_name_path, const Type** type); - ABSL_DEPRECATED("Inline me!") - absl::Status GetProtoFieldType(bool ignore_annotations, - const google::protobuf::FieldDescriptor* field_descr, - const Type** type) { - return GetProtoFieldType(ignore_annotations, field_descr, - /*catalog_name_path=*/{}, type); - } // Get the Type for a proto field. // This is the same as the above signature with ignore_annotations = false. @@ -349,16 +348,6 @@ class TypeFactory { catalog_name_path, type); } - // Get the Type for a proto field. - // This is the same as the above signature with = false - // and an empty . - ABSL_DEPRECATED("Inline me!") - absl::Status GetProtoFieldType(const google::protobuf::FieldDescriptor* field_descr, - const Type** type) { - return GetProtoFieldType(/*ignore_annotations=*/false, field_descr, - /*catalog_name_path=*/{}, type); - } - // DEPRECATED: Callers should remove their dependencies on obsolete types and // move to the method above. ABSL_DEPRECATED("Obsolete timestamp types are deprecated") @@ -547,6 +536,9 @@ class TypeFactory { absl::flat_hash_map cached_range_types_ ABSL_GUARDED_BY(store_->mutex_); + absl::flat_hash_map, const Type*> + cached_map_types_ ABSL_GUARDED_BY(store_->mutex_); + // The key is a descriptor and a catalog name path. absl::flat_hash_map< std::pair, @@ -596,6 +588,7 @@ const Type* GeographyType(); const Type* NumericType(); const Type* BigNumericType(); const Type* JsonType(); +const Type* TokenListType(); const StructType* EmptyStructType(); // ArrayTypes @@ -617,6 +610,7 @@ const ArrayType* GeographyArrayType(); const ArrayType* NumericArrayType(); const ArrayType* BigNumericArrayType(); const ArrayType* JsonArrayType(); +const ArrayType* TokenListArrayType(); // RangeTypes const RangeType* DateRangeType(); diff --git a/zetasql/public/types/value_representations.h b/zetasql/public/types/value_representations.h index 95fec67c1..581b70b7a 100644 --- a/zetasql/public/types/value_representations.h +++ b/zetasql/public/types/value_representations.h @@ -23,11 +23,13 @@ #include #include #include +#include #include "zetasql/base/logging.h" #include "zetasql/public/interval_value.h" #include "zetasql/public/json_value.h" #include "zetasql/public/numeric_value.h" +#include "zetasql/public/token_list.h" #include "zetasql/public/value_content.h" #include "absl/strings/cord.h" #include "absl/types/optional.h" @@ -88,6 +90,24 @@ class ValueContentContainer { } }; +// Interface for the Map type to access map value content. +class ValueContentMap { + public: + virtual ~ValueContentMap() = default; + virtual int64_t num_elements() const = 0; + virtual uint64_t physical_byte_size() const = 0; + virtual std::vector> + value_content_entries() const = 0; + + // Returns this container as const SubType*. Must only be used when it + // is known that the object *is* this subclass. + template + const SubType* GetAs() const { + return static_cast(this); + } +}; + // ------------------------------------------------------- // ValueContentContainerRef is a ref count wrapper around a pointer to // ValueContentContainer. @@ -288,6 +308,50 @@ class JSONRef final std::variant value_; }; +// ------------------------------------------------------------- +// TokenListRef is ref count wrapper around tokens::TokenList. +// ------------------------------------------------------------- +class TokenListRef final + : public zetasql_base::refcount::CompactReferenceCounted { + public: + TokenListRef() {} + explicit TokenListRef(tokens::TokenList value) : value_(std::move(value)) {} + + TokenListRef(const TokenListRef&) = delete; + TokenListRef& operator=(const TokenListRef&) = delete; + + const tokens::TokenList& value() { return value_; } + uint64_t physical_byte_size() const { + return sizeof(TokenListRef) + value_.SpaceUsed() - sizeof(value_); + } + + private: + tokens::TokenList value_; +}; + +// ------------------------------------------------------- +// ValueContentMapRef is a ref count wrapper around a pointer to +// ValueContentMap. +// ------------------------------------------------------- +class ValueContentMapRef final + : public zetasql_base::refcount::CompactReferenceCounted { + public: + explicit ValueContentMapRef(std::unique_ptr map) + : map_(std::move(map)) {} + + ValueContentMapRef(const ValueContentMapRef&) = delete; + ValueContentMapRef& operator=(const ValueContentMapRef&) = delete; + + ValueContentMap* value() const { return map_.get(); } + + uint64_t physical_byte_size() const { + return sizeof(ValueContentMapRef) + map_->physical_byte_size(); + } + + private: + const std::unique_ptr map_; +}; + } // namespace internal } // namespace zetasql diff --git a/zetasql/public/value.cc b/zetasql/public/value.cc index a5b1f7d53..509385f9a 100644 --- a/zetasql/public/value.cc +++ b/zetasql/public/value.cc @@ -35,6 +35,7 @@ #include "google/protobuf/descriptor.h" #include "google/protobuf/dynamic_message.h" #include "google/protobuf/message.h" +#include "zetasql/common/errors.h" #include "zetasql/common/thread_stack.h" #include "zetasql/public/functions/comparison.h" #include "zetasql/public/functions/convert_string.h" @@ -42,12 +43,17 @@ #include "zetasql/public/options.pb.h" #include "zetasql/public/type.h" #include "zetasql/public/type.pb.h" +#include "zetasql/public/types/map_type.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/public/types/value_equality_check_options.h" +#include "zetasql/public/types/value_representations.h" #include "zetasql/public/value_content.h" #include "absl/base/optimization.h" +#include "absl/container/flat_hash_set.h" #include "absl/flags/flag.h" #include "absl/hash/hash.h" +#include "zetasql/base/check.h" +#include "absl/log/log.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/cord.h" @@ -59,7 +65,9 @@ #include "absl/strings/str_split.h" #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" +#include "absl/time/civil_time.h" #include "absl/time/time.h" +#include "zetasql/base/map_view.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -104,6 +112,11 @@ void Value::SetMetadataForNonSimpleType(const Type* type, bool is_null, Value::Value(const Type* type, bool is_null, OrderPreservationKind order_kind) { ABSL_CHECK(type != nullptr); + if (type->IsMap()) { + ABSL_DCHECK(order_kind == kIgnoresOrder); + order_kind = kIgnoresOrder; + } + if (type->IsSimpleType()) { metadata_ = Metadata(type->kind(), is_null, order_kind, /*value_extended_content=*/0); @@ -202,7 +215,7 @@ Value::Value(const EnumType* enum_type, int64_t value, Value::Value(const EnumType* enum_type, absl::string_view name) { int32_t number; - if (enum_type->FindNumber(std::string(name), &number)) { + if (enum_type->FindNumber(name, &number)) { SetMetadataForNonSimpleType(enum_type); enum_value_ = static_cast(number); } else { @@ -297,6 +310,49 @@ absl::StatusOr Value::MakeRangeInternal(bool is_validated, return result; } +absl::StatusOr Value::MakeMapInternal( + const Type* type, std::vector> map_entries) { + const Type* key_type = GetMapKeyType(type); + const Type* value_type = GetMapValueType(type); + if (kDebugMode) { + for (const auto& [key, value] : map_entries) { + ZETASQL_RET_CHECK(key.is_valid() && key.type()->Equals(key_type) && + value.is_valid() && value.type()->Equals(value_type)) + << absl::StrCat( + "Values with invalid types provided to map. Expected ", + type->DebugString(), " but got an entry with key of ", + key.type()->DebugString(), " and value of ", + value.type()->DebugString(), "."); + } + } + + Value result(type, /*is_null=*/false, kIgnoresOrder); + result.map_ptr_ = + new internal::ValueContentMapRef(std::make_unique(map_entries)); + + // If map_entries size is different from map size, we can infer that a + // nonzero amount of keys are duplicates. + if (map_entries.size() != result.map_ptr_->value()->num_elements()) { + absl::flat_hash_set map_entries_set; + for (const auto& [key, value] : map_entries) { + const auto& [unused_iter, inserted] = map_entries_set.insert(key); + if (!inserted) { + return MakeEvalError() + << "Duplicate map keys are not allowed, but got multiple " + "instances of key: " + << key.Format(/*print_top_level_type=*/false); + } + } + // If execution gets here without finding the key, something is wrong with + // the logic. + return MakeEvalError() + << "Duplicate map keys are not allowed, but got multiple " + "instances of a key"; + } + + return result; +} + const Type* Value::type() const { ABSL_CHECK(is_valid()) << DebugString(); return metadata_.type(); @@ -322,6 +378,18 @@ const std::vector& Value::elements() const { return list_ptr->values(); } +zetasql_base::MapView Value::map_entries() const { + if (metadata_.type_kind() != TYPE_MAP || is_null()) { + ABSL_DCHECK_EQ(TYPE_MAP, metadata_.type_kind()); + ABSL_DCHECK(!is_null()) << "Null value"; + + static const absl::flat_hash_map* empty_map = + new absl::flat_hash_map(); + return *empty_map; + } + return static_cast(map_ptr_->value())->entries(); +} + Value Value::TimestampFromUnixMicros(int64_t v) { ABSL_CHECK(functions::IsValidTimestamp(v, functions::kMicroseconds)) << v; return Value(absl::FromUnixMicros(v)); @@ -370,11 +438,16 @@ std::string Value::EnumDisplayName() const { int64_t Value::ToInt64() const { ABSL_CHECK(!is_null()) << "Null value"; switch (metadata_.type_kind()) { - case TYPE_INT64: return int64_value_; - case TYPE_INT32: return int32_value_; - case TYPE_UINT32: return uint32_value_; - case TYPE_BOOL: return bool_value_; - case TYPE_DATE: return int32_value_; + case TYPE_INT64: + return int64_value_; + case TYPE_INT32: + return int32_value_; + case TYPE_UINT32: + return uint32_value_; + case TYPE_BOOL: + return bool_value_; + case TYPE_DATE: + return int32_value_; case TYPE_TIMESTAMP: return ToUnixMicros(); case TYPE_ENUM: @@ -395,9 +468,12 @@ int64_t Value::ToInt64() const { uint64_t Value::ToUint64() const { ABSL_CHECK(!is_null()) << "Null value"; switch (metadata_.type_kind()) { - case TYPE_UINT64: return uint64_value_; - case TYPE_UINT32: return uint32_value_; - case TYPE_BOOL: return bool_value_; + case TYPE_UINT64: + return uint64_value_; + case TYPE_UINT32: + return uint32_value_; + case TYPE_BOOL: + return bool_value_; default: ABSL_LOG(FATAL) << "Cannot coerce " << TypeKind_Name(type_kind()) << " to uint64"; @@ -408,13 +484,20 @@ uint64_t Value::ToUint64() const { double Value::ToDouble() const { ABSL_CHECK(!is_null()) << "Null value"; switch (metadata_.type_kind()) { - case TYPE_BOOL: return bool_value_; - case TYPE_DATE: return int32_value_; - case TYPE_DOUBLE: return double_value_; - case TYPE_FLOAT: return float_value_; - case TYPE_INT32: return int32_value_; - case TYPE_UINT32: return uint32_value_; - case TYPE_UINT64: return uint64_value_; + case TYPE_BOOL: + return bool_value_; + case TYPE_DATE: + return int32_value_; + case TYPE_DOUBLE: + return double_value_; + case TYPE_FLOAT: + return float_value_; + case TYPE_INT32: + return int32_value_; + case TYPE_UINT32: + return uint32_value_; + case TYPE_UINT64: + return uint64_value_; case TYPE_INT64: return int64_value_; case TYPE_NUMERIC: @@ -445,6 +528,8 @@ uint64_t Value::physical_byte_size() const { if (DoesTypeUseValueList()) { physical_size += container_ptr_->physical_byte_size(); + } else if (DoesTypeUseValueMap()) { + physical_size += map_ptr_->physical_byte_size(); } else { physical_size += type()->GetValueContentExternallyAllocatedByteSize(GetContent()); @@ -468,6 +553,27 @@ absl::Cord Value::ToCord() const { } } +std::string Value::ToString() const { + ABSL_CHECK(!is_null()) << "Null value"; + switch (metadata_.type_kind()) { + case TYPE_STRING: + case TYPE_BYTES: + return string_ptr_->value(); + case TYPE_PROTO: + return std::string(proto_ptr_->value()); + default: + ABSL_LOG(FATAL) << "Cannot coerce " << TypeKind_Name(type_kind()) + << " to std::string"; + return std::string(); + } +} + +absl::CivilDay Value::ToCivilDay() const { + ABSL_CHECK(!is_null()) << "Null value"; + ABSL_CHECK_EQ(TYPE_DATE, metadata_.type_kind()) << "Not a date value"; + return absl::CivilDay() + date_value(); +} + absl::Time Value::ToTime() const { ABSL_CHECK(!is_null()) << "Null value"; ABSL_CHECK_EQ(TYPE_TIMESTAMP, metadata_.type_kind()) << "Not a timestamp value"; @@ -580,8 +686,12 @@ bool Value::EqualsInternal(const Value& x, const Value& y, const bool allow_bags, const ValueEqualityCheckOptions& options) { std::string* reason = options.reason; - if (!x.is_valid()) { return !y.is_valid(); } - if (!y.is_valid()) { return false; } + if (!x.is_valid()) { + return !y.is_valid(); + } + if (!y.is_valid()) { + return false; + } if (!x.type()->Equivalent(y.type())) return TypesDiffer(x, y, reason); @@ -683,7 +793,6 @@ Value Value::SqlEquals(const Value& that) const { case TYPE_KIND_PAIR(TYPE_NUMERIC, TYPE_NUMERIC): case TYPE_KIND_PAIR(TYPE_BIGNUMERIC, TYPE_BIGNUMERIC): return Value::Bool(Equals(that)); - case TYPE_KIND_PAIR(TYPE_STRUCT, TYPE_STRUCT): { if (num_fields() != that.num_fields()) { return values::False(); @@ -909,7 +1018,7 @@ std::string Value::DebugString(bool verbose) const { if (is_null()) { result = "NULL"; } else { - if (DoesTypeUseValueList()) { + if (DoesTypeUseValueList() || DoesTypeUseValueMap()) { add_type_prefix = false; } Type::FormatValueContentOptions options; @@ -935,28 +1044,32 @@ std::string Value::Format(bool print_top_level_type) const { // // This is also basically the same as GetSQLLiteral below, except this adds // CASTs and explicit type names so the exact value comes back out. -std::string Value::GetSQL(ProductMode mode) const { - return GetSQLInternal(mode); +std::string Value::GetSQL(ProductMode mode, bool use_external_float32) const { + return GetSQLInternal(mode, use_external_float32); } // This is basically the same as GetSQL() above, except this doesn't add CASTs // or explicit type names if the literal would be valid without them. -std::string Value::GetSQLLiteral(ProductMode mode) const { - return GetSQLInternal(mode); +std::string Value::GetSQLLiteral(ProductMode mode, + bool use_external_float32) const { + return GetSQLInternal(mode, use_external_float32); } template -std::string Value::GetSQLInternal(ProductMode mode) const { +std::string Value::GetSQLInternal(ProductMode mode, + bool use_external_float32) const { const Type* type = this->type(); if (is_null()) { return as_literal ? "NULL" - : absl::StrCat("CAST(NULL AS ", type->TypeName(mode), ")"); + : absl::StrCat("CAST(NULL AS ", + type->TypeName(mode, use_external_float32), ")"); } Type::FormatValueContentOptions options; options.product_mode = mode; + options.use_external_float32 = use_external_float32; if (as_literal) { options.mode = maybe_add_simple_type_prefix ? Type::FormatValueContentOptions::Mode::kSQLLiteral @@ -1035,8 +1148,8 @@ static int FindSubstitutionMarker(absl::string_view block_template) { // Break upon finding "$0" if (block_template[marker_index + 1] == '0') { return marker_index; - // Skip an extra character upon seeing "$$", since this is the escape - // sequence for a single '$'. + // Skip an extra character upon seeing "$$", since this is the escape + // sequence for a single '$'. } else if (block_template[marker_index + 1] == '$') { ++marker_index; } @@ -1125,7 +1238,11 @@ std::string FormatBlock(absl::string_view block_template, absl::StrCat(pre, absl::StrJoin(parts, sep), post)); } -enum class ArrayElemFormat { ALL, NONE, FIRST_LEVEL_ONLY, }; +enum class ArrayElemFormat { + ALL, + NONE, + FIRST_LEVEL_ONLY, +}; const int kArrayIndent = 6; // Length of "ARRAY<" const int kStructIndent = 7; // Length of "STRUCT<" @@ -1285,6 +1402,19 @@ std::string Value::FormatInternal( return FormatBlock(templ, boundaries_strings, ",", options.indent, WrapStyle::AUTO, Type::FormatValueContentOptions::kIndentStep); + } else if (type()->IsTokenList() && !is_null()) { + // Empty tokenlists are rendered as "". Following the ARRAY and + // STRUCT pattern, we'll print the type in this case regardless of whether + // force_type_at_top_level is true. + std::vector lines = + SimpleType::FormatTokenLines(GetContent(), options); + std::string tmpl = + (options.force_type_at_top_level || lines.empty()) + ? type()->AddCapitalizedTypePrefix("{$0}", /*is_null=*/false) + : "{$0}"; + if (lines.empty()) lines = {""}; + return FormatBlock(tmpl, lines, ",", options.indent, WrapStyle::AUTO, + Type::FormatValueContentOptions::kIndentStep); } else { return DebugString(options.force_type_at_top_level); } @@ -1624,4 +1754,7 @@ Value::Metadata::Metadata(const Type* type, bool is_null, Value::TypedList::~TypedList() { } +Value::TypedMap::~TypedMap() { +} + } // namespace zetasql diff --git a/zetasql/public/value.h b/zetasql/public/value.h index aa10c5c4e..a208bb16d 100644 --- a/zetasql/public/value.h +++ b/zetasql/public/value.h @@ -17,12 +17,11 @@ #ifndef ZETASQL_PUBLIC_VALUE_H_ #define ZETASQL_PUBLIC_VALUE_H_ -#include - +#include #include +#include #include #include -#include #include #include #include @@ -35,23 +34,28 @@ #include "zetasql/public/civil_time.h" #include "zetasql/public/interval_value.h" #include "zetasql/public/json_value.h" +#include "zetasql/public/language_options.h" #include "zetasql/public/numeric_value.h" #include "zetasql/public/options.pb.h" +#include "zetasql/public/token_list.h" #include "zetasql/public/type.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/types/extended_type.h" +#include "zetasql/public/types/map_type.h" #include "zetasql/public/types/value_equality_check_options.h" #include "zetasql/public/types/value_representations.h" #include "zetasql/public/value.pb.h" #include "zetasql/public/value_content.h" #include "absl/base/attributes.h" +#include "absl/container/flat_hash_map.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/cord.h" #include "absl/strings/string_view.h" +#include "absl/time/civil_time.h" #include "absl/time/time.h" #include "absl/types/span.h" -#include "zetasql/base/status.h" +#include "zetasql/base/map_view.h" namespace zetasql { @@ -157,6 +161,9 @@ class Value { std::string EnumDisplayName() const; // REQUIRES: enum type const absl::Cord& proto_value() const; // REQUIRES: proto type + // Returns date value as a absl::CivilDay. + absl::CivilDay ToCivilDay() const; // REQUIRES: date type + // Returns timestamp value as absl::Time at nanoseconds precision. absl::Time ToTime() const; // REQUIRES: timestamp type @@ -200,6 +207,8 @@ class Value { // Returns the string representing stored JSON value. std::string json_string() const; + const tokens::TokenList& tokenlist_value() const; // REQUIRES: tokenlist type + // Returns the value content of extended type. // REQUIRES: type_kind() == TYPE_EXTENDED ValueContent extended_value() const; @@ -219,7 +228,8 @@ class Value { // Use of this method for timestamp_ values is DEPRECATED. double ToDouble() const; // For bool, int_, date, timestamp_, enum, Numeric, // BigNumeric types. - absl::Cord ToCord() const; // For string, bytes, and protos + absl::Cord ToCord() const; // For string, bytes, and protos + std::string ToString() const; // For string, bytes, and protos // Convert this value to a dynamically allocated proto Message. // @@ -253,12 +263,20 @@ class Value { // Does not find anonymous fields (those with empty names). const Value& FindFieldByName(absl::string_view name) const; - // Array-specific methods. REQUIRES: !is_null(). + // Array and Map-specific methods. + // REQUIRES: !is_null() + // REQUIRES: (type_kind() == TYPE_ARRAY or type_kind() == TYPE_MAP). bool empty() const; int num_elements() const; + + // Array-specific methods. REQUIRES: !is_null(), type_kind() == TYPE_ARRAY. const Value& element(int i) const; const std::vector& elements() const; + // Map-specific methods. REQUIRES: !is_null(), type_kind() == TYPE_MAP. + // Returns the entries of the map. Note that a stable order is not guaranteed. + zetasql_base::MapView map_entries() const; + // Range-specific methods. REQUIRES: !is_null(), type_kind() == TYPE_RANGE const Value& start() const; const Value& end() const; @@ -370,14 +388,36 @@ class Value { // GetSQLLiteral() is used in ZetaSQL's FORMAT() function implementation // (Format() in zetasql/public_functions/format.cc) so we cannot change // the output without breaking existing ZetaSQL function semantics. - std::string GetSQL(ProductMode mode = PRODUCT_EXTERNAL) const; + // Setting `use_external_float32` to true will return + // FLOAT32 as the type name for TYPE_FLOAT, for PRODUCT_EXTERNAL mode. + // TODO: Remove `use_external_float32` once all engines are + // updated. + std::string GetSQL(ProductMode mode, bool use_external_float32) const; + std::string GetSQL(ProductMode mode) const { + return GetSQL(mode, /*use_external_float32=*/false); + } + ABSL_DEPRECATED("Use signature taking ProductMode.") + std::string GetSQL() const { + return GetSQL(PRODUCT_EXTERNAL, /*use_external_float32=*/false); + } // Returns a SQL expression that is compatible as a literal for this value. // This won't include CASTs except for non-finite floating point values, and // won't necessarily produce the exact same type when parsed on its own, but // it should be the closest SQL literal form for this value. Returned type // names are sensitive to the SQL ProductMode (INTERNAL or EXTERNAL). - std::string GetSQLLiteral(ProductMode mode = PRODUCT_EXTERNAL) const; + // Setting `use_external_float32` to true will return + // FLOAT32 as the type name for TYPE_FLOAT, for PRODUCT_EXTERNAL mode. + // TODO: Remove `use_external_float32` once all engines are + // updated. + std::string GetSQLLiteral(ProductMode mode, bool use_external_float32) const; + std::string GetSQLLiteral(ProductMode mode) const { + return GetSQLLiteral(mode, /*use_external_float32=*/false); + } + ABSL_DEPRECATED("Use signature taking ProductMode.") + std::string GetSQLLiteral() const { + return GetSQLLiteral(PRODUCT_EXTERNAL, /*use_external_float32=*/false); + } // We do not define < operator to prevent accidental use of values of mixed // types in STL set and map. @@ -436,6 +476,10 @@ class Value { // optimized for member access operations. static Value Json(JSONValue value); + // TODO: as of 2021Q4, the encoding is a work-in-progress + // and subject to change. Avoid storing on disk. + static Value TokenList(tokens::TokenList value); + // Creates a value of extended type with the given content. static Value Extended(const ExtendedType* type, const ValueContent& value); @@ -534,6 +578,7 @@ class Value { static Value NullNumeric(); static Value NullBigNumeric(); static Value NullJson(); + static Value NullTokenList(); // Returns an empty but non-null Geography value. static Value EmptyGeography(); @@ -678,6 +723,27 @@ class Value { static absl::StatusOr MakeRangeFromValidatedInputs( const RangeType* range_type, const Value& start, const Value& end); +#ifndef SWIG // TODO: Investigate SWIG compatibility for MAP. + // Creates a map of the given 'map_type' initialized with 'map_entries' as the + // key/value pairs. The type of each key and value must match the key and + // value types in map_type, otherwise returns an error (in debug mode). + // 'map_type' must outlive the returned object. A map_entries containing + // multiple equivalent keys will result in an OutOfRange error. + // + // REQUIRES: map_type.type_kind() == MAP_TYPE + // REQUIRES: map_entries key and value types match map_type + // REQUIRES: map_entries does not contain any entries with equivalent keys. + static absl::StatusOr MakeMap( + const Type* map_type, + absl::Span> map_entries); + + static absl::StatusOr MakeMap( + const Type* map_type, std::vector>&& map_entries); + static absl::StatusOr MakeMap( + const Type* map_type, + std::initializer_list> map_entries); +#endif + // Creates a null of the given 'type'. static Value Null(const Type* type); // Creates an invalid value. @@ -709,9 +775,13 @@ class Value { FRIEND_TEST(TypeTest, FormatValueContentStructSQLExpressionMode); FRIEND_TEST(TypeTest, FormatValueContentStructDebugMode); FRIEND_TEST(TypeTest, FormatValueContentStructWithAnonymousFieldsDebugMode); + FRIEND_TEST(MapTest, FormatValueContentSQLLiteralMode); + FRIEND_TEST(MapTest, FormatValueContentSQLExpressionMode); + FRIEND_TEST(MapTestFormatValueContentDebugMode, FormatValueContentDebugMode); + FRIEND_TEST(MapTest, FormatValueContentDebugModeEmptyMap); template - std::string GetSQLInternal(ProductMode mode) const; + std::string GetSQLInternal(ProductMode mode, bool use_external_float32) const; template H HashValueInternal(H h) const; @@ -720,6 +790,7 @@ class Value { friend struct InternalComparer; // Defined in value.cc. friend struct InternalHasher; // Defined in value.cc class TypedList; // Defined in value_inl.h + class TypedMap; // Defined in value_inl.h // Specifies whether an array value preserves or ignores order (public array // values always preserve order). The enum values are designed to be used with @@ -733,19 +804,22 @@ class Value { __TypeKind__switch_must_have_a_default__; // Constructs an empty (where content contains zeros) or NULL value of the - // given 'type'. Argument order_kind is currently used only for arrays and - // should always be set to kPreservesOrder for all other types. + // given 'type'. Argument order_kind is currently used only for arrays. + // kPreservesOrder should be set for all other types, except for Map which + // always sets kIgnoresOrder regardless of the supplied order_kind. Value(const Type* type, bool is_null, OrderPreservationKind order_kind); // Constructs a typed NULL of the given 'type'. explicit Value(const Type* type) - : Value(type, /*is_null=*/true, kPreservesOrder) {} + : Value(type, /*is_null=*/true, + type->kind() == TYPE_MAP ? kIgnoresOrder : kPreservesOrder) {} #ifndef SWIG // SWIG has trouble with constexpr. constexpr #endif explicit Value(TypeKind kind) - : metadata_(kind, /*is_null=*/true, kPreservesOrder, + : metadata_(kind, /*is_null=*/true, + kind == TYPE_MAP ? kIgnoresOrder : kPreservesOrder, /*value_extended_content=*/0) { } @@ -781,6 +855,9 @@ class Value { // Takes ownership of 'json_ptr' without increasing its ref count. explicit Value(internal::JSONRef* json_ptr); + // Constructs a TOKENLIST value. + explicit Value(tokens::TokenList tokenlist); + // Constructs an enum. Value(const EnumType* enum_type, int64_t value, bool allow_unknown_enum_values); @@ -832,6 +909,18 @@ class Value { bool is_validated, const Value& start, const Value& end, const RangeType* range_type = nullptr); +#ifndef SWIG // TODO: Investigate SWIG compatibility for MAP. + + // Creates a map of the given 'map_type' initialized with the entries in + // 'map_entries'. + // A map_entries containing duplicate keys will result in out of range error. + // A map_entries containing values with types that do not match the provided + // type will result in system error, but only in debug mode. + static absl::StatusOr MakeMapInternal( + const Type* type, std::vector> map_entries); + +#endif + // Returns a pretty-printed (e.g. wrapped) string for the value // indented a number of spaces according to the 'indent' parameter. // 'force_type' causes the top-level value to print its type. By @@ -850,6 +939,11 @@ class Value { metadata_.type_kind() == TYPE_RANGE; } + // Type cannot create a list of Values because it cannot depend on + // "value" package. Thus for Map type that needs a list of values, we + // will create them from Value directly. + bool DoesTypeUseValueMap() const { return metadata_.type_kind() == TYPE_MAP; } + // Gets Value's content. Requires: has_content() == true. ValueContent GetContent() const; @@ -995,6 +1089,10 @@ class Value { internal::JSONRef* json_ptr_; // Owned. Used for values of TYPE_JSON. internal::IntervalRef* interval_ptr_; // Owned. Used for values of TYPE_INTERVAL. + internal::TokenListRef* + tokenlist_ptr_; // Owned. Used for values of TYPE_TOKENLIST. + internal::ValueContentMapRef* + map_ptr_; // Owned. Used for values of TYPE_MAP. }; // Intentionally copyable. }; diff --git a/zetasql/public/value.proto b/zetasql/public/value.proto index 502da1633..b4f86c497 100644 --- a/zetasql/public/value.proto +++ b/zetasql/public/value.proto @@ -109,6 +109,8 @@ message ValueProto { // Encoded interval value. For the encoding format see documentation for // IntervalValue::SerializeAsBytes(). bytes interval_value = 24; + // Encoded tokenlist value. + bytes tokenlist_value = 25; // Encoded range value. See (broken link). Range range_value = 26; // User code that switches on this oneoff enum must have a default case so diff --git a/zetasql/public/value_inl.h b/zetasql/public/value_inl.h index 087ab82cd..56828220b 100644 --- a/zetasql/public/value_inl.h +++ b/zetasql/public/value_inl.h @@ -26,6 +26,7 @@ #include #include +#include #include #include #include @@ -40,12 +41,17 @@ #include "zetasql/public/civil_time.h" #include "zetasql/public/json_value.h" #include "zetasql/public/numeric_value.h" +#include "zetasql/public/token_list.h" #include "zetasql/public/type.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/types/value_representations.h" #include "zetasql/public/value.h" +#include "zetasql/public/value_content.h" +#include "absl/container/btree_map.h" +#include "absl/container/flat_hash_map.h" #include "absl/hash/hash.h" #include "absl/status/status.h" +#include "absl/status/statusor.h" #include "absl/strings/cord.h" #include "absl/strings/string_view.h" #include "absl/time/time.h" @@ -85,6 +91,57 @@ class Value::TypedList : public internal::ValueContentContainer { std::vector values_; }; +struct ValueComparator { + bool operator()(const Value& a, const Value& b) const { + return a.LessThan(b); + } +}; + +using ValueMap = absl::btree_map; + +class Value::TypedMap : public internal::ValueContentMap { + public: + explicit TypedMap(std::vector>& values) + : map_(values.begin(), values.end()) {} + TypedMap(const TypedMap&) = delete; + TypedMap& operator=(const TypedMap&) = delete; + ~TypedMap() override; + + const ValueMap& entries() const { return map_; } + + uint64_t physical_byte_size() const override { + uint64_t size = sizeof(TypedMap); + for (const auto& entry : map_) { + size += (entry.first.physical_byte_size() + + entry.second.physical_byte_size()); + } + return size; + } + + int64_t num_elements() const override { return map_.size(); } + std::vector> + value_content_entries() const override { + std::vector> + elements; + elements.reserve(map_.size()); + for (const auto& [key, value] : map_) { + elements.push_back(std::make_pair( + key.is_null() + ? internal::ValueContentContainerElement() + : internal::ValueContentContainerElement(key.GetContent()), + value.is_null() + ? internal::ValueContentContainerElement() + : internal::ValueContentContainerElement(value.GetContent()))); + } + return elements; + } + + private: + ValueMap map_; +}; + // ------------------------------------------------------- // Value // ------------------------------------------------------- @@ -205,6 +262,10 @@ inline Value::Value(const IntervalValue& interval) : metadata_(TypeKind::TYPE_INTERVAL), interval_ptr_(new internal::IntervalRef(interval)) {} +inline Value::Value(tokens::TokenList tokenlist) + : metadata_(TypeKind::TYPE_TOKENLIST), + tokenlist_ptr_(new internal::TokenListRef(std::move(tokenlist))) {} + inline absl::StatusOr Value::MakeStruct(const StructType* type, std::vector&& values) { return MakeStructInternal(/*already_validated=*/false, type, @@ -268,6 +329,23 @@ inline Value Value::EmptyArray(const ArrayType* array_type) { return *MakeArrayFromValidatedInputs(array_type, std::vector{}); } +inline absl::StatusOr Value::MakeMap( + const Type* map_type, + absl::Span> map_entries) { + return MakeMapInternal(map_type, std::vector>( + map_entries.begin(), map_entries.end())); +} +inline absl::StatusOr Value::MakeMap( + const Type* map_type, std::vector>&& map_entries) { + return MakeMapInternal(map_type, map_entries); +} +inline absl::StatusOr Value::MakeMap( + const Type* map_type, + std::initializer_list> map_entries) { + return MakeMapInternal(map_type, std::vector>( + map_entries.begin(), map_entries.end())); +} + inline Value Value::Int32(int32_t v) { return Value(v); } inline Value Value::Int64(int64_t v) { return Value(v); } inline Value Value::Uint32(uint32_t v) { return Value(v); } @@ -328,6 +406,10 @@ inline Value Value::UnvalidatedJsonString(std::string v) { inline Value Value::Json(JSONValue v) { return Value(new internal::JSONRef(std::move(v))); } +inline Value Value::TokenList(tokens::TokenList value) { + return Value(std::move(value)); +} + inline Value Value::Enum(const EnumType* type, int64_t value, bool allow_unknown_enum_values) { return Value(type, value, allow_unknown_enum_values); @@ -363,6 +445,7 @@ inline Value Value::NullBigNumeric() { return Value(TypeKind::TYPE_BIGNUMERIC); } inline Value Value::NullJson() { return Value(TypeKind::TYPE_JSON); } +inline Value Value::NullTokenList() { return Value(types::TokenListType()); } inline Value Value::EmptyGeography() { ABSL_CHECK(false); return NullGeography(); @@ -548,12 +631,21 @@ inline std::string Value::json_string() const { return *json_ptr_->unparsed_string(); } +inline const tokens::TokenList& Value::tokenlist_value() const { + ABSL_CHECK_EQ(TYPE_TOKENLIST, metadata_.type_kind()) << "Not a tokenlist type"; + ABSL_CHECK(!metadata_.is_null()) << "Null value"; + return tokenlist_ptr_->value(); +} + inline bool Value::empty() const { return elements().empty(); } inline int Value::num_elements() const { - return elements().size(); + if (type()->IsMap()) { + return static_cast(map_entries().size()); + } + return static_cast(elements().size()); } inline int Value::num_fields() const { @@ -915,6 +1007,7 @@ inline Value NullGeography() { return Value::NullGeography(); } inline Value NullNumeric() { return Value::NullNumeric(); } inline Value NullBigNumeric() { return Value::NullBigNumeric(); } inline Value NullJson() { return Value::NullJson(); } +inline Value NullTokenList() { return Value::NullTokenList(); } inline Value Null(const Type* type) { return Value::Null(type); } inline Value Invalid() { return Value::Invalid(); } diff --git a/zetasql/public/value_test.cc b/zetasql/public/value_test.cc index 3053b2807..58167a125 100644 --- a/zetasql/public/value_test.cc +++ b/zetasql/public/value_test.cc @@ -16,11 +16,10 @@ #include "zetasql/public/value.h" -#include -#include #include #include +#include #include #include #include @@ -38,6 +37,8 @@ #include "google/protobuf/text_format.h" #include "zetasql/public/civil_time.h" #include "zetasql/testdata/test_proto3.pb.h" +#include "absl/container/flat_hash_map.h" +#include "absl/status/status.h" #include "absl/strings/str_format.h" #include "google/protobuf/io/coded_stream.h" #include "google/protobuf/io/zero_copy_stream_impl.h" @@ -54,6 +55,7 @@ #include "zetasql/public/numeric_value.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/simple_catalog.h" +#include "zetasql/public/token_list_util.h" #include "zetasql/public/type.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/testdata/test_schema.pb.h" @@ -80,9 +82,14 @@ namespace zetasql { namespace { using ::google::protobuf::internal::WireFormatLite; +using ::testing::ElementsAre; +using ::testing::ElementsAreArray; +using ::testing::EndsWith; using ::testing::HasSubstr; using ::testing::IsEmpty; using ::testing::Not; +using ::testing::Pair; +using ::zetasql_base::testing::IsOkAndHolds; using ::zetasql_base::testing::StatusIs; using interval_testing::Days; @@ -120,6 +127,11 @@ absl::Time ParseTimeHm(absl::string_view str) { return ParseTimeWithFormat("%H:%M", str); } +MATCHER_P(MapEntriesWhere, matcher, "") { + return ::testing::ExplainMatchResult(matcher, arg.map_entries(), + result_listener); +} + } // namespace // Test that GetSQL returns a string that can be re-parsed as the Value. @@ -143,6 +155,8 @@ static Value TestGetSQL(const Value& value) { analyzer_options.mutable_language()->EnableLanguageFeature(FEATURE_JSON_TYPE); analyzer_options.mutable_language()->EnableLanguageFeature( FEATURE_INTERVAL_TYPE); + analyzer_options.mutable_language()->EnableLanguageFeature( + FEATURE_TOKENIZED_SEARCH); analyzer_options.mutable_language()->EnableLanguageFeature( FEATURE_RANGE_TYPE); @@ -370,6 +384,61 @@ TEST_F(ValueTest, Int64Formatting) { EXPECT_EQ(Value::Int64(456).GetSQL(), "456"); } +TEST_F(ValueTest, FloatNull) { + Value value = TestGetSQL(Value::NullFloat()); + EXPECT_EQ("FLOAT", value.type()->DebugString()); + EXPECT_TRUE(value.is_null()); + EXPECT_DEATH(value.float_value(), "Null value"); + EXPECT_EQ("NULL", value.DebugString()); + EXPECT_EQ("Float(NULL)", value.FullDebugString()); + EXPECT_EQ("NULL", value.GetSQLLiteral()); + EXPECT_EQ("CAST(NULL AS FLOAT)", value.GetSQL()); + EXPECT_EQ("CAST(NULL AS FLOAT)", value.GetSQL(PRODUCT_INTERNAL)); + EXPECT_EQ("CAST(NULL AS FLOAT)", + value.GetSQL(PRODUCT_INTERNAL, /*use_external_float32=*/true)); + EXPECT_EQ("CAST(NULL AS FLOAT)", value.GetSQL(PRODUCT_EXTERNAL)); + EXPECT_EQ("CAST(NULL AS FLOAT32)", + value.GetSQL(PRODUCT_EXTERNAL, /*use_external_float32=*/true)); +} + +TEST_F(ValueTest, FloatNonNull) { + TestGetSQL(Value::Float(3)); + TestGetSQL(Value::Float(3.5)); + TestGetSQL(Value::Float(3.0000004)); + TestGetSQL(Value::Float(.0000004)); + TestGetSQL(Value::Float(-55500000000)); + TestGetSQL(Value::Float(-0.000034634643)); + TestGetSQL(Value::Float(1.5e25)); + TestGetSQL(Value::Float(-1.5e25)); + TestGetSQL(Value::Float(std::numeric_limits::quiet_NaN())); + TestGetSQL(Value::Float(std::numeric_limits::infinity())); + TestGetSQL(Value::Float(-std::numeric_limits::infinity())); +} + +TEST_F(ValueTest, FloatFormatting) { + EXPECT_EQ(Value::Float(123.456).DebugString(/*verbose=*/true), + "Float(123.456)"); + EXPECT_EQ(Value::Float(123.456).DebugString(), "123.456"); + EXPECT_EQ(Value::Float(123.456).Format(), "Float(123.456)"); + EXPECT_EQ(Value::Float(123.456).Format(/*print_top_level_type=*/false), + "123.456"); + EXPECT_EQ(Value::Float(123.456).GetSQLLiteral(), "123.456"); + EXPECT_EQ(Value::Float(123.456).GetSQL(), "CAST(123.456 AS FLOAT)"); + EXPECT_EQ(Value::Float(123.456).GetSQL(PRODUCT_INTERNAL), + "CAST(123.456 AS FLOAT)"); + EXPECT_EQ(Value::Float(123.456).GetSQL(PRODUCT_EXTERNAL, + /*use_external_float32=*/true), + "CAST(123.456 AS FLOAT32)"); + + EXPECT_EQ(Value::Float(123.456).GetSQLLiteral(PRODUCT_INTERNAL), "123.456"); + EXPECT_EQ(Value::Float(123.456).GetSQLLiteral(PRODUCT_INTERNAL, + /*use_external_float32=*/true), + "123.456"); + EXPECT_EQ(Value::Float(123.456).GetSQLLiteral(PRODUCT_EXTERNAL, + /*use_external_float32=*/true), + "123.456"); +} + TEST_F(ValueTest, DoubleNull) { Value value = TestGetSQL(Value::NullDouble()); EXPECT_EQ("DOUBLE", value.type()->DebugString()); @@ -629,6 +698,9 @@ TEST_F(ValueTest, ConvenienceDate) { // 1970-05-04 is 123 days after the Unix epoch. Date values are represented as // the number of days since the Unix epoch. EXPECT_EQ(Value::Date(123), values::Date(absl::CivilDay(1970, 5, 4))); + EXPECT_EQ(Value::Date(123).ToCivilDay(), absl::CivilDay(1970, 5, 4)); + EXPECT_DEATH(Value::NullDate().ToCivilDay(), "Null value"); + EXPECT_DEATH(Value::Int32(0).ToCivilDay(), "Not a date value"); } TEST_F(ValueTest, TimestampFormatting) { @@ -934,6 +1006,67 @@ TEST_F(ValueTest, JSONFormatting) { R"sql(JSON '{"foo":[1,null,"bar"],"foo2":"hello","foo3":true}')sql"); } +namespace { +Value TokenListFromArray(std::vector tokens) { + return TokenListFromStringArray(std::move(tokens)); +} + +Value TokenListFromToken(std::string token) { + return TokenListFromArray({token}); +} +} // namespace + +TEST_F(ValueTest, TokenList) { + EXPECT_TRUE(Value::NullTokenList().is_null()); + EXPECT_EQ(TYPE_TOKENLIST, Value::NullTokenList().type_kind()); + + // Verify that there are no memory leaks. + { + Value v1(TokenListFromToken("test")); + EXPECT_EQ(TYPE_TOKENLIST, v1.type()->kind()); + EXPECT_FALSE(v1.is_null()); + Value v2(v1); + EXPECT_EQ(TYPE_TOKENLIST, v2.type()->kind()); + EXPECT_FALSE(v2.is_null()); + } + + // Test the assignment operator. + Value tokenlist = TokenListFromStringArray({"test"}); + Value expected = TokenListFromStringArray({"test"}); + ASSERT_TRUE( + tokenlist.tokenlist_value().EquivalentTo(expected.tokenlist_value())); + { + Value v1 = TokenListFromToken("test"); + Value v2 = tokenlist; + EXPECT_NE("", v1.DebugString()); + EXPECT_EQ(v1.DebugString(), v2.DebugString()); + + Value v3 = Value::NullTokenList(); + v3 = v1; + EXPECT_EQ(TYPE_TOKENLIST, v1.type()->kind()); + EXPECT_EQ(TYPE_TOKENLIST, v2.type()->kind()); + EXPECT_EQ(TYPE_TOKENLIST, v3.type()->kind()); + EXPECT_FALSE(v3.is_null()); + EXPECT_TRUE(v1.tokenlist_value().EquivalentTo(expected.tokenlist_value())); + EXPECT_TRUE(v2.tokenlist_value().EquivalentTo(expected.tokenlist_value())); + EXPECT_TRUE(v3.tokenlist_value().EquivalentTo(expected.tokenlist_value())); + } + + // Equals. + { + const Value v1 = TokenListFromToken("a"); + const Value v2 = TokenListFromToken("a"); + const Value v3 = TokenListFromToken("b"); + EXPECT_EQ(v1, v2); + TestHashEqual(v1, v2); + EXPECT_NE(v1, v3); + TestHashNotEqual(v1, v3); + } + + TestGetSQL(Value::NullTokenList()); + TestGetSQL(TokenListFromToken("tokenlist")); +} + TEST_F(ValueTest, GenericAccessors) { // Return types. static Value v; @@ -1085,6 +1218,8 @@ TEST_F(ValueTest, HashCode) { Value::BigNumeric(BigNumericValue(int64_t{1002})), Value::Json(JSONValue(int64_t{1})), Value::UnvalidatedJsonString("value"), + TokenListFromToken("t1"), + TokenListFromToken("t2"), // Enums of two different types. values::Enum(enum_type, 0), values::Enum(enum_type, 1), @@ -1300,9 +1435,11 @@ TEST_F(ValueTest, CopyConstructor) { Value v8 = TestGetSQL(Value::String("honorificabilitudinitatibus")); EXPECT_EQ("honorificabilitudinitatibus", v8.string_value()); EXPECT_EQ("honorificabilitudinitatibus", v8.ToCord()); + EXPECT_EQ("honorificabilitudinitatibus", v8.ToString()); Value v9 = TestGetSQL(Value::Bytes("honorificabilitudinitatibus")); EXPECT_EQ("honorificabilitudinitatibus", v9.bytes_value()); EXPECT_EQ("honorificabilitudinitatibus", v9.ToCord()); + EXPECT_EQ("honorificabilitudinitatibus", v9.ToString()); Value v10 = TestGetSQL(Value::Date(12345)); EXPECT_EQ(12345, v10.date_value()); @@ -1314,6 +1451,12 @@ TEST_F(ValueTest, CopyConstructor) { EXPECT_EQ(-12345, v11.ToInt64()); } +TEST_F(ValueTest, ToStringTests) { + EXPECT_DEATH(Value::NullString().ToString(), "Null value"); + EXPECT_DEATH(Value::NullBytes().ToString(), "Null value"); + EXPECT_DEATH(Value::Date(0).ToString(), ""); +} + TEST_F(ValueTest, DateTests) { Value value = TestGetSQL(Value::Date(0)); EXPECT_EQ("DATE", value.type()->DebugString()); @@ -1879,7 +2022,29 @@ TEST_F(ValueTest, FloatArray) { "ARRAY[CAST(1.5 AS FLOAT), CAST(2.5 AS FLOAT), " "CAST(\"nan\" AS FLOAT)]", v1.GetSQL()); + EXPECT_EQ( + "ARRAY[CAST(1.5 AS FLOAT), CAST(2.5 AS FLOAT), " + "CAST(\"nan\" AS FLOAT)]", + v1.GetSQL(PRODUCT_INTERNAL)); + EXPECT_EQ( + "ARRAY[CAST(1.5 AS FLOAT), CAST(2.5 AS FLOAT), " + "CAST(\"nan\" AS FLOAT)]", + v1.GetSQL(PRODUCT_INTERNAL, /*use_external_float32=*/true)); + EXPECT_EQ( + "ARRAY[CAST(1.5 AS FLOAT), CAST(2.5 AS FLOAT), " + "CAST(\"nan\" AS FLOAT)]", + v1.GetSQL(PRODUCT_EXTERNAL)); + EXPECT_EQ( + "ARRAY[CAST(1.5 AS FLOAT32), CAST(2.5 AS FLOAT32), " + "CAST(\"nan\" AS FLOAT32)]", + v1.GetSQL(PRODUCT_EXTERNAL, /*use_external_float32=*/true)); EXPECT_EQ("[1.5, 2.5, CAST(\"nan\" AS FLOAT)]", v1.GetSQLLiteral()); + EXPECT_EQ("[1.5, 2.5, CAST(\"nan\" AS FLOAT)]", + v1.GetSQLLiteral(PRODUCT_INTERNAL)); + EXPECT_EQ("[1.5, 2.5, CAST(\"nan\" AS FLOAT)]", + v1.GetSQLLiteral(PRODUCT_INTERNAL, /*use_external_float32=*/true)); + EXPECT_EQ("[1.5, 2.5, CAST(\"nan\" AS FLOAT32)]", + v1.GetSQLLiteral(PRODUCT_EXTERNAL, /*use_external_float32=*/true)); } TEST_F(ValueTest, DoubleArray) { @@ -2140,6 +2305,253 @@ TEST_F(ValueTest, ArrayOfStructsOfStringsFormatting) { R"(ARRAY>[STRUCT("5938", "longFunctionInvocation, 2"), STRUCT("5938", "longFunctionInvocation, 2"), CAST(NULL AS STRUCT)])"); } +TEST(MapValueTest, MapConstructionInitializerList) { + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + MakeMapType(Int64Type(), Int64Type())); + + EXPECT_THAT(Value::MakeMap(map_type, {}), + IsOkAndHolds(MapEntriesWhere(IsEmpty()))); + + EXPECT_THAT(Value::MakeMap( + map_type, {std::make_pair(Value::Int64(1), Value::Int64(2))}), + IsOkAndHolds(MapEntriesWhere( + ElementsAre(Pair(Value::Int64(1), Value::Int64(2)))))); +} + +TEST(MapValueTest, MapConstructionSpan) { + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + MakeMapType(Int64Type(), Int64Type())); + + EXPECT_THAT( + Value::MakeMap(map_type, + absl::Span>{}), + IsOkAndHolds(MapEntriesWhere(IsEmpty()))); + + std::vector> kv_vec = { + std::make_pair(Value::Int64(1), Value::Int64(2))}; + // kv_vec is converted to absl::Span. + EXPECT_THAT(Value::MakeMap(map_type, kv_vec), + IsOkAndHolds(MapEntriesWhere( + ElementsAre(Pair(Value::Int64(1), Value::Int64(2)))))); +} + +TEST(MapValueTest, MapConstructionRvalueVector) { + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + MakeMapType(Int64Type(), Int64Type())); + + EXPECT_THAT(Value::MakeMap(map_type, std::vector>{}), + IsOkAndHolds(MapEntriesWhere(IsEmpty()))); + + std::vector> kv_vec = { + std::make_pair(Value::Int64(1), Value::Int64(2))}; + EXPECT_THAT(Value::MakeMap(map_type, std::move(kv_vec)), + IsOkAndHolds(MapEntriesWhere( + ElementsAre(Pair(Value::Int64(1), Value::Int64(2)))))); +} + +struct MapConstructionInvalidTestParams { + const Type* key_type; + const Type* value_type; + const std::vector> map_entries; + const std::string err_expected_type; + const std::pair err_actual_type; +}; + +#ifdef DEBUG // Map values are only type checked in debug mode. +class MapConstructionInvalidTest + : public ::testing::TestWithParam {}; + +TEST_P(MapConstructionInvalidTest, MapConstructionInvalid) { + auto params = GetParam(); + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + MakeMapType(params.key_type, params.value_type)); + EXPECT_THAT( + Value::MakeMap(map_type, params.map_entries), + StatusIs( + absl::StatusCode::kInternal, + ::testing::AllOf( + HasSubstr(absl::StrCat("Expected ", params.err_expected_type)), + HasSubstr(absl::StrCat( + "entry with key of ", params.err_actual_type.first, + " and value of ", params.err_actual_type.second))))); +} + +INSTANTIATE_TEST_SUITE_P( + MapConstructionTest, MapConstructionInvalidTest, + ::testing::ValuesIn( + {{.key_type = Int64Type(), + .value_type = DoubleType(), + .map_entries = {std::make_pair(Value::String("a"), + Value::Bool(true))}, + .err_expected_type = "MAP", + .err_actual_type = std::make_pair("STRING", "BOOL")}, + {.key_type = DateType(), + .value_type = Int32Type(), + .map_entries = {std::make_pair(Value::String("a"), Value::Int32(1))}, + .err_expected_type = "MAP", + .err_actual_type = std::make_pair("STRING", "INT32")}, + {.key_type = StringType(), + .value_type = Int64Type(), + .map_entries = {std::make_pair(Value::String("a"), Value::Date(1))}, + .err_expected_type = "MAP", + .err_actual_type = std::make_pair("STRING", "DATE")}, + {.key_type = BoolType(), + .value_type = Int64Type(), + .map_entries = {std::make_pair(Value::Bool(true), Value::Int32(1))}, + .err_expected_type = "MAP", + .err_actual_type = std::make_pair("BOOL", "INT32")}})); +#endif + +struct MapConstructionValidTestParams { + const Type* key_type; + const Type* value_type; + const std::vector> map_entries; +}; + +class MapConstructionValidTest + : public ::testing::TestWithParam {}; + +TEST_P(MapConstructionValidTest, MapConstruction) { + auto params = GetParam(); + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + MakeMapType(params.key_type, params.value_type)); + + EXPECT_THAT( + Value::MakeMap(map_type, params.map_entries), + IsOkAndHolds(MapEntriesWhere(ElementsAreArray(params.map_entries)))); +} + +INSTANTIATE_TEST_SUITE_P( + MapConstructionTest, MapConstructionValidTest, + ::testing::ValuesIn({ + { + .key_type = StringType(), + .value_type = JsonType(), + .map_entries = {}, + }, + { + .key_type = Int64Type(), + .value_type = DoubleType(), + .map_entries = {std::make_pair(Value::Int64(1), Value::Double(2))}, + }, + { + .key_type = FloatType(), + .value_type = BoolType(), + .map_entries = + {std::make_pair(Value::Float(1.0), Value::Bool(true)), + std::make_pair(Value::Float(2.0), Value::Bool(false))}, + }, + })); + +struct MapDuplicatesTestParams { + const absl::StatusOr map_type; + const std::vector> map_entries; + const bool expect_ok = true; + const std::string failure_duplicate_key; +}; + +class MapDuplicatesTest + : public ::testing::TestWithParam {}; + +TEST_P(MapDuplicatesTest, MapConstructionDuplicateKeysError) { + const auto& params = GetParam(); + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, params.map_type); + const auto& result = Value::MakeMap(map_type, params.map_entries); + + if (params.expect_ok) { + ZETASQL_EXPECT_OK(result); + } else { + EXPECT_THAT( + result, + StatusIs( + absl::StatusCode::kOutOfRange, + AllOf(HasSubstr("Duplicate map key"), + EndsWith(absl::StrCat(": ", params.failure_duplicate_key))))); + } +} +INSTANTIATE_TEST_SUITE_P( + MapDuplicatesTest, MapDuplicatesTest, + ::testing::ValuesIn({ + // Basic success tests. + { + .map_type = MakeMapType(Int64Type(), JsonType()), + .map_entries = {}, + }, + { + .map_type = MakeMapType(Int64Type(), DoubleType()), + .map_entries = {std::make_pair(Value::Int64(1), Value::Double(2))}, + }, + { + .map_type = MakeMapType(FloatType(), BoolType()), + .map_entries = + {std::make_pair(Value::Float(1.0), Value::Bool(true)), + std::make_pair(Value::Float(2.0), Value::Bool(false))}, + }, + // Basic duplicate key failure tests. + { + .map_type = MakeMapType(Int64Type(), Int64Type()), + .map_entries = {{Value::Int64(1), Value::Int64(1)}, + {Value::Int64(2), Value::Int64(1)}, + {Value::Int64(2), Value::Int64(3)}}, + .expect_ok = false, + .failure_duplicate_key = "2", + }, + { + .map_type = MakeMapType(Int64Type(), Int64Type()), + .map_entries = {{Value::Int64(1), Value::Int64(1)}, + {Value::Int64(2), Value::Int64(1)}, + {Value::Int64(2), Value::Int64(3)}, + {Value::Int64(1), Value::Int64(3)}}, + .expect_ok = false, + .failure_duplicate_key = "2", + }, + { + .map_type = MakeMapType(Int64Type(), Int64Type()), + .map_entries = {{Value::Int64(1), Value::Int64(1)}, + {Value::NullInt64(), Value::Int64(1)}, + {Value::NullInt64(), Value::Int64(3)}}, + .expect_ok = false, + .failure_duplicate_key = "NULL", + }, + // Float complex value success tests. + { + .map_type = MakeMapType(FloatType(), Int64Type()), + .map_entries = + {{Value::Float(1), Value::Int64(1)}, + {Value::NullFloat(), Value::Int64(1)}, + {Value::Float(std::numeric_limits::quiet_NaN()), + Value::Int64(1)}, + {Value::Float(std::numeric_limits::infinity()), + Value::Int64(1)}, + {Value::Float(-std::numeric_limits::infinity()), + Value::Int64(1)}}, + .expect_ok = true, + }, + // Float complex value duplicate key tests. + { + .map_type = MakeMapType(FloatType(), Int64Type()), + .map_entries = + {{Value::Float(1), Value::Int64(1)}, + {Value::Float(std::numeric_limits::infinity()), + Value::Int64(1)}, + {Value::Float(std::numeric_limits::infinity()), + Value::Int64(3)}}, + .expect_ok = false, + .failure_duplicate_key = "inf", + }, + { + .map_type = MakeMapType(FloatType(), Int64Type()), + .map_entries = + {{Value::Float(1), Value::Int64(1)}, + {Value::Float(std::numeric_limits::quiet_NaN()), + Value::Int64(1)}, + {Value::Float(std::numeric_limits::quiet_NaN()), + Value::Int64(3)}}, + .expect_ok = false, + .failure_duplicate_key = "nan", + }, + })); + // A sanity test to make sure EqualsInternal does not blow up in the case // where structs have different numbers of fields. TEST_F(ValueTest, InternalEqualsOnDifferentSizedStructs) { @@ -2172,6 +2584,9 @@ TEST_F(ValueTest, NaN) { EXPECT_EQ("Float(nan)", float_nan.FullDebugString()); EXPECT_EQ("Double(nan)", double_nan.FullDebugString()); EXPECT_EQ("CAST(\"nan\" AS FLOAT)", float_nan.GetSQL()); + EXPECT_EQ("CAST(\"nan\" AS FLOAT)", float_nan.GetSQL(PRODUCT_EXTERNAL)); + EXPECT_EQ("CAST(\"nan\" AS FLOAT32)", + float_nan.GetSQL(PRODUCT_EXTERNAL, /*use_external_float32=*/true)); EXPECT_EQ("CAST(\"nan\" AS FLOAT64)", double_nan.GetSQL(PRODUCT_EXTERNAL)); EXPECT_EQ("CAST(\"nan\" AS DOUBLE)", double_nan.GetSQL(PRODUCT_INTERNAL)); } @@ -2510,6 +2925,7 @@ TEST_F(ValueTest, Proto) { EXPECT_FALSE(Value::Proto(proto_type, bytes).is_null()); Value proto = TestGetSQL(Proto(proto_type, bytes)); EXPECT_EQ(bytes, proto.ToCord()); + EXPECT_EQ(bytes, proto.ToString()); EXPECT_EQ(bytes, proto.proto_value()); EXPECT_EQ("Proto{}", proto.FullDebugString()); EXPECT_EQ("{}", proto.ShortDebugString()); @@ -2532,7 +2948,10 @@ TEST_F(ValueTest, Proto) { // Cord representation is different, but protos compare as equal. EXPECT_EQ(2, proto1.ToCord().size()); EXPECT_EQ(4, proto3.ToCord().size()); + EXPECT_EQ(2, proto1.ToString().size()); + EXPECT_EQ(4, proto3.ToString().size()); EXPECT_NE(std::string(proto1.ToCord()), std::string(proto3.ToCord())); + EXPECT_NE(absl::Cord(proto1.ToString()), absl::Cord(proto3.ToString())); EXPECT_TRUE(proto1.Equals(proto3)); EXPECT_EQ(proto1, proto3); TestHashEqual(proto1, proto3); @@ -2566,6 +2985,7 @@ TEST_F(ValueTest, Proto) { bytes.Append(bytes4); Value proto4 = TestGetSQL(Proto(proto_type, bytes)); EXPECT_EQ(6, proto4.ToCord().size()); + EXPECT_EQ(6, proto4.ToString().size()); EXPECT_FALSE(proto1.Equals(proto4)); EXPECT_NE(proto1, proto4); EXPECT_EQ("{int32_val: 7}", proto4.ShortDebugString()); @@ -2574,6 +2994,7 @@ TEST_F(ValueTest, Proto) { EXPECT_EQ(proto4, proto5); TestHashEqual(proto4, proto5); EXPECT_GT(proto4.ToCord().size(), proto5.ToCord().size()); + EXPECT_GT(proto4.ToString().size(), proto5.ToString().size()); // One example where we get a reason out from the proto MessageDifferencer. std::string reason; @@ -2638,6 +3059,7 @@ TEST_F(ValueTest, Proto) { proto_2_1.FullDebugString()); EXPECT_EQ(proto_1_2.FullDebugString(), proto_2_1.FullDebugString()); EXPECT_NE(proto_1_2.ToCord(), proto_2_1.ToCord()); + EXPECT_NE(proto_1_2.ToString(), proto_2_1.ToString()); // Test equality and hash codes when one message explicitly sets a field to // the default value and the other message leaves the field unset. @@ -2715,7 +3137,8 @@ TEST_F(ValueTest, ClassAndProtoSize) { EXPECT_EQ(16, sizeof(Value)) << "The size of Value class has changed, please also update the proto " << "and serialization code if you added/removed fields in it."; - EXPECT_EQ(24, ValueProto::descriptor()->field_count()) + // TODO: Implement serialization/deserialization for RANGE. + EXPECT_EQ(25, ValueProto::descriptor()->field_count()) << "The number of fields in ValueProto has changed, please also update " << "the serialization code accordingly."; EXPECT_EQ(1, ValueProto::Array::descriptor()->field_count()) @@ -3990,6 +4413,7 @@ TEST_F(ValueTest, PhysicalByteSize) { EXPECT_EQ(sizeof(Value), Value::NullTimestamp().physical_byte_size()); EXPECT_EQ(sizeof(Value), Value::NullUint32().physical_byte_size()); EXPECT_EQ(sizeof(Value), Value::NullUint64().physical_byte_size()); + EXPECT_EQ(sizeof(Value), Value::NullTokenList().physical_byte_size()); // Constant sized types. auto bool_value = Value::Bool(true); @@ -4106,6 +4530,22 @@ TEST_F(ValueTest, PhysicalByteSize) { absl::CivilSecond(2020, 01, 01, 10, 00, 00), utc)), Value::UnboundedEndTimestamp()) .physical_byte_size()); + + // Map type + ZETASQL_ASSERT_OK_AND_ASSIGN(const Type* map_type, + MakeMapType(Int64Type(), Int64Type())); + + ZETASQL_ASSERT_OK_AND_ASSIGN(Value empty_map, Value::MakeMap(map_type, {})); + EXPECT_EQ(sizeof(Value) + sizeof(internal::ValueContentMapRef) + + sizeof(Value::TypedMap), + empty_map.physical_byte_size()); + + const Value map_string = Value::String("foo"); + const Value map_int64 = Value::Int64(1); + EXPECT_EQ(sizeof(Value) + sizeof(internal::ValueContentMapRef) + + sizeof(Value::TypedMap) + map_string.physical_byte_size() + + map_int64.physical_byte_size(), + Map({std::make_pair(map_string, map_int64)}).physical_byte_size()); } // Roundtrips Value through ValueProto and back. @@ -4325,6 +4765,13 @@ TEST_F(ValueTest, Serialize) { Value::Interval(Days(-5)), Value::Interval(Micros(123456789)), Value::Interval(interval_min), Value::Interval(interval_max)})); + // TokenList + SerializeDeserialize(Value::NullTokenList()); + SerializeDeserialize(TokenListFromToken("test")); + SerializeDeserialize(EmptyArray(types::TokenListArrayType())); + SerializeDeserialize( + Array({Value::NullTokenList(), TokenListFromToken("test")})); + // Enum. const EnumType* enum_type = GetTestEnumType(); SerializeDeserialize(Null(enum_type)); diff --git a/zetasql/reference_impl/BUILD b/zetasql/reference_impl/BUILD index 1362c63c3..338114131 100644 --- a/zetasql/reference_impl/BUILD +++ b/zetasql/reference_impl/BUILD @@ -35,6 +35,7 @@ cc_library( "//zetasql/resolved_ast", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -137,6 +138,7 @@ cc_library( "@com_google_absl//absl/hash", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -172,7 +174,9 @@ cc_library( "@com_google_googleapis//google/type:timeofday_cc_proto", # buildcleaner: keep "//zetasql/common:thread_stack", + "//zetasql/public/functions:array_zip_mode_cc_proto", "//zetasql/public/functions:differential_privacy_cc_proto", + "//zetasql/public/types:timestamp_util", "//zetasql/public:anonymization_utils", "//zetasql/common:errors", "//zetasql/common:initialize_required_fields", @@ -185,6 +189,7 @@ cc_library( "//zetasql/public:collator_lite", "//zetasql/public:evaluator_table_iterator", "//zetasql/public:function", + "//zetasql/public:interval_value", "//zetasql/public:json_value", "//zetasql/public:language_options", "//zetasql/public:numeric_value", @@ -205,7 +210,6 @@ cc_library( "//zetasql/public/functions:string_format", "//zetasql/public/functions:generate_array", "//zetasql/public/functions:json", - "//zetasql/public/functions:like", "//zetasql/public/functions:math", "//zetasql/public/functions:net", "//zetasql/public/functions:normalize_mode_cc_proto", @@ -213,7 +217,9 @@ cc_library( "//zetasql/public/functions:percentile", "//zetasql/public/functions:regexp", "//zetasql/public/functions:string", + "//zetasql/public/functions:like", "//zetasql/public/proto:type_annotation_cc_proto", + "//zetasql/reference_impl/functions:like", "//zetasql/resolved_ast", "//zetasql/resolved_ast:resolved_ast_enums_cc_proto", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", @@ -229,6 +235,7 @@ cc_library( "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/hash", "//zetasql/base:check", + "@com_google_absl//absl/log:die_if_null", "@com_google_absl//absl/memory", "@com_google_absl//absl/random", "@com_google_absl//absl/random:distributions", @@ -314,6 +321,7 @@ cc_library( "//zetasql/base:status", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/types:span", "@com_google_googletest//:gtest", ], ) @@ -343,7 +351,6 @@ cc_library( "//zetasql/base:flat_set", "//zetasql/base:map_util", "//zetasql/base:ret_check", - "//zetasql/base:source_location", "//zetasql/base:status", "//zetasql/base:stl_util", "//zetasql/common:aggregate_null_handling", @@ -353,15 +360,18 @@ cc_library( "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", "//zetasql/public:coercer", + "//zetasql/public:collator_lite", "//zetasql/public:evaluator_table_iterator", "//zetasql/public:function", "//zetasql/public:id_string", "//zetasql/public:language_options", + "//zetasql/public:numeric_value", "//zetasql/public:simple_catalog", "//zetasql/public:sql_function", "//zetasql/public:templated_sql_function", "//zetasql/public:type", "//zetasql/public:value", + "//zetasql/public/functions:array_zip_mode_cc_proto", "//zetasql/public/functions:date_time_util", "//zetasql/public/functions:differential_privacy_cc_proto", "//zetasql/public/functions:json", @@ -371,6 +381,7 @@ cc_library( "//zetasql/resolved_ast:resolved_ast_enums_cc_proto", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", "//zetasql/resolved_ast:serialization_cc_proto", + "@com_google_absl//absl/algorithm:container", "@com_google_absl//absl/cleanup", "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/container:flat_hash_set", @@ -458,6 +469,7 @@ cc_library( "//zetasql/public:analyzer_options", "//zetasql/public:analyzer_output", "//zetasql/public:error_helpers", + "//zetasql/public:function_headers", "//zetasql/public:language_options", "//zetasql/public:multi_catalog", "//zetasql/public:options_cc_proto", @@ -598,6 +610,18 @@ cc_library( ], ) +cc_test( + name = "evaluation_context_test", + size = "small", + srcs = ["evaluation_context_test.cc"], + deps = [ + ":evaluation", + "//zetasql/base/testing:zetasql_gtest_main", + "//zetasql/public:language_options", + "@com_google_absl//absl/time", + ], +) + cc_test( name = "aggregate_op_test", size = "small", @@ -614,6 +638,7 @@ cc_test( "//zetasql/public:numeric_value", "//zetasql/public:type", "//zetasql/public:value", + "//zetasql/public/types", "//zetasql/testdata:test_schema_cc_proto", "//zetasql/testing:test_value", "@com_google_absl//absl/memory", @@ -681,6 +706,8 @@ cc_test( "//zetasql/public:simple_catalog", "//zetasql/public:type", "//zetasql/public:value", + "//zetasql/public/functions:array_zip_mode_cc_proto", + "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/testdata:test_schema_cc_proto", "//zetasql/testing:test_value", @@ -726,11 +753,13 @@ cc_test( "//zetasql/public:type_cc_proto", "//zetasql/public:value", "//zetasql/resolved_ast", + "//zetasql/resolved_ast:resolved_ast_builder", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", "//zetasql/testdata:test_schema_cc_proto", "//zetasql/testing:test_function", "//zetasql/testing:test_value", "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/flags:flag", "@com_google_absl//absl/memory", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", @@ -749,20 +778,21 @@ cc_test( srcs = ["algebrizer_test.cc"], deps = [ ":algebrizer", - "//zetasql/base", + ":common", + ":parameters", + ":variable_generator", "//zetasql/base:map_util", "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", - "//zetasql/common:evaluator_test_table", - "//zetasql/public:analyzer", "//zetasql/public:builtin_function", "//zetasql/public:builtin_function_cc_proto", "//zetasql/public:catalog", "//zetasql/public:civil_time", "//zetasql/public:function", "//zetasql/public:function_cc_proto", + "//zetasql/public:id_string", "//zetasql/public:options_cc_proto", "//zetasql/public:simple_catalog", "//zetasql/public:type", @@ -770,9 +800,9 @@ cc_test( "//zetasql/resolved_ast", "//zetasql/resolved_ast:make_node_vector", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", - "//zetasql/testdata:sample_catalog", "//zetasql/testing:test_value", "@com_google_absl//absl/memory", + "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/types:span", diff --git a/zetasql/reference_impl/aggregate_op.cc b/zetasql/reference_impl/aggregate_op.cc index 25bd9a6d5..2c6cc4bcb 100644 --- a/zetasql/reference_impl/aggregate_op.cc +++ b/zetasql/reference_impl/aggregate_op.cc @@ -38,10 +38,12 @@ #include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "absl/flags/flag.h" +#include "zetasql/base/check.h" #include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" +#include "absl/strings/str_format.h" #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" #include "absl/types/optional.h" @@ -434,6 +436,26 @@ absl::StatusOr GetValueSortKey(const Value& value, return values::Bytes(sort_key); } +bool IsGroupingFunction(const AggregateFunctionCallExpr* func_expr) { + ABSL_DCHECK(func_expr != nullptr); + const BuiltinAggregateFunction* builtin_func = + dynamic_cast(func_expr->function()); + if (builtin_func != nullptr) { + return builtin_func->kind() == FunctionKind::kGrouping; + } + return false; +} + +// A struct holding the accumulator and other additional information about the +// current aggregate arg. +struct AggregateArgAccumulatorParam { + // The accumulator for the current aggregate_arg. + std::unique_ptr accumulator; + bool stop_bit; + // Whether the current aggregate_arg is a grouping function call. + bool is_grouping_function; +}; + } // namespace // Accumulator that only passes through distinct values. @@ -908,6 +930,55 @@ class IntermediateAggregateAccumulatorAdaptor : public AggregateArgAccumulator { EvaluationContext* context_; }; +// An accumulator for GROUPING function. +// Regular aggregate functions only consume the evaluated arguments, which is an +// implementation of AggregateAccumulator. The argument of GROUPING function is +// just a const key index, it also needs to consume the input rows to calculate +// the final result 0 or 1, so it's an implementation of the interface +// IntermediateAggregateAccumulator. The input_row of +// GroupingAggregateArgAccumulator is a specially-built TupleData, where +// input_row[i] represents the result of grouping call at index i. +class GroupingAggregateArgAccumulator + : public IntermediateAggregateAccumulator { + public: + GroupingAggregateArgAccumulator() = default; + + absl::Status Reset() override { + grouping_value_ = -1; + return absl::OkStatus(); + } + + bool Accumulate(const TupleData& input_row, const Value& value, + bool* stop_accumulation, absl::Status* status) override { + // The argument of GROUPING function is the key index of the grouping key. + ABSL_DCHECK_EQ(value.type(), types::Int64Type()); + int key_index = static_cast(value.int64_value()); + ABSL_DCHECK_LT(key_index, input_row.num_slots()); + ABSL_DCHECK_GE(key_index, 0); + + grouping_value_ = input_row.slot(key_index).value().int64_value(); + // Sanity check. The output of grouping function can be either 0 or 1. + ABSL_DCHECK(grouping_value_ == 0 || grouping_value_ == 1); + // The output of GROUPING function has the same value within the same group, + // so we can skip the following accumulations. + *stop_accumulation = true; + return true; + } + + absl::StatusOr GetFinalResult(bool inputs_in_defined_order) override { + // Sanity check. + if (grouping_value_ == 0 || grouping_value_ == 1) { + return Value::Int64(grouping_value_); + } + return absl::InternalError( + absl::StrFormat("Unexpected grouping_value: %d", grouping_value_)); + } + + private: + // set an initial invalid value for the grouping result. + int64_t grouping_value_ = -1; +}; + } // namespace static absl::Status PopulateSlotsForKeysAndValues( @@ -950,17 +1021,29 @@ AggregateArg::CreateAccumulator(absl::Span params, } ZETASQL_ASSIGN_OR_RETURN(CollatorList collator_list, MakeCollatorList(collation_list())); - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr underlying_accumulator, - aggregate_function()->function()->CreateAccumulator( - args, std::move(collator_list), context)); - - // Adapt the underlying AggregateAccumulator to the - // IntermediateAggregateAccumulator interface so that we can stack other - // intermediate accumulators on top of it. - std::unique_ptr accumulator = - std::make_unique( - aggregate_function()->output_type(), error_mode_, - std::move(underlying_accumulator)); + + std::unique_ptr accumulator; + + // Creates a special accumulator for GROUPING function aggregator. Other + // aggregators use the AggregateAccumulator as underlying accumulator which + // only accumulates the argument value. GROUPING function is different, its + // argument is just a key index, and the aggregated result will be calculated + // based on the both input rows and the key index. So we need create a special + // accumulator for it. + if (IsGroupingFunction(aggregate_function())) { + accumulator = std::make_unique(); + } else { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr underlying_accumulator, + aggregate_function()->function()->CreateAccumulator( + args, std::move(collator_list), context)); + // Adapt the underlying AggregateAccumulator to the + // IntermediateAggregateAccumulator interface so that we can stack other + // intermediate accumulators on top of it. + accumulator = std::make_unique( + aggregate_function()->output_type(), error_mode_, + std::move(underlying_accumulator)); + } const TupleSchema* agg_fn_input_schema = group_schema_.get(); std::unique_ptr group_rows_schema; @@ -1244,12 +1327,13 @@ std::string AggregateOp::GetIteratorDebugString( absl::StatusOr> AggregateOp::Create( std::vector> keys, std::vector> aggregators, - std::unique_ptr input) { + std::unique_ptr input, std::vector grouping_sets) { for (auto& arg : keys) { ZETASQL_RETURN_IF_ERROR(ValidateTypeSupportsEqualityComparison(arg->type())); } - return absl::WrapUnique(new AggregateOp( - std::move(keys), std::move(aggregators), std::move(input))); + return absl::WrapUnique( + new AggregateOp(std::move(keys), std::move(aggregators), std::move(input), + std::move(grouping_sets))); } absl::Status AggregateOp::SetSchemasForEvaluation( @@ -1334,8 +1418,7 @@ class AggregateTupleIterator : public TupleIterator { // The bool is true if we should stop accumulation for the corresponding // accumulator. -using AccumulatorList = - std::vector, bool>>; +using AccumulatorList = std::vector; // The data associated with a grouping key during aggregation. class GroupValue { @@ -1488,6 +1571,23 @@ absl::StatusOr> AggregateOp::CreateIterator( UnorderedArrayCollisionTracker unordered_array_collision_tracker; absl::Status status; + std::vector grouping_sets = grouping_sets_; + bool has_grouping_sets = !grouping_sets.empty(); + // The number of keys in AggregateOp. + int key_size = static_cast(keys().size()); + // The number of actual keys used for grouping. For grouping sets, we group by + // an extra key - grouping set offset. + int grouping_key_size = has_grouping_sets ? key_size + 1 : key_size; + // To simplify the code below, when it's a regular group by query without + // GROUPING SETS/CUBE/ROLLUP, we also convert the group-by keys to a grouping + // set id with value -1. Theoretically we can use (1 << n) - 1 to represent + // GROUP BY key1, key2, ..., keyn. However this will add a limitation to the + // regular group by query that it can have at most 63 keys. It's guaranteed a + // grouping set id from grouping set query will be always positive. + constexpr int64_t kNoGroupingSetId = -1; + if (grouping_sets.empty()) { + grouping_sets.push_back(kNoGroupingSetId); + } while (true) { const TupleData* next_input = input_iter->Next(); if (next_input == nullptr) { @@ -1495,96 +1595,142 @@ absl::StatusOr> AggregateOp::CreateIterator( break; } - // Determine the key to 'group_to_accumulator_map'. - const std::vector params_and_input_tuple = - ConcatSpans(params, {next_input}); - auto key_data = std::make_unique(keys().size()); - // If collator is present for , is - // collation_key for value of . Otherwise, - // is the same as . - auto collated_key_data = std::make_unique(keys().size()); - - for (int i = 0; i < keys().size(); ++i) { - TupleSlot* slot = key_data->mutable_slot(i); - const KeyArg* key = keys()[i]; - absl::Status status; - if (!key->value_expr()->EvalSimple(params_and_input_tuple, context, slot, - &status)) { - return status; - } + for (int offset = 0; offset < grouping_sets.size(); ++offset) { + int64_t grouping_set = grouping_sets[offset]; + // This means the grouping set id is from a grouping sets/rollup/cube + // query, rather than a regular query. + bool is_grouping_set = grouping_set != kNoGroupingSetId; + // Determine the key to 'group_to_accumulator_map'. + const std::vector params_and_input_tuple = + ConcatSpans(params, {next_input}); + // When it's a grouping set query, We also need to group by an additional + // grouping set offset to allow duplicated grouping sets in the query. In + // this case, it's guaranteed the last key is always the offset. + auto key_data = std::make_unique(grouping_key_size); + // If collator is present for , is + // collation_key for value of . Otherwise, + // is the same as . + auto collated_key_data = std::make_unique(grouping_key_size); + // If GROUPING function call is present in the AggregateOp, it needs a + // special input data with only 0 or 1 when conducting aggregation, rather + // than the original input rows. grouping_value_data[i] is 0 if the key at + // index is in the current grouping set, otherwise its value is 1. The + // grouping accumulator will calculate the output of the GROUPING function + // with this input data and its argument key index. Basically the + // accumulator just returns grouping_value_data.slot(key_index). + auto grouping_value_data = std::make_unique(grouping_key_size); + + for (int i = 0; i < key_size; ++i) { + TupleSlot* slot = key_data->mutable_slot(i); + const KeyArg* key = keys()[i]; + absl::Status status; + if (!key->value_expr()->EvalSimple(params_and_input_tuple, context, + slot, &status)) { + return status; + } + // If the group by key is not in the current grouping set, then the + // value contributed to aggregation will be NULL, and the grouping call + // output is 1. The logic here assumes the key orders won't be changed. + if (is_grouping_set && (grouping_set & (1ull << i)) == 0) { + // The current grouping set doesn't contains keys[i]. + slot->SetValue(Value::Null(key->type())); + grouping_value_data->mutable_slot(i)->SetValue(Value::Int64(1)); + } else { + // The current grouping set contains keys[i] or it's a regular group + // by query. + grouping_value_data->mutable_slot(i)->SetValue(Value::Int64(0)); + } - if ( // Once we know the query is known to be non-deterministic, we - // short-circuit to avoid any overhead from non-determinism - // detection. - context->IsDeterministicOutput() && - unordered_array_collision_tracker.CouldIndicateNondetermisticGrouping( - i, slot->value()) && - // On the first row we know it is not real non-determinism yet. - !group_map.empty()) { - context->SetNonDeterministicOutput(); + if ( // Once we know the query is known to be non-deterministic, we + // short-circuit to avoid any overhead from non-determinism + // detection. + context->IsDeterministicOutput() && + unordered_array_collision_tracker + .CouldIndicateNondetermisticGrouping(i, slot->value()) && + // On the first row we know it is not real non-determinism yet. + !group_map.empty()) { + context->SetNonDeterministicOutput(); + } + + Value* collated_slot_value = + collated_key_data->mutable_slot(i)->mutable_value(); + if (collators[i] == nullptr) { + *collated_slot_value = slot->value(); + } else { + ZETASQL_ASSIGN_OR_RETURN(*collated_slot_value, + GetValueSortKey(slot->value(), *(collators[i]))); + } } + if (is_grouping_set) { + ZETASQL_RET_CHECK_EQ(grouping_key_size, key_size + 1); + // Add the offset to the group by key list. + key_data->mutable_slot(key_size)->SetValue(Value::Int32(offset)); + collated_key_data->mutable_slot(key_size)->SetValue( + Value::Int32(offset)); + } + + // Look up the value in 'group_to_accumulator_map', initializing a new + // one if necessary. + AccumulatorList* accumulators = nullptr; + std::unique_ptr* found_group_value = + zetasql_base::FindOrNull(group_map, TupleDataPtr(collated_key_data.get())); + if (found_group_value == nullptr) { + // Create the new GroupValue. + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr inserted_group_value, + GroupValue::Create(std::move(key_data), + context->memory_accountant())); + + // Initialize the accumulators. + accumulators = inserted_group_value->mutable_accumulator_list(); + accumulators->reserve(aggregators().size()); + for (const AggregateArg* aggregator : aggregators()) { + AggregateArgAccumulatorParam accumulator_param; + ZETASQL_ASSIGN_OR_RETURN(accumulator_param.accumulator, + aggregator->CreateAccumulator(params, context)); + accumulator_param.is_grouping_function = + IsGroupingFunction(aggregator->aggregate_function()); + accumulator_param.stop_bit = false; + accumulators->push_back(std::move(accumulator_param)); + } - Value* collated_slot_value = - collated_key_data->mutable_slot(i)->mutable_value(); - if (collators[i] == nullptr) { - *collated_slot_value = slot->value(); + // Insert the new GroupValue. + ZETASQL_RET_CHECK(group_map + .emplace(TupleDataPtr(collated_key_data.get()), + std::move(inserted_group_value)) + .second); + group_map_keys_memory.push_back(std::move(collated_key_data)); } else { - ZETASQL_ASSIGN_OR_RETURN(*collated_slot_value, - GetValueSortKey(slot->value(), *(collators[i]))); + accumulators = (*found_group_value)->mutable_accumulator_list(); + key_data.reset(); + collated_key_data.reset(); + grouping_value_data.reset(); } - } - // Look up the value in 'group_to_accumulator_map', initializing a new one - // if necessary. - AccumulatorList* accumulators = nullptr; - std::unique_ptr* found_group_value = - zetasql_base::FindOrNull(group_map, TupleDataPtr(collated_key_data.get())); - if (found_group_value == nullptr) { - // Create the new GroupValue. - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr inserted_group_value, - GroupValue::Create(std::move(key_data), - context->memory_accountant())); - - // Initialize the accumulators. - accumulators = inserted_group_value->mutable_accumulator_list(); - accumulators->reserve(aggregators().size()); - for (const AggregateArg* aggregator : aggregators()) { - std::pair, bool> - accumulator_and_stop_bit; - ZETASQL_ASSIGN_OR_RETURN(accumulator_and_stop_bit.first, - aggregator->CreateAccumulator(params, context)); - accumulators->push_back(std::move(accumulator_and_stop_bit)); + // Accumulate. + ZETASQL_RET_CHECK_EQ(accumulators->size(), aggregators().size()); + bool all_accumulators_stopped = true; + for (auto& accumulator_param : *accumulators) { + bool& stop_bit = accumulator_param.stop_bit; + if (stop_bit) continue; + // When GROUPING function call is present in the current AggregateOp, we + // need to replace the input rows to the special grouping_value_data + // which contains either 0 or 1 for each key index. + const TupleData* actual_next_input = next_input; + if (accumulator_param.is_grouping_function) { + actual_next_input = grouping_value_data.get(); + } + if (!accumulator_param.accumulator->Accumulate(*actual_next_input, + &stop_bit, &status)) { + return status; + } + if (!stop_bit) all_accumulators_stopped = false; } - // Insert the new GroupValue. - ZETASQL_RET_CHECK(group_map - .emplace(TupleDataPtr(collated_key_data.get()), - std::move(inserted_group_value)) - .second); - group_map_keys_memory.push_back(std::move(collated_key_data)); - } else { - accumulators = (*found_group_value)->mutable_accumulator_list(); - key_data.reset(); - collated_key_data.reset(); - } - - // Accumulate. - ZETASQL_RET_CHECK_EQ(accumulators->size(), aggregators().size()); - bool all_accumulators_stopped = true; - for (auto& accumulator_and_stop_bit : *accumulators) { - bool& stop_bit = accumulator_and_stop_bit.second; - if (stop_bit) continue; - if (!accumulator_and_stop_bit.first->Accumulate(*next_input, &stop_bit, - &status)) { - return status; + if (all_accumulators_stopped && keys().empty()) { + // We are doing full aggregation and all the accumulators have stopped, + // we can stop reading the input. + break; } - if (!stop_bit) all_accumulators_stopped = false; - } - - if (all_accumulators_stopped && keys().empty()) { - // We are doing full aggregation and all the accumulators have stopped, we - // can stop reading the input. - break; } } @@ -1600,10 +1746,10 @@ absl::StatusOr> AggregateOp::CreateIterator( tuple->AddSlots(accumulators.size() + num_extra_slots); for (int i = 0; i < accumulators.size(); ++i) { - AggregateArgAccumulator& accumulator = *accumulators[i].first; + AggregateArgAccumulator& accumulator = *accumulators[i].accumulator; ZETASQL_ASSIGN_OR_RETURN(Value value, accumulator.GetFinalResult( /*inputs_in_defined_order=*/false)); - tuple->mutable_slot(keys().size() + i)->SetValue(value); + tuple->mutable_slot(grouping_key_size + i)->SetValue(value); } // This can free up considerable memory. E.g., for STRING_AGG. accumulators.clear(); @@ -1652,15 +1798,37 @@ absl::StatusOr> AggregateOp::CreateIterator( // (which are based on purely textual matching). It can also break some user // tests. std::vector slots_for_keys; - slots_for_keys.reserve(keys().size()); - for (int i = 0; i < keys().size(); ++i) { + slots_for_keys.reserve(key_size); + for (int i = 0; i < key_size; ++i) { slots_for_keys.push_back(i); } + std::vector extra_slots_for_keys; + // If the AggregateOp contains grouping sets, then also sort the tuple with + // the extra grouping set offset value. The slot index is key_size. + if (has_grouping_sets) { + extra_slots_for_keys.push_back(key_size); + } ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr tuple_comparator, - TupleComparator::Create(keys(), slots_for_keys, params, context)); + TupleComparator::Create(keys(), slots_for_keys, extra_slots_for_keys, + params, context)); tuples->Sort(*tuple_comparator, context->options().always_use_stable_sort); + // We need to remove the extra grouping key "offset" from the tuples. It's + // only used internally for grouping and sorting, won't expose to other + // operators. + if (has_grouping_sets) { + int64_t tuple_size = tuples->GetSize(); + for (int64_t i = 0; i < tuple_size; ++i) { + std::unique_ptr tuple = tuples->PopFront(); + // The extra grouping key is at index key_size. + tuple->RemoveSlotAt(key_size); + if (!tuples->PushBack(std::move(tuple), &status)) { + return status; + } + } + } + auto input_schema = std::make_unique(input_iter->Schema().variables()); std::unique_ptr iter = @@ -1689,18 +1857,33 @@ std::string AggregateOp::IteratorDebugString() const { std::string AggregateOp::DebugInternal(const std::string& indent, bool verbose) const { - return absl::StrCat("AggregateOp(", - ArgDebugString({"keys", "aggregators", "input"}, - {kN, kN, k1}, indent, verbose), - ")"); + bool has_grouping_sets = !grouping_sets_.empty(); + std::string args_debug_string = ArgDebugString( + {"keys", "aggregators", "input"}, {kN, kN, k1}, indent, verbose, + /*more_children=*/has_grouping_sets); + // Only append grouping_sets debug string to AggregateOp when it's not empty. + std::string grouping_sets_debug_string = ""; + if (has_grouping_sets) { + grouping_sets_debug_string = absl::StrCat( + indent, kIndentFork, "grouping_sets: [", + absl::StrJoin(grouping_sets_, ",", + [](std::string* out, int64_t grouping_set) { + absl::StrAppend(out, "0x", absl::Hex(grouping_set)); + }), + "]"); + } + return absl::StrCat("AggregateOp(", args_debug_string, + grouping_sets_debug_string, ")"); } AggregateOp::AggregateOp(std::vector> keys, std::vector> aggregators, - std::unique_ptr input) { + std::unique_ptr input, + std::vector grouping_sets) { SetArgs(kKey, std::move(keys)); SetArgs(kAggregator, std::move(aggregators)); SetArg(kInput, std::make_unique(std::move(input))); + grouping_sets_ = grouping_sets; } absl::Span AggregateOp::keys() const { @@ -1727,6 +1910,10 @@ RelationalOp* AggregateOp::mutable_input() { return GetMutableArg(kInput)->mutable_node()->AsMutableRelationalOp(); } +absl::Span AggregateOp::grouping_sets() const { + return absl::MakeSpan(grouping_sets_); +} + // static absl::StatusOr> GroupRowsOp::Create( std::vector> columns) { diff --git a/zetasql/reference_impl/aggregate_op_test.cc b/zetasql/reference_impl/aggregate_op_test.cc index 070f3f916..dd6b829df 100644 --- a/zetasql/reference_impl/aggregate_op_test.cc +++ b/zetasql/reference_impl/aggregate_op_test.cc @@ -27,6 +27,7 @@ #include "zetasql/base/testing/status_matchers.h" #include "zetasql/public/numeric_value.h" #include "zetasql/public/type.h" +#include "zetasql/public/types/type_factory.h" #include "zetasql/public/value.h" #include "zetasql/reference_impl/evaluation.h" #include "zetasql/reference_impl/function.h" @@ -110,7 +111,7 @@ struct AggregateFunctionTemplate { }; static std::vector ConcatTemplates( - const std::vector>& + absl::Span> aggregate_vectors) { std::vector result; for (const auto& v : aggregate_vectors) { @@ -594,7 +595,8 @@ TEST(OrderPreservationTest, GroupByAggregate) { {Int64(1), Int64(10)}, {Int64(1), Int64(10)}, {Int64(1), NullInt64()}}), - /*preserves_order=*/true)))); + /*preserves_order=*/true)), + /*grouping_sets=*/{})); EXPECT_EQ( "AggregateOp(\n" @@ -719,7 +721,8 @@ TEST(CreateIteratorTest, AggregateAll) { {Int64(1), NullInt64()}, {Int64(5), NullInt64()}, {Int64(1), Int64(2)}}), - /*preserves_order=*/true)))); + /*preserves_order=*/true)), + /*grouping_sets=*/{})); EXPECT_EQ(aggregate_op->IteratorDebugString(), "AggregationTupleIterator(TestTupleIterator)"); EXPECT_EQ( @@ -995,7 +998,8 @@ TEST(CreateIteratorTest, AggregateOrderBy) { {Int64(1), Int64(10), String("b")}, {Int64(1), NullInt64(), String("a")}, {Int64(1), NullInt64(), NullString()}}), - /*preserves_order=*/true)))); + /*preserves_order=*/true)), + /*grouping_sets=*/{})); EXPECT_EQ( "AggregateOp(\n" "+-keys: {\n" @@ -1276,7 +1280,8 @@ TEST(CreateIteratorTest, AggregateLimit) { {Int64(1), Int64(10), String("b")}, {Int64(1), NullInt64(), String("a")}, {Int64(1), NullInt64(), NullString()}}), - /*preserves_order=*/true)))); + /*preserves_order=*/true)), + /*grouping_sets=*/{})); EXPECT_EQ( "AggregateOp(\n" @@ -1438,7 +1443,8 @@ TEST(CreateIteratorTest, AggregateHaving) { {Int64(1), Int64(10), String("b")}, {Int64(1), NullInt64(), String("a")}, {Int64(1), NullInt64(), NullString()}}), - /*preserves_order=*/true)))); + /*preserves_order=*/true)), + /*grouping_sets=*/{})); EXPECT_EQ( "AggregateOp(\n" @@ -1549,7 +1555,8 @@ TEST(EvalAggTest, ArrayAggWithLimitNonDeterministic) { {Int64(0), Int64(-2)}, {Int64(0), Int64(1)}, {Int64(0), Int64(2)}}), - /*preserves_order=*/true)))); + /*preserves_order=*/true)), + /*grouping_sets=*/{})); EXPECT_EQ( "AggregateOp(\n" @@ -1651,7 +1658,8 @@ TEST(EvalAggTest, ArrayAggWithLimitDeterministic) { {Int64(0), Int64(-2)}, {Int64(0), Int64(1)}, {Int64(0), Int64(2)}}), - /*preserves_order=*/true)))); + /*preserves_order=*/true)), + /*grouping_sets=*/{})); EXPECT_EQ( "AggregateOp(\n" @@ -1703,5 +1711,259 @@ TEST(EvalAggTest, ArrayAggWithLimitDeterministic) { EXPECT_TRUE(context.IsDeterministicOutput()); } +TEST(EvalAggTest, GroupingSetTest) { + TypeFactory type_factory; + + // The following code builds an AggregateOp for the query + // "SELECT key, value, COUNT(*) FROM KeyValue GROUP BY GROUPING SETS(key, + // value)" + // Build group-by keys + VariableId col_key("col_key"), col_value("col_value"), key("key"), + value("value"), agg_count("count"); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_col_key, + DerefExpr::Create(col_key, types::Int64Type())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_col_value, + DerefExpr::Create(col_value, types::StringType())); + std::vector> keys; + keys.push_back(std::make_unique(key, std::move(deref_col_key))); + keys.push_back(std::make_unique(value, std::move(deref_col_value))); + + // Build aggregator count(*) + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto agg, + AggregateArg::Create(agg_count, + std::make_unique( + FunctionKind::kCount, Int64Type(), + /*num_input_fields=*/0, EmptyStructType(), + /*ignores_null=*/true), + {})); + std::vector> aggregators; + aggregators.push_back(std::move(agg)); + + // Build the AggregateOp. + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto aggregate_op, + AggregateOp::Create(std::move(keys), std::move(aggregators), + absl::WrapUnique(new TestRelationalOp( + {col_key, col_value}, + CreateTestTupleDatas({{Int64(1), String("a")}, + {Int64(2), String("b")}, + {Int64(1), String("a")}, + {Int64(2), NullString()}}), + /*preserves_order=*/true)), + /*grouping_sets=*/{1, 2})); + + // Valid the AggregateOp debug string. + EXPECT_EQ( + "AggregateOp(\n" + "+-keys: {\n" + "| +-$key := $col_key,\n" + "| +-$value := $col_value},\n" + "+-aggregators: {\n" + "| +-$count := Count()},\n" + "+-input: TestRelationalOp\n" + "+-grouping_sets: [0x1,0x2])", + aggregate_op->DebugString()); + + // TODO: Abstract the following execution code to a method and + // remove redundancy in all tests. + // Build the output schema + std::unique_ptr output_schema = + aggregate_op->CreateOutputSchema(); + EXPECT_THAT(output_schema->variables(), ElementsAre(key, value, agg_count)); + + EvaluationContext context((EvaluationOptions())); + + // Return result as array of structs. + auto struct_type = MakeStructType( + {{"key", Int64Type()}, {"value", StringType()}, {"count", Int64Type()}}); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_key, DerefExpr::Create(key, Int64Type())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_value, + DerefExpr::Create(value, StringType())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_count, + DerefExpr::Create(agg_count, Int64Type())); + + // Build the result struct expression + std::vector> args_for_struct; + args_for_struct.push_back(std::make_unique(std::move(deref_key))); + args_for_struct.push_back(std::make_unique(std::move(deref_value))); + args_for_struct.push_back(std::make_unique(std::move(deref_count))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto struct_expr, + NewStructExpr::Create(struct_type, std::move(args_for_struct))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto nest_op, + ArrayNestExpr::Create(MakeArrayType(struct_type), std::move(struct_expr), + std::move(aggregate_op), + /*is_with_table=*/false)); + ZETASQL_ASSERT_OK(nest_op->SetSchemasForEvaluation(EmptyParamsSchemas())); + + // Use the reference implementation to evaluate the result. + TupleSlot slot; + absl::Status status; + ASSERT_TRUE(nest_op->EvalSimple(EmptyParams(), &context, &slot, &status)) + << status; + const Value& reference = slot.value(); + + auto expected = StructArray({"key", "value", "count"}, + { + {NullInt64(), NullString(), Int64(1)}, + {NullInt64(), String("a"), Int64(2)}, + {NullInt64(), String("b"), Int64(1)}, + {Int64(1), NullString(), Int64(2)}, + {Int64(2), NullString(), Int64(2)}, + }); + EXPECT_THAT(reference, EqualsValue(expected)); +} + +TEST(EvalAggTest, GroupingFunctionTest) { + TypeFactory type_factory; + + // The following code builds an AggregateOp for the query + // "SELECT key, value, COUNT(*), GROUPING(key), GROUPING(value) + // FROM KeyValue GROUP BY GROUPING SETS(key, value)" + // Build group-by keys + VariableId col_key("col_key"), col_value("col_value"), key("key"), + value("value"), agg_count("count"), agg_grouping_key("grouping_key"), + agg_grouping_value("grouping_value"); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_col_key, + DerefExpr::Create(col_key, types::Int64Type())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_col_value, + DerefExpr::Create(col_value, types::StringType())); + std::vector> keys; + keys.push_back(std::make_unique(key, std::move(deref_col_key))); + keys.push_back(std::make_unique(value, std::move(deref_col_value))); + + // Build aggregator count(*) + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto agg1, + AggregateArg::Create(agg_count, + std::make_unique( + FunctionKind::kCount, Int64Type(), + /*num_input_fields=*/0, EmptyStructType()), + {})); + + // Build aggregator grouping(key) + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr grouping_key_arg, + ConstExpr::Create(Value::Int64(0))); + std::vector> grouping_key_args; + grouping_key_args.push_back(std::move(grouping_key_arg)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto agg2, + AggregateArg::Create(agg_grouping_key, + std::make_unique( + FunctionKind::kGrouping, Int64Type(), + /*num_input_fields=*/1, Int64Type()), + std::move(grouping_key_args))); + + // Build aggregator grouping(value) + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr grouping_value_arg, + ConstExpr::Create(Value::Int64(1))); + std::vector> grouping_value_args; + grouping_value_args.push_back(std::move(grouping_value_arg)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto agg3, + AggregateArg::Create(agg_grouping_value, + std::make_unique( + FunctionKind::kGrouping, Int64Type(), + /*num_input_fields=*/1, Int64Type()), + std::move(grouping_value_args))); + std::vector> aggregators; + aggregators.push_back(std::move(agg1)); + aggregators.push_back(std::move(agg2)); + aggregators.push_back(std::move(agg3)); + + // Build the AggregateOp. + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto aggregate_op, + AggregateOp::Create(std::move(keys), std::move(aggregators), + absl::WrapUnique(new TestRelationalOp( + {col_key, col_value}, + CreateTestTupleDatas({{Int64(1), String("a")}, + {Int64(2), String("b")}, + {Int64(1), String("a")}, + {Int64(2), NullString()}}), + /*preserves_order=*/true)), + /*grouping_sets=*/{1, 2})); + + // Valid the AggregateOp debug string. + EXPECT_EQ( + "AggregateOp(\n" + "+-keys: {\n" + "| +-$key := $col_key,\n" + "| +-$value := $col_value},\n" + "+-aggregators: {\n" + "| +-$count := Count(),\n" + "| +-$grouping_key := Grouping(ConstExpr(0)),\n" + "| +-$grouping_value := Grouping(ConstExpr(1))},\n" + "+-input: TestRelationalOp\n" + "+-grouping_sets: [0x1,0x2])", + aggregate_op->DebugString()); + + // TODO: Abstract the following execution code to a method and + // remove redundancy in all tests. + // Build the output schema + std::unique_ptr output_schema = + aggregate_op->CreateOutputSchema(); + EXPECT_THAT( + output_schema->variables(), + ElementsAre(key, value, agg_count, agg_grouping_key, agg_grouping_value)); + + EvaluationContext context((EvaluationOptions())); + + // Return result as array of structs. + auto struct_type = MakeStructType({{"key", Int64Type()}, + {"value", StringType()}, + {"count", Int64Type()}, + {"grouping_key", Int64Type()}, + {"grouping_value", Int64Type()}}); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_key, DerefExpr::Create(key, Int64Type())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_value, + DerefExpr::Create(value, StringType())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_count, + DerefExpr::Create(agg_count, Int64Type())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_grouping_key, + DerefExpr::Create(agg_grouping_key, Int64Type())); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_grouping_value, + DerefExpr::Create(agg_grouping_value, Int64Type())); + + // Build the result struct expression + std::vector> args_for_struct; + args_for_struct.push_back(std::make_unique(std::move(deref_key))); + args_for_struct.push_back(std::make_unique(std::move(deref_value))); + args_for_struct.push_back(std::make_unique(std::move(deref_count))); + args_for_struct.push_back( + std::make_unique(std::move(deref_grouping_key))); + args_for_struct.push_back( + std::make_unique(std::move(deref_grouping_value))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto struct_expr, + NewStructExpr::Create(struct_type, std::move(args_for_struct))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto nest_op, + ArrayNestExpr::Create(MakeArrayType(struct_type), std::move(struct_expr), + std::move(aggregate_op), + /*is_with_table=*/false)); + ZETASQL_ASSERT_OK(nest_op->SetSchemasForEvaluation(EmptyParamsSchemas())); + + // Use the reference implementation to evaluate the result. + TupleSlot slot; + absl::Status status; + ASSERT_TRUE(nest_op->EvalSimple(EmptyParams(), &context, &slot, &status)) + << status; + const Value& reference = slot.value(); + + auto expected = + StructArray({"key", "value", "count", "grouping_key", "grouping_value"}, + { + {NullInt64(), NullString(), Int64(1), Int64(1), Int64(0)}, + {NullInt64(), String("a"), Int64(2), Int64(1), Int64(0)}, + {NullInt64(), String("b"), Int64(1), Int64(1), Int64(0)}, + {Int64(1), NullString(), Int64(2), Int64(0), Int64(1)}, + {Int64(2), NullString(), Int64(2), Int64(0), Int64(1)}, + }); + EXPECT_THAT(reference, EqualsValue(expected)); +} + } // namespace } // namespace zetasql diff --git a/zetasql/reference_impl/algebrizer.cc b/zetasql/reference_impl/algebrizer.cc index 1712b92ca..a3dd2da64 100644 --- a/zetasql/reference_impl/algebrizer.cc +++ b/zetasql/reference_impl/algebrizer.cc @@ -17,7 +17,9 @@ #include "zetasql/reference_impl/algebrizer.h" #include +#include #include +#include #include #include #include @@ -32,7 +34,6 @@ #include #include "zetasql/base/logging.h" -#include "google/protobuf/descriptor.h" #include "zetasql/analyzer/expr_resolver_helper.h" #include "zetasql/common/aggregate_null_handling.h" #include "zetasql/common/thread_stack.h" @@ -40,15 +41,19 @@ #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/cast.h" #include "zetasql/public/catalog.h" +#include "zetasql/public/collator.h" #include "zetasql/public/function.h" #include "zetasql/public/function_signature.h" +#include "zetasql/public/functions/array_zip_mode.pb.h" #include "zetasql/public/functions/differential_privacy.pb.h" #include "zetasql/public/functions/json.h" #include "zetasql/public/id_string.h" #include "zetasql/public/language_options.h" +#include "zetasql/public/numeric_value.h" #include "zetasql/public/proto_util.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/sql_function.h" +#include "zetasql/public/table_valued_function.h" #include "zetasql/public/templated_sql_function.h" #include "zetasql/public/type.h" #include "zetasql/public/types/type_factory.h" @@ -69,10 +74,13 @@ #include "zetasql/resolved_ast/resolved_ast_visitor.h" #include "zetasql/resolved_ast/resolved_collation.h" #include "zetasql/resolved_ast/resolved_column.h" +#include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/resolved_node_kind.pb.h" #include "zetasql/resolved_ast/serialization.pb.h" -#include "absl/cleanup/cleanup.h" -#include "absl/memory/memory.h" +#include "absl/algorithm/container.h" +#include "absl/container/flat_hash_map.h" +#include "absl/container/flat_hash_set.h" +#include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/ascii.h" @@ -83,8 +91,8 @@ #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" #include "absl/types/span.h" +#include "google/protobuf/descriptor.h" #include "zetasql/base/map_util.h" -#include "zetasql/base/source_location.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" @@ -93,6 +101,11 @@ using zetasql::types::BoolType; namespace zetasql { +#define RETURN_ERROR_IF_OUT_OF_STACK_SPACE() \ + ZETASQL_RETURN_IF_NOT_ENOUGH_STACK( \ + "Out of stack space due to deeply nested query expression during query " \ + "algebrization") + namespace { constexpr FunctionSignatureId kCollationSupportedAnalyticFunctions[] = { @@ -146,6 +159,87 @@ bool IsAnalyticFunctionCollationSupported(const FunctionSignature& signature) { std::end(kCollationSupportedAnalyticFunctions); } +// Converts ResolvedGroupingSet to an int64_t grouping set id. +absl::StatusOr ConvertGroupingSetToGroupingSetId( + const ResolvedGroupingSet* node, + const absl::flat_hash_map& key_to_index_map) { + int64_t grouping_set = 0; + for (const std::unique_ptr& column_ref : + node->group_by_column_list()) { + const int* key_index = + zetasql_base::FindOrNull(key_to_index_map, column_ref->column().name()); + ZETASQL_RET_CHECK_NE(key_index, nullptr); + grouping_set |= (1ll << *key_index); + } + return grouping_set; +} + +// Converts ResolvedCube to a list of int64_t grouping set ids. +absl::StatusOr> ConvertCubeToGroupingSetIds( + const ResolvedCube* cube, + const absl::flat_hash_map& key_to_index_map) { + std::vector grouping_sets; + + int cube_size = cube->cube_column_list_size(); + // This is the hard limit to avoid bitset overflow, the actual maximum number + // of grouping sets allowed is specified by AggregateOp::kMaxGroupingSet. + if (cube_size > 31) { + return absl::InvalidArgumentError( + "Cube can not have more than 31 elements"); + } + // CUBE with n columns generates 2^N grouping sets + uint32_t expanded_grouping_set_size = (1u << cube_size); + grouping_sets.reserve((expanded_grouping_set_size)); + // Though expanded_grouping_set_size is an uint32_t, but the check above makes + // sure uint32_t is smaller than or equal to 2^31, in which case, + // expanded_grouping_set_size - 1 is still in the range of int. + for (int i = expanded_grouping_set_size - 1; i >= 0; --i) { + int64_t current_grouping_set = 0; + std::bitset<32> grouping_set_bit(i); + for (int column_index = 0; column_index < cube_size; ++column_index) { + if (!grouping_set_bit.test(column_index)) { + continue; + } + for (const std::unique_ptr& column_ref : + cube->cube_column_list(column_index)->column_list()) { + const int* key_index = + zetasql_base::FindOrNull(key_to_index_map, column_ref->column().name()); + ZETASQL_RET_CHECK_NE(key_index, nullptr); + current_grouping_set |= (1ll << *key_index); + } + } + grouping_sets.push_back(current_grouping_set); + } + return grouping_sets; +} + +// Converts ResolvedRollup to a list of int64_t grouping set ids. +absl::StatusOr> ConvertRollupToGroupingSetIds( + const ResolvedRollup* rollup, + const absl::flat_hash_map& key_to_index_map) { + std::vector grouping_sets; + // ROLLUP with n columns generates n+1 grouping sets + grouping_sets.reserve(rollup->rollup_column_list_size() + 1); + // Add the empty grouping set to the grouping sets list. + grouping_sets.push_back(0); + int64_t current_grouping_set = 0; + for (const std::unique_ptr& + multi_column : rollup->rollup_column_list()) { + for (const std::unique_ptr& column_ref : + multi_column->column_list()) { + const int* key_index = + zetasql_base::FindOrNull(key_to_index_map, column_ref->column().name()); + ZETASQL_RET_CHECK_NE(key_index, nullptr); + current_grouping_set |= (1ll << *key_index); + } + grouping_sets.push_back(current_grouping_set); + } + // Prefer to output the grouping sets from more to less granular levels of + // subtotals, e.g. (a, b, c), (a, b), (a), and then (). + std::reverse(grouping_sets.begin(), grouping_sets.end()); + return grouping_sets; +} + } // namespace Algebrizer::Algebrizer(const LanguageOptions& language_options, @@ -280,7 +374,7 @@ constexpr absl::string_view kCollatedFunctionNamePostfix = "_with_collation"; absl::Status GetCollatedFunctionNameAndArguments( absl::string_view function_name, std::vector> arguments, - const std::vector& collation_list, + absl::Span collation_list, const LanguageOptions& language_options, std::string* collated_function_name, std::vector>* collated_arguments) { @@ -319,7 +413,12 @@ absl::Status GetCollatedFunctionNameAndArguments( function_name == "strpos" || function_name == "instr" || function_name == "starts_with" || function_name == "ends_with" || function_name == "$like" || function_name == "$like_any" || - function_name == "$like_all") { + function_name == "$not_like_any" || function_name == "$like_all" || + function_name == "$not_like_all" || + function_name == "$like_any_array" || + function_name == "$not_like_any_array" || + function_name == "$like_all_array" || + function_name == "$not_like_all_array") { // For string functions whose collated version take an extra collator // argument, we insert the as a String argument at the // beginning of the vector. We append the postfix to @@ -439,17 +538,21 @@ absl::StatusOr> Algebrizer::AlgebrizeFunctionCall( } // User-defined functions. if (!function_call->function()->IsZetaSQLBuiltin()) { - FunctionEvaluator evaluator; + ContextAwareFunctionEvaluator evaluator; auto callback = function_call->function()->GetFunctionEvaluatorFactory(); if (callback != nullptr) { // An evaluator is already defined for this function. auto status_or_evaluator = callback(function_call->signature()); ZETASQL_RETURN_IF_ERROR(status_or_evaluator.status()); - evaluator = status_or_evaluator.value(); - if (evaluator == nullptr) { + if (status_or_evaluator.value() == nullptr) { return ::zetasql_base::InternalErrorBuilder() << "NULL evaluator returned for user-defined function " << name; } + evaluator = [evaluator = status_or_evaluator.value()]( + const absl::Span arguments, + EvaluationContext& context) { + return evaluator(arguments); + }; } else { // Extract the function body ResolvedAST and argument names const ResolvedExpr* expr; @@ -491,19 +594,23 @@ absl::StatusOr> Algebrizer::AlgebrizeFunctionCall( const LanguageOptions& language_options = this->language_options_; const AlgebrizerOptions& algebrizer_options = this->algebrizer_options_; TypeFactory* type_factory = this->type_factory_; - FunctionEvaluator udf_evaluator( + ContextAwareFunctionEvaluator udf_evaluator( [evaluator_state = MakeUdfEvaluator(expr, argument_names, language_options, algebrizer_options, type_factory)]( - const absl::Span args) -> absl::StatusOr { + const absl::Span args, + EvaluationContext& context) -> absl::StatusOr { ZETASQL_ASSIGN_OR_RETURN(EvaluatorState state, evaluator_state); - // Create an empty EvaluationContext and populate the map with - // argument pairs. - EvaluationContext context((EvaluationOptions())); + // Create a local EvaluationContext for evaluating the function + // body, it is important that the local context uses the same + // EvaluationOptions as the outer context, since this contains + // settings that enable some non determinism checks. + std::unique_ptr local_context = + context.MakeChildContext(); ZETASQL_RET_CHECK(state.argument_names.size() == args.size()); if (!args.empty()) { for (int i = 0; i < state.argument_names.size(); ++i) { - ZETASQL_RETURN_IF_ERROR(context.AddFunctionArgumentRef( + ZETASQL_RETURN_IF_ERROR(local_context->AddFunctionArgumentRef( state.argument_names[i], args[i])); } } @@ -514,11 +621,12 @@ absl::StatusOr> Algebrizer::AlgebrizeFunctionCall( absl::Status status; ZETASQL_RETURN_IF_ERROR(state.algebrized_tree->SetSchemasForEvaluation({})); if (!state.algebrized_tree->EvalSimple( - /*params=*/{}, &context, &result, &status)) { + /*params=*/{}, local_context.get(), &result, &status)) { ZETASQL_RET_CHECK(!status.ok()); return status; } ZETASQL_RET_CHECK(status.ok()); + ZETASQL_RETURN_IF_ERROR(local_context->VerifyNotAborted()); return result.value(); }); evaluator = udf_evaluator; @@ -1028,7 +1136,7 @@ absl::StatusOr> Algebrizer::AlgebrizeUdaCall( ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr agg_op, AggregateOp::Create( /*keys=*/{}, std::move(algebrized_aggregate_exprs), - std::move(input))); + std::move(input), /*grouping_sets=*/{})); // Construct a compute op which augments the output of AggregateOp with an // additional entry representing the algebrized UDA function expression. @@ -1574,8 +1682,9 @@ absl::StatusOr> Algebrizer::AlgebrizeFlattenedArg( absl::StatusOr> Algebrizer::AlgebrizeGetProtoFieldOfPath( const ResolvedExpr* column_or_param_expr, - const std::vector>& path) { + absl::Span> + path) { SharedProtoFieldPath column_and_field_path; switch (column_or_param_expr->node_kind()) { case RESOLVED_COLUMN_REF: @@ -1820,9 +1929,10 @@ Algebrizer::AlgebrizeInLikeAnyLikeAllRelation( std::vector> agg_args; agg_args.push_back(std::move(agg_arg)); - ZETASQL_ASSIGN_OR_RETURN(auto agg_rel, AggregateOp::Create( - /*keys=*/{}, std::move(agg_args), - std::move(haystack_rel))); + ZETASQL_ASSIGN_OR_RETURN(auto agg_rel, + AggregateOp::Create( + /*keys=*/{}, std::move(agg_args), + std::move(haystack_rel), /*grouping_sets=*/{})); // Create a scalar expression from the aggregate. ZETASQL_ASSIGN_OR_RETURN(auto deref_matches_var, @@ -1842,7 +1952,7 @@ absl::StatusOr> Algebrizer::AlgebrizeScalarArrayFunctionWithCollation( FunctionKind kind, const Type* output_type, absl::string_view function_name, std::vector> converted_arguments, - const std::vector& collation_list) { + absl::Span collation_list) { ZETASQL_ASSIGN_OR_RETURN(CollatorList collator_list, MakeCollatorList(collation_list)); @@ -2789,8 +2899,8 @@ absl::StatusOr> Algebrizer::AlgebrizeArrayScan( const JoinOp::JoinKind join_kind = array_scan->is_outer() ? JoinOp::kOuterApply : JoinOp::kCrossApply; - std::vector right_output_columns; - right_output_columns.push_back(array_scan->element_column()); + std::vector right_output_columns = + array_scan->element_column_list(); if (array_scan->array_offset_column() != nullptr) { right_output_columns.push_back( array_scan->array_offset_column()->column()); @@ -2822,12 +2932,15 @@ absl::StatusOr> Algebrizer::AlgebrizeArrayScanWithoutJoin( const ResolvedArrayScan* array_scan, std::vector* active_conjuncts) { - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr array, - AlgebrizeExpression(array_scan->array_expr())); - - const VariableId array_element_in = - column_to_variable_->GetVariableNameFromColumn( - array_scan->element_column()); + int element_column_count = array_scan->array_expr_list_size(); + std::vector element_list(element_column_count); + std::vector> array_list(element_column_count); + for (int i = 0; i < element_column_count; ++i) { + ZETASQL_ASSIGN_OR_RETURN(array_list[i], + AlgebrizeExpression(array_scan->array_expr_list(i))); + element_list[i] = column_to_variable_->GetVariableNameFromColumn( + array_scan->element_column_list(i)); + } VariableId array_position_in; if (array_scan->array_offset_column() != nullptr) { @@ -2835,9 +2948,21 @@ Algebrizer::AlgebrizeArrayScanWithoutJoin( array_scan->array_offset_column()->column()); } - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr rel_op, - ArrayScanOp::Create(array_element_in, array_position_in, - /*fields=*/{}, std::move(array))); + // ARRAY_ZIP_MODE `mode` argument, which defaults to "PAD" if unspecified. + std::unique_ptr mode; + if (array_scan->array_zip_mode() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN(mode, AlgebrizeExpression(array_scan->array_zip_mode())); + ZETASQL_RET_CHECK(mode->output_type()->IsEnum()); + } else { + ZETASQL_ASSIGN_OR_RETURN( + mode, ConstExpr::Create(Value::Enum(types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::PAD))); + } + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr rel_op, + ArrayScanOp::Create(std::move(element_list), array_position_in, + std::move(array_list), std::move(mode))); return MaybeApplyFilterConjuncts(std::move(rel_op), active_conjuncts); } @@ -3874,22 +3999,62 @@ Algebrizer::AlgebrizeGroupSelectionThresholdExpression( return algebrized_conjuncts; } +absl::StatusOr> Algebrizer::AlgebrizeGroupingCall( + const ResolvedGroupingCall* grouping_call, + std::optional anonymization_options, + absl::flat_hash_set& output_columns, + absl::flat_hash_map& key_to_index_map) { + const ResolvedColumnRef* grouping_call_expr = + grouping_call->group_by_column(); + const ResolvedColumn& column = grouping_call->output_column(); + if (!anonymization_options.has_value()) { + // Sanity check that all aggregate functions appear in the output column + // list, so that we don't accidentally return an aggregate function that + // the analyzer pruned from the scan. (If it did that, it should have + // pruned the aggregate function as well.) + ZETASQL_RET_CHECK(output_columns.contains(column)) << column.DebugString(); + } + + // Add the GROUPING function to the output. + const VariableId agg_variable_name = + column_to_variable_->AssignNewVariableToColumn(column); + + const std::string arg_name = grouping_call_expr->column().name(); + const int* key_index = zetasql_base::FindOrNull(key_to_index_map, arg_name); + ZETASQL_RET_CHECK_NE(key_index, nullptr) << "GROUPING function argument " << arg_name + << " not found in key_to_index_map"; + + // GROUPING function is different from other aggregate functions, the output + // is calculated based on whether its argument is in the current grouping + // set. It doesn't aggregate the actual value of the argument. Here we + // simplify its argument to the key index, so at the runtime we can quickly + // check whether its argument is included in the current grouping set by bit + // operation at the grouping set id. + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr argument_expr, + ConstExpr::Create(Value::Int64(*key_index))); + std::vector> args; + args.push_back(std::move(argument_expr)); + + std::unique_ptr function_body = + std::make_unique( + FunctionKind::kGrouping, type_factory_->get_int64(), + /*num_input_fields=*/1, type_factory_->get_int64()); + return AggregateArg::Create(agg_variable_name, std::move(function_body), + std::move(args)); +} + absl::StatusOr> Algebrizer::AlgebrizeAggregateScanBase( const ResolvedAggregateScanBase* aggregate_scan, std::optional anonymization_options) { - if (aggregate_scan->grouping_set_list_size() != 0) { - return absl::UnimplementedError( - "GROUPING SETS, CUBE, and ROLLUP are not implemented."); - } - if (aggregate_scan->grouping_call_list_size() != 0) { - return absl::UnimplementedError("GROUPING function is not implemented."); - } // Algebrize the relational input of the aggregate. ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr input, AlgebrizeScan(aggregate_scan->input_scan())); // Build the list of grouping keys. std::vector> keys; + // The map from the key name to key index. The key index is used to generate + // grouping set ids. + absl::flat_hash_map key_to_index_map; ZETASQL_RET_CHECK(aggregate_scan->collation_list().empty() || aggregate_scan->collation_list().size() == aggregate_scan->group_by_list_size()) @@ -3900,6 +4065,7 @@ Algebrizer::AlgebrizeAggregateScanBase( AlgebrizeExpression(key_expr->expr())); const VariableId key_variable_name = column_to_variable_->AssignNewVariableToColumn(key_expr->column()); + key_to_index_map[key_expr->column().name()] = i; keys.push_back(std::make_unique(key_variable_name, std::move(key))); if (!aggregate_scan->collation_list().empty() && !aggregate_scan->collation_list(i).Empty()) { @@ -3920,9 +4086,11 @@ Algebrizer::AlgebrizeAggregateScanBase( // Build the list of aggregate functions. std::vector> aggregators; - for (const std::unique_ptr& agg_expr : + for (const std::unique_ptr& agg_expr : aggregate_scan->aggregate_list()) { - const ResolvedColumn& column = agg_expr->column(); + ZETASQL_RET_CHECK(agg_expr->Is()); + const ResolvedColumn& column = + agg_expr->GetAs()->column(); if (!anonymization_options.has_value()) { // Sanity check that all aggregate functions appear in the output column // list, so that we don't accidentally return an aggregate function that @@ -3936,12 +4104,87 @@ Algebrizer::AlgebrizeAggregateScanBase( column_to_variable_->AssignNewVariableToColumn(column); ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr agg, - AlgebrizeAggregateFn(agg_variable_name, anonymization_options, - /*filter=*/nullptr, agg_expr->expr())); + AlgebrizeAggregateFn( + agg_variable_name, anonymization_options, + /*filter=*/nullptr, + agg_expr->GetAs()->expr())); + aggregators.push_back(std::move(agg)); + } + + // Algebrize the list of GROUPING function calls. + for (const std::unique_ptr& grouping_call : + aggregate_scan->grouping_call_list()) { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr agg, + AlgebrizeGroupingCall(grouping_call.get(), anonymization_options, + output_columns, key_to_index_map)); aggregators.push_back(std::move(agg)); } + + std::vector grouping_sets; + if (!aggregate_scan->grouping_set_list().empty()) { + // Before GROUPING SETS is supported, rollup columns are stored in + // rollup_column_list, and they are expanded to grouping_set_list by + // default. So this is just an sanity check to verify the grouping sets size + // matches the number of rollup columns, we still extract grouping sets from + // the grouping_set_list. + if (!aggregate_scan->rollup_column_list().empty()) { + ZETASQL_RET_CHECK_EQ(aggregate_scan->grouping_set_list_size(), + aggregate_scan->rollup_column_list_size() + 1); + // Mark columns accessed in rollup_column_list. + absl::c_for_each( + aggregate_scan->rollup_column_list(), + [](const std::unique_ptr& column_ref) { + column_ref->MarkFieldsAccessed(); + }); + } + + // Make sure we can use an int64_t to represent the grouping set id. + if (keys.size() > AggregateOp::kMaxColumnsInGroupingSet) { + return absl::InvalidArgumentError(absl::StrFormat( + "Too many columns in grouping sets, at most %d columns are allowed", + AggregateOp::kMaxColumnsInGroupingSet)); + } + for (const std::unique_ptr& + grouping_set_base_node : aggregate_scan->grouping_set_list()) { + ZETASQL_RET_CHECK(grouping_set_base_node->Is() || + grouping_set_base_node->Is() || + grouping_set_base_node->Is()); + if (grouping_set_base_node->Is()) { + ZETASQL_ASSIGN_OR_RETURN( + int64_t grouping_set, + ConvertGroupingSetToGroupingSetId( + grouping_set_base_node->GetAs(), + key_to_index_map)); + grouping_sets.push_back(grouping_set); + } else if (grouping_set_base_node->Is()) { + ZETASQL_ASSIGN_OR_RETURN(std::vector rollup_grouping_set_ids, + ConvertRollupToGroupingSetIds( + grouping_set_base_node->GetAs(), + key_to_index_map)); + grouping_sets.insert(grouping_sets.end(), + rollup_grouping_set_ids.begin(), + rollup_grouping_set_ids.end()); + } else if (grouping_set_base_node->Is()) { + ZETASQL_ASSIGN_OR_RETURN(std::vector cube_grouping_set_ids, + ConvertCubeToGroupingSetIds( + grouping_set_base_node->GetAs(), + key_to_index_map)); + grouping_sets.insert(grouping_sets.end(), cube_grouping_set_ids.begin(), + cube_grouping_set_ids.end()); + } + // Sanity check the grouping set size in grouping sets. + if (grouping_sets.size() > AggregateOp::kMaxGroupingSets) { + return absl::InvalidArgumentError(absl::StrFormat( + "Too many grouping sets, at most %d grouping sets are allowed", + AggregateOp::kMaxGroupingSets)); + } + } + } + absl::c_sort(grouping_sets); + return AggregateOp::Create(std::move(keys), std::move(aggregators), - std::move(input)); + std::move(input), grouping_sets); } namespace { @@ -4232,8 +4475,7 @@ Algebrizer::MaybeCreateSortForAnalyticOperator( absl::Status Algebrizer::AlgebrizeOrderByItems( bool drop_correlated_columns, bool create_new_ids, - const std::vector>& - order_by_items, + absl::Span> order_by_items, absl::flat_hash_map* column_to_id_map, std::vector>* order_by_keys) { for (const std::unique_ptr& order_by_item : @@ -4666,9 +4908,10 @@ absl::StatusOr> Algebrizer::AlgebrizeUnionScan( keys.back()->set_collation(std::move(collation_expr)); } } - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr aggr_op, - AggregateOp::Create(std::move(keys), {} /* aggregators */, - std::move(union_op))); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr aggr_op, + AggregateOp::Create(std::move(keys), /*aggregators=*/{}, + std::move(union_op), /*grouping_sets=*/{})); return aggr_op; } @@ -4797,9 +5040,9 @@ Algebrizer::AlgebrizeExceptIntersectScan( std::move(agg_func_args))); aggregators.push_back(std::move(agg_arg)); } - ZETASQL_ASSIGN_OR_RETURN(auto query_t, - AggregateOp::Create(std::move(keys), std::move(aggregators), - std::move(query_u))); + ZETASQL_ASSIGN_OR_RETURN(auto query_t, AggregateOp::Create( + std::move(keys), std::move(aggregators), + std::move(query_u), /*grouping_sets=*/{})); // Add filter or cross-apply depending on the kind of set operation. switch (set_scan->op_type()) { @@ -4896,6 +5139,7 @@ Algebrizer::AlgebrizeExceptIntersectScan( absl::StatusOr> Algebrizer::AlgebrizeProjectScan( const ResolvedProjectScan* resolved_project, std::vector* active_conjuncts) { + RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); // Determine the new columns and their definitions. absl::flat_hash_set defined_columns; const std::vector>& expr_list = @@ -5131,9 +5375,10 @@ absl::StatusOr> Algebrizer::AlgebrizePivotScan( aggregators.push_back(std::move(aggregator)); } - ZETASQL_ASSIGN_OR_RETURN(auto agg_op, - AggregateOp::Create(std::move(keys), std::move(aggregators), - std::move(wrapped_input))); + ZETASQL_ASSIGN_OR_RETURN( + auto agg_op, + AggregateOp::Create(std::move(keys), std::move(aggregators), + std::move(wrapped_input), /*grouping_sets=*/{})); return agg_op; } @@ -5258,6 +5503,11 @@ absl::StatusOr> Algebrizer::AlgebrizeScan( rel_op, AlgebrizeGroupRowsScan(scan->GetAs())); break; } + case RESOLVED_TVFSCAN: { + ZETASQL_ASSIGN_OR_RETURN(rel_op, + AlgebrizeTvfScan(scan->GetAs())); + break; + } default: return ::zetasql_base::UnimplementedErrorBuilder() << "Unhandled node type algebrizing a scan: " @@ -5301,7 +5551,7 @@ absl::StatusOr> Algebrizer::AlgebrizeScan( } bool Algebrizer::FindColumnDefinition( - const std::vector>& expr_list, + absl::Span> expr_list, int column_id, const ResolvedExpr** definition) { (*definition) = nullptr; for (int i = 0; i < expr_list.size(); ++i) { @@ -5844,7 +6094,7 @@ absl::Status Algebrizer::PopulateResolvedExprMap( absl::Status Algebrizer::AlgebrizeDefaultAndGeneratedExpressions( const ResolvedTableScan* table_scan, ColumnExprMap* column_expr_map, - const std::vector>& + absl::Span> generated_column_exprs, std::vector topologically_sorted_generated_column_ids) { ZETASQL_RET_CHECK(column_expr_map != nullptr); @@ -6082,6 +6332,12 @@ absl::StatusOr> Algebrizer::FilterDuplicates( absl::StatusOr> Algebrizer::AlgebrizeRecursiveScan( const ResolvedRecursiveScan* recursive_scan) { + // TODO: Implement recursion depth modifier. + if (recursive_scan->recursion_depth_modifier() != nullptr) { + return absl::UnimplementedError( + "WITH DEPTH modifier is not supported yet in reference implementation"); + } + // Algebrize non-recursive term first. ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr non_recursive_term, AlgebrizeScan(recursive_scan->non_recursive_term()->scan())); @@ -6411,4 +6667,87 @@ Algebrizer::AlgebrizeGroupRowsScan( return std::move(group_rows_op); // necessary to work around bugs in gcc. } +absl::StatusOr> Algebrizer::AlgebrizeTvfScan( + const ResolvedTVFScan* tvf_scan) { + // Algebrize input arguments. + std::vector arguments; + for (int i = 0; i < tvf_scan->argument_list().size(); ++i) { + const ResolvedFunctionArgument* argument = tvf_scan->argument_list(i); + if (argument->expr() != nullptr) { + ZETASQL_RET_CHECK(tvf_scan->signature()->argument(i).is_scalar()); + ZETASQL_ASSIGN_OR_RETURN(auto expr_argument, + AlgebrizeExpression(argument->expr())); + arguments.push_back({.value = std::move(expr_argument)}); + continue; + } + + if (argument->scan() != nullptr) { + ZETASQL_RET_CHECK(tvf_scan->signature()->argument(i).is_relation()); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr relation, + AlgebrizeScan(argument->scan())); + + // TVF implementation needs to access input table by column names. + // Preserve variable to column name mapping. + const TVFRelation& relation_signature = + tvf_scan->signature()->argument(i).relation(); + ZETASQL_RET_CHECK_EQ(argument->argument_column_list_size(), + relation_signature.num_columns()); + std::vector columns; + for (int j = 0; j < argument->argument_column_list_size(); ++j) { + const ResolvedColumn& argument_column = + argument->argument_column_list(j); + const TVFSchemaColumn& relation_signature_column = + relation_signature.column(j); + + ZETASQL_RET_CHECK_EQ(argument_column.type(), relation_signature_column.type); + ZETASQL_ASSIGN_OR_RETURN( + VariableId input_variable, + column_to_variable_->LookupVariableNameForColumn(argument_column)); + columns.push_back({relation_signature_column.name, + argument_column.type(), input_variable}); + } + arguments.push_back({.relation = TVFOp::TvfInputRelation{ + std::move(relation), std::move(columns)}}); + continue; + } + + if (argument->model() != nullptr) { + ZETASQL_RET_CHECK(tvf_scan->signature()->argument(i).is_model()); + arguments.push_back({.model = argument->model()->model()}); + continue; + } + + return ::zetasql_base::UnimplementedErrorBuilder() + << "Unimplemented table valued function argument: " + << argument->node_kind_string() << ". " + << "Only expressions, relations and models are currently supported"; + } + + // Algebrize output column names and variables. + ZETASQL_RET_CHECK_EQ(tvf_scan->column_list_size(), + tvf_scan->column_index_list_size()); + std::vector output_columns; + output_columns.reserve(tvf_scan->column_list().size()); + std::vector variables; + variables.reserve(tvf_scan->column_list().size()); + for (int i = 0; i < tvf_scan->column_list_size(); ++i) { + const ResolvedColumn& column = tvf_scan->column_list(i); + const TVFSchemaColumn& signature_column = + tvf_scan->signature()->result_schema().column( + tvf_scan->column_index_list(i)); + ZETASQL_RET_CHECK(column.type()->Equals(signature_column.type)) + << column.type()->DebugString() + << " != " << signature_column.type->DebugString(); + output_columns.push_back(signature_column); + variables.push_back(column_to_variable_->GetVariableNameFromColumn(column)); + } + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr tvf_op, + TVFOp::Create(tvf_scan->tvf(), std::move(arguments), + std::move(output_columns), std::move(variables), + tvf_scan->function_call_signature())); + return std::move(tvf_op); +} + } // namespace zetasql diff --git a/zetasql/reference_impl/algebrizer.h b/zetasql/reference_impl/algebrizer.h index 83c918b94..9ebc2f8b3 100644 --- a/zetasql/reference_impl/algebrizer.h +++ b/zetasql/reference_impl/algebrizer.h @@ -29,11 +29,12 @@ #include #include #include -#include #include #include #include +#include "zetasql/public/catalog.h" +#include "zetasql/public/id_string.h" #include "zetasql/public/language_options.h" #include "zetasql/public/type.h" #include "zetasql/reference_impl/function.h" @@ -43,6 +44,7 @@ #include "zetasql/reference_impl/variable_generator.h" #include "zetasql/reference_impl/variable_id.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/resolved_ast/resolved_collation.h" #include "zetasql/resolved_ast/resolved_column.h" #include "zetasql/resolved_ast/resolved_node_kind.pb.h" #include "gtest/gtest_prod.h" @@ -53,6 +55,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "absl/types/optional.h" +#include "absl/types/span.h" #include "absl/types/variant.h" namespace zetasql { @@ -194,6 +197,7 @@ class Algebrizer { FRIEND_TEST(AlgebrizerTestGroupingAggregation, GroupByMax); FRIEND_TEST(AlgebrizerTestGroupingAggregation, GroupByMin); FRIEND_TEST(AlgebrizerTestGroupingAggregation, GroupBySum); + FRIEND_TEST(StatementAlgebrizerTest, TVF); Algebrizer(const LanguageOptions& options, const AlgebrizerOptions& algebrizer_options, @@ -237,8 +241,9 @@ class Algebrizer { // ResolvedColumnRef, a ResolvedParameter, or a ResolvedExpressionColumn. absl::StatusOr> AlgebrizeGetProtoFieldOfPath( const ResolvedExpr* column_or_param_expr, - const std::vector>& path); + absl::Span> + path); // Algebrize specific expressions. absl::StatusOr> @@ -303,7 +308,7 @@ class Algebrizer { FunctionKind kind, const Type* output_type, absl::string_view function_name, std::vector> converted_arguments, - const std::vector& collation_list); + absl::Span collation_list); // Algebrizes IN, LIKE ANY, or LIKE ALL when the rhs is a subquery. absl::StatusOr> AlgebrizeInLikeAnyLikeAllRelation( @@ -444,6 +449,11 @@ class Algebrizer { absl::StatusOr> AlgebrizeNullFilterForUnpivotScan( const ResolvedUnpivotScan* unpivot_scan, std::unique_ptr input); + absl::StatusOr> AlgebrizeGroupingCall( + const ResolvedGroupingCall* grouping_call, + std::optional anonymization_options, + absl::flat_hash_set& output_columns, + absl::flat_hash_map& key_to_index_map); absl::StatusOr> AlgebrizeAggregateScanBase( const ResolvedAggregateScanBase* aggregate_scan, std::optional anonymization_options); @@ -481,7 +491,8 @@ class Algebrizer { const ResolvedWithRefScan* scan); absl::StatusOr> AlgebrizeGroupRowsScan( const ResolvedGroupRowsScan* group_rows_scan); - + absl::StatusOr> AlgebrizeTvfScan( + const ResolvedTVFScan* tvf_scan); absl::StatusOr> AlgebrizeTableScan( const ResolvedTableScan* table_scan, std::vector* active_conjuncts); @@ -558,7 +569,7 @@ class Algebrizer { // initial VariableId before any change has been made in this function. absl::Status AlgebrizeOrderByItems( bool drop_correlated_columns, bool create_new_ids, - const std::vector>& + absl::Span> order_by_items, absl::flat_hash_map* column_to_id_map, std::vector>* order_by_keys); @@ -662,7 +673,7 @@ class Algebrizer { // 1-to-1 mapping. absl::Status AlgebrizeDefaultAndGeneratedExpressions( const ResolvedTableScan* table_scan, ColumnExprMap* column_expr_map, - const std::vector>& + absl::Span> generated_column_exprs, std::vector topologically_sorted_generated_column_ids); @@ -670,8 +681,7 @@ class Algebrizer { // (*definition) the expression that defines that column, or nullptr if not // found. bool FindColumnDefinition( - const std::vector>& - expr_list, + absl::Span> expr_list, int column_id, const ResolvedExpr** definition); // Returns a RelationalOp representing a filtered view of , excluding diff --git a/zetasql/reference_impl/algebrizer_test.cc b/zetasql/reference_impl/algebrizer_test.cc index 84d537401..9902cf4cd 100644 --- a/zetasql/reference_impl/algebrizer_test.cc +++ b/zetasql/reference_impl/algebrizer_test.cc @@ -26,29 +26,33 @@ #include #include -#include "zetasql/base/logging.h" -#include "zetasql/common/evaluator_test_table.h" #include "zetasql/base/testing/status_matchers.h" -#include "zetasql/public/analyzer.h" #include "zetasql/public/builtin_function.h" #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/catalog.h" #include "zetasql/public/civil_time.h" #include "zetasql/public/function.h" #include "zetasql/public/function.pb.h" +#include "zetasql/public/function_signature.h" +#include "zetasql/public/id_string.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/simple_catalog.h" +#include "zetasql/public/table_valued_function.h" #include "zetasql/public/type.h" #include "zetasql/public/value.h" +#include "zetasql/reference_impl/operator.h" +#include "zetasql/reference_impl/parameters.h" +#include "zetasql/reference_impl/variable_generator.h" +#include "zetasql/reference_impl/variable_id.h" #include "zetasql/resolved_ast/make_node_vector.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/resolved_ast/resolved_column.h" #include "zetasql/resolved_ast/resolved_node_kind.pb.h" -#include "zetasql/testdata/sample_catalog.h" #include "zetasql/testing/using_test_value.cc" #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/memory/memory.h" +#include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" @@ -2025,4 +2029,44 @@ INSTANTIATE_TEST_SUITE_P( GroupBySum, AlgebrizerTestGroupingAggregation, ValuesIn(AlgebrizerTestGroupingAggregation::AllGroupByTests())); +TEST_F(StatementAlgebrizerTest, TVF) { + TVFRelation tvf_output_schema({ + {"o1", Int64Type()}, + {"o2", DoubleType()}, + }); + + FixedOutputSchemaTVF tvf( + {"tvf_no_args"}, + FunctionSignature(FunctionArgumentType::RelationWithSchema( + tvf_output_schema, + /*extra_relation_input_columns_allowed=*/false), + FunctionArgumentTypeList(), /*context_ptr=*/nullptr), + tvf_output_schema); + + ResolvedColumn o1(1, IdString::MakeGlobal("tvf_no_args"), + IdString::MakeGlobal("o1"), types::Int64Type()); + ResolvedColumn o2(2, IdString::MakeGlobal("tvf_no_args"), + IdString::MakeGlobal("o2"), types::DoubleType()); + + std::vector signature_arguments; + auto signature = + std::make_shared(signature_arguments, tvf_output_schema); + + auto tvf_scan = + MakeResolvedTVFScan({o1, o2}, &tvf, signature, /*argument_list=*/{}, + /*column_index_list=*/{0, 1}, /*alias=*/""); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr algebrized_scan, + algebrizer_->AlgebrizeScan(tvf_scan.get())); + EXPECT_EQ(algebrized_scan->DebugString(true), + "TvfOp(\n" + "+-tvf: tvf_no_args\n" + "+-arguments: {}\n" + "+-output_columns: {\n" + "| +-o1\n" + "| +-o2}\n" + "+-variables: {\n" + "| +-$o1\n" + "| +-$o2})"); +} + } // namespace zetasql diff --git a/zetasql/reference_impl/analytic_op_test.cc b/zetasql/reference_impl/analytic_op_test.cc index 530b553b1..c0cba911c 100644 --- a/zetasql/reference_impl/analytic_op_test.cc +++ b/zetasql/reference_impl/analytic_op_test.cc @@ -235,7 +235,7 @@ std::ostream& operator<<(std::ostream& out, } std::vector GetTupleDataPtrs( - const std::vector& tuples) { + absl::Span tuples) { std::vector ptrs; ptrs.reserve(tuples.size()); for (const TupleData& tuple : tuples) { @@ -1759,10 +1759,10 @@ class AnalyticWindowTest // Adjusts by removing tuples with ids in . // The tuple ids in must be in the ascending order. - static AnalyticWindow RemoveFromWindow(const std::vector& ids_to_remove, + static AnalyticWindow RemoveFromWindow(absl::Span ids_to_remove, const AnalyticWindow& window); - static std::string ToString(const std::vector& windows) { + static std::string ToString(absl::Span windows) { std::string ret("{"); for (const AnalyticWindow& window : windows) { absl::StrAppend(&ret, "[", window.start_tuple_id, ",", window.num_tuples, @@ -1894,7 +1894,7 @@ Value AnalyticWindowTest::CreateNaNOrZero(TypeKind type_kind) { } AnalyticWindow AnalyticWindowTest::RemoveFromWindow( - const std::vector& ids_to_remove, const AnalyticWindow& window) { + absl::Span ids_to_remove, const AnalyticWindow& window) { if (window.num_tuples == 0) { return window; } @@ -3220,7 +3220,7 @@ INSTANTIATE_TEST_SUITE_P(AnalyticWindowInfinityOffsetTest, // Appends a new column to and to the end of each tuple. // provides the values of the new column, and // corresponds 1:1 with . -void AddColumn(const VariableId& var, const std::vector& column_values, +void AddColumn(const VariableId& var, absl::Span column_values, std::vector* vars, std::vector* tuples) { vars->push_back(var); ABSL_CHECK_EQ(tuples->size(), column_values.size()); diff --git a/zetasql/reference_impl/common.cc b/zetasql/reference_impl/common.cc index 272e1c283..a8fd97552 100644 --- a/zetasql/reference_impl/common.cc +++ b/zetasql/reference_impl/common.cc @@ -25,6 +25,7 @@ #include "zetasql/public/type.pb.h" #include "zetasql/public/value.h" #include "absl/status/status.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -133,7 +134,7 @@ GetCollatorFromResolvedCollation(const ResolvedCollation& resolved_collation) { absl::StatusOr> GetCollatorFromResolvedCollationList( - const std::vector& collation_list) { + absl::Span collation_list) { ZETASQL_RET_CHECK_LE(collation_list.size(), 1); if (collation_list.empty()) { return nullptr; @@ -157,7 +158,7 @@ GetCollatorFromResolvedCollationValue(const Value& collation_value) { } absl::StatusOr MakeCollatorList( - const std::vector& collation_list) { + absl::Span collation_list) { CollatorList collator_list; if (collation_list.empty()) { diff --git a/zetasql/reference_impl/common.h b/zetasql/reference_impl/common.h index df6143d2b..5b8791f52 100644 --- a/zetasql/reference_impl/common.h +++ b/zetasql/reference_impl/common.h @@ -25,6 +25,7 @@ #include "zetasql/public/type.h" #include "zetasql/resolved_ast/resolved_collation.h" #include "absl/status/status.h" +#include "absl/types/span.h" #include "zetasql/base/status.h" namespace zetasql { @@ -50,7 +51,7 @@ GetCollatorFromResolvedCollation(const ResolvedCollation& resolved_collation); // Returns error when the collation list has more than one collation. absl::StatusOr> GetCollatorFromResolvedCollationList( - const std::vector& collation_list); + absl::Span collation_list); // Returns a collator from a value representing a ResolvedCollation object. // An error will be returned if the input cannot be converted @@ -63,7 +64,7 @@ using CollatorList = std::vector>; // Returns a list of ZetaSqlCollator based on collation information obtained // from resolved function call. absl::StatusOr MakeCollatorList( - const std::vector& collation_list); + absl::Span collation_list); } // namespace zetasql diff --git a/zetasql/reference_impl/evaluation.cc b/zetasql/reference_impl/evaluation.cc index 24b76258f..15ff14af1 100644 --- a/zetasql/reference_impl/evaluation.cc +++ b/zetasql/reference_impl/evaluation.cc @@ -17,6 +17,7 @@ #include "zetasql/reference_impl/evaluation.h" #include +#include #include #include #include @@ -28,6 +29,7 @@ #include "zetasql/public/functions/datetime.pb.h" #include "zetasql/public/type.h" #include "zetasql/public/value.h" +#include "zetasql/reference_impl/tuple.h" #include "absl/container/flat_hash_set.h" #include "absl/container/node_hash_set.h" #include "absl/flags/flag.h" @@ -85,10 +87,32 @@ absl::Status ValidateFirstColumnPrimaryKey( } EvaluationContext::EvaluationContext(const EvaluationOptions& options) + : EvaluationContext( + options, + std::make_shared(options.max_intermediate_byte_size, + "max_intermediate_byte_size"), + /*parent_context=*/nullptr) {} +EvaluationContext::EvaluationContext( + const EvaluationOptions& options, + std::shared_ptr memory_accountant, + EvaluationContext* parent_context) : options_(options), - memory_accountant_(options.max_intermediate_byte_size, - "max_intermediate_byte_size"), - deterministic_output_(true) {} + memory_accountant_(memory_accountant), + deterministic_output_(true), + parent_context_(parent_context) {} + +std::unique_ptr EvaluationContext::MakeChildContext() const { + EvaluationContext* mutable_parent_ref = const_cast(this); + std::unique_ptr child_context = absl::WrapUnique( + new EvaluationContext(options_, memory_accountant_, mutable_parent_ref)); + child_context->SetLanguageOptions(language_options_); + child_context->SetSessionUser(session_user_); + child_context->SetStatementEvaluationDeadline(statement_eval_deadline_); + if (!IsDeterministicOutput()) { + child_context->SetNonDeterministicOutput(); + } + return child_context; +} absl::Status EvaluationContext::AddTableAsArray( absl::string_view table_name, bool is_value_table, Value array, @@ -152,6 +176,16 @@ void EvaluationContext::InitializeDefaultTimeZone() { } void EvaluationContext::InitializeCurrentTimestamp() { + if (parent_context_ != nullptr) { + current_timestamp_ = parent_context_->GetCurrentTimestamp(); + current_date_in_default_timezone_ = + parent_context_->GetCurrentDateInDefaultTimezone(); + current_datetime_in_default_timezone_ = + parent_context_->GetCurrentDatetimeInDefaultTimezone(); + current_time_in_default_timezone_ = + parent_context_->GetCurrentTimeInDefaultTimezone(); + return; + } current_timestamp_ = absl::ToUnixMicros(clock_->TimeNow()); LazilyInitializeDefaultTimeZone(); @@ -172,6 +206,21 @@ void EvaluationContext::InitializeCurrentTimestamp() { default_timezone_.value(), ¤t_time_in_default_timezone_)); } +void EvaluationContext::SetNonDeterministicOutput() { + deterministic_output_ = false; + if (parent_context_ != nullptr) { + parent_context_->SetNonDeterministicOutput(); + } +} + +absl::TimeZone EvaluationContext::GetDefaultTimeZone() { + if (parent_context_ != nullptr) { + return parent_context_->GetDefaultTimeZone(); + } + LazilyInitializeDefaultTimeZone(); + return default_timezone_.value(); +} + // Indicate which errors should be converted to NULL in SAFE mode. // For built-in functions, we expect to see only OUT_OF_RANGE. // We try to handle others here in a reasonable way in case users are diff --git a/zetasql/reference_impl/evaluation.h b/zetasql/reference_impl/evaluation.h index 345e86f1c..fcb25fd72 100644 --- a/zetasql/reference_impl/evaluation.h +++ b/zetasql/reference_impl/evaluation.h @@ -128,9 +128,18 @@ class EvaluationContext { EvaluationContext(const EvaluationContext&) = delete; EvaluationContext& operator=(const EvaluationContext&) = delete; + // Creates a local evaluator that inherits statement level properties + // from the parent evaluator. Useful for executing SQL defined + // function bodies which require their own scope but should share some + // global execution state. + // The local evaluator keeps a reference to the parent, and will + // propagate non-determinism if encountered + // The parent context must outlive any child contexts it creates. + std::unique_ptr MakeChildContext() const; + const EvaluationOptions& options() const { return options_; } - MemoryAccountant* memory_accountant() { return &memory_accountant_; } + MemoryAccountant* memory_accountant() { return memory_accountant_.get(); } // Returns the `value` associated with `arg_name` or an invalid Value. Value GetFunctionArgumentRef(std::string arg_name); @@ -157,8 +166,9 @@ class EvaluationContext { const LanguageOptions& language_options); // Indicates that the result of evaluation is non-deterministic. - void SetNonDeterministicOutput() { deterministic_output_ = false; } - + // If this context has a non-null parent context, then we also set + // non-determinism on the parent context. + void SetNonDeterministicOutput(); bool IsDeterministicOutput() const { return deterministic_output_; } void SetLanguageOptions(LanguageOptions options) { @@ -181,10 +191,7 @@ class EvaluationContext { // If necessary, (lazily) initializes the default timezone. Lazy // initialization saves time for most evaluations, which don't require time // zone information. - absl::TimeZone GetDefaultTimeZone() { - LazilyInitializeDefaultTimeZone(); - return default_timezone_.value(); - } + absl::TimeZone GetDefaultTimeZone(); // If necessary, (lazily) initializes the random number generator. Lazy // initialization saves time for most evaluations, which don't require random @@ -374,6 +381,10 @@ class EvaluationContext { udf_argument_references_; private: + EvaluationContext(const EvaluationOptions& options, + std::shared_ptr memory_accountant, + EvaluationContext* parent_context); + void LazilyInitializeDefaultTimeZone() { if (!default_timezone_.has_value()) { InitializeDefaultTimeZone(); @@ -391,7 +402,7 @@ class EvaluationContext { void InitializeCurrentTimestamp(); const EvaluationOptions options_; - MemoryAccountant memory_accountant_; + std::shared_ptr memory_accountant_; // Tables added by AddTableAsArray(). std::map> tables_; @@ -439,6 +450,10 @@ class EvaluationContext { // The current user, specified by the engine. Used to evaluate the // SESSION_USER function. Defaults to an empty string if not set. std::string session_user_ = ""; + + // A reference to the EvaluationContext that created this context by + // calling `MakeChildContext`. Always nullptr if not created this way. + EvaluationContext* parent_context_ = nullptr; }; // Returns true if we should suppress 'error' (which must not be OK) in diff --git a/zetasql/reference_impl/evaluation_context_test.cc b/zetasql/reference_impl/evaluation_context_test.cc new file mode 100644 index 000000000..ad6617c06 --- /dev/null +++ b/zetasql/reference_impl/evaluation_context_test.cc @@ -0,0 +1,104 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include +#include + +#include "zetasql/public/language_options.h" +#include "zetasql/reference_impl/evaluation.h" +#include "gtest/gtest.h" +#include "absl/time/clock.h" +#include "absl/time/time.h" + +namespace zetasql { +namespace { + +TEST(EvaluationContext, ChildContextTest) { + EvaluationOptions options; + options.always_use_stable_sort = true; + EvaluationContext context = EvaluationContext(options); + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_TEST_IDEALLY_DISABLED); + language_options.DisableLanguageFeature( + FEATURE_TEST_IDEALLY_ENABLED_BUT_IN_DEVELOPMENT); + context.SetLanguageOptions(language_options); + context.SetStatementEvaluationDeadline(absl::Now() + absl::Seconds(1)); + context.SetSessionUser("test_user"); + + // Check that a child context created from the above context inherits + // expected properties. + std::unique_ptr child_context = context.MakeChildContext(); + // EvaluationOptions fields should be preserved + EXPECT_EQ(context.options().always_use_stable_sort, + child_context->options().always_use_stable_sort); + // LanguageOptions should be preserved + EXPECT_EQ(context.GetLanguageOptions().GetEnabledLanguageFeatures(), + child_context->GetLanguageOptions().GetEnabledLanguageFeatures()); + EXPECT_TRUE(child_context->GetLanguageOptions().LanguageFeatureEnabled( + FEATURE_TEST_IDEALLY_DISABLED)); + EXPECT_FALSE(child_context->GetLanguageOptions().LanguageFeatureEnabled( + FEATURE_TEST_IDEALLY_ENABLED_BUT_IN_DEVELOPMENT)); + // Statement deadline, if previously set on the parent, is preserved + EXPECT_EQ(context.GetStatementEvaluationDeadline(), + child_context->GetStatementEvaluationDeadline()); + // Session user, if previously set on the parent, is preserved + EXPECT_EQ(context.GetSessionUser(), child_context->GetSessionUser()); + // Both parent and child context should point to the same MemoryAccountant + // object + EXPECT_EQ(context.memory_accountant(), child_context->memory_accountant()); +} + +TEST(EvaluationContext, ChildContextCalledFirstTest) { + EvaluationContext context = EvaluationContext(EvaluationOptions()); + std::unique_ptr child_context = context.MakeChildContext(); + + // Nondeterminism is propagated upwards from child to parent + child_context->SetNonDeterministicOutput(); + EXPECT_FALSE(child_context->IsDeterministicOutput()); + EXPECT_FALSE(context.IsDeterministicOutput()); + + absl::TimeZone timezone1 = child_context->GetDefaultTimeZone(); + absl::TimeZone timezone2 = context.GetDefaultTimeZone(); + EXPECT_EQ(timezone1, timezone2); + + int64_t ts1 = child_context->GetCurrentTimestamp(); + int64_t ts2 = context.GetCurrentTimestamp(); + EXPECT_EQ(ts1, ts2); +} + +TEST(EvaluationContext, ParentContextCalledFirstTest) { + EvaluationContext context = EvaluationContext(EvaluationOptions()); + std::unique_ptr child_context = context.MakeChildContext(); + + // Non-determinism is not propagated downward by default when set on parent + // context. We need non-determinism to propagate upwards in the case of + // SQL defined function bodies which use a child context, by contrast + // propagating downwards doesn't matter as much. + context.SetNonDeterministicOutput(); + EXPECT_FALSE(context.IsDeterministicOutput()); + EXPECT_TRUE(child_context->IsDeterministicOutput()); + + absl::TimeZone timezone1 = context.GetDefaultTimeZone(); + absl::TimeZone timezone2 = child_context->GetDefaultTimeZone(); + EXPECT_EQ(timezone1, timezone2); + + int64_t ts1 = context.GetCurrentTimestamp(); + int64_t ts2 = child_context->GetCurrentTimestamp(); + EXPECT_EQ(ts1, ts2); +} + +} // namespace +} // namespace zetasql diff --git a/zetasql/reference_impl/expected_errors.cc b/zetasql/reference_impl/expected_errors.cc index cecb4a644..102896226 100644 --- a/zetasql/reference_impl/expected_errors.cc +++ b/zetasql/reference_impl/expected_errors.cc @@ -86,6 +86,9 @@ std::unique_ptr> ReferenceExpectedErrorMatcher( error_matchers.emplace_back( new StatusRegexMatcher(absl::StatusCode::kUnimplemented, "as (?:a )?PIVOT expression is not supported")); + error_matchers.emplace_back(std::make_unique( + absl::StatusCode::kInvalidArgument, + "SQL-defined aggregate functions are not supported in PIVOT")); // TODO: RQG should not generate proto expressions for protos in // zetasql.functions.* as they are often "special" (e.g. only allowed as diff --git a/zetasql/reference_impl/function.cc b/zetasql/reference_impl/function.cc index b258838fb..3fe547757 100644 --- a/zetasql/reference_impl/function.cc +++ b/zetasql/reference_impl/function.cc @@ -24,8 +24,8 @@ #include #include #include +#include #include -#include #include #include #include @@ -52,6 +52,7 @@ #include "zetasql/public/function.h" #include "zetasql/public/function_signature.h" #include "zetasql/public/functions/arithmetics.h" +#include "zetasql/public/functions/array_zip_mode.pb.h" #include "zetasql/public/functions/bitcast.h" #include "zetasql/public/functions/bitwise.h" #include "zetasql/public/functions/common_proto.h" @@ -60,14 +61,26 @@ #include "zetasql/public/functions/datetime.pb.h" #include "zetasql/public/functions/differential_privacy.pb.h" #include "zetasql/public/functions/distance.h" +#include "zetasql/public/interval_value.h" +#include "zetasql/public/types/timestamp_util.h" #include "zetasql/public/types/type.h" +#include "zetasql/reference_impl/functions/like.h" #include "zetasql/reference_impl/operator.h" +#include "zetasql/reference_impl/tuple.h" +#include "absl/base/attributes.h" +#include "absl/base/const_init.h" +#include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "zetasql/base/check.h" +#include "absl/log/die_if_null.h" +#include "absl/random/distributions.h" #include "absl/strings/ascii.h" +#include "absl/strings/match.h" #include "absl/strings/str_format.h" #include "absl/strings/substitute.h" #include "absl/time/civil_time.h" +#include "google/protobuf/descriptor.h" +#include "google/protobuf/dynamic_message.h" #include "google/protobuf/io/coded_stream.h" #include "google/protobuf/io/zero_copy_stream_impl_lite.h" #include "zetasql/public/functions/string_format.h" @@ -119,11 +132,9 @@ #include "proto/data.pb.h" #include "algorithms/quantiles.h" #include "zetasql/base/map_util.h" -#include "zetasql/base/source_location.h" #include "zetasql/base/exactfloat.h" #include "re2/re2.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" @@ -138,6 +149,8 @@ namespace zetasql { namespace { +using ::zetasql::functions::ArrayZipEnums; + static bool IsTypeWithDistinguishableTies(const Type* type, const CollatorList& collator_list) { return type->IsInterval() || (type->IsString() && !collator_list.empty()); @@ -556,6 +569,7 @@ FunctionMap::FunctionMap() { RegisterFunction(FunctionKind::kIsNotDistinct, "$is_not_distinct_from", "IsDistinct"); RegisterFunction(FunctionKind::kExists, "exists", "Exists"); + RegisterFunction(FunctionKind::kGrouping, "grouping", "Grouping"); RegisterFunction(FunctionKind::kGenerateArray, "generate_array", "GenerateArray"); RegisterFunction(FunctionKind::kGenerateDateArray, "generate_date_array", @@ -583,14 +597,48 @@ FunctionMap::FunctionMap() { RegisterFunction(FunctionKind::kJsonValueArray, "json_value_array", "JsonValueArray"); RegisterFunction(FunctionKind::kToJson, "to_json", "ToJson"); + RegisterFunction(FunctionKind::kStringArray, "string_array", "StringArray"); + RegisterFunction(FunctionKind::kInt32, "int32", "Int32"); + RegisterFunction(FunctionKind::kInt32Array, "int32_array", "Int32Array"); RegisterFunction(FunctionKind::kInt64, "int64", "Int64"); + RegisterFunction(FunctionKind::kInt64Array, "int64_array", "Int64Array"); + RegisterFunction(FunctionKind::kUint32, "uint32", "Uint32"); + RegisterFunction(FunctionKind::kUint32Array, "uint32_array", "Uint32Array"); + RegisterFunction(FunctionKind::kUint64, "uint64", "Uint64"); + RegisterFunction(FunctionKind::kUint64Array, "uint64_array", "Uint64Array"); RegisterFunction(FunctionKind::kDouble, "float64", "Float64"); + RegisterFunction(FunctionKind::kDoubleArray, "float64_array", + "Float64Array"); + RegisterFunction(FunctionKind::kFloat, "float32", "Float32"); + RegisterFunction(FunctionKind::kFloatArray, "float32_array", + "Float32Array"); RegisterFunction(FunctionKind::kBool, "bool", "Bool"); + RegisterFunction(FunctionKind::kBoolArray, "bool_array", "BoolArray"); RegisterFunction(FunctionKind::kJsonType, "json_type", "JsonType"); RegisterFunction(FunctionKind::kLaxBool, "lax_bool", "LaxBool"); + RegisterFunction(FunctionKind::kLaxBoolArray, "lax_bool_array", + "LaxBoolArray"); + RegisterFunction(FunctionKind::kLaxInt32, "lax_int32", "LaxInt32"); + RegisterFunction(FunctionKind::kLaxInt32Array, "lax_int32_array", + "LaxInt32Array"); RegisterFunction(FunctionKind::kLaxInt64, "lax_int64", "LaxInt64"); + RegisterFunction(FunctionKind::kLaxInt64Array, "lax_int64_array", + "LaxInt64Array"); + RegisterFunction(FunctionKind::kLaxUint32, "lax_uint32", "LaxUint32"); + RegisterFunction(FunctionKind::kLaxUint32Array, "lax_uint32_array", + "LaxUint32Array"); + RegisterFunction(FunctionKind::kLaxUint64, "lax_uint64", "LaxUint64"); + RegisterFunction(FunctionKind::kLaxUint64Array, "lax_uint64_array", + "LaxUint64Array"); RegisterFunction(FunctionKind::kLaxDouble, "lax_float64", "LaxFloat64"); + RegisterFunction(FunctionKind::kLaxDoubleArray, "lax_float64_array", + "LaxFloat64Array"); + RegisterFunction(FunctionKind::kLaxFloat, "lax_float32", "LaxFloat32"); + RegisterFunction(FunctionKind::kLaxFloatArray, "lax_float32_array", + "LaxFloat32Array"); RegisterFunction(FunctionKind::kLaxString, "lax_string", "LaxString"); + RegisterFunction(FunctionKind::kLaxStringArray, "lax_string_array", + "LaxStringArray"); RegisterFunction(FunctionKind::kToJsonString, "to_json_string", "ToJsonString"); RegisterFunction(FunctionKind::kParseJson, "parse_json", "ParseJson"); @@ -618,15 +666,37 @@ FunctionMap::FunctionMap() { RegisterFunction(FunctionKind::kLikeWithCollation, "$like_with_collation", "LikeWithCollation"); RegisterFunction(FunctionKind::kLikeAny, "$like_any", "LikeAny"); + RegisterFunction(FunctionKind::kNotLikeAny, "$not_like_any", "NotLikeAny"); RegisterFunction(FunctionKind::kLikeAnyWithCollation, "$like_any_with_collation", "LikeAnyWithCollation"); + RegisterFunction(FunctionKind::kNotLikeAnyWithCollation, + "$not_like_any_with_collation", "NotLikeAnyWithCollation"); RegisterFunction(FunctionKind::kLikeAll, "$like_all", "LikeAll"); + RegisterFunction(FunctionKind::kNotLikeAll, "$not_like_all", "NotLikeAll"); RegisterFunction(FunctionKind::kLikeAllWithCollation, "$like_all_with_collation", "LikeAllWithCollation"); + RegisterFunction(FunctionKind::kNotLikeAllWithCollation, + "$not_like_all_with_collation", "NotLikeAllWithCollation"); RegisterFunction(FunctionKind::kLikeAnyArray, "$like_any_array", "LikeAnyArray"); + RegisterFunction(FunctionKind::kNotLikeAnyArray, "$not_like_any_array", + "NotLikeAnyArray"); + RegisterFunction(FunctionKind::kLikeAnyArrayWithCollation, + "$like_any_array_with_collation", + "LikeAnyArrayWithCollation"); + RegisterFunction(FunctionKind::kNotLikeAnyArrayWithCollation, + "$not_like_any_array_with_collation", + "NotLikeAnyArrayWithCollation"); RegisterFunction(FunctionKind::kLikeAllArray, "$like_all_array", "LikeAllArray"); + RegisterFunction(FunctionKind::kNotLikeAllArray, "$not_like_all_array", + "NotLikeAllArray"); + RegisterFunction(FunctionKind::kLikeAllArrayWithCollation, + "$like_all_array_with_collation", + "LikeAllArrayWithCollation"); + RegisterFunction(FunctionKind::kNotLikeAllArrayWithCollation, + "$not_like_all_array_with_collation", + "NotLikeAllArrayWithCollation"); RegisterFunction(FunctionKind::kLogicalAnd, "logical_and", "LogicalAnd"); RegisterFunction(FunctionKind::kLogicalOr, "logical_or", "LogicalOr"); RegisterFunction(FunctionKind::kMakeProto, "make_proto", "MakeProto"); @@ -995,10 +1065,28 @@ FunctionMap::FunctionMap() { "ArrayFindAll"); RegisterFunction(FunctionKind::kCosineDistance, "cosine_distance", "CosineDistance"); + RegisterFunction(FunctionKind::kApproxCosineDistance, + "approx_cosine_distance", "ApproxCosineDistance"); RegisterFunction(FunctionKind::kEuclideanDistance, "euclidean_distance", "EuclideanDistance"); + RegisterFunction(FunctionKind::kApproxEuclideanDistance, + "approx_euclidean_distance", "ApproxEuclideanDistance"); + RegisterFunction(FunctionKind::kApproxDotProduct, "approx_dot_product", + "ApproxDotProduct"); + RegisterFunction(FunctionKind::kDotProduct, "dot_product", "DotProduct"); + RegisterFunction(FunctionKind::kManhattanDistance, "manhattan_distance", + "ManhattanDistance"); + RegisterFunction(FunctionKind::kL1Norm, "l1_norm", "L1Norm"); + RegisterFunction(FunctionKind::kL2Norm, "l2_norm", "L2Norm"); RegisterFunction(FunctionKind::kEditDistance, "edit_distance", "EditDistance"); + RegisterFunction(FunctionKind::kArrayZip, "array_zip", "ArrayZip"); + RegisterFunction(FunctionKind::kElementwiseSum, "elementwise_sum", + "ElementwiseSum"); + RegisterFunction(FunctionKind::kElementwiseAvg, "elementwise_avg", + "ElementwiseAvg"); + RegisterFunction(FunctionKind::kMapFromArray, "map_from_array", + "MapFromArray"); }(); } // NOLINT(readability/fn_size) @@ -1703,51 +1791,107 @@ static absl::Status ValidateSupportedTypes( return absl::OkStatus(); } +struct CheckVectorDistanceInputTypeOptions { + // Whether to expect 1 or 2 input vectors. If false, expect and check + // exactly 1. If true, expect and check exactly 2. + bool expect_pair_of_vectors; + bool allow_int64_elements; + bool allow_struct_elements; +}; + static absl::Status CheckVectorDistanceInputType( - const std::vector& input_types) { - ZETASQL_RET_CHECK_EQ(input_types.size(), 2) << absl::Substitute( - "input type size must be exactly 2 but got $0", input_types.size()); + const std::vector& input_types, + CheckVectorDistanceInputTypeOptions options) { + const int num_vectors = options.expect_pair_of_vectors ? 2 : 1; + ZETASQL_RET_CHECK_EQ(input_types.size(), num_vectors) + << absl::Substitute("Input type size must be exactly $0 but got $1", + num_vectors, input_types.size()); for (int i = 0; i < input_types.size(); ++i) { - ZETASQL_RET_CHECK(input_types[i]->IsArray()) << "both input types must be array"; + ZETASQL_RET_CHECK(input_types[i]->IsArray()) << "All input types must be arrays"; } + + std::string same_element_type_error_message = + "Array element types must be the same"; + if (input_types[0]->AsArray()->element_type()->IsDouble()) { - ZETASQL_RET_CHECK(input_types[1]->AsArray()->element_type()->IsDouble()) - << "array element type must be both DOUBLE"; + if (options.expect_pair_of_vectors) { + ZETASQL_RET_CHECK(input_types[1]->AsArray()->element_type()->IsDouble()) + << same_element_type_error_message; + } return absl::OkStatus(); - } else if (input_types[0]->AsArray()->element_type()->IsFloat()) { - ZETASQL_RET_CHECK(input_types[1]->AsArray()->element_type()->IsFloat()) - << "array element type must be both FLOAT"; + } + + if (input_types[0]->AsArray()->element_type()->IsFloat()) { + if (options.expect_pair_of_vectors) { + ZETASQL_RET_CHECK(input_types[1]->AsArray()->element_type()->IsFloat()) + << same_element_type_error_message; + } return absl::OkStatus(); } - for (int i = 0; i < input_types.size(); ++i) { - ZETASQL_RET_CHECK(input_types[i]->AsArray()->element_type()->IsStruct()) - << "array element type must be struct"; - ZETASQL_RET_CHECK_EQ( - input_types[i]->AsArray()->element_type()->AsStruct()->num_fields(), 2) - << "array struct element type must have exactly 2 fields"; - ZETASQL_RET_CHECK(input_types[i] - ->AsArray() - ->element_type() - ->AsStruct() - ->fields()[1] - .type->IsDouble()) - << "array struct 2nd element type must be DOUBLE"; - } - auto key_type0 = - input_types[0]->AsArray()->element_type()->AsStruct()->fields()[0].type; - auto key_type1 = - input_types[1]->AsArray()->element_type()->AsStruct()->fields()[0].type; - ZETASQL_RET_CHECK_EQ(key_type0, key_type1) << "key types must be the same"; + if (options.allow_int64_elements) { + if (input_types[0]->AsArray()->element_type()->IsInt64()) { + if (options.expect_pair_of_vectors) { + ZETASQL_RET_CHECK(input_types[1]->AsArray()->element_type()->IsInt64()) + << same_element_type_error_message; + } + return absl::OkStatus(); + } + } - return absl::OkStatus(); + if (options.allow_struct_elements) { + if (input_types[0]->AsArray()->element_type()->IsStruct()) { + for (int i = 0; i < input_types.size(); ++i) { + ZETASQL_RET_CHECK(input_types[i]->AsArray()->element_type()->IsStruct()) + << same_element_type_error_message; + ZETASQL_RET_CHECK_EQ( + input_types[i]->AsArray()->element_type()->AsStruct()->num_fields(), + 2) + << "Array struct element type must have exactly 2 fields"; + ZETASQL_RET_CHECK(input_types[i] + ->AsArray() + ->element_type() + ->AsStruct() + ->fields()[1] + .type->IsDouble()) + << "Array struct 2nd element type must be DOUBLE"; + + if (options.expect_pair_of_vectors) { + auto key_type0 = input_types[0] + ->AsArray() + ->element_type() + ->AsStruct() + ->fields()[0] + .type; + auto key_type1 = input_types[1] + ->AsArray() + ->element_type() + ->AsStruct() + ->fields()[0] + .type; + ZETASQL_RET_CHECK_EQ(key_type0, key_type1) << "Key types must be the same"; + } + } + return absl::OkStatus(); + } + } + + // If no valid argument type was identified and correctly validated above, + // then the type is unsupported. + std::string unsupported_type_error_message = "Unsupported array element type"; + ZETASQL_RET_CHECK_FAIL() << unsupported_type_error_message; + return absl::InvalidArgumentError(unsupported_type_error_message); } static absl::StatusOr> CreateCosineDistanceFunction(std::vector& input_types, const Type* output_type) { - ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType(input_types)); + ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType( + input_types, + CheckVectorDistanceInputTypeOptions{.expect_pair_of_vectors = true, + .allow_int64_elements = false, + .allow_struct_elements = true})); bool is_signature_dense = input_types[0]->AsArray()->element_type()->IsDouble() || @@ -1774,7 +1918,7 @@ CreateCosineDistanceFunction(std::vector& input_types, ->AsStruct() ->field(0) .type->IsString(); - ZETASQL_RET_CHECK(is_string) << "input type must be either STRUCT with INT64 index " + ZETASQL_RET_CHECK(is_string) << "Input type must be either STRUCT with INT64 index " "field or STRING index field"; return std::make_unique( FunctionKind::kCosineDistance, output_type); @@ -1783,7 +1927,11 @@ CreateCosineDistanceFunction(std::vector& input_types, static absl::StatusOr> CreateEuclideanDistanceFunction(std::vector& input_types, const Type* output_type) { - ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType(input_types)); + ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType( + input_types, + CheckVectorDistanceInputTypeOptions{.expect_pair_of_vectors = true, + .allow_int64_elements = false, + .allow_struct_elements = true})); bool is_signature_dense = input_types[0]->AsArray()->element_type()->IsDouble() || input_types[0]->AsArray()->element_type()->IsFloat(); @@ -1809,12 +1957,62 @@ CreateEuclideanDistanceFunction(std::vector& input_types, ->AsStruct() ->field(0) .type->IsString(); - ZETASQL_RET_CHECK(is_string) << "input type must be either STRUCT with INT64 index " + ZETASQL_RET_CHECK(is_string) << "Input type must be either STRUCT with INT64 index " "field or STRING index field"; return std::make_unique( FunctionKind::kEuclideanDistance, output_type); } +static absl::StatusOr> +CreateDotProductFunction(std::vector& input_types, + const Type* output_type) { + ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType( + input_types, + CheckVectorDistanceInputTypeOptions{.expect_pair_of_vectors = true, + .allow_int64_elements = true, + .allow_struct_elements = false})); + + return std::make_unique(FunctionKind::kDotProduct, + output_type); +} + +static absl::StatusOr> +CreateManhattanDistanceFunction(std::vector& input_types, + const Type* output_type) { + ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType( + input_types, + CheckVectorDistanceInputTypeOptions{.expect_pair_of_vectors = true, + .allow_int64_elements = true, + .allow_struct_elements = false})); + + return std::make_unique( + FunctionKind::kManhattanDistance, output_type); +} + +static absl::StatusOr> +CreateL1NormFunction(std::vector& input_types, + const Type* output_type) { + ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType( + input_types, + CheckVectorDistanceInputTypeOptions{.expect_pair_of_vectors = false, + .allow_int64_elements = true, + .allow_struct_elements = false})); + + return std::make_unique(FunctionKind::kL1Norm, output_type); +} + +static absl::StatusOr> +CreateL2NormFunction(std::vector& input_types, + const Type* output_type) { + ZETASQL_RET_CHECK_OK(CheckVectorDistanceInputType( + input_types, + CheckVectorDistanceInputTypeOptions{.expect_pair_of_vectors = false, + .allow_int64_elements = true, + .allow_struct_elements = false})); + + return std::make_unique(FunctionKind::kL2Norm, output_type); +} + absl::StatusOr> BuiltinScalarFunction::CreateCast( const LanguageOptions& language_options, const Type* output_type, @@ -1872,6 +2070,25 @@ BuiltinScalarFunction::CreateCall( std::move(arguments), error_mode); } +// Verifies the input `arguments` have at most one lambda argument, and return +// it. If there are no lambda arguments return nullptr. +static absl::StatusOr GetLambdaArgumentForArrayZip( + absl::Span> arguments) { + const InlineLambdaExpr* inline_lambda_expr = nullptr; + for (const std::unique_ptr& arg : arguments) { + if (arg->value_expr() == nullptr) { + // If an argument to ARRAY_ZIP is not a value expression, it is a + // lambda. Note that we cannot directly check `inline_lambda_expr() == + // nullptr` because it will crash if the argument is not a lambda. + ZETASQL_RET_CHECK_EQ(inline_lambda_expr, nullptr) + << "Multiple lambda arguments are found for ARRAY_ZIP: " + << inline_lambda_expr->DebugString() << " and " << arg->DebugString(); + inline_lambda_expr = arg->mutable_inline_lambda_expr(); + } + } + return inline_lambda_expr; +} + absl::StatusOr BuiltinScalarFunction::CreateValidatedRaw( FunctionKind kind, const LanguageOptions& language_options, @@ -1931,42 +2148,42 @@ BuiltinScalarFunction::CreateValidatedRaw( case FunctionKind::kBitCastToUint32: case FunctionKind::kBitCastToUint64: return new BitCastFunction(kind, output_type); - case FunctionKind::kLike: { + case FunctionKind::kLike: + case FunctionKind::kLikeWithCollation: { ZETASQL_RETURN_IF_ERROR( ValidateInputTypesSupportEqualityComparison(kind, input_types)); ZETASQL_ASSIGN_OR_RETURN(auto fct, CreateLikeFunction(kind, output_type, arguments)); return fct.release(); } - case FunctionKind::kLikeAny: { - ZETASQL_RETURN_IF_ERROR( - ValidateInputTypesSupportEqualityComparison(kind, input_types)); - ZETASQL_ASSIGN_OR_RETURN(auto fct, - CreateLikeAnyFunction(kind, output_type, arguments)); - return fct.release(); - } - case FunctionKind::kLikeAll: { + case FunctionKind::kLikeAny: + case FunctionKind::kNotLikeAny: + case FunctionKind::kLikeAnyWithCollation: + case FunctionKind::kNotLikeAnyWithCollation: + case FunctionKind::kLikeAll: + case FunctionKind::kNotLikeAll: + case FunctionKind::kLikeAllWithCollation: + case FunctionKind::kNotLikeAllWithCollation: { ZETASQL_RETURN_IF_ERROR( ValidateInputTypesSupportEqualityComparison(kind, input_types)); ZETASQL_ASSIGN_OR_RETURN(auto fct, - CreateLikeAllFunction(kind, output_type, arguments)); + CreateLikeAnyAllFunction(kind, output_type, arguments)); return fct.release(); } case FunctionKind::kLikeAnyArray: - case FunctionKind::kLikeAllArray: { - ZETASQL_RET_CHECK_EQ(arguments.size(), 2); + case FunctionKind::kLikeAnyArrayWithCollation: + case FunctionKind::kNotLikeAnyArray: + case FunctionKind::kNotLikeAnyArrayWithCollation: + case FunctionKind::kLikeAllArray: + case FunctionKind::kLikeAllArrayWithCollation: + case FunctionKind::kNotLikeAllArray: + case FunctionKind::kNotLikeAllArrayWithCollation: { ZETASQL_RETURN_IF_ERROR( ValidateInputTypesSupportEqualityComparison(kind, input_types)); ZETASQL_ASSIGN_OR_RETURN(auto fct, CreateLikeAnyAllArrayFunction( kind, output_type, arguments)); return fct.release(); } - case FunctionKind::kLikeWithCollation: - case FunctionKind::kLikeAnyWithCollation: - case FunctionKind::kLikeAllWithCollation: - ZETASQL_RETURN_IF_ERROR( - ValidateInputTypesSupportEqualityComparison(kind, input_types)); - return BuiltinFunctionRegistry::GetScalarFunction(kind, output_type); case FunctionKind::kBitwiseNot: case FunctionKind::kBitwiseOr: case FunctionKind::kBitwiseXor: @@ -2120,13 +2337,37 @@ BuiltinScalarFunction::CreateValidatedRaw( case FunctionKind::kToJsonString: case FunctionKind::kParseJson: case FunctionKind::kJsonType: + case FunctionKind::kStringArray: + case FunctionKind::kInt32: + case FunctionKind::kInt32Array: case FunctionKind::kInt64: + case FunctionKind::kInt64Array: + case FunctionKind::kUint32: + case FunctionKind::kUint32Array: + case FunctionKind::kUint64: + case FunctionKind::kUint64Array: case FunctionKind::kDouble: + case FunctionKind::kDoubleArray: + case FunctionKind::kFloat: + case FunctionKind::kFloatArray: case FunctionKind::kBool: + case FunctionKind::kBoolArray: case FunctionKind::kLaxBool: + case FunctionKind::kLaxBoolArray: + case FunctionKind::kLaxInt32: + case FunctionKind::kLaxInt32Array: case FunctionKind::kLaxInt64: + case FunctionKind::kLaxInt64Array: + case FunctionKind::kLaxUint32: + case FunctionKind::kLaxUint32Array: + case FunctionKind::kLaxUint64: + case FunctionKind::kLaxUint64Array: case FunctionKind::kLaxDouble: + case FunctionKind::kLaxDoubleArray: + case FunctionKind::kLaxFloat: + case FunctionKind::kLaxFloatArray: case FunctionKind::kLaxString: + case FunctionKind::kLaxStringArray: case FunctionKind::kJsonArray: case FunctionKind::kJsonObject: case FunctionKind::kJsonRemove: @@ -2316,13 +2557,47 @@ BuiltinScalarFunction::CreateValidatedRaw( CreateCosineDistanceFunction(input_types, output_type)); return f.release(); } + case FunctionKind::kApproxCosineDistance: { + return new ApproxCosineDistanceFunction(kind, output_type); + } case FunctionKind::kEuclideanDistance: { ZETASQL_ASSIGN_OR_RETURN( auto f, CreateEuclideanDistanceFunction(input_types, output_type)); return f.release(); } + case FunctionKind::kApproxEuclideanDistance: { + return new ApproxEuclideanDistanceFunction(kind, output_type); + } + case FunctionKind::kDotProduct: { + ZETASQL_ASSIGN_OR_RETURN(auto f, + CreateDotProductFunction(input_types, output_type)); + return f.release(); + } + case FunctionKind::kApproxDotProduct: { + return new ApproxDotProductFunction(kind, output_type); + } + case FunctionKind::kManhattanDistance: { + ZETASQL_ASSIGN_OR_RETURN( + auto f, CreateManhattanDistanceFunction(input_types, output_type)); + return f.release(); + } + case FunctionKind::kL1Norm: { + ZETASQL_ASSIGN_OR_RETURN(auto f, CreateL1NormFunction(input_types, output_type)); + return f.release(); + } + case FunctionKind::kL2Norm: { + ZETASQL_ASSIGN_OR_RETURN(auto f, CreateL2NormFunction(input_types, output_type)); + return f.release(); + } case FunctionKind::kEditDistance: return new EditDistanceFunction(kind, output_type); + case FunctionKind::kArrayZip: { + ZETASQL_ASSIGN_OR_RETURN(const InlineLambdaExpr* inline_lambda_expr, + GetLambdaArgumentForArrayZip(arguments)); + return new ArrayZipFunction(kind, output_type, inline_lambda_expr); + } + case FunctionKind::kMapFromArray: + return BuiltinFunctionRegistry::GetScalarFunction(kind, output_type); default: ZETASQL_RET_CHECK_FAIL() << BuiltinFunctionCatalog::GetDebugNameByKind(kind) << " is not a scalar function"; @@ -2368,43 +2643,32 @@ absl::StatusOr> GetLikePatternRegexp( absl::StatusOr> BuiltinScalarFunction::CreateLikeFunction( FunctionKind kind, const Type* output_type, - const std::vector>& arguments) { + absl::Span> arguments) { ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr regexp, GetLikePatternRegexp(*arguments[1]->value_expr())); - return std::unique_ptr( - new LikeFunction(kind, output_type, std::move(regexp))); -} - -absl::StatusOr> -BuiltinScalarFunction::CreateLikeAnyFunction( - FunctionKind kind, const Type* output_type, - const std::vector>& arguments) { - std::vector> regexp; - for (int i = 1; i < arguments.size(); ++i) { - ZETASQL_ASSIGN_OR_RETURN(regexp.emplace_back(), - GetLikePatternRegexp(*arguments[i]->value_expr())); - } - return std::unique_ptr( - new LikeAnyFunction(kind, output_type, std::move(regexp))); + return std::make_unique(kind, output_type, std::move(regexp)); } absl::StatusOr> -BuiltinScalarFunction::CreateLikeAllFunction( +BuiltinScalarFunction::CreateLikeAnyAllFunction( FunctionKind kind, const Type* output_type, - const std::vector>& arguments) { + absl::Span> arguments) { std::vector> regexp; - for (int i = 1; i < arguments.size(); ++i) { - ZETASQL_ASSIGN_OR_RETURN(regexp.emplace_back(), - GetLikePatternRegexp(*arguments[i]->value_expr())); + if (kind == FunctionKind::kLikeAny || kind == FunctionKind::kNotLikeAny || + kind == FunctionKind::kLikeAll || kind == FunctionKind::kNotLikeAll) { + for (int i = 1; i < arguments.size(); ++i) { + ZETASQL_ASSIGN_OR_RETURN(regexp.emplace_back(), + GetLikePatternRegexp(*arguments[i]->value_expr())); + } } return std::unique_ptr( - new LikeAllFunction(kind, output_type, std::move(regexp))); + new LikeAnyAllFunction(kind, output_type, std::move(regexp))); } absl::StatusOr> BuiltinScalarFunction::CreateLikeAnyAllArrayFunction( FunctionKind kind, const Type* output_type, - const std::vector>& arguments) { + absl::Span> arguments) { std::vector> regexp; // The second argument to this function will be an array. @@ -2426,13 +2690,8 @@ BuiltinScalarFunction::CreateLikeAnyAllArrayFunction( } } - if (kind == FunctionKind::kLikeAnyArray) { - return std::make_unique(kind, output_type, - std::move(regexp)); - } else { - return std::make_unique(kind, output_type, - std::move(regexp)); - } + return std::make_unique(kind, output_type, + std::move(regexp)); } absl::StatusOr> @@ -2497,7 +2756,7 @@ absl::StatusOr> CreateRegexp( absl::StatusOr> BuiltinScalarFunction::CreateRegexpFunction( FunctionKind kind, const Type* output_type, - const std::vector>& arguments) { + absl::Span> arguments) { std::vector input_types; input_types.reserve(arguments.size()); for (const auto& expr : arguments) { @@ -2685,7 +2944,8 @@ absl::StatusOr FormatFunction::Eval( ZETASQL_RETURN_IF_ERROR(functions::StringFormatUtf8( args[0].string_value(), values, context->GetLanguageOptions().product_mode(), &output, - &is_null, true)); + &is_null, true, context->GetLanguageOptions().product_mode() == + PRODUCT_EXTERNAL)); Value value; if (is_null) { value = Value::NullString(); @@ -4186,6 +4446,22 @@ absl::StatusOr ArrayMinMaxFunction::Eval( } return Value::Interval(min_value); } + case FCT(FunctionKind::kArrayMin, TYPE_RANGE): { + Value min_value = Value::Null(output_type()); + for (const Value& element : args[0].elements()) { + if (element.is_null()) { + continue; + } + has_non_null = true; + if (min_value.is_null() || element.LessThan(min_value)) { + min_value = element; + } + } + if (!has_non_null) { + return output_null; + } + return min_value; + } // ARRAY_MAX case FCT(FunctionKind::kArrayMax, TYPE_FLOAT): @@ -4406,13 +4682,29 @@ absl::StatusOr ArrayMinMaxFunction::Eval( } return Value::Interval(max_value); } + case FCT(FunctionKind::kArrayMax, TYPE_RANGE): { + Value max_value = Value::Null(output_type()); + for (const Value& element : args[0].elements()) { + if (element.is_null()) { + continue; + } + has_non_null = true; + if (max_value.is_null() || max_value.LessThan(element)) { + max_value = element; + } + } + if (!has_non_null) { + return output_null; + } + return max_value; + } default: ZETASQL_RET_CHECK_FAIL(); } } static absl::StatusOr AggregateDoubleArraySumValue( - const std::vector& elements, const Type* output_type) { + absl::Span elements, const Type* output_type) { zetasql_base::ExactFloat sum = 0; bool has_non_null = false; for (const Value& element : elements) { @@ -4433,7 +4725,7 @@ static absl::StatusOr AggregateDoubleArraySumValue( } static absl::StatusOr AggregateInt64ArraySumValue( - const std::vector& elements, const Type* output_type) { + absl::Span elements, const Type* output_type) { __int128 sum = 0; bool has_non_null = false; for (const Value& element : elements) { @@ -4454,7 +4746,7 @@ static absl::StatusOr AggregateInt64ArraySumValue( } static absl::StatusOr AggregateUint64ArraySumValue( - const std::vector& elements, const Type* output_type) { + absl::Span elements, const Type* output_type) { unsigned __int128 sum = 0; bool has_non_null = false; for (const Value& element : elements) { @@ -4474,7 +4766,7 @@ static absl::StatusOr AggregateUint64ArraySumValue( } static absl::StatusOr AggregateNumericArraySumValue( - const std::vector& elements, const Type* output_type) { + absl::Span elements, const Type* output_type) { bool has_non_null = false; NumericValue::SumAggregator aggregator = NumericValue::SumAggregator(); for (const Value& element : elements) { @@ -4494,7 +4786,7 @@ static absl::StatusOr AggregateNumericArraySumValue( } static absl::StatusOr AggregateBigNumericArraySumValue( - const std::vector& elements, const Type* output_type) { + absl::Span elements, const Type* output_type) { bool has_non_null = false; BigNumericValue::SumAggregator aggregator = BigNumericValue::SumAggregator(); for (const Value& element : elements) { @@ -4514,7 +4806,7 @@ static absl::StatusOr AggregateBigNumericArraySumValue( } static absl::StatusOr AggregateIntervalArraySumValue( - const std::vector& elements, const Type* output_type) { + absl::Span elements, const Type* output_type) { IntervalValue sum; bool has_non_null = false; IntervalValue::SumAggregator aggregator = IntervalValue::SumAggregator(); @@ -4550,7 +4842,7 @@ absl::Status ValidateNoDoubleOverflow(long double value) { // When the input array contains +/-inf, ARRAY_AVG might return +/-inf or NaN // which means the output cannot be validated by approximate comparison. static absl::StatusOr AggregateDoubleArrayAvgValue( - const std::vector& elements) { + absl::Span elements) { long double avg = 0; int64_t non_null_count = 0; for (const Value& element : elements) { @@ -4580,7 +4872,7 @@ static absl::StatusOr AggregateDoubleArrayAvgValue( } static absl::StatusOr AggregateNumericArrayAvgValue( - const std::vector& elements) { + absl::Span elements) { int64_t non_null_count = 0; NumericValue::SumAggregator aggregator = NumericValue::SumAggregator(); for (const Value& element : elements) { @@ -4598,7 +4890,7 @@ static absl::StatusOr AggregateNumericArrayAvgValue( } static absl::StatusOr AggregateBigNumericArrayAvgValue( - const std::vector& elements) { + absl::Span elements) { int64_t non_null_count = 0; BigNumericValue::SumAggregator aggregator = BigNumericValue::SumAggregator(); for (const Value& element : elements) { @@ -4618,7 +4910,7 @@ static absl::StatusOr AggregateBigNumericArrayAvgValue( } static absl::StatusOr AggregateIntervalArrayAvgValue( - const std::vector& elements, EvaluationContext* context) { + absl::Span elements, EvaluationContext* context) { int64_t non_null_count = 0; IntervalValue::SumAggregator aggregator = IntervalValue::SumAggregator(); for (const Value& element : elements) { @@ -5114,9 +5406,33 @@ class BuiltinAggregateAccumulator : public AggregateAccumulator { NumericValue out_numeric_; // Min, Max BigNumericValue out_bignumeric_; // Min, Max Value out_range_; // Min, Max - NumericValue::SumAggregator numeric_aggregator_; // Avg, Sum - BigNumericValue::SumAggregator bignumeric_aggregator_; // Avg, Sum - IntervalValue::SumAggregator interval_aggregator_; // Sum + NumericValue::SumAggregator numeric_aggregator_; // Avg, Sum + BigNumericValue::SumAggregator bignumeric_aggregator_; // Avg, Sum + IntervalValue::SumAggregator interval_aggregator_; // Sum + + // Elementwise aggregates (Elementwise_sum, Elementwise_avg) + std::vector out_double_array_; + std::vector out_exact_float_array_; + std::vector<__int128> out_int128_array_; + std::vector out_uint128_array_; + std::vector out_numeric_array_; + std::vector out_bignumeric_array_; + std::vector out_range_array_; + std::vector numeric_aggregator_array_; + std::vector bignumeric_aggregator_array_; + std::vector interval_aggregator_array_; + // The size of the array argument to an elementwise aggregate. Initially set + // to invalid value, and overwritten when the first non NULL array is + // encountered. It will be a runtime error if all the array sizes within + // a group of rows are not the same size. + int array_length_ = -1; + // This tracks the count of non-null elements encountered at a particular + // index during accumulation. This will be important in `GetFinalResult` + // as it will signal which indices of the output array should return + // NULL (i.e., when all elements at index `i` are NULL), as well as serving + // as the denominator for average computations. + std::vector elementwise_non_null_element_tracker_; + NumericValue::VarianceAggregator numeric_variance_aggregator_; // Var, Stddev BigNumericValue::VarianceAggregator bignumeric_variance_aggregator_; // Var, Stddev @@ -5262,6 +5578,7 @@ class UserDefinedAggregateFunction : public AggregateFunctionBody { // This should never happen because we already check for null evaluator // in the algebrizer. ZETASQL_RET_CHECK(evaluator != nullptr); + evaluator->SetEvaluationContext(context); return UserDefinedAggregateAccumulator::Create( std::move(evaluator), output_type_, num_input_fields()); } @@ -5293,7 +5610,7 @@ static absl::Status SetAnonBuilderEpsilon(const Value& arg, T* builder) { } template -static absl::Status InitializeAnonBuilder(const std::vector& args, +static absl::Status InitializeAnonBuilder(absl::Span args, T* builder) { // The last two args represent 'delta' and 'epsilon'. If clamping // bounds are explicitly set, then there will be two additional args @@ -5324,7 +5641,7 @@ static absl::Status InitializeAnonBuilder(const std::vector& args, template <> absl::Status InitializeAnonBuilder<::differential_privacy::Quantiles::Builder>( - const std::vector& args, + absl::Span args, ::differential_privacy::Quantiles::Builder* builder) { // The current implementation always expects 5 arguments (until b/205277450 is // fixed and optional clamping bounds is supported): @@ -5371,8 +5688,8 @@ InitializeAnonBuilder<::differential_privacy::Quantiles::Builder>( // TODO: Plumb the value through rather than relying on the // constant. template -absl::Status InitializeAnonBuilderForArrayFunction( - const std::vector& args, T* builder) { +absl::Status InitializeAnonBuilderForArrayFunction(absl::Span args, + T* builder) { builder->SetMaxContributionsPerPartition( anonymization::kPerUserArrayAggLimit); return InitializeAnonBuilder(args, builder); @@ -5716,6 +6033,23 @@ absl::Status BuiltinAggregateAccumulator::Reset() { interval_aggregator_ = IntervalValue::SumAggregator(); break; + // Elementwise_sum and Elementwise_avg + case FCT(FunctionKind::kElementwiseSum, TYPE_ARRAY): + case FCT(FunctionKind::kElementwiseAvg, TYPE_ARRAY): + array_length_ = -1; + elementwise_non_null_element_tracker_.clear(); + out_double_array_.clear(); + out_exact_float_array_.clear(); + out_int128_array_.clear(); + out_uint128_array_.clear(); + out_numeric_array_.clear(); + out_bignumeric_array_.clear(); + out_range_array_.clear(); + numeric_aggregator_array_.clear(); + bignumeric_aggregator_array_.clear(); + interval_aggregator_array_.clear(); + break; + // Variance and standard deviation. case FCT(FunctionKind::kStddevPop, TYPE_DOUBLE): case FCT(FunctionKind::kStddevSamp, TYPE_DOUBLE): @@ -5853,7 +6187,7 @@ absl::Status BuiltinAggregateAccumulator::Reset() { } return absl::OkStatus(); -} +} // NOLINT(readability/fn_size) bool BuiltinAggregateAccumulator::Accumulate(const Value& value, bool* stop_accumulation, @@ -6284,6 +6618,190 @@ bool BuiltinAggregateAccumulator::Accumulate(const Value& value, interval_aggregator_.Add(value.interval_value()); break; } + case FCT(FunctionKind::kElementwiseSum, TYPE_ARRAY): { + const Value& array = value; + if (array.is_null()) { + break; + } + // Array is non null, validate its length + if (array_length_ != -1 && array_length_ != array.num_elements()) { + *status = ::zetasql_base::InvalidArgumentErrorBuilder() + << "Elementwise aggregate requires all non-NULL arrays have " + "the same length; found length " + << array_length_ << " and length " << array.num_elements() + << " in the same group"; + return false; + } + // If we encounter any non-order preserving array during accumulation, + // then we should mark the non-determinism bit. + MaybeSetNonDeterministicArrayOutput(array, context_); + if (array_length_ == -1) { + switch (input_type_->AsArray()->element_type()->kind()) { + case TYPE_FLOAT: + case TYPE_DOUBLE: + out_exact_float_array_.resize(array.num_elements(), 0); + break; + case TYPE_INT32: + case TYPE_INT64: + out_int128_array_.resize(array.num_elements(), 0); + break; + case TYPE_UINT32: + case TYPE_UINT64: + out_uint128_array_.resize(array.num_elements(), 0); + break; + case TYPE_NUMERIC: + numeric_aggregator_array_.resize(array.num_elements(), + NumericValue::SumAggregator()); + break; + case TYPE_BIGNUMERIC: + bignumeric_aggregator_array_.resize( + array.num_elements(), BigNumericValue::SumAggregator()); + break; + case TYPE_INTERVAL: + interval_aggregator_array_.resize(array.num_elements(), + IntervalValue::SumAggregator()); + break; + default: + *status = ::zetasql_base::InternalErrorBuilder() + << "Unsupported element type " + << input_type_->AsArray()->element_type()->DebugString() + << " in ELEMENTWISE_SUM"; + return false; + } + array_length_ = array.num_elements(); + elementwise_non_null_element_tracker_.resize(array.num_elements(), 0); + } + for (int i = 0; i < array.num_elements(); ++i) { + const Value& element = array.element(i); + if (element.is_null()) { + continue; + } + elementwise_non_null_element_tracker_[i]++; + switch (input_type_->AsArray()->element_type()->kind()) { + case TYPE_FLOAT: + out_exact_float_array_[i] += element.float_value(); + break; + case TYPE_DOUBLE: + out_exact_float_array_[i] += element.double_value(); + break; + case TYPE_INT32: + out_int128_array_[i] += element.int32_value(); + break; + case TYPE_INT64: + out_int128_array_[i] += element.int64_value(); + break; + case TYPE_UINT32: + out_uint128_array_[i] += element.uint32_value(); + break; + case TYPE_UINT64: + out_uint128_array_[i] += element.uint64_value(); + break; + case TYPE_NUMERIC: + numeric_aggregator_array_[i].Add(element.numeric_value()); + break; + case TYPE_BIGNUMERIC: + bignumeric_aggregator_array_[i].Add(element.bignumeric_value()); + break; + case TYPE_INTERVAL: + interval_aggregator_array_[i].Add(element.interval_value()); + break; + default: + break; + } + } + break; + } + case FCT(FunctionKind::kElementwiseAvg, TYPE_ARRAY): { + const Value& array = value; + if (array.is_null()) { + break; + } + // Array is non null, validate its length + if (array_length_ != -1 && array_length_ != array.num_elements()) { + *status = ::zetasql_base::InvalidArgumentErrorBuilder() + << "Elementwise aggregate requires all non-NULL arrays have " + "the same length; found length " + << array_length_ << " and length " << array.num_elements() + << " in the same group"; + return false; + } + // If we encounter any non order preserving array during accumulation, + // then we should mark the non-determinism bit. + MaybeSetNonDeterministicArrayOutput(array, context_); + if (array_length_ == -1) { + switch (input_type_->AsArray()->element_type()->kind()) { + case TYPE_INT32: + case TYPE_UINT32: + case TYPE_INT64: + case TYPE_UINT64: + case TYPE_FLOAT: + case TYPE_DOUBLE: + out_double_array_.resize(array.num_elements(), 0); + break; + case TYPE_NUMERIC: + numeric_aggregator_array_.resize(array.num_elements(), + NumericValue::SumAggregator()); + break; + case TYPE_BIGNUMERIC: + bignumeric_aggregator_array_.resize( + array.num_elements(), BigNumericValue::SumAggregator()); + break; + case TYPE_INTERVAL: + interval_aggregator_array_.resize(array.num_elements(), + IntervalValue::SumAggregator()); + break; + default: + *status = ::zetasql_base::InternalErrorBuilder() + << "Unsupported element type " + << input_type_->AsArray()->element_type()->DebugString() + << " in ELEMENTWISE_AVG"; + return false; + } + array_length_ = array.num_elements(); + elementwise_non_null_element_tracker_.resize(array.num_elements(), 0); + } + for (int i = 0; i < array.num_elements(); ++i) { + const Value& element = array.element(i); + if (element.is_null()) { + continue; + } + elementwise_non_null_element_tracker_[i]++; + switch (input_type_->AsArray()->element_type()->kind()) { + case TYPE_INT32: + case TYPE_INT64: + case TYPE_UINT32: + case TYPE_UINT64: + case TYPE_FLOAT: + case TYPE_DOUBLE: + // Iterative algorithm to calculate the average that is less likely + // to overflow in the common case where there are lots of values of + // similar magnitude. + // Note that this mimics the algorithm used by AVG. + long double delta; + if (!functions::Subtract((long double)element.ToDouble(), + out_double_array_[i], &delta, status) || + !functions::Add( + out_double_array_[i], + delta / elementwise_non_null_element_tracker_[i], + &out_double_array_[i], status)) { + return false; + } + break; + case TYPE_NUMERIC: + numeric_aggregator_array_[i].Add(element.numeric_value()); + break; + case TYPE_BIGNUMERIC: + bignumeric_aggregator_array_[i].Add(element.bignumeric_value()); + break; + case TYPE_INTERVAL: + interval_aggregator_array_[i].Add(element.interval_value()); + break; + default: + break; + } + } + break; + } case FCT(FunctionKind::kStringAgg, TYPE_STRING): { if (count_ > 1) { additional_bytes_to_request = delimiter_.size(); @@ -6395,8 +6913,8 @@ absl::StatusOr BuiltinAggregateAccumulator::GetFinalResult( } template -absl::StatusOr ComputePercentileCont( - const std::vector& values_arg, T percentile, bool ignore_nulls) { +absl::StatusOr ComputePercentileCont(absl::Span values_arg, + T percentile, bool ignore_nulls) { ZETASQL_ASSIGN_OR_RETURN(PercentileEvaluator percentile_evalutor, PercentileEvaluator::Create(percentile)); @@ -6429,7 +6947,7 @@ template absl::StatusOr ComputePercentileDisc( const PercentileEvaluator& percentile_evalutor, - const std::vector& values_arg, const Type* type, + absl::Span values_arg, const Type* type, V (Value::*extract_value_fn)() const /* e.g., &Value::double_value */, const ValueCreationFn& value_creation_fn /* e.g., &Value::Double */, bool ignore_nulls, const zetasql::ZetaSqlCollator* collator = nullptr) { @@ -6464,7 +6982,7 @@ absl::StatusOr ComputePercentileDisc( template absl::StatusOr ComputePercentileDisc( - const std::vector& values_arg, const Type* type, + absl::Span values_arg, const Type* type, PercentileType percentile, bool ignore_nulls, const zetasql::ZetaSqlCollator* collator) { ZETASQL_ASSIGN_OR_RETURN(PercentileEvaluator percentile_evalutor, @@ -7096,6 +7614,224 @@ absl::StatusOr BuiltinAggregateAccumulator::GetFinalResultInternal( ZETASQL_ASSIGN_OR_RETURN(out_interval_, interval_aggregator_.GetSum()); return Value::Interval(out_interval_); } + case FCT(FunctionKind::kElementwiseSum, TYPE_ARRAY): { + switch (input_type_->AsArray()->element_type()->kind()) { + case TYPE_INT32: + case TYPE_INT64: { + if (count_ == 0) { + return Value::Null(types::Int64ArrayType()); + } + std::vector int64_vals; + for (int i = 0; i < out_int128_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + if (out_int128_array_[i] > std::numeric_limits::max() || + out_int128_array_[i] < std::numeric_limits::min()) { + return ::zetasql_base::OutOfRangeErrorBuilder() << "int64 overflow"; + } + int64_vals.push_back( + Value::Int64(static_cast(out_int128_array_[i]))); + } else { + int64_vals.push_back(Value::NullInt64()); + } + } + return Value::MakeArray(types::Int64ArrayType(), + std::move(int64_vals)); + } + case TYPE_UINT32: + case TYPE_UINT64: { + if (count_ == 0) { + return Value::Null(types::Uint64ArrayType()); + } + std::vector uint64_vals; + for (int i = 0; i < out_uint128_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + if (out_uint128_array_[i] > + std::numeric_limits::max()) { + return ::zetasql_base::OutOfRangeErrorBuilder() << "uint64 overflow"; + } + uint64_vals.push_back( + Value::Uint64(static_cast(out_uint128_array_[i]))); + } else { + uint64_vals.push_back(Value::NullUint64()); + } + } + return Value::MakeArray(types::Uint64ArrayType(), + std::move(uint64_vals)); + } + case TYPE_FLOAT: + case TYPE_DOUBLE: { + if (count_ == 0) { + return Value::Null(types::DoubleArrayType()); + } + std::vector double_vals; + for (int i = 0; i < out_exact_float_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + if (out_exact_float_array_[i].is_finite() && + (out_exact_float_array_[i] > + std::numeric_limits::max() || + out_exact_float_array_[i] < + -std::numeric_limits::max())) { + return ::zetasql_base::OutOfRangeErrorBuilder() << "double overflow"; + } + double_vals.push_back( + Value::Double(out_exact_float_array_[i].ToDouble())); + } else { + double_vals.push_back(Value::NullDouble()); + } + } + return Value::MakeArray(types::DoubleArrayType(), + std::move(double_vals)); + } + case TYPE_NUMERIC: { + if (count_ == 0) { + return Value::Null(types::NumericArrayType()); + } + std::vector numeric_vals; + for (int i = 0; i < numeric_aggregator_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + ZETASQL_ASSIGN_OR_RETURN(NumericValue val, + numeric_aggregator_array_[i].GetSum()); + numeric_vals.push_back(Value::Numeric(val)); + } else { + numeric_vals.push_back(Value::NullNumeric()); + } + } + return Value::MakeArray(types::NumericArrayType(), + std::move(numeric_vals)); + } + case TYPE_BIGNUMERIC: { + if (count_ == 0) { + return Value::Null(types::BigNumericArrayType()); + } + std::vector bignumeric_vals; + for (int i = 0; i < bignumeric_aggregator_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + ZETASQL_ASSIGN_OR_RETURN(BigNumericValue val, + bignumeric_aggregator_array_[i].GetSum()); + bignumeric_vals.push_back(Value::BigNumeric(val)); + } else { + bignumeric_vals.push_back(Value::NullBigNumeric()); + } + } + return Value::MakeArray(types::BigNumericArrayType(), + std::move(bignumeric_vals)); + } + case TYPE_INTERVAL: { + if (count_ == 0) { + return Value::Null(types::IntervalArrayType()); + } + std::vector interval_vals; + for (int i = 0; i < interval_aggregator_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + ZETASQL_ASSIGN_OR_RETURN(IntervalValue val, + interval_aggregator_array_[i].GetSum()); + interval_vals.push_back(Value::Interval(val)); + } else { + interval_vals.push_back(Value::NullInterval()); + } + } + return Value::MakeArray(types::IntervalArrayType(), + std::move(interval_vals)); + } + default: { + return ::zetasql_base::InternalErrorBuilder() + << "Unsupported element type " + << input_type_->AsArray()->element_type()->DebugString() + << " in ELEMENTWISE_SUM"; + } + } + } + case FCT(FunctionKind::kElementwiseAvg, TYPE_ARRAY): { + switch (input_type_->AsArray()->element_type()->kind()) { + case TYPE_INT32: + case TYPE_INT64: + case TYPE_UINT32: + case TYPE_UINT64: + case TYPE_FLOAT: + case TYPE_DOUBLE: { + ZETASQL_RET_CHECK_GE(count_, 0); + if (count_ < 0) { + return Value::Null(types::DoubleArrayType()); + } + std::vector double_vals; + for (int i = 0; i < out_double_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + ZETASQL_RET_CHECK_OK(ValidateNoDoubleOverflow(out_double_array_[i])); + double_vals.push_back(Value::Double(out_double_array_[i])); + } else { + double_vals.push_back(Value::NullDouble()); + } + } + return count_ > 0 ? Value::MakeArray(types::DoubleArrayType(), + std::move(double_vals)) + : Value::Null(types::DoubleArrayType()); + } + case TYPE_NUMERIC: { + if (count_ == 0) { + return Value::Null(types::NumericArrayType()); + } + std::vector numeric_vals; + for (int i = 0; i < numeric_aggregator_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + ZETASQL_ASSIGN_OR_RETURN(NumericValue val, + numeric_aggregator_array_[i].GetAverage( + elementwise_non_null_element_tracker_[i])); + numeric_vals.push_back(Value::Numeric(val)); + } else { + numeric_vals.push_back(Value::NullNumeric()); + } + } + return Value::MakeArray(types::NumericArrayType(), + std::move(numeric_vals)); + } + case TYPE_BIGNUMERIC: { + if (count_ == 0) { + return Value::Null(types::BigNumericArrayType()); + } + std::vector bignumeric_vals; + for (int i = 0; i < bignumeric_aggregator_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + ZETASQL_ASSIGN_OR_RETURN(BigNumericValue val, + bignumeric_aggregator_array_[i].GetAverage( + elementwise_non_null_element_tracker_[i])); + bignumeric_vals.push_back(Value::BigNumeric(val)); + } else { + bignumeric_vals.push_back(Value::NullBigNumeric()); + } + } + return Value::MakeArray(types::BigNumericArrayType(), + std::move(bignumeric_vals)); + } + case TYPE_INTERVAL: { + if (count_ == 0) { + return Value::Null(types::IntervalArrayType()); + } + bool round_to_micros = + GetTimestampScale(context_->GetLanguageOptions()) == + functions::TimestampScale::kMicroseconds; + std::vector interval_vals; + for (int i = 0; i < interval_aggregator_array_.size(); ++i) { + if (elementwise_non_null_element_tracker_[i]) { + ZETASQL_ASSIGN_OR_RETURN(IntervalValue val, + interval_aggregator_array_[i].GetAverage( + elementwise_non_null_element_tracker_[i], + round_to_micros)); + interval_vals.push_back(Value::Interval(val)); + } else { + interval_vals.push_back(Value::NullInterval()); + } + } + return Value::MakeArray(types::IntervalArrayType(), + std::move(interval_vals)); + } + default: { + return ::zetasql_base::InternalErrorBuilder() + << "Unsupported element type " + << input_type_->AsArray()->element_type()->DebugString() + << " in ELEMENTWISE_AVG"; + } + } + } // Variance and Stddev case FCT(FunctionKind::kStddevPop, TYPE_DOUBLE): { if (count_ == 0) return Value::NullDouble(); @@ -7632,28 +8368,6 @@ BinaryStatFunction::CreateAccumulator(absl::Span args, } namespace { -absl::StatusOr LikeImpl(const Value& lhs, const Value& rhs, - const RE2* regexp) { - if (lhs.is_null() || rhs.is_null()) { - return Value::Null(types::BoolType()); - } - - const std::string& text = - lhs.type_kind() == TYPE_STRING ? lhs.string_value() : lhs.bytes_value(); - - if (regexp != nullptr) { - // Regexp is precompiled - return Value::Bool(RE2::FullMatch(text, *regexp)); - } else { - // Regexp is not precompiled, compile it on the fly - const std::string& pattern = - rhs.type_kind() == TYPE_STRING ? rhs.string_value() : rhs.bytes_value(); - std::unique_ptr regexp; - ZETASQL_RETURN_IF_ERROR( - functions::CreateLikeRegexp(pattern, lhs.type_kind(), ®exp)); - return Value::Bool(RE2::FullMatch(text, *regexp)); - } -} bool IsTrue(const Value& value) { return !value.is_null() && value.bool_value(); @@ -7663,133 +8377,144 @@ bool IsFalse(const Value& value) { return !value.is_null() && !value.bool_value(); } +absl::StatusOr +GetQuantifiedLikeOperationType(FunctionKind kind) { + switch (kind) { + case FunctionKind::kLike: + case FunctionKind::kLikeWithCollation: + return QuantifiedLikeEvaluationParams::OperationType::kLike; + case FunctionKind::kLikeAny: + case FunctionKind::kNotLikeAny: + case FunctionKind::kLikeAnyWithCollation: + case FunctionKind::kNotLikeAnyWithCollation: + case FunctionKind::kLikeAnyArray: + case FunctionKind::kLikeAnyArrayWithCollation: + case FunctionKind::kNotLikeAnyArray: + case FunctionKind::kNotLikeAnyArrayWithCollation: + return QuantifiedLikeEvaluationParams::OperationType::kLikeAny; + case FunctionKind::kLikeAll: + case FunctionKind::kNotLikeAll: + case FunctionKind::kLikeAllWithCollation: + case FunctionKind::kNotLikeAllWithCollation: + case FunctionKind::kLikeAllArray: + case FunctionKind::kLikeAllArrayWithCollation: + case FunctionKind::kNotLikeAllArray: + case FunctionKind::kNotLikeAllArrayWithCollation: + return QuantifiedLikeEvaluationParams::OperationType::kLikeAll; + default: + return ::zetasql_base::InvalidArgumentErrorBuilder() + << "Expected some variant of like function. Found: " + << static_cast(kind); + } +} + } // namespace absl::StatusOr LikeFunction::Eval( absl::Span params, absl::Span args, EvaluationContext* context) const { - ABSL_CHECK_EQ(2, args.size()); - return LikeImpl(args[0], args[1], regexp_.get()); -} - -absl::StatusOr LikeAnyFunction::Eval( - absl::Span params, absl::Span args, - EvaluationContext* context) const { - ABSL_CHECK_LE(1, args.size()); - ABSL_CHECK_EQ(regexp_.size(), args.size() - 1); - - if (args[0].is_null()) { - return Value::Null(output_type()); + ZETASQL_ASSIGN_OR_RETURN(QuantifiedLikeEvaluationParams::OperationType operation_type, + GetQuantifiedLikeOperationType(kind())); + if (has_collation_) { + ZETASQL_RET_CHECK_GE(args.size(), 3) + << "LIKE with collation has 3 or more arguments"; + QuantifiedLikeEvaluationParams quantified_like_eval_params( + /*search_value=*/args[1], + /*pattern_elements=*/args.subspan(2), + /*operation_type=*/operation_type, + /*is_not=*/false, + /*collation_str=*/args[0].string_value()); + return EvaluateQuantifiedLike(quantified_like_eval_params); } - Value result = Value::Bool(false); - - for (int i = 1; i < args.size(); ++i) { - ZETASQL_ASSIGN_OR_RETURN(Value local_result, - LikeImpl(args[0], args[i], regexp_[i - 1].get())); - if (IsTrue(local_result)) { - return local_result; - } else if (!IsTrue(result) && !IsFalse(local_result)) { - result = local_result; - } - } - return result; + ABSL_CHECK_EQ(2, args.size()); + QuantifiedLikeEvaluationParams quantified_like_eval_params( + /*search_value=*/args[0], + /*pattern_elements=*/args.subspan(1), + /*pattern_regex=*/®exp_, + /*operation_type=*/QuantifiedLikeEvaluationParams::kLike, + /*is_not=*/false); + return EvaluateQuantifiedLike(quantified_like_eval_params); } -absl::StatusOr LikeAllFunction::Eval( +absl::StatusOr LikeAnyAllFunction::Eval( absl::Span params, absl::Span args, EvaluationContext* context) const { - ABSL_CHECK_LE(1, args.size()); - ABSL_CHECK_EQ(regexp_.size(), args.size() - 1); - - if (args[0].is_null()) { - return Value::Null(output_type()); - } - - Value result = Value::Bool(true); - - for (int i = 1; i < args.size(); ++i) { - ZETASQL_ASSIGN_OR_RETURN(Value local_result, - LikeImpl(args[0], args[i], regexp_[i - 1].get())); - if (!IsFalse(result) && !IsTrue(local_result)) { - result = local_result; - } - } - return result; -} - -absl::StatusOr LikeAnyArrayFunction::Eval( + ZETASQL_ASSIGN_OR_RETURN(QuantifiedLikeEvaluationParams::OperationType operation_type, + GetQuantifiedLikeOperationType(kind())); + if (has_collation_) { + ZETASQL_RET_CHECK_GE(args.size(), 3) + << "[NOT] LIKE ANY|ALL with collation 3 or more arguments"; + QuantifiedLikeEvaluationParams quantified_like_eval_params( + /*search_value=*/args[1], + /*pattern_elements=*/args.subspan(2), + /*operation_type=*/operation_type, + /*is_not=*/is_not_, + /*collation_str=*/args[0].string_value()); + return EvaluateQuantifiedLike(quantified_like_eval_params); + } + + ZETASQL_RET_CHECK_LE(1, args.size()); + ZETASQL_RET_CHECK_EQ(regexp_.size(), args.size() - 1); + QuantifiedLikeEvaluationParams quantified_like_eval_params( + /*search_value=*/args[0], + /*pattern_elements=*/args.subspan(1), + /*pattern_regex=*/®exp_, + /*operation_type=*/operation_type, + /*is_not=*/is_not_); + return EvaluateQuantifiedLike(quantified_like_eval_params); +} + +absl::StatusOr LikeAnyAllArrayFunction::Eval( absl::Span params, absl::Span args, EvaluationContext* context) const { - ZETASQL_RET_CHECK_EQ(args.size(), 2) - << "LIKE ANY with UNNEST has exactly 2 arguments"; - - // Return FALSE if the patterns array is NULL or empty and NULL if the search - // input is NULL - if (args[1].is_null() || args[1].is_empty_array()) { - return Value::Bool(false); - } - if (args[0].is_null()) { - return Value::Null(output_type()); + const Value* search_value; + const Value* pattern_elements; + Value collation_str = Value::String(""); + if (has_collation_) { + ZETASQL_RET_CHECK_EQ(args.size(), 3) + << "[NOT] LIKE ANY with UNNEST and collation has exactly 3 arguments"; + collation_str = args[0]; + search_value = &args[1]; + pattern_elements = &args[2]; + } else { + ZETASQL_RET_CHECK_EQ(args.size(), 2) + << "[NOT] LIKE ANY with UNNEST has exactly 2 arguments"; + search_value = &args[0]; + pattern_elements = &args[1]; } - // For cases with the rhs is a subquery expression creating an ARRAY, the - // number of regexps will be less than the number of elements and the regexp - // for each element will be generated during execution - ZETASQL_RET_CHECK_LE(regexp_.size(), args[1].num_elements()) - << "The number of regular expressions should be less than or equal to" - "the number of arguments in the pattern list"; - - Value result = Value::Bool(false); + ZETASQL_ASSIGN_OR_RETURN(QuantifiedLikeEvaluationParams::OperationType operation_type, + GetQuantifiedLikeOperationType(kind())); - for (int i = 0; i < args[1].num_elements(); ++i) { - const RE2* current_regexp = i < regexp_.size() ? regexp_[i].get() : nullptr; - ZETASQL_ASSIGN_OR_RETURN(Value local_result, - LikeImpl(args[0], args[1].element(i), current_regexp)); - if (IsTrue(local_result)) { - return local_result; - } else if (!IsTrue(result) && !IsFalse(local_result)) { - result = local_result; + // If the patterns array is NULL or empty then short circuit and return - + // FALSE for like any + // TRUE for like all + if (pattern_elements->is_null() || pattern_elements->is_empty_array()) { + if (operation_type == + QuantifiedLikeEvaluationParams::OperationType::kLikeAny) { + return Value::Bool(false); + } else if (operation_type == + QuantifiedLikeEvaluationParams::OperationType::kLikeAll) { + return Value::Bool(true); } } - return result; -} -absl::StatusOr LikeAllArrayFunction::Eval( - absl::Span params, absl::Span args, - EvaluationContext* context) const { - ZETASQL_RET_CHECK_EQ(args.size(), 2) - << "LIKE ANY with UNNEST has exactly 2 arguments"; - - // Return TRUE if the patterns array is NULL or empty and NULL if the search - // input is NULL - if (args[1].is_null() || args[1].is_empty_array()) { - return Value::Bool(true); - } - if (args[0].is_null()) { + // Short circuit if the search value is NULL. + if (search_value->is_null()) { return Value::Null(output_type()); } - // For cases with the rhs is a subquery expression creating an ARRAY, the - // number of regexps will be less than the number of elements and the regexp - // for each element will be generated during execution - ZETASQL_RET_CHECK_LE(regexp_.size(), args[1].num_elements()) - << "The number of regular expressions should be less than or equal to" - "the number of arguments in the pattern list"; - - Value result = Value::Bool(true); - - for (int i = 0; i < args[1].num_elements(); ++i) { - // If there is not a precomputed regexp for a pattern, then a nullptr can - // be passed to LikeImpl() to compute the regexp during execution - const RE2* current_regexp = i < regexp_.size() ? regexp_[i].get() : nullptr; - ZETASQL_ASSIGN_OR_RETURN(Value local_result, - LikeImpl(args[0], args[1].element(i), current_regexp)); - if (!IsFalse(result) && !IsTrue(local_result)) { - result = local_result; - } + if (has_collation_) { + QuantifiedLikeEvaluationParams quantified_like_eval_params( + *search_value, pattern_elements->elements(), operation_type, is_not_, + collation_str.string_value()); + return EvaluateQuantifiedLike(quantified_like_eval_params); } - return result; + QuantifiedLikeEvaluationParams quantified_like_eval_params( + *search_value, pattern_elements->elements(), ®exp_, operation_type, + is_not_); + return EvaluateQuantifiedLike(quantified_like_eval_params); } bool BitwiseFunction::Eval(absl::Span params, @@ -10206,11 +10931,15 @@ absl::StatusOr IntervalFunction::Eval( IntervalValue interval; switch (kind()) { case FunctionKind::kIntervalCtor: { + functions::TimestampScale scale = + GetTimestampScale(context->GetLanguageOptions()); + bool allow_nanos = scale == functions::TimestampScale::kNanoseconds; ZETASQL_ASSIGN_OR_RETURN( interval, IntervalValue::FromInteger( args[0].int64_value(), - static_cast(args[1].enum_value()))); + static_cast(args[1].enum_value()), + allow_nanos)); break; } case FunctionKind::kMakeInterval: { @@ -10298,7 +11027,8 @@ bool UserDefinedScalarFunction::Eval(absl::Span params, absl::Span args, EvaluationContext* context, Value* result, absl::Status* status) const { - auto status_or_result = evaluator_(args); + ZETASQL_DCHECK_OK(*status); + auto status_or_result = evaluator_(args, *context); if (!status_or_result.ok()) { *status = status_or_result.status(); return false; @@ -10814,7 +11544,7 @@ absl::Status NthValueFunction::Eval( // Returns the value at 'offset' in 'arg_values' if the offset is within // the bound, otherwise returns 'default_value'. -static Value GetOutputAtOffset(int offset, const std::vector& arg_values, +static Value GetOutputAtOffset(int offset, absl::Span arg_values, const Value& default_value) { ABSL_DCHECK(!arg_values.empty()); if (offset < 0 || offset >= arg_values.size()) { @@ -11328,6 +12058,27 @@ absl::StatusOr CosineDistanceFunctionSparseStringKey::Eval( return result; } +absl::StatusOr ApproxCosineDistanceFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_GE(args.size(), 2); + ZETASQL_RET_CHECK_LE(args.size(), 3); + if (args.size() == 3) { + return ::zetasql_base::InvalidArgumentErrorBuilder() + << "Optional argument `options` is not supported by the ZetaSQL " + << "reference implementation."; + } + if (HasNulls(args.subspan(0, 2))) { + return Value::Null(output_type()); + } + // Approximate distance functions are nondeterministic. + context->SetNonDeterministicOutput(); + ZETASQL_ASSIGN_OR_RETURN(Value result, + functions::CosineDistanceDense(args[0], args[1]), + _.With(&DistanceFunctionResultConverter)); + return result; +} + absl::StatusOr EuclideanDistanceFunctionDense::Eval( absl::Span params, absl::Span args, EvaluationContext* context) const { @@ -11368,6 +12119,97 @@ absl::StatusOr EuclideanDistanceFunctionSparseStringKey::Eval( return result; } +absl::StatusOr ApproxEuclideanDistanceFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_GE(args.size(), 2); + ZETASQL_RET_CHECK_LE(args.size(), 3); + if (args.size() == 3) { + return ::zetasql_base::InvalidArgumentErrorBuilder() + << "Optional argument `options` is not supported by the ZetaSQL " + << "reference implementation."; + } + if (HasNulls(args.subspan(0, 2))) { + return Value::Null(output_type()); + } + // Approximate distance functions are nondeterministic. + context->SetNonDeterministicOutput(); + ZETASQL_ASSIGN_OR_RETURN(Value result, + functions::EuclideanDistanceDense(args[0], args[1]), + _.With(&DistanceFunctionResultConverter)); + return result; +} + +absl::StatusOr DotProductFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_EQ(args.size(), 2); + if (HasNulls(args)) { + return Value::Null(output_type()); + } + for (const Value& arg : args) { + MaybeSetNonDeterministicArrayOutput(arg, context); + } + ZETASQL_ASSIGN_OR_RETURN(Value result, functions::DotProduct(args[0], args[1])); + return result; +} + +absl::StatusOr ApproxDotProductFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_GE(args.size(), 2); + ZETASQL_RET_CHECK_LE(args.size(), 3); + if (args.size() == 3) { + return ::zetasql_base::InvalidArgumentErrorBuilder() + << "Optional argument `options` is not supported by the ZetaSQL " + << "reference implementation."; + } + if (HasNulls(args.subspan(0, 2))) { + return Value::Null(output_type()); + } + // Approximate distance functions are nondeterministic. + context->SetNonDeterministicOutput(); + ZETASQL_ASSIGN_OR_RETURN(Value result, functions::DotProduct(args[0], args[1])); + return result; +} + +absl::StatusOr ManhattanDistanceFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_EQ(args.size(), 2); + if (HasNulls(args)) { + return Value::Null(output_type()); + } + for (const Value& arg : args) { + MaybeSetNonDeterministicArrayOutput(arg, context); + } + ZETASQL_ASSIGN_OR_RETURN(Value result, + functions::ManhattanDistance(args[0], args[1])); + return result; +} + +absl::StatusOr L1NormFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + if (HasNulls(args)) { + return Value::Null(output_type()); + } + ZETASQL_ASSIGN_OR_RETURN(Value result, functions::L1Norm(args[0])); + return result; +} + +absl::StatusOr L2NormFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + if (HasNulls(args)) { + return Value::Null(output_type()); + } + ZETASQL_ASSIGN_OR_RETURN(Value result, functions::L2Norm(args[0])); + return result; +} + absl::StatusOr EditDistanceFunction::Eval( absl::Span params, absl::Span args, EvaluationContext* context) const { @@ -11406,4 +12248,213 @@ absl::StatusOr EditDistanceFunction::Eval( return Value::Int64(result); } +// Returns true if the all elements of the given `array` are equal, or if the +// array is empty/contains only a single element. `array.type()->IsArray()` must +// be true. +static bool ArrayElementsAreEqual(const Value& array) { + for (int i = 1; i < array.num_elements(); ++i) { + if (array.element(i - 1) != array.element(i)) { + return false; + } + } + return true; +} + +// Returns whether the ARRAY_ZIP function call with the given input arrays +// `array_values` produce non-deterministic results. +static absl::StatusOr ArrayZipResultIsNonDeterministic( + const absl::Span array_values) { + for (const Value& array_value : array_values) { + ZETASQL_RET_CHECK(array_value.type()->IsArray()); + if (array_value.is_null() || array_value.num_elements() <= 1) { + continue; + } + if (ArrayElementsAreEqual(array_value)) { + continue; + } + // If one array does not have a defined order, the result of ARRAY_ZIP is + // non-deterministic. Note strictly speaking this is neither a necessary or + // sufficient condition for the lambda signatures: + // + // (1) If the lambda body does not reference the kIgnoresOrder arrays, the + // result can be deterministic. + // (2) The result is non-deterministic if the lambda body itself is + // non-deterministic. + // + // We don't have a good way to implement (2), and for simplicity we don't + // implement (1) either. + if (InternalValue::GetOrderKind(array_value) == + InternalValue::kIgnoresOrder) { + return true; + } + } + return false; +} + +absl::StatusOr ArrayZipFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK(output_type()->IsArray()); + if (HasNulls(args)) { + // Returns NULL as long as one input argument is NULL, even if the + // `array_zip_mode` is STRICT, meaning no array length validation will be + // performed. + return Value::Null(output_type()); + } + ZETASQL_ASSIGN_OR_RETURN(CategorizedArguments categorized_arguments, + GetCategorizedArguments(args)); + ZETASQL_ASSIGN_OR_RETURN(int zipped_array_length, + GetZippedArrayLength(categorized_arguments.arrays, + categorized_arguments.array_zip_mode)); + ZETASQL_ASSIGN_OR_RETURN( + bool is_non_deterministic, + ArrayZipResultIsNonDeterministic(categorized_arguments.arrays)); + if (is_non_deterministic) { + context->SetNonDeterministicOutput(); + } + if (lambda_ == nullptr) { + return EvalNoLambda(categorized_arguments, zipped_array_length); + } + LambdaEvaluationContext lambda_context(params, context); + return EvalLambda(categorized_arguments, zipped_array_length, lambda_context); +} + +absl::StatusOr +ArrayZipFunction::GetCategorizedArguments(absl::Span args) const { + // Must at least have two arrays and one array_zip_mode arg. + ZETASQL_RET_CHECK_GE(args.size(), 3); + ZETASQL_RET_CHECK(args.back().type()->IsEnum()); + CategorizedArguments categorized_arguments = { + .arrays = args.subspan(0, args.size() - 1), + .array_zip_mode = + static_cast(args.back().enum_value()), + }; + for (const Value& array_value : categorized_arguments.arrays) { + ZETASQL_RET_CHECK(array_value.type()->IsArray()); + } + return categorized_arguments; +} + +absl::StatusOr ArrayZipFunction::GetZippedArrayLength( + absl::Span arrays, + ArrayZipEnums::ArrayZipMode array_zip_mode) const { + switch (array_zip_mode) { + case ArrayZipEnums::PAD: + return MaxArrayLength(arrays); + case ArrayZipEnums::TRUNCATE: + return MinArrayLength(arrays); + case ArrayZipEnums::STRICT: + if (!EqualArrayLength(arrays)) { + // The error message should stay the same as the error message in the + // rewrite template in builtin_function_array.cc. + return absl::OutOfRangeError( + "Unequal array length in ARRAY_ZIP using STRICT mode"); + } + return arrays[0].num_elements(); + case ArrayZipEnums::ARRAY_ZIP_MODE_INVALID: + ZETASQL_RET_CHECK_FAIL(); + } +} + +absl::StatusOr ArrayZipFunction::EvalNoLambda( + const CategorizedArguments& args, int zipped_array_length) const { + ZETASQL_RET_CHECK_EQ(lambda_, nullptr); + absl::Span arrays = args.arrays; + ZETASQL_RET_CHECK_GE(arrays.size(), 2); + ZETASQL_RET_CHECK(output_type()->AsArray()->element_type()->IsStruct()); + + // The check in `Value::MakeStruct` and `Value::MakeArray` will implicitly + // verify that the input element types match the output element type, i.e. + // - The number of fields of the result struct equals to the number of input + // arrays. + // - The i-th field has the same type as the element type of the i-th array. + // Signatures without lambda arguments must output ARRAY. + const StructType* zipped_struct_type = + output_type()->AsArray()->element_type()->AsStruct(); + std::vector elements(zipped_array_length); + for (int i = 0; i < elements.size(); ++i) { + ZETASQL_ASSIGN_OR_RETURN(elements[i], ToStructValue(zipped_struct_type, args.arrays, + /*element_index=*/i)); + } + return Value::MakeArray(output_type()->AsArray(), elements); +} + +absl::StatusOr ArrayZipFunction::EvalLambda( + const CategorizedArguments& args, int zipped_array_length, + LambdaEvaluationContext& lambda_context) const { + ZETASQL_RET_CHECK_NE(lambda_, nullptr); + absl::Span arrays = args.arrays; + ZETASQL_RET_CHECK_GE(arrays.size(), 2); + + std::vector elements(zipped_array_length); + for (int i = 0; i < elements.size(); ++i) { + ZETASQL_ASSIGN_OR_RETURN(elements[i], + ToLambdaReturnValue(arrays, + /*element_index=*/i, lambda_context)); + } + return Value::MakeArray(output_type()->AsArray(), elements); +} + +absl::StatusOr ArrayZipFunction::ToStructValue( + const StructType* struct_type, absl::Span arrays, + int element_index) const { + ZETASQL_RET_CHECK_EQ(struct_type->num_fields(), arrays.size()); + std::vector fields; + fields.reserve(arrays.size()); + for (int i = 0; i < struct_type->num_fields(); ++i) { + const Value array_value = arrays[i]; + if (element_index < array_value.num_elements()) { + fields.push_back(array_value.element(element_index)); + } else { + // Pad NULL for shorter arrays. This only happens when the array_zip_mode + // is `PAD`. + fields.push_back(Value::Null(struct_type->field(i).type)); + } + } + return Value::MakeStruct(struct_type, fields); +} + +absl::StatusOr ArrayZipFunction::ToLambdaReturnValue( + absl::Span arrays, int element_index, + LambdaEvaluationContext& lambda_context) const { + std::vector lambda_args; + lambda_args.reserve(arrays.size()); + for (const Value& array : arrays) { + if (element_index < array.num_elements()) { + lambda_args.push_back(array.element(element_index)); + } else { + // Pad NULL for shorter arrays. This happens only when the array_zip_mode + // is `PAD`. + lambda_args.push_back( + Value::Null(array.type()->AsArray()->element_type())); + } + } + return lambda_context.EvaluateLambda(lambda_, lambda_args); +} + +int ArrayZipFunction::MaxArrayLength(absl::Span arrays) const { + int max_array_length = 0; + for (const Value& array_value : arrays) { + max_array_length = std::max(max_array_length, array_value.num_elements()); + } + return max_array_length; +} + +int ArrayZipFunction::MinArrayLength(absl::Span arrays) const { + int min_array_length = arrays[0].num_elements(); + for (const Value& array_value : arrays) { + min_array_length = std::min(min_array_length, array_value.num_elements()); + } + return min_array_length; +} + +bool ArrayZipFunction::EqualArrayLength(absl::Span arrays) const { + for (int i = 1; i < arrays.size(); ++i) { + if (arrays[i].num_elements() != arrays[i - 1].num_elements()) { + return false; + } + } + return true; +} + } // namespace zetasql diff --git a/zetasql/reference_impl/function.h b/zetasql/reference_impl/function.h index 07c3ded2a..7221a56da 100644 --- a/zetasql/reference_impl/function.h +++ b/zetasql/reference_impl/function.h @@ -30,6 +30,7 @@ #include "google/protobuf/descriptor.h" #include "zetasql/public/function.h" #include "zetasql/public/function_signature.h" +#include "zetasql/public/functions/array_zip_mode.pb.h" #include "zetasql/public/functions/date_time_util.h" #include "zetasql/public/functions/regexp.h" #include "zetasql/public/language_options.h" @@ -108,6 +109,8 @@ enum class FunctionKind { kSum, kVarPop, kVarSamp, + kElementwiseSum, + kElementwiseAvg, // Anonymization functions (broken link) kAnonSum, kAnonSumWithReportProto, @@ -128,6 +131,8 @@ enum class FunctionKind { kDifferentialPrivacyQuantiles, // Exists function kExists, + // GROUPING function + kGrouping, // IsNull function kIsNull, kIsTrue, @@ -137,11 +142,21 @@ enum class FunctionKind { kLike, kLikeWithCollation, kLikeAny, + kNotLikeAny, kLikeAnyWithCollation, + kNotLikeAnyWithCollation, kLikeAll, + kNotLikeAll, kLikeAllWithCollation, + kNotLikeAllWithCollation, kLikeAnyArray, + kLikeAnyArrayWithCollation, + kNotLikeAnyArray, + kNotLikeAnyArrayWithCollation, kLikeAllArray, + kLikeAllArrayWithCollation, + kNotLikeAllArray, + kNotLikeAllArrayWithCollation, // BitCast functions kBitCastToInt32, kBitCastToInt64, @@ -238,6 +253,7 @@ enum class FunctionKind { kArrayFind, kArrayOffsets, kArrayFindAll, + kArrayZip, // Proto map functions. Like array functions, the map functions must use // MaybeSetNonDeterministicArrayOutput. @@ -257,14 +273,38 @@ enum class FunctionKind { kToJson, kToJsonString, kParseJson, + kStringArray, + kInt32, + kInt32Array, kInt64, + kInt64Array, + kUint32, + kUint32Array, + kUint64, + kUint64Array, kDouble, + kDoubleArray, + kFloat, + kFloatArray, kBool, + kBoolArray, kJsonType, kLaxBool, + kLaxBoolArray, + kLaxInt32, + kLaxInt32Array, kLaxInt64, + kLaxInt64Array, + kLaxUint32, + kLaxUint32Array, + kLaxUint64, + kLaxUint64Array, kLaxDouble, + kLaxDoubleArray, + kLaxFloat, + kLaxFloatArray, kLaxString, + kLaxStringArray, kJsonArray, kJsonObject, kJsonRemove, @@ -457,9 +497,20 @@ enum class FunctionKind { kGenerateRangeArray, kRangeContains, + // Distance functions kCosineDistance, + kApproxCosineDistance, kEuclideanDistance, + kApproxEuclideanDistance, + kDotProduct, + kApproxDotProduct, + kManhattanDistance, + kL1Norm, + kL2Norm, kEditDistance, + + // Map functions + kMapFromArray, }; // Provides two utility methods to look up a built-in function name or function @@ -543,31 +594,24 @@ class BuiltinScalarFunction : public ScalarFunctionBody { // Creates a like function. static absl::StatusOr> CreateLikeFunction(FunctionKind kind, const Type* output_type, - const std::vector>& arguments); + absl::Span> arguments); - // Creates a like any function. + // Creates a like any/all function. static absl::StatusOr> - CreateLikeAnyFunction( + CreateLikeAnyAllFunction( FunctionKind kind, const Type* output_type, - const std::vector>& arguments); - - // Creates a like all function. - static absl::StatusOr> - CreateLikeAllFunction( - FunctionKind kind, const Type* output_type, - const std::vector>& arguments); + absl::Span> arguments); // Creates a like any/all array function. static absl::StatusOr> CreateLikeAnyAllArrayFunction( FunctionKind kind, const Type* output_type, - const std::vector>& arguments); + absl::Span> arguments); // Creates a regexp function. static absl::StatusOr> - CreateRegexpFunction( - FunctionKind kind, const Type* output_type, - const std::vector>& arguments); + CreateRegexpFunction(FunctionKind kind, const Type* output_type, + absl::Span> arguments); FunctionKind kind_; }; @@ -637,9 +681,12 @@ class BinaryStatFunction : public BuiltinAggregateFunction { EvaluationContext* context) const override; }; +using ContextAwareFunctionEvaluator = std::function( + const absl::Span arguments, EvaluationContext& context)>; + class UserDefinedScalarFunction : public ScalarFunctionBody { public: - UserDefinedScalarFunction(const FunctionEvaluator& evaluator, + UserDefinedScalarFunction(const ContextAwareFunctionEvaluator& evaluator, const Type* output_type, absl::string_view function_name) : ScalarFunctionBody(output_type), @@ -651,7 +698,7 @@ class UserDefinedScalarFunction : public ScalarFunctionBody { Value* result, absl::Status* status) const override; private: - FunctionEvaluator evaluator_; + ContextAwareFunctionEvaluator evaluator_; const std::string function_name_; }; @@ -1000,8 +1047,10 @@ class LikeFunction : public SimpleBuiltinScalarFunction { public: LikeFunction(FunctionKind kind, const Type* output_type, std::unique_ptr regexp) - : SimpleBuiltinScalarFunction(kind, output_type), - regexp_(std::move(regexp)) {} + : SimpleBuiltinScalarFunction(kind, output_type) { + regexp_.push_back(std::move(regexp)); + has_collation_ = kind == FunctionKind::kLikeWithCollation; + } absl::StatusOr Eval(absl::Span params, absl::Span args, EvaluationContext* context) const override; @@ -1011,83 +1060,84 @@ class LikeFunction : public SimpleBuiltinScalarFunction { private: // Regexp precompiled at prepare time; null if cannot be precompiled. - std::unique_ptr regexp_; -}; - -class LikeAnyFunction : public SimpleBuiltinScalarFunction { - public: - LikeAnyFunction(FunctionKind kind, const Type* output_type, - std::vector> regexp) - : SimpleBuiltinScalarFunction(kind, output_type), - regexp_(std::move(regexp)) {} - - absl::StatusOr Eval(absl::Span params, - absl::Span args, - EvaluationContext* context) const override; - - LikeAnyFunction(const LikeAnyFunction&) = delete; - LikeAnyFunction& operator=(const LikeAnyFunction&) = delete; - - private: - std::vector> regexp_; -}; - -class LikeAllFunction : public SimpleBuiltinScalarFunction { - public: - LikeAllFunction(FunctionKind kind, const Type* output_type, - std::vector> regexp) - : SimpleBuiltinScalarFunction(kind, output_type), - regexp_(std::move(regexp)) {} - - absl::StatusOr Eval(absl::Span params, - absl::Span args, - EvaluationContext* context) const override; - - LikeAllFunction(const LikeAllFunction&) = delete; - LikeAllFunction& operator=(const LikeAllFunction&) = delete; - - private: std::vector> regexp_; + bool has_collation_; }; // Invoked by expression such as: -// LIKE ANY UNNEST() -class LikeAnyArrayFunction : public SimpleBuiltinScalarFunction { +// [NOT] LIKE ANY|ALL (pattern1, pattern2, ...) +class LikeAnyAllFunction : public SimpleBuiltinScalarFunction { public: - LikeAnyArrayFunction(FunctionKind kind, const Type* output_type, - std::vector> regexp) + LikeAnyAllFunction(FunctionKind kind, const Type* output_type, + std::vector> regexp) : SimpleBuiltinScalarFunction(kind, output_type), - regexp_(std::move(regexp)) {} + regexp_(std::move(regexp)) { + ABSL_CHECK(kind == FunctionKind::kLikeAny || kind == FunctionKind::kNotLikeAny || + kind == FunctionKind::kLikeAnyWithCollation || + kind == FunctionKind::kNotLikeAnyWithCollation || + kind == FunctionKind::kLikeAll || kind == FunctionKind::kNotLikeAll || + kind == FunctionKind::kLikeAllWithCollation || + kind == FunctionKind::kNotLikeAllWithCollation); + has_collation_ = kind == FunctionKind::kLikeAnyWithCollation || + kind == FunctionKind::kNotLikeAnyWithCollation || + kind == FunctionKind::kLikeAllWithCollation || + kind == FunctionKind::kNotLikeAllWithCollation; + is_not_ = kind == FunctionKind::kNotLikeAny || + kind == FunctionKind::kNotLikeAll || + kind == FunctionKind::kNotLikeAllWithCollation || + kind == FunctionKind::kNotLikeAnyWithCollation; + } absl::StatusOr Eval(absl::Span params, absl::Span args, EvaluationContext* context) const override; - LikeAnyArrayFunction(const LikeAnyArrayFunction&) = delete; - LikeAnyArrayFunction& operator=(const LikeAnyArrayFunction&) = delete; + LikeAnyAllFunction(const LikeAnyAllFunction&) = delete; + LikeAnyAllFunction& operator=(const LikeAnyAllFunction&) = delete; private: std::vector> regexp_; + bool has_collation_; + bool is_not_; }; // Invoked by expression such as: -// LIKE ALL UNNEST() -class LikeAllArrayFunction : public SimpleBuiltinScalarFunction { +// [NOT] LIKE ANY|ALL UNNEST() +class LikeAnyAllArrayFunction : public SimpleBuiltinScalarFunction { public: - LikeAllArrayFunction(FunctionKind kind, const Type* output_type, - std::vector> regexp) + LikeAnyAllArrayFunction(FunctionKind kind, const Type* output_type, + std::vector> regexp) : SimpleBuiltinScalarFunction(kind, output_type), - regexp_(std::move(regexp)) {} + regexp_(std::move(regexp)) { + ABSL_CHECK(kind == FunctionKind::kLikeAnyArray || + kind == FunctionKind::kLikeAnyArrayWithCollation || + kind == FunctionKind::kNotLikeAnyArray || + kind == FunctionKind::kNotLikeAnyArrayWithCollation || + kind == FunctionKind::kLikeAllArray || + kind == FunctionKind::kLikeAllArrayWithCollation || + kind == FunctionKind::kNotLikeAllArray || + kind == FunctionKind::kNotLikeAllArrayWithCollation); + is_not_ = kind == FunctionKind::kNotLikeAnyArray || + kind == FunctionKind::kNotLikeAnyArrayWithCollation || + kind == FunctionKind::kNotLikeAllArray || + kind == FunctionKind::kNotLikeAllArrayWithCollation; + has_collation_ = kind == FunctionKind::kLikeAnyArrayWithCollation || + kind == FunctionKind::kLikeAllArrayWithCollation || + kind == FunctionKind::kNotLikeAnyArrayWithCollation || + kind == FunctionKind::kNotLikeAllArrayWithCollation; + } absl::StatusOr Eval(absl::Span params, absl::Span args, EvaluationContext* context) const override; - LikeAllArrayFunction(const LikeAllArrayFunction&) = delete; - LikeAllArrayFunction& operator=(const LikeAllArrayFunction&) = delete; + LikeAnyAllArrayFunction(const LikeAnyAllArrayFunction&) = delete; + LikeAnyAllArrayFunction& operator=(const LikeAnyAllArrayFunction&) = delete; private: std::vector> regexp_; + bool is_not_; + bool has_collation_; }; class BitwiseFunction : public BuiltinScalarFunction { @@ -1965,6 +2015,14 @@ class CosineDistanceFunctionDense : public SimpleBuiltinScalarFunction { EvaluationContext* context) const override; }; +class ApproxCosineDistanceFunction : public SimpleBuiltinScalarFunction { + public: + using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + class CosineDistanceFunctionSparseInt64Key : public SimpleBuiltinScalarFunction { public: @@ -2009,6 +2067,54 @@ class EuclideanDistanceFunctionSparseStringKey EvaluationContext* context) const override; }; +class ApproxEuclideanDistanceFunction : public SimpleBuiltinScalarFunction { + public: + using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + +class DotProductFunction : public SimpleBuiltinScalarFunction { + public: + using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + +class ApproxDotProductFunction : public SimpleBuiltinScalarFunction { + public: + using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + +class ManhattanDistanceFunction : public SimpleBuiltinScalarFunction { + public: + using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + +class L1NormFunction : public SimpleBuiltinScalarFunction { + public: + using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + +class L2NormFunction : public SimpleBuiltinScalarFunction { + public: + using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + class EditDistanceFunction : public SimpleBuiltinScalarFunction { public: using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; @@ -2017,6 +2123,74 @@ class EditDistanceFunction : public SimpleBuiltinScalarFunction { EvaluationContext* context) const override; }; +// Evaluates the function calls to all the signatures of `ARRAY_ZIP`. +class ArrayZipFunction : public SimpleBuiltinScalarFunction { + public: + ArrayZipFunction(FunctionKind kind, const Type* output_type, + const InlineLambdaExpr* lambda) + : SimpleBuiltinScalarFunction(kind, output_type), lambda_(lambda) {} + + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; + + private: + // Groups the input arguments according to their types. + struct CategorizedArguments { + absl::Span arrays; + functions::ArrayZipEnums::ArrayZipMode array_zip_mode; + }; + + // Validates the input `args` and convert it to `CategorizedArguments` format + // for easier processing. + absl::StatusOr GetCategorizedArguments( + absl::Span args) const; + + // Returns the result array length based on the input `arrays` and the + // `array_zip_mode`. If `array_zip_mode` == STRICT and the input arrays have + // different lengths, a SQL error will be returned. + absl::StatusOr GetZippedArrayLength( + absl::Span arrays, + functions::ArrayZipEnums::ArrayZipMode array_zip_mode) const; + + // Evaluates the function calls to the signatures without lambda arguments. + // `zipped_array_length` is the length of the result array. + absl::StatusOr EvalNoLambda(const CategorizedArguments& args, + int zipped_array_length) const; + + // Evaluates the function calls to the signatures with lambda arguments. + // `zipped_array_length` is the length of the result array. + absl::StatusOr EvalLambda( + const CategorizedArguments& args, int zipped_array_length, + LambdaEvaluationContext& lambda_context) const; + + // Returns a struct value of the given `struct_type`, whose i-th field value + // equals to the element at `element_index` of the i-th `arrays`, or NULL if + // the i-th array does not have an element at `element_index`. + absl::StatusOr ToStructValue(const StructType* struct_type, + absl::Span arrays, + int element_index) const; + + // Returns the evaluation result of `lambda_` under the given + // `lambda_context`. The i-th input argument to `lambda_` is + // - the element of `arrays[i]` at `element_index`, + // - or NULL if `element_index` is out of boundary for `arrays[i]`. + absl::StatusOr ToLambdaReturnValue( + absl::Span arrays, int element_index, + LambdaEvaluationContext& lambda_context) const; + + // Returns the maximum length of the arrays in `arrays`. + int MaxArrayLength(absl::Span arrays) const; + + // Returns the minimum length of the arrays in `arrays`. + int MinArrayLength(absl::Span arrays) const; + + // Returns true if all the arrays in `arrays` have the same length. + bool EqualArrayLength(absl::Span arrays) const; + + const InlineLambdaExpr* lambda_; +}; + // This method is used only for setting non-deterministic output. // This method does not detect floating point types within STRUCTs or PROTOs, // which would be too expensive to call for each row. diff --git a/zetasql/reference_impl/functions/BUILD b/zetasql/reference_impl/functions/BUILD index dd03360da..441b20eff 100644 --- a/zetasql/reference_impl/functions/BUILD +++ b/zetasql/reference_impl/functions/BUILD @@ -32,6 +32,7 @@ cc_library( deps = [ ":hash", ":json", + ":map", ":range", ":string_with_collation", ":uuid", @@ -59,6 +60,7 @@ cc_library( "//zetasql/base:ret_check", "//zetasql/common:errors", "//zetasql/public:json_value", + "//zetasql/public:language_options", "//zetasql/public:type_cc_proto", "//zetasql/public:value", "//zetasql/public/functions:json", @@ -76,11 +78,39 @@ cc_library( srcs = ["string_with_collation.cc"], hdrs = ["string_with_collation.h"], deps = [ + ":like", + "//zetasql/base:ret_check", + "//zetasql/base:status", "//zetasql/public:collator", "//zetasql/public:value", "//zetasql/public/functions:string_with_collation", + "//zetasql/public/types", "//zetasql/reference_impl:evaluation", "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/strings:cord", + "@com_google_absl//absl/types:span", + ], +) + +cc_library( + name = "like", + srcs = ["like.cc"], + hdrs = ["like.h"], + deps = [ + "//zetasql/base:check", + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/public:collator", + "//zetasql/public:type", + "//zetasql/public:value", + "//zetasql/public/functions:like", + "//zetasql/public/functions:string_with_collation", + "//zetasql/public/proto:type_annotation_cc_proto", + "//zetasql/public/types", + "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/types:span", + "@com_googlesource_code_re2//:re2", ], ) @@ -115,3 +145,20 @@ cc_library( "@com_google_absl//absl/types:span", ], ) + +cc_library( + name = "map", + srcs = ["map.cc"], + hdrs = ["map.h"], + deps = [ + "//zetasql/base:ret_check", + "//zetasql/base:status", + "//zetasql/public:type", + "//zetasql/public:type_cc_proto", + "//zetasql/public:value", + "//zetasql/public/types", + "//zetasql/reference_impl:evaluation", + "@com_google_absl//absl/status:statusor", + "@com_google_absl//absl/types:span", + ], +) diff --git a/zetasql/reference_impl/functions/json.cc b/zetasql/reference_impl/functions/json.cc index b344f764e..6479d873e 100644 --- a/zetasql/reference_impl/functions/json.cc +++ b/zetasql/reference_impl/functions/json.cc @@ -29,7 +29,9 @@ #include "zetasql/public/functions/json_internal.h" #include "zetasql/public/functions/to_json.h" #include "zetasql/public/json_value.h" +#include "zetasql/public/language_options.h" #include "zetasql/public/type.pb.h" +#include "zetasql/public/types/array_type.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/public/value.h" #include "zetasql/reference_impl/function.h" @@ -41,6 +43,15 @@ namespace { using functions::json_internal::StrictJSONPathIterator; +JSONParsingOptions GetJSONParsingOptions( + const LanguageOptions& language_options) { + return JSONParsingOptions{ + .wide_number_mode = (language_options.LanguageFeatureEnabled( + FEATURE_JSON_STRICT_NUMBER_PARSING) + ? JSONParsingOptions::WideNumberMode::kExact + : JSONParsingOptions::WideNumberMode::kRound)}; +} + absl::StatusOr GetJSONValueConstRef( const Value& json, const JSONParsingOptions& json_parsing_options, JSONValue& json_storage) { @@ -90,6 +101,22 @@ Value CreateValueFromOptional(std::optional opt) { return Value::MakeNull(); } +template +absl::StatusOr CreateArrayValueFromOptional( + const std::optional>>& opt, + const ArrayType* array_type) { + if (opt.has_value()) { + std::vector values; + for (const auto& item : opt.value()) { + values.push_back(item.has_value() ? Value::Make(*item) + : Value::MakeNull()); + } + return Value::MakeArray(array_type, std::move(values)); + } else { + return Value::Null(array_type); + } +} + // Signal that statement evaluation encountered non-determinism if a potentially // imprecise value is converted to JSON or to JSON_STRING. void MaybeSetNonDeterministicContext(const Value& arg, @@ -314,6 +341,34 @@ absl::StatusOr JsonExtractJson( return Value::Null(output_type); } +// Compute JSON_QUERY with lax semantics if applicable. The expression is in +// lax mode if the input path is a lax version such as "lax $.a". +// +// Returns std::nullopt if the path is not in lax mode. +absl::StatusOr> MaybeExecuteJsonQueryLax( + const Value& json, absl::string_view json_path, + const LanguageOptions& language_options) { + // Check if JSONPath is in lax mode. + // For example: JSON_QUERY(json_col, "lax $.a"); + ZETASQL_ASSIGN_OR_RETURN(bool is_lax, + functions::json_internal::IsValidAndLaxJSONPath(json_path)); + if (!is_lax) { + // This is a non-lax path. + return std::nullopt; + } + JSONValue json_backing; + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(json, GetJSONParsingOptions(language_options), + json_backing)); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr path_iterator, + StrictJSONPathIterator::Create(json_path, /*enable_lax_mode=*/true)); + ZETASQL_ASSIGN_OR_RETURN(JSONValue result, functions::JsonQueryLax( + json_value_const_ref, *path_iterator)); + return Value::Json(std::move(result)); +} + absl::StatusOr JsonSubscriptFunction::Eval( absl::Span params, absl::Span args, EvaluationContext* context) const { @@ -369,32 +424,42 @@ absl::StatusOr JsonExtractFunction::Eval( if (HasNulls(args)) { return Value::Null(output_type()); } + std::string json_path = args.size() > 1 ? args[1].string_value() : "$"; + const auto& language_options = context->GetLanguageOptions(); + + // Compute JSON_QUERY with lax semantics if applicable. The expression is in + // lax mode if the following criteria is met: + // 1) The language option FEATURE_JSON_QUERY_LAX is enabled + // 2) The function is JSON_QUERY with JSON input type. + // 3) The input path is a lax version + if (language_options.LanguageFeatureEnabled(FEATURE_JSON_QUERY_LAX) && + kind() == FunctionKind::kJsonQuery && args[0].type_kind() == TYPE_JSON) { + ZETASQL_ASSIGN_OR_RETURN( + std::optional result, + MaybeExecuteJsonQueryLax(args[0], json_path, language_options)); + if (result.has_value()) { + return std::move(*result); + } + // This is a non-lax path. Execute JSON_QUERY in non-lax mode. + } + // Note that since the second argument, json_path, is always a constant, it // would be better performance-wise just to create the JsonPathEvaluator once. bool sql_standard_mode = (kind() == FunctionKind::kJsonQuery || kind() == FunctionKind::kJsonValue); - ZETASQL_ASSIGN_OR_RETURN( - std::unique_ptr evaluator, - functions::JsonPathEvaluator::Create( - /*json_path=*/((args.size() > 1) ? args[1].string_value() : "$"), - sql_standard_mode, - /*enable_special_character_escaping_in_values=*/true, - /*enable_special_character_escaping_in_keys=*/true)); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr evaluator, + functions::JsonPathEvaluator::Create( + json_path, sql_standard_mode, + /*enable_special_character_escaping_in_values=*/true, + /*enable_special_character_escaping_in_keys=*/true)); bool scalar = kind() == FunctionKind::kJsonValue || kind() == FunctionKind::kJsonExtractScalar; if (args[0].type_kind() == TYPE_STRING) { return JsonExtractString(*evaluator, args[0].string_value(), scalar); } else { - const auto& language_options = context->GetLanguageOptions(); - return JsonExtractJson( - *evaluator, args[0], output_type(), scalar, - JSONParsingOptions{ - .wide_number_mode = - (language_options.LanguageFeatureEnabled( - FEATURE_JSON_STRICT_NUMBER_PARSING) - ? JSONParsingOptions::WideNumberMode::kExact - : JSONParsingOptions::WideNumberMode::kRound)}); + return JsonExtractJson(*evaluator, args[0], output_type(), scalar, + GetJSONParsingOptions(language_options)); } } @@ -650,6 +715,33 @@ absl::StatusOr ConvertJsonFunction::Eval( functions::ConvertJsonToInt64(json_value_const_ref)); return Value::Int64(output); } + case FunctionKind::kInt32: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN(const int32_t output, + functions::ConvertJsonToInt32(json_value_const_ref)); + return Value::Int32(output); + } + case FunctionKind::kUint64: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN(const uint64_t output, + functions::ConvertJsonToUint64(json_value_const_ref)); + return Value::Uint64(output); + } + case FunctionKind::kUint32: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN(const uint32_t output, + functions::ConvertJsonToUint32(json_value_const_ref)); + return Value::Uint32(output); + } case FunctionKind::kDouble: { ZETASQL_RET_CHECK_EQ(args.size(), 2); ZETASQL_RET_CHECK(args[1].type()->IsString()); @@ -676,6 +768,32 @@ absl::StatusOr ConvertJsonFunction::Eval( language_options.product_mode())); return Value::Double(output); } + case FunctionKind::kFloat: { + ZETASQL_RET_CHECK_EQ(args.size(), 2); + ZETASQL_RET_CHECK(args[1].type()->IsString()); + std::string wide_number_mode_as_string = args[1].string_value(); + functions::WideNumberMode wide_number_mode; + if (wide_number_mode_as_string == "exact") { + wide_number_mode = functions::WideNumberMode::kExact; + } else if (wide_number_mode_as_string == "round") { + wide_number_mode = functions::WideNumberMode::kRound; + } else { + return MakeEvalError() << "Invalid `wide_number_mode` specified: " + << wide_number_mode_as_string; + } + json_parsing_options.wide_number_mode = + (wide_number_mode == functions::WideNumberMode::kExact + ? JSONParsingOptions::WideNumberMode::kExact + : JSONParsingOptions::WideNumberMode::kRound); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN( + const float output, + functions::ConvertJsonToFloat(json_value_const_ref, wide_number_mode, + language_options.product_mode())); + return Value::Float(output); + } case FunctionKind::kBool: { ZETASQL_RET_CHECK_EQ(args.size(), 1); ZETASQL_ASSIGN_OR_RETURN( @@ -685,6 +803,117 @@ absl::StatusOr ConvertJsonFunction::Eval( functions::ConvertJsonToBool(json_value_const_ref)); return Value::Bool(output); } + case FunctionKind::kStringArray: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN( + std::vector output, + functions::ConvertJsonToStringArray(json_value_const_ref)); + return values::StringArray(output); + } + case FunctionKind::kInt64Array: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN( + std::vector output, + functions::ConvertJsonToInt64Array(json_value_const_ref)); + return values::Int64Array(output); + } + case FunctionKind::kInt32Array: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN( + std::vector output, + functions::ConvertJsonToInt32Array(json_value_const_ref)); + return values::Int32Array(output); + } + case FunctionKind::kUint64Array: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN( + std::vector output, + functions::ConvertJsonToUint64Array(json_value_const_ref)); + return values::Uint64Array(output); + } + case FunctionKind::kUint32Array: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN( + std::vector output, + functions::ConvertJsonToUint32Array(json_value_const_ref)); + return values::Uint32Array(output); + } + case FunctionKind::kDoubleArray: { + ZETASQL_RET_CHECK_EQ(args.size(), 2); + ZETASQL_RET_CHECK(args[1].type()->IsString()); + std::string wide_number_mode_as_string = args[1].string_value(); + functions::WideNumberMode wide_number_mode; + if (wide_number_mode_as_string == "exact") { + wide_number_mode = functions::WideNumberMode::kExact; + } else if (wide_number_mode_as_string == "round") { + wide_number_mode = functions::WideNumberMode::kRound; + } else { + return MakeEvalError() << "Invalid `wide_number_mode` specified: " + << wide_number_mode_as_string; + } + json_parsing_options.wide_number_mode = + (wide_number_mode == functions::WideNumberMode::kExact + ? JSONParsingOptions::WideNumberMode::kExact + : JSONParsingOptions::WideNumberMode::kRound); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN(std::vector output, + functions::ConvertJsonToDoubleArray( + json_value_const_ref, wide_number_mode, + language_options.product_mode())); + return values::DoubleArray(output); + } + case FunctionKind::kFloatArray: { + ZETASQL_RET_CHECK_EQ(args.size(), 2); + ZETASQL_RET_CHECK(args[1].type()->IsString()); + std::string wide_number_mode_as_string = args[1].string_value(); + functions::WideNumberMode wide_number_mode; + if (wide_number_mode_as_string == "exact") { + wide_number_mode = functions::WideNumberMode::kExact; + } else if (wide_number_mode_as_string == "round") { + wide_number_mode = functions::WideNumberMode::kRound; + } else { + return MakeEvalError() << "Invalid `wide_number_mode` specified: " + << wide_number_mode_as_string; + } + json_parsing_options.wide_number_mode = + (wide_number_mode == functions::WideNumberMode::kExact + ? JSONParsingOptions::WideNumberMode::kExact + : JSONParsingOptions::WideNumberMode::kRound); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN(std::vector output, + functions::ConvertJsonToFloatArray( + json_value_const_ref, wide_number_mode, + language_options.product_mode())); + return values::FloatArray(output); + } + case FunctionKind::kBoolArray: { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + ZETASQL_ASSIGN_OR_RETURN( + JSONValueConstRef json_value_const_ref, + GetJSONValueConstRef(args[0], json_parsing_options, json_storage)); + ZETASQL_ASSIGN_OR_RETURN(std::vector output, + functions::ConvertJsonToBoolArray(json_value_const_ref)); + return values::BoolArray(output); + } default: return ::zetasql_base::InvalidArgumentErrorBuilder() << "Unsupported function"; } @@ -717,16 +946,81 @@ absl::StatusOr ConvertJsonLaxFunction::Eval( ZETASQL_RET_CHECK(result.ok()); return CreateValueFromOptional(*result); } + case FunctionKind::kLaxInt32: { + auto result = functions::LaxConvertJsonToInt32(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateValueFromOptional(*result); + } + case FunctionKind::kLaxUint64: { + auto result = functions::LaxConvertJsonToUint64(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateValueFromOptional(*result); + } + case FunctionKind::kLaxUint32: { + auto result = functions::LaxConvertJsonToUint32(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateValueFromOptional(*result); + } case FunctionKind::kLaxDouble: { auto result = functions::LaxConvertJsonToFloat64(json_value_const_ref); ZETASQL_RET_CHECK(result.ok()); return CreateValueFromOptional(*result); } + case FunctionKind::kLaxFloat: { + auto result = functions::LaxConvertJsonToFloat32(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateValueFromOptional(*result); + } case FunctionKind::kLaxString: { auto result = functions::LaxConvertJsonToString(json_value_const_ref); ZETASQL_RET_CHECK(result.ok()); return CreateValueFromOptional(*result); } + case FunctionKind::kLaxBoolArray: { + auto result = functions::LaxConvertJsonToBoolArray(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::BoolArrayType()); + } + case FunctionKind::kLaxInt64Array: { + auto result = functions::LaxConvertJsonToInt64Array(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::Int64ArrayType()); + } + case FunctionKind::kLaxInt32Array: { + auto result = functions::LaxConvertJsonToInt32Array(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::Int32ArrayType()); + } + case FunctionKind::kLaxUint64Array: { + auto result = + functions::LaxConvertJsonToUint64Array(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::Uint64ArrayType()); + } + case FunctionKind::kLaxUint32Array: { + auto result = + functions::LaxConvertJsonToUint32Array(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::Uint32ArrayType()); + } + case FunctionKind::kLaxDoubleArray: { + auto result = + functions::LaxConvertJsonToFloat64Array(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::DoubleArrayType()); + } + case FunctionKind::kLaxFloatArray: { + auto result = + functions::LaxConvertJsonToFloat32Array(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::FloatArrayType()); + } + case FunctionKind::kLaxStringArray: { + auto result = + functions::LaxConvertJsonToStringArray(json_value_const_ref); + ZETASQL_RET_CHECK(result.ok()); + return CreateArrayValueFromOptional(*result, types::StringArrayType()); + } default: return ::zetasql_base::InvalidArgumentErrorBuilder() << "Unsupported function"; } @@ -1088,13 +1382,25 @@ void RegisterBuiltinJsonFunctions() { return new ParseJsonFunction(); }); BuiltinFunctionRegistry::RegisterScalarFunction( - {FunctionKind::kInt64, FunctionKind::kDouble, FunctionKind::kBool}, + {FunctionKind::kBool, FunctionKind::kBoolArray, FunctionKind::kDouble, + FunctionKind::kDoubleArray, FunctionKind::kFloat, + FunctionKind::kFloatArray, FunctionKind::kInt64, + FunctionKind::kInt64Array, FunctionKind::kInt32, + FunctionKind::kInt32Array, FunctionKind::kUint64, + FunctionKind::kUint64Array, FunctionKind::kUint32, + FunctionKind::kUint32Array, FunctionKind::kStringArray}, [](FunctionKind kind, const zetasql::Type* output_type) { return new ConvertJsonFunction(kind, output_type); }); BuiltinFunctionRegistry::RegisterScalarFunction( - {FunctionKind::kLaxInt64, FunctionKind::kLaxDouble, - FunctionKind::kLaxBool, FunctionKind::kLaxString}, + {FunctionKind::kLaxBool, FunctionKind::kLaxBoolArray, + FunctionKind::kLaxDouble, FunctionKind::kLaxDoubleArray, + FunctionKind::kLaxFloat, FunctionKind::kLaxFloatArray, + FunctionKind::kLaxInt64, FunctionKind::kLaxInt64Array, + FunctionKind::kLaxInt32, FunctionKind::kLaxInt32Array, + FunctionKind::kLaxUint64, FunctionKind::kLaxUint64Array, + FunctionKind::kLaxUint32, FunctionKind::kLaxUint32Array, + FunctionKind::kLaxString, FunctionKind::kLaxStringArray}, [](FunctionKind kind, const zetasql::Type* output_type) { return new ConvertJsonLaxFunction(kind, output_type); }); diff --git a/zetasql/reference_impl/functions/like.cc b/zetasql/reference_impl/functions/like.cc new file mode 100644 index 000000000..e9909a645 --- /dev/null +++ b/zetasql/reference_impl/functions/like.cc @@ -0,0 +1,193 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/reference_impl/functions/like.h" + +#include +#include +#include + +#include "zetasql/public/collator.h" +#include "zetasql/public/functions/like.h" +#include "zetasql/public/functions/string_with_collation.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/public/value.h" +#include "absl/status/status.h" +#include "absl/status/statusor.h" +#include "absl/types/span.h" +#include "re2/re2.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { + +absl::StatusOr LikeImpl(const Value& lhs, const Value& rhs, + const RE2* regexp) { + if (lhs.is_null() || rhs.is_null()) { + return Value::Null(types::BoolType()); + } + + const std::string& text = + lhs.type_kind() == TYPE_STRING ? lhs.string_value() : lhs.bytes_value(); + + if (regexp != nullptr) { + // Regexp is precompiled + return Value::Bool(RE2::FullMatch(text, *regexp)); + } else { + // Regexp is not precompiled, compile it on the fly + const std::string& pattern = + rhs.type_kind() == TYPE_STRING ? rhs.string_value() : rhs.bytes_value(); + std::unique_ptr regexp; + ZETASQL_RETURN_IF_ERROR( + functions::CreateLikeRegexp(pattern, lhs.type_kind(), ®exp)); + return Value::Bool(RE2::FullMatch(text, *regexp)); + } +} + +bool IsTrue(const Value& value) { + return !value.is_null() && value.bool_value(); +} + +bool IsFalse(const Value& value) { + return !value.is_null() && !value.bool_value(); +} + +// Performs a logical AND of elements in `values` +// Returns NULL for a NULL element in `values` if all other non-null +// elements are false. +// Returns false if there is at least one false in element in `values`. +// Returns true for all other cases. +Value LogicalAnd(absl::Span values) { + bool found_null = false; + for (const auto& value : values) { + if (value.is_null()) { + found_null = true; + } else if (IsFalse(value)) { + return Value::Bool(false); + } + } + return found_null ? Value::NullBool() : Value::Bool(true); +} + +// Performs a logical OR of elements in `values` +// Returns NULL for a NULL element in `values` if all other non-null +// elements are false. +// Returns true if there is at least one true in element in `values`. +// Returns false for all other cases. +Value LogicalOr(absl::Span values) { + bool found_null = false; + for (const auto& value : values) { + if (value.is_null()) { + found_null = true; + } else if (IsTrue(value)) { + return Value::Bool(true); + } + } + return found_null ? Value::NullBool() : Value::Bool(false); +} + +Value LogicalNot(const Value& value) { + if (value.is_null()) { + return value; + } + return IsTrue(value) ? Value::Bool(false) : Value::Bool(true); +} + +absl::Status ValidateQuantifiedLikeEvaluationParams( + const QuantifiedLikeEvaluationParams& params) { + if (params.collation_str.empty()) { + ZETASQL_RET_CHECK(params.pattern_regex != nullptr) << "Pattern regex is null"; + // For cases with pattern is a subquery expression creating an ARRAY, the + // number of regexps will be less than the number of elements and the regexp + ZETASQL_RET_CHECK_LE(params.pattern_regex->size(), params.pattern_elements.size()) + << "Number of regexps is greater than the number of elements"; + } + return absl::OkStatus(); +} + +absl::StatusOr> BuildCollator( + const std::string& collation_str) { + if (collation_str.empty()) { + return nullptr; + } + return zetasql::MakeSqlCollator(collation_str); +} + +absl::StatusOr EvaluateQuantifiedLike( + const QuantifiedLikeEvaluationParams& params) { + ZETASQL_RET_CHECK_OK(ValidateQuantifiedLikeEvaluationParams(params)); + if (params.search_value.is_null()) { + return Value::NullBool(); + } + + if (params.pattern_elements.empty()) { + return Value::Bool(false); + } + + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr collator, + BuildCollator(params.collation_str)); + if (!params.collation_str.empty()) { + ZETASQL_RET_CHECK(collator != nullptr); + } + + Value local_result; + std::vector result_values; + result_values.reserve(params.pattern_elements.size()); + + for (int i = 0; i < params.pattern_elements.size(); ++i) { + auto& pattern_element = params.pattern_elements.at(i); + if (pattern_element.is_null()) { + result_values.push_back(Value::NullBool()); + continue; + } + + if (collator != nullptr) { + // If collator is present, invoke like with collation + ZETASQL_ASSIGN_OR_RETURN(bool result, + functions::LikeUtf8WithCollation( + params.search_value.string_value(), + pattern_element.string_value(), *collator)); + local_result = Value::Bool(result); + } else { + // If collator is absent, invoke like without collation + const RE2* current_regexp = i < params.pattern_regex->size() + ? params.pattern_regex->at(i).get() + : nullptr; + ZETASQL_ASSIGN_OR_RETURN(local_result, LikeImpl(params.search_value, + pattern_element, current_regexp)); + } + + // If NOT LIKE ANY/ALL, flip the result. + if (params.is_not) { + local_result = LogicalNot(local_result); + } + result_values.push_back(local_result); + } + + switch (params.operation_type) { + case QuantifiedLikeEvaluationParams::kLike: + ZETASQL_RET_CHECK_EQ(result_values.size(), 1); + return result_values[0]; + case QuantifiedLikeEvaluationParams::kLikeAny: + return LogicalOr(result_values); + case QuantifiedLikeEvaluationParams::kLikeAll: + return LogicalAnd(result_values); + default: + return absl::InvalidArgumentError("Unknown operation type"); + } +} + +} // namespace zetasql diff --git a/zetasql/reference_impl/functions/like.h b/zetasql/reference_impl/functions/like.h new file mode 100644 index 000000000..03d6f4008 --- /dev/null +++ b/zetasql/reference_impl/functions/like.h @@ -0,0 +1,96 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_REFERENCE_IMPL_FUNCTIONS_LIKE_H_ +#define ZETASQL_REFERENCE_IMPL_FUNCTIONS_LIKE_H_ + +#include +#include +#include + +#include "zetasql/public/proto/type_annotation.pb.h" +#include "zetasql/public/type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/value.h" +#include "zetasql/base/check.h" +#include "absl/status/statusor.h" +#include "absl/types/span.h" +#include "re2/re2.h" + +namespace zetasql { + +// Parameters required for evaluation of like and quantified like operators. +// Format of like operator: +// search_value LIKE pattern_element +// Format of quantified like operator: +// search_value [NOT] LIKE ANY|ALL (pattern_element1, pattern_element2, ...) +// search_value [NOT] LIKE ANY|ALL UNNEST([pattern_element1, ...]) +struct QuantifiedLikeEvaluationParams { + // Type of operation - like, like_any, like_all + enum OperationType { kLike, kLikeAny, kLikeAll }; + // The LHS value of like and quantified like operators. + const Value& search_value; + // The RHS pattern elements of like and quantified like operators. + // pattern_elements should be non-empty. + const absl::Span pattern_elements; + // The RHS pattern regex of like and quantified like operators. + // pattern_regex is used for comparison for cases without collation. + // pattern_regex should be null when collation is specified. If collation_str + // is empty (collation not specified), then pattern_regex and pattern_elements + // must contain same number of elements. + const std::vector>* pattern_regex; + const OperationType operation_type; + // Indicates whether quantified LIKE is has a preceding NOT operator. + const bool is_not; + // The collation string of like and quantified like operators. + // If collation is not specified, collation_str is empty. + const std::string collation_str; + + QuantifiedLikeEvaluationParams() = delete; + QuantifiedLikeEvaluationParams( + const Value& search_value, absl::Span pattern_elements, + const std::vector>* pattern_regex, + OperationType operation_type, bool is_not) + : search_value(search_value), + pattern_elements(pattern_elements), + pattern_regex(pattern_regex), + operation_type(operation_type), + is_not(is_not) {} + + QuantifiedLikeEvaluationParams(const Value& search_value, + absl::Span pattern_elements, + OperationType operation_type, bool is_not, + const std::string& collation_str) + : search_value(search_value), + pattern_elements(pattern_elements), + pattern_regex(nullptr), + operation_type(operation_type), + is_not(is_not), + collation_str(collation_str) {} +}; + +// Evaluates the following like expressions with and without collation: +// search_value LIKE pattern +// search_value [NOT] LIKE ANY (pattern1, pattern2, ...) +// search_value [NOT] LIKE ALL (pattern1, pattern2, ...) +// search_value [NOT] LIKE ANY UNNEST([pattern1, pattern2, ...]) +// search_value [NOT] LIKE ALL UNNEST([pattern1, pattern2, ...]) +absl::StatusOr EvaluateQuantifiedLike( + const QuantifiedLikeEvaluationParams& params); + +} // namespace zetasql + +#endif // ZETASQL_REFERENCE_IMPL_FUNCTIONS_LIKE_H_ diff --git a/zetasql/reference_impl/functions/map.cc b/zetasql/reference_impl/functions/map.cc new file mode 100644 index 000000000..69974ffe3 --- /dev/null +++ b/zetasql/reference_impl/functions/map.cc @@ -0,0 +1,87 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#include "zetasql/reference_impl/functions/map.h" + +#include +#include + +#include "zetasql/public/type.h" +#include "zetasql/public/type.pb.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/public/value.h" +#include "zetasql/reference_impl/evaluation.h" +#include "zetasql/reference_impl/function.h" +#include "zetasql/reference_impl/tuple.h" +#include "absl/status/statusor.h" +#include "absl/types/span.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_macros.h" + +namespace zetasql { +namespace { + +class MapFromArrayFunction : public SimpleBuiltinScalarFunction { + public: + MapFromArrayFunction(FunctionKind kind, const Type* output_type) + : SimpleBuiltinScalarFunction(kind, output_type) {} + absl::StatusOr Eval(absl::Span params, + absl::Span args, + EvaluationContext* context) const override; +}; + +absl::StatusOr MapFromArrayFunction::Eval( + absl::Span params, absl::Span args, + EvaluationContext* context) const { + ZETASQL_RET_CHECK_EQ(args.size(), 1); + const Value& array_arg = args[0]; + + ZETASQL_RET_CHECK(array_arg.type()->IsArray()); + const ArrayType* array_type = array_arg.type()->AsArray(); + + ZETASQL_RET_CHECK(array_type->element_type()->IsStruct()); + const StructType* struct_type = array_type->element_type()->AsStruct(); + + ZETASQL_RET_CHECK_EQ(struct_type->fields().size(), 2); + TypeFactory type_factory; + ZETASQL_ASSIGN_OR_RETURN(const Type* map_type, + type_factory.MakeMapType(struct_type->fields()[0].type, + struct_type->fields()[1].type)); + + if (array_arg.is_null()) { + return Value::Null(map_type); + } + + std::vector> map_entries; + map_entries.reserve(array_arg.elements().size()); + for (const auto& struct_val : array_arg.elements()) { + map_entries.push_back( + std::make_pair(struct_val.fields()[0], struct_val.fields()[1])); + } + + return Value::MakeMap(map_type, std::move(map_entries)); +} + +} // namespace + +void RegisterBuiltinMapFunctions() { + BuiltinFunctionRegistry::RegisterScalarFunction( + {FunctionKind::kMapFromArray}, + [](FunctionKind kind, const Type* output_type) { + return new MapFromArrayFunction(kind, output_type); + }); +} +} // namespace zetasql diff --git a/zetasql/reference_impl/functions/map.h b/zetasql/reference_impl/functions/map.h new file mode 100644 index 000000000..e358a16a9 --- /dev/null +++ b/zetasql/reference_impl/functions/map.h @@ -0,0 +1,27 @@ +// +// Copyright 2019 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// + +#ifndef ZETASQL_REFERENCE_IMPL_FUNCTIONS_MAP_H_ +#define ZETASQL_REFERENCE_IMPL_FUNCTIONS_MAP_H_ + +namespace zetasql { + +// This module registers implementations for all of the Map functions. +void RegisterBuiltinMapFunctions(); + +} // namespace zetasql + +#endif // ZETASQL_REFERENCE_IMPL_FUNCTIONS_MAP_H_ diff --git a/zetasql/reference_impl/functions/register_all.cc b/zetasql/reference_impl/functions/register_all.cc index ebe4644a2..9600e8c8f 100644 --- a/zetasql/reference_impl/functions/register_all.cc +++ b/zetasql/reference_impl/functions/register_all.cc @@ -18,6 +18,7 @@ #include "zetasql/reference_impl/functions/hash.h" #include "zetasql/reference_impl/functions/json.h" +#include "zetasql/reference_impl/functions/map.h" #include "zetasql/reference_impl/functions/range.h" #include "zetasql/reference_impl/functions/string_with_collation.h" #include "zetasql/reference_impl/functions/uuid.h" @@ -30,6 +31,7 @@ void RegisterAllOptionalBuiltinFunctions() { RegisterBuiltinHashFunctions(); RegisterBuiltinStringWithCollationFunctions(); RegisterBuiltinRangeFunctions(); + RegisterBuiltinMapFunctions(); } } // namespace zetasql diff --git a/zetasql/reference_impl/functions/string_with_collation.cc b/zetasql/reference_impl/functions/string_with_collation.cc index c4f4f772c..8477f8b92 100644 --- a/zetasql/reference_impl/functions/string_with_collation.cc +++ b/zetasql/reference_impl/functions/string_with_collation.cc @@ -23,9 +23,18 @@ #include "zetasql/public/collator.h" #include "zetasql/public/functions/string_with_collation.h" +#include "zetasql/public/types/type.h" #include "zetasql/public/value.h" +#include "zetasql/reference_impl/evaluation.h" #include "zetasql/reference_impl/function.h" +#include "zetasql/reference_impl/functions/like.h" +#include "zetasql/reference_impl/tuple.h" #include "absl/status/statusor.h" +#include "absl/strings/cord.h" +#include "absl/types/span.h" +#include "zetasql/base/ret_check.h" +#include "zetasql/base/status_builder.h" +#include "zetasql/base/status_macros.h" namespace zetasql { namespace { @@ -70,22 +79,6 @@ class LikeWithCollationFunction : public SimpleBuiltinScalarFunction { EvaluationContext* context) const override; }; -class LikeAllWithCollationFunction : public SimpleBuiltinScalarFunction { - public: - using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; - absl::StatusOr Eval(absl::Span params, - absl::Span args, - EvaluationContext* context) const override; -}; - -class LikeAnyWithCollationFunction : public SimpleBuiltinScalarFunction { - public: - using SimpleBuiltinScalarFunction::SimpleBuiltinScalarFunction; - absl::StatusOr Eval(absl::Span params, - absl::Span args, - EvaluationContext* context) const override; -}; - template bool InvokeWithCollation(FunctionType function, Value* result, absl::Status* status, absl::string_view collation_name, @@ -225,79 +218,14 @@ absl::StatusOr CollationKeyFunction::Eval( absl::StatusOr LikeWithCollationFunction::Eval( absl::Span params, absl::Span args, EvaluationContext* context) const { - ZETASQL_RET_CHECK_EQ(args.size(), 3); - if (HasNulls(args)) { - return Value::NullBool(); - } - - absl::StatusOr> collator_or_status = - MakeSqlCollator(args[0].string_value()); - if (!collator_or_status.ok()) { - return collator_or_status.status(); - } - ZETASQL_ASSIGN_OR_RETURN(bool result, - functions::LikeUtf8WithCollation( - args[1].string_value(), args[2].string_value(), - *(collator_or_status.value()))); - return Value::Bool(result); -} - -absl::StatusOr LikeAllWithCollationFunction::Eval( - absl::Span params, absl::Span args, - EvaluationContext* context) const { - ZETASQL_RET_CHECK_GE(args.size(), 3); - if (args[1].is_null()) { - return Value::NullBool(); - } - - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr collator, - MakeSqlCollator(args[0].string_value())); - - bool is_rhs_element_null = false; - auto& lhs = args[1].string_value(); - for (int rhs_idx = 2; rhs_idx < args.size(); rhs_idx++) { - if (args[rhs_idx].is_null()) { - is_rhs_element_null = true; - continue; - } - ZETASQL_ASSIGN_OR_RETURN(bool result, - functions::LikeUtf8WithCollation( - lhs, args[rhs_idx].string_value(), *collator)); - if (!result) { - return Value::Bool(false); - } - } - - return is_rhs_element_null ? Value::NullBool() : Value::Bool(true); -} - -absl::StatusOr LikeAnyWithCollationFunction::Eval( - absl::Span params, absl::Span args, - EvaluationContext* context) const { - ZETASQL_RET_CHECK_GE(args.size(), 3); - if (args[1].is_null()) { - return Value::NullBool(); - } - - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr collator, - MakeSqlCollator(args[0].string_value())); - - bool is_rhs_element_null = false; - auto& lhs = args[1].string_value(); - for (int rhs_idx = 2; rhs_idx < args.size(); rhs_idx++) { - if (args[rhs_idx].is_null()) { - is_rhs_element_null = true; - continue; - } - ZETASQL_ASSIGN_OR_RETURN(bool result, - functions::LikeUtf8WithCollation( - lhs, args[rhs_idx].string_value(), *collator)); - if (result) { - return Value::Bool(true); - } - } - - return is_rhs_element_null ? Value::NullBool() : Value::Bool(false); + ZETASQL_RET_CHECK_GE(args.size(), 3) << "LIKE with collation has 3 or more arguments"; + QuantifiedLikeEvaluationParams quantified_like_eval_params( + /*search_value=*/args[1], + /*pattern_elements=*/args.subspan(2), + /*operation_type=*/QuantifiedLikeEvaluationParams::kLike, + /*is_not=*/false, + /*collation_str=*/args[0].string_value()); + return EvaluateQuantifiedLike(quantified_like_eval_params); } } // namespace @@ -322,21 +250,6 @@ void RegisterBuiltinStringWithCollationFunctions() { [](FunctionKind kind, const Type* output_type) { return new CollationKeyFunction(kind, output_type); }); - BuiltinFunctionRegistry::RegisterScalarFunction( - {FunctionKind::kLikeWithCollation}, - [](FunctionKind kind, const Type* output_type) { - return new LikeWithCollationFunction(kind, output_type); - }); - BuiltinFunctionRegistry::RegisterScalarFunction( - {FunctionKind::kLikeAllWithCollation}, - [](FunctionKind kind, const Type* output_type) { - return new LikeAllWithCollationFunction(kind, output_type); - }); - BuiltinFunctionRegistry::RegisterScalarFunction( - {FunctionKind::kLikeAnyWithCollation}, - [](FunctionKind kind, const Type* output_type) { - return new LikeAnyWithCollationFunction(kind, output_type); - }); } } // namespace zetasql diff --git a/zetasql/reference_impl/operator.h b/zetasql/reference_impl/operator.h index 021df0f6a..d47ec6578 100644 --- a/zetasql/reference_impl/operator.h +++ b/zetasql/reference_impl/operator.h @@ -38,21 +38,20 @@ // - relational_op.cc (other relational operation code) // - value_expr.cc (code for ValueExprs) +#include #include #include #include #include #include -#include -#include #include #include #include #include "zetasql/base/logging.h" -#include "google/protobuf/descriptor.h" #include "zetasql/public/catalog.h" #include "zetasql/public/evaluator_table_iterator.h" +#include "zetasql/public/table_valued_function.h" #include "zetasql/public/type.h" #include "zetasql/public/value.h" #include "zetasql/reference_impl/common.h" @@ -62,11 +61,13 @@ #include "zetasql/reference_impl/variable_generator.h" #include "zetasql/reference_impl/variable_id.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/resolved_ast/resolved_collation.h" #include "zetasql/resolved_ast/resolved_column.h" #include "zetasql/resolved_ast/resolved_node.h" #include "absl/container/flat_hash_map.h" #include "absl/container/node_hash_map.h" #include "absl/hash/hash.h" +#include "zetasql/base/check.h" #include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" @@ -74,9 +75,9 @@ #include "absl/types/optional.h" #include "absl/types/span.h" #include "absl/types/variant.h" +#include "google/protobuf/descriptor.h" #include "zetasql/base/stl_util.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status.h" namespace zetasql { @@ -690,6 +691,8 @@ class AggregateArg final : public ExprArg { absl::Span params, EvaluationContext* context) const; + const AggregateFunctionCallExpr* aggregate_function() const; + std::string DebugInternal(const std::string& indent, bool verbose) const override; @@ -710,7 +713,6 @@ class AggregateArg final : public ExprArg { Distinctness distinct() const { return distinct_; } - const AggregateFunctionCallExpr* aggregate_function() const; AggregateFunctionCallExpr* mutable_aggregate_function(); // Number of aggregate arguments to (). @@ -1312,6 +1314,74 @@ class EvaluatorTableScanOp final : public RelationalOp { std::unique_ptr read_time_; }; +// Produces a relation from a TVF. +class TVFOp final : public RelationalOp { + public: + // Relational input of the TVF. Apart from RelationalOp, contains additional + // column schema information allowing TVF evaluator to find columns by name. + struct TvfInputRelation { + // Descriptor of each input column. Needed to preserve schema and column + // name to variable mapping. + struct TvfInputRelationColumn { + // The name of the column. + std::string name; + // The type of the column. + const Type* type; + // Variable associated with this column. + VariableId variable; + }; + // Input relation. + std::unique_ptr relational_op; + // Schema of the input relation. + std::vector columns; + }; + + // Argument for the TVFOp. + struct TVFOpArgument { + // If set, the argument is a value expression. + std::unique_ptr value; + // If set, the argument is a relation. + std::optional relation; + // If set, the argument is a model. + const Model* model; + }; + + static absl::StatusOr> Create( + const TableValuedFunction* tvf, std::vector arguments, + std::vector output_columns, + std::vector variables, + std::shared_ptr function_call_signature); + + absl::Status SetSchemasForEvaluation( + absl::Span params_schemas) override; + + absl::StatusOr> CreateIterator( + absl::Span params, int num_extra_slots, + EvaluationContext* context) const override; + + std::unique_ptr CreateOutputSchema() const override; + std::string IteratorDebugString() const override; + std::string DebugInternal(const std::string& indent, + bool verbose) const override; + + private: + TVFOp(const TableValuedFunction* tvf, std::vector arguments, + std::vector output_columns, + std::vector variables, + std::shared_ptr function_call_signature); + + // The invoked table valued function. + const TableValuedFunction* tvf_; + // Arguments for the invocation. + const std::vector arguments_; + // Names and types of TVF output columns. + const std::vector output_columns_; + // Variables matching 'output_columns_' positionally. + const std::vector variables_; + // Signature of the invocation. Set only if invocation is ambiguous. + const std::shared_ptr function_call_signature_; +}; + // Evaluates some expressions and makes them available to 'body'. Each // expression is allowed to depend on the results of the previous expressions. class LetOp final : public RelationalOp { @@ -1493,6 +1563,12 @@ class AggregateOp final : public RelationalOp { AggregateOp(const AggregateOp&) = delete; AggregateOp& operator=(const AggregateOp&) = delete; + // The maximum grouping sets allowed. + static constexpr int kMaxGroupingSets = 4096; + // The maximum number of columns allowed in grouping sets, equivalent to + // the maximum aggregation keys allowed in a grouping sets query. + static constexpr int kMaxColumnsInGroupingSet = 50; + static std::string GetIteratorDebugString( absl::string_view input_iter_debug_string); @@ -1501,7 +1577,7 @@ class AggregateOp final : public RelationalOp { static absl::StatusOr> Create( std::vector> keys, std::vector> aggregators, - std::unique_ptr input); + std::unique_ptr input, std::vector grouping_sets); absl::Status SetSchemasForEvaluation( absl::Span params_schemas) override; @@ -1524,7 +1600,8 @@ class AggregateOp final : public RelationalOp { AggregateOp(std::vector> keys, std::vector> aggregators, - std::unique_ptr input); + std::unique_ptr input, + std::vector grouping_sets); absl::Span keys() const; absl::Span mutable_keys(); @@ -1534,6 +1611,14 @@ class AggregateOp final : public RelationalOp { const RelationalOp* input() const; RelationalOp* mutable_input(); + + absl::Span grouping_sets() const; + + // Grouping sets stored using bit per "group by key", this also includes + // grouping sets expanded from rollup or cube. + // The least significant bit is for the first key, and so on. It means there + // is no grouping sets when the vector is empty. + std::vector grouping_sets_; }; // Represents scan operator for returning all rows corresponding to the current @@ -1750,14 +1835,17 @@ class SortOp final : public RelationalOp { const bool is_stable_sort_; }; -// Scans (or unnests) an 'array' as a relation. Each output tuple contains an -// optional 'element' variable bound to an element of the array (if 'element' is -// non-empty), and an optional 'position' variable (if 'position' is -// non-empty). If the array has not fully specified order, output positions are -// non-deterministic. Note that all tables have unspecified order of rows. -// For an 'array' of structs, 'fields' may specify (variable, field_index) +// Scans (or unnests) `arrays` as a relation. Each output tuple contains +// optional `elements` variables each of which, is bound to an element of one of +// the array, and an optional `position` variable (if 'position' is non-empty). +// +// If any of the arrays has undefined order, and there is at least one +// additional column, the output positions are non-deterministic. +// Note that all tables have unspecified order of rows. +// +// For an `array` of structs, `fields` may specify (variable, field_index) // pairs (which are useful for scanning a table represented as an array of -// structs in the compliance test framework). field_index refers to the field +// structs in the compliance test framework). `field_index` refers to the field // number in the struct type inside the array type. class ArrayScanOp final : public RelationalOp { public: @@ -1783,11 +1871,20 @@ class ArrayScanOp final : public RelationalOp { static std::string GetIteratorDebugString( absl::string_view array_debug_string); + // Legacy factory method. We keep it to maintain the use case of `fields`, for + // backward compatibility. static absl::StatusOr> Create( const VariableId& element, const VariableId& position, absl::Span> fields, std::unique_ptr array); + // Multiway UNNEST factory function. It's mutually exclusive with the usage of + // `fields`. `elements` and `arrays` are of the same length. + static absl::StatusOr> Create( + absl::Span elements, const VariableId& position, + std::vector> arrays, + std::unique_ptr zip_mode_expr); + absl::Status SetSchemasForEvaluation( absl::Span params_schemas) override; @@ -1795,9 +1892,9 @@ class ArrayScanOp final : public RelationalOp { absl::Span params, int num_extra_slots, EvaluationContext* context) const override; - // Returns the schema consisting of the fields, followed optionally by - // 'element' and 'position' (depending on whether those VariableIds are - // valid). + // Returns the schema consisting of the fields, followed optionally by a + // number of `elements` and `position` (depending on whether those VariableIds + // are valid). std::unique_ptr CreateOutputSchema() const override; std::string IteratorDebugString() const override; @@ -1806,20 +1903,32 @@ class ArrayScanOp final : public RelationalOp { bool verbose) const override; private: - enum ArgKind { kElement, kPosition, kField, kArray }; + enum ArgKind { kElement, kPosition, kField, kArray, kMode }; - // 'fields' contains (variable, field_index) pairs given an 'array' of + // `fields` contains (variable, field_index) pairs given an `array` of // structs, and must be empty otherwise. ArrayScanOp(const VariableId& element, const VariableId& position, absl::Span> fields, std::unique_ptr array); - const VariableId& element() const; // May be empty, i.e., unused. + // Multiway UNNEST constructor. If used, field_list() should be empty. + ArrayScanOp(std::vector> elements, + std::unique_ptr position, + std::vector> arrays, + std::unique_ptr zip_mode_expr); + + // May be empty, i.e., unused. + absl::Span elements() const; const VariableId& position() const; // May be empty, i.e., unused. absl::Span field_list() const; - const ValueExpr* array_expr() const; - ValueExpr* mutable_array_expr(); + absl::Span array_expr_list() const; + absl::Span mutable_array_expr_list(); + int num_arrays() const; + + // Array zipping mode using ARRAY_ZIP_MODE enum. The default is nullptr which + // will be interpreted as "PAD". + const ValueExpr* zip_mode_expr() const; }; // Evaluates a set of keys for each row produced by an input iterator. @@ -4017,7 +4126,7 @@ class DMLInsertValueExpr final : public DMLValueExpr { // mode properly for each corresponding row in "dml_returning_rows". absl::StatusOr InsertRows( const InsertColumnMap& insert_column_map, - const std::vector>& rows_to_insert, + std::vector>& rows_to_insert, std::vector>& dml_returning_rows, EvaluationContext* context, PrimaryKeyRowMap* row_map) const; diff --git a/zetasql/reference_impl/reference_driver.cc b/zetasql/reference_impl/reference_driver.cc index e6f8c9c8c..179f01fb1 100644 --- a/zetasql/reference_impl/reference_driver.cc +++ b/zetasql/reference_impl/reference_driver.cc @@ -35,6 +35,7 @@ #include "zetasql/public/analyzer_output.h" #include "zetasql/public/annotation/collation.h" #include "zetasql/public/error_helpers.h" +#include "zetasql/public/function.h" #include "zetasql/public/functions/date_time_util.h" #include "zetasql/public/language_options.h" #include "zetasql/public/multi_catalog.h" @@ -242,6 +243,12 @@ void ReferenceDriver::SetLanguageOptions(const LanguageOptions& options) { absl::Status ReferenceDriver::AddSqlUdfs( absl::Span create_function_stmts) { + return AddSqlUdfs(create_function_stmts, FunctionOptions()); +} + +absl::Status ReferenceDriver::AddSqlUdfs( + absl::Span create_function_stmts, + FunctionOptions function_options) { // Ensure the language options used allow CREATE FUNCTION in schema setup LanguageOptions language = language_options_; language.AddSupportedStatementKind(RESOLVED_CREATE_FUNCTION_STMT); @@ -251,12 +258,14 @@ absl::Status ReferenceDriver::AddSqlUdfs( // Don't pre-rewrite function bodies. // TODO: In RQG mode, apply a random subset of rewriters. analyzer_options.set_enabled_rewrites({}); + // TODO: b/277368430 - Remove this rewriter once the reference implementation + // for UDA is fixed. + analyzer_options.enable_rewrite(REWRITE_INLINE_SQL_UDAS); for (const std::string& create_function : create_function_stmts) { sql_udf_artifacts_.emplace_back(); ZETASQL_RETURN_IF_ERROR(AddFunctionFromCreateFunction( create_function, analyzer_options, /*allow_persistent_function=*/false, - /*function_options=*/std::nullopt, sql_udf_artifacts_.back(), - *catalog_.catalog())); + function_options, sql_udf_artifacts_.back(), *catalog_.catalog())); } return absl::OkStatus(); } diff --git a/zetasql/reference_impl/reference_driver.h b/zetasql/reference_impl/reference_driver.h index 7504cd3b8..9d6661ef5 100644 --- a/zetasql/reference_impl/reference_driver.h +++ b/zetasql/reference_impl/reference_driver.h @@ -28,6 +28,7 @@ #include "zetasql/compliance/test_database_catalog.h" #include "zetasql/compliance/test_driver.h" #include "zetasql/public/analyzer.h" +#include "zetasql/public/function.h" #include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/simple_catalog.h" @@ -43,6 +44,7 @@ #include "absl/status/statusor.h" #include "absl/strings/string_view.h" #include "absl/time/time.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" @@ -126,6 +128,12 @@ class ReferenceDriver : public TestDriver { // is a collection of "CREATE TEMP FUNCTION" statements. absl::Status AddSqlUdfs( absl::Span create_function_stmts) override; + // A reference-driver specific overload of AddSqlUdfs that also takes a + // FunctionOptions. Even though most function options cannot be controlled + // through a CREATE FUNCTION statement, this driver is used for the query + // generator. We supply FunctionOptions to affect the RQG behavior. + absl::Status AddSqlUdfs(absl::Span create_function_stmts, + FunctionOptions function_options); // Adds some viewss to the catalog owned by this test driver. The argument // is a collection of "CREATE TEMP VIEW" statements. diff --git a/zetasql/reference_impl/reference_impl_known_errors.textproto b/zetasql/reference_impl/reference_impl_known_errors.textproto index f3aaab810..937d2c198 100644 --- a/zetasql/reference_impl/reference_impl_known_errors.textproto +++ b/zetasql/reference_impl/reference_impl_known_errors.textproto @@ -16,3 +16,12 @@ # Known errors file for zetasql compliance framework. Add labels and queries # here to exclude them from compliance tests. + +# TODO: Remove once internal engines are updated to use FLOAT32 +# for FORMAT("%T"). +known_errors { + mode: ALLOW_ERROR_OR_WRONG_ANSWER + reason: "Internal engines still return the type as FLOAT for FORMAT('%T')" + label: "code:format_STRING__ARRAY_<4, -2@5, nan>" + label: "code:format_STRING__FLOAT_.*" +} diff --git a/zetasql/reference_impl/relational_op.cc b/zetasql/reference_impl/relational_op.cc index 2db06b850..58e5abc7a 100644 --- a/zetasql/reference_impl/relational_op.cc +++ b/zetasql/reference_impl/relational_op.cc @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -33,6 +34,9 @@ #include "zetasql/common/thread_stack.h" #include "zetasql/public/catalog.h" #include "zetasql/public/evaluator_table_iterator.h" +#include "zetasql/public/function_signature.h" +#include "zetasql/public/functions/array_zip_mode.pb.h" +#include "zetasql/public/table_valued_function.h" #include "zetasql/public/type.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/value.h" @@ -43,9 +47,12 @@ #include "zetasql/reference_impl/tuple_comparator.h" #include "zetasql/reference_impl/variable_id.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "absl/algorithm/container.h" +#include "absl/container/btree_set.h" #include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "absl/flags/flag.h" +#include "zetasql/base/check.h" #include "absl/memory/memory.h" #include "absl/random/distributions.h" #include "absl/random/random.h" @@ -54,16 +61,16 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" +#include "absl/time/time.h" #include "absl/types/optional.h" #include "absl/types/span.h" -#include "zetasql/base/source_location.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" using zetasql::values::Bool; using zetasql::values::Int64; +using zetasql::values::Null; namespace zetasql { @@ -618,6 +625,316 @@ EvaluatorTableScanOp::EvaluatorTableScanOp( and_filters_(std::move(and_filters)), read_time_(std::move(read_time)) {} +// ------------------------------------------------------- +// TVFOp +// ------------------------------------------------------- + +namespace { +// EvaluatorTableIterator representing input relation scan. +// The TVF implementation operates on relations with columns, therefore it is +// necessary to adapt the input relation tuple iterator and translate column +// indexes into matching tuple slots. +class InputRelationIterator : public EvaluatorTableIterator { + public: + // Creates a new instance of InputRelationIterator. Arguments: + // * columns - Names and types of columns produced by this iterator. + // Accessed through NumColumns, GetColumnName, GetColumnType. + // * tuple_indexes - Maps indices of 'columns' to tuple iterator so that + // proper values can be retrieved. Length must match 'columns'. + // * context - Common evaluation context. + // * iter - Tuple iterator of the relation being adapted. + InputRelationIterator( + std::vector> columns, + const std::vector tuple_indexes, EvaluationContext* context, + std::unique_ptr iter) + : columns_(std::move(columns)), + tuple_indexes_(std::move(tuple_indexes)), + context_(context), + iter_(std::move(iter)) { + ABSL_DCHECK_EQ(columns_.size(), tuple_indexes_.size()); + } + + InputRelationIterator(const InputRelationIterator&) = delete; + InputRelationIterator& operator=(const InputRelationIterator&) = delete; + + int NumColumns() const override { return static_cast(columns_.size()); } + + std::string GetColumnName(int i) const override { + ABSL_DCHECK_LT(i, columns_.size()); + return columns_[i].first; + } + + const Type* GetColumnType(int i) const override { + ABSL_DCHECK_LT(i, columns_.size()); + return columns_[i].second; + } + + bool NextRow() override { + current_ = iter_->Next(); + return current_ != nullptr; + } + + const Value& GetValue(int i) const override { + ABSL_DCHECK_LT(i, tuple_indexes_.size()); + return current_->slot(tuple_indexes_[i]).value(); + } + + absl::Status Status() const override { return iter_->Status(); } + + absl::Status Cancel() override { return context_->CancelStatement(); } + + void SetDeadline(absl::Time deadline) override { + context_->SetStatementEvaluationDeadline(deadline); + } + + private: + // Names and types of the columns + const std::vector> columns_; + // Tuple slot index for each column. Size must match columns_. + const std::vector tuple_indexes_; + // Current evaluation context. + EvaluationContext* context_; + // Input relation tuple iterator. + std::unique_ptr iter_; + // Current tuple values obtained from iter_. + const TupleData* current_ = nullptr; +}; + +// Tuple iterator that adapts TVF EvaluatorTableIterator and converts between +// TVF columnar abstractions to tuples. +// +// The query can select only a subset of columns produced by TVF +// EvaluatorTableIterator. Unselected columns can be pruned and won't have +// tuple slots allocated. Tuple index allows this iterator to map all produced +// tuples to selected TVF columns and ignore the rest of its columns. +class EvaluatorTVFTupleIterator : public TupleIterator { + public: + EvaluatorTVFTupleIterator( + absl::string_view name, std::unique_ptr schema, + int num_extra_slots, std::vector tuple_indexes, + EvaluationContext* context, + std::unique_ptr evaluator_table_iter) + : name_(name), + schema_(std::move(schema)), + tuple_indexes_(std::move(tuple_indexes)), + context_(context), + evaluator_table_iter_(std::move(evaluator_table_iter)), + current_(schema_->num_variables() + num_extra_slots) { + context_->RegisterCancelCallback( + [this] { return evaluator_table_iter_->Cancel(); }); + } + + EvaluatorTVFTupleIterator(const EvaluatorTVFTupleIterator&) = delete; + EvaluatorTVFTupleIterator& operator=(const EvaluatorTVFTupleIterator&) = + delete; + + const TupleSchema& Schema() const override { return *schema_; } + + TupleData* Next() override { + if (!called_next_) { + evaluator_table_iter_->SetDeadline( + context_->GetStatementEvaluationDeadline()); + called_next_ = true; + } + if (!evaluator_table_iter_->NextRow()) { + status_ = evaluator_table_iter_->Status(); + return nullptr; + } + + for (int i = 0; i < tuple_indexes_.size(); ++i) { + current_.mutable_slot(i)->SetValue( + evaluator_table_iter_->GetValue(static_cast(tuple_indexes_[i]))); + } + return ¤t_; + } + + absl::Status Status() const override { return status_; } + + std::string DebugString() const override { + return EvaluatorTableScanOp::GetIteratorDebugString(name_); + } + + private: + const std::string name_; + const std::unique_ptr schema_; + const std::vector tuple_indexes_; + EvaluationContext* context_; + bool called_next_ = false; + std::unique_ptr evaluator_table_iter_; + TupleData current_; + absl::Status status_; +}; +} // namespace + +/*static*/ absl::StatusOr> TVFOp::Create( + const TableValuedFunction* tvf, std::vector arguments, + std::vector output_columns, + std::vector variables, + std::shared_ptr function_call_signature) { + return absl::WrapUnique( + new TVFOp(tvf, std::move(arguments), std::move(output_columns), + std::move(variables), std::move(function_call_signature))); +} + +TVFOp::TVFOp(const TableValuedFunction* tvf, + std::vector arguments, + std::vector output_columns, + std::vector variables, + std::shared_ptr function_call_signature) + : tvf_(tvf), + arguments_(std::move(arguments)), + output_columns_(std::move(output_columns)), + variables_(std::move(variables)), + function_call_signature_(std::move(function_call_signature)) {} + +absl::Status TVFOp::SetSchemasForEvaluation( + absl::Span params_schemas) { + for (const TVFOpArgument& argument : arguments_) { + if (argument.value) { + ZETASQL_RETURN_IF_ERROR(argument.value->SetSchemasForEvaluation(params_schemas)); + } else if (argument.relation) { + ZETASQL_RETURN_IF_ERROR(argument.relation->relational_op->SetSchemasForEvaluation( + params_schemas)); + } else if (argument.model) { + // No-op. + } else { + ZETASQL_RET_CHECK_FAIL() << "Unexpected TVFOpArgument"; + } + } + return absl::OkStatus(); +} + +absl::StatusOr> TVFOp::CreateIterator( + absl::Span params, int num_extra_slots, + EvaluationContext* context) const { + std::vector input_arguments; + for (const TVFOpArgument& argument : arguments_) { + if (argument.value) { + absl::Status status; + TupleSlot result; + if (!argument.value->EvalSimple(params, context, &result, &status)) { + return status; + } + input_arguments.push_back({.value = {result.value()}}); + } else if (argument.relation) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr tuple_iterator, + argument.relation->relational_op->Eval( + params, num_extra_slots, context)); + + std::vector> columns; + std::vector tuple_indexes; + const TupleSchema& tuple_schema = tuple_iterator->Schema(); + for (const TVFOp::TvfInputRelation::TvfInputRelationColumn& column : + argument.relation->columns) { + columns.push_back({column.name, column.type}); + + auto type_index = tuple_schema.FindIndexForVariable(column.variable); + ZETASQL_RET_CHECK(type_index.has_value()); + tuple_indexes.push_back(*type_index); + } + ZETASQL_RET_CHECK_EQ(columns.size(), tuple_indexes.size()); + input_arguments.push_back( + {.relation = {std::make_unique( + std::move(columns), std::move(tuple_indexes), context, + std::move(tuple_iterator))}}); + } else if (argument.model) { + input_arguments.push_back({.model = argument.model}); + } else { + ZETASQL_RET_CHECK_FAIL() << "Unexpected TVFOpArgument"; + } + } + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr evaluator_table_iterator, + tvf_->CreateEvaluator(std::move(input_arguments), output_columns_, + function_call_signature_.get())); + + // evaluator_table_iterator can produce more output columns than were + // selected, especially if the implementation assumes a fixed schema. The tvf + // tuple iterator adapter must ensure that tuple slots match to correct output + // columns. + std::vector tuple_indexes; + for (int i = 0; i < output_columns_.size(); ++i) { + int64_t tuple_index = -1; + for (int j = 0; j < evaluator_table_iterator->NumColumns(); ++j) { + if (output_columns_[i].name == + evaluator_table_iterator->GetColumnName(j)) { + tuple_index = j; + break; + } + } + ZETASQL_RET_CHECK_GE(tuple_index, 0) + << " TVF iterator does not produce output column " + << output_columns_[i].name; + tuple_indexes.push_back(tuple_index); + } + + std::unique_ptr tuple_iterator = + std::make_unique( + tvf_->Name(), CreateOutputSchema(), num_extra_slots, + std::move(tuple_indexes), context, + std::move(evaluator_table_iterator)); + return MaybeReorder(std::move(tuple_iterator), context); +} + +std::unique_ptr TVFOp::CreateOutputSchema() const { + return std::make_unique(variables_); +} + +std::string TVFOp::IteratorDebugString() const { + return absl::StrCat("TvfOp(", tvf_->Name(), ")"); +} + +std::string TVFOp::DebugInternal(const std::string& indent, + bool verbose) const { + const std::string indent_field = absl::StrCat(indent, kIndentFork); + const std::string indent_list = absl::StrCat(indent, kIndentBar, kIndentFork); + const std::string indent_nested = + absl::StrCat(indent, kIndentBar, kIndentSpace); + std::string result = "TvfOp("; + absl::StrAppend(&result, indent_field, "tvf: ", tvf_->Name()); + + absl::StrAppend(&result, indent_field, "arguments: {"); + for (const TVFOpArgument& argument : arguments_) { + if (argument.value) { + absl::StrAppend(&result, indent_list, + argument.value->DebugInternal(indent_nested, verbose)); + } else if (argument.relation) { + absl::StrAppend(&result, indent_list, + argument.relation->relational_op->DebugInternal( + indent_nested, verbose)); + } else if (argument.model) { + absl::StrAppend(&result, indent_list, "MODEL ", argument.model->Name()); + } else { + absl::StrAppend(&result, kIndentBar, kIndentFork, "UNEXPECTED ARGUMENT"); + } + } + absl::StrAppend(&result, "}"); + + if (verbose) { + absl::StrAppend(&result, indent_field, "output_columns: {"); + for (int i = 0; i < output_columns_.size(); ++i) { + absl::StrAppend(&result, indent_list, output_columns_[i].name); + } + absl::StrAppend(&result, "}"); + } + + absl::StrAppend(&result, indent_field, "variables: {"); + for (int i = 0; i < variables_.size(); ++i) { + absl::StrAppend(&result, indent_list, "$", variables_[i].ToString()); + } + absl::StrAppend(&result, "}"); + + if (verbose && function_call_signature_) { + absl::StrAppend( + &result, indent_field, "function_call_signature: ", + function_call_signature_->DebugString(tvf_->Name(), verbose)); + } + + absl::StrAppend(&result, ")"); // To match TvfOp( + return result; +} + // ------------------------------------------------------- // LetOp // ------------------------------------------------------- @@ -3658,33 +3975,72 @@ absl::StatusOr> ArrayScanOp::Create( new ArrayScanOp(element, position, fields, std::move(array))); } +absl::StatusOr> ArrayScanOp::Create( + absl::Span elements, const VariableId& position, + std::vector> arrays, + std::unique_ptr zip_mode_expr) { + ZETASQL_RET_CHECK_EQ(elements.size(), arrays.size()); + ZETASQL_RET_CHECK(absl::c_all_of(arrays, [](const auto& expr) { + return expr != nullptr && expr->output_type()->IsArray(); + })); + ZETASQL_RET_CHECK(zip_mode_expr != nullptr); + + int num_arrays = static_cast(arrays.size()); + std::vector> element_columns(num_arrays); + std::vector> array_columns(num_arrays); + for (int i = 0; i < num_arrays; ++i) { + const Type* element_type = + arrays[i]->output_type()->AsArray()->element_type(); + ZETASQL_RET_CHECK(elements[i].is_valid()); + element_columns[i] = std::make_unique(elements[i], element_type); + array_columns[i] = std::make_unique(std::move(arrays[i])); + } + + std::unique_ptr position_column = + !position.is_valid() + ? nullptr + : std::make_unique(position, types::Int64Type()); + return absl::WrapUnique( + new ArrayScanOp(std::move(element_columns), std::move(position_column), + std::move(array_columns), + std::make_unique(std::move(zip_mode_expr)))); +} + absl::Status ArrayScanOp::SetSchemasForEvaluation( absl::Span params_schemas) { - return mutable_array_expr()->SetSchemasForEvaluation(params_schemas); + for (auto* array_expr : mutable_array_expr_list()) { + ZETASQL_RETURN_IF_ERROR(array_expr->mutable_value_expr()->SetSchemasForEvaluation( + params_schemas)); + } + return absl::OkStatus(); } namespace { -// Returns one tuple per element of 'array_value'. -// - If 'element' is valid, the tuple includes a variable containing the array -// element. -// - If 'position' is valid, the tuple includes a variable containing the +// Returns one tuple for every element of `array_values` with the same position. +// - If `include_element` is true, the tuple includes variables containing an +// element from each of the arrays. +// - If `include_position` is true, the tuple includes a variable containing the // zero-based element position. -// - For each element in 'field_list', the tuple contains a variable containing +// - `max_num_elements` indicates the maximum number of elements we scan in +// every array. This decides the total number of output rows coming out of the +// iterator. +// - For each element in `field_list`, the tuple contains a variable containing // the corresponding field of the array element (which must be a struct if -// 'field_list' is non-empty). This functionality is useful for scanning an +// `field_list` is non-empty). This functionality is useful for scanning an // table represented as an array (e.g., in the compliance tests). class ArrayScanTupleIterator : public TupleIterator { public: ArrayScanTupleIterator( - const Value& array_value, const VariableId& element, - const VariableId& position, + const std::vector& array_values, bool include_element, + bool include_position, int max_num_elements, absl::Span field_list, std::unique_ptr schema, int num_extra_slots, EvaluationContext* context) - : array_value_(array_value), + : array_values_(array_values), schema_(std::move(schema)), - include_element_(element.is_valid()), - include_position_(position.is_valid()), + include_element_(include_element), + include_position_(include_position), + max_num_elements_(max_num_elements), field_list_(field_list.begin(), field_list.end()), current_(schema_->num_variables() + num_extra_slots), context_(context) { @@ -3697,22 +4053,32 @@ class ArrayScanTupleIterator : public TupleIterator { const TupleSchema& Schema() const override { return *schema_; } TupleData* Next() override { - // Scanning a NULL array results in no output. - if (array_value_.is_null()) { + // If final array length is 0, we produce no output. + if (max_num_elements_ == 0) { return nullptr; } - if (next_element_idx_ == array_value_.num_elements()) { - // Done iterating over the array. The output is non-deterministic if it - // includes the position but the array is unordered and has more than one - // element. - if (include_position_ && - (InternalValue::GetOrderKind(array_value_) == - InternalValue::kIgnoresOrder) && - array_value_.num_elements() > 1) { + bool has_multiple_columns = include_position_ || array_values_.size() > 1; + if (next_element_idx_ == max_num_elements_) { + // Done iterating over the arrays. The output is non-deterministic if any + // of the arrays is unordered and has multiple elements, and the output + // tuple contains multiple columns (slots). The multiple columns can show + // up when there is an array offset column, or there are multiple arrays. + bool ignore_order = false; + for (const auto& array : array_values_) { + if (InternalValue::GetOrderKind(array) == + InternalValue::kIgnoresOrder && + !array.is_null() && array.num_elements() > 1) { + // We don't need to check if unordered array with more than 1 element + // contains equivalent elements. If that happens, it indicates a bug + // in the origin who provides the unordered array. + ignore_order = true; + break; + } + } + if (has_multiple_columns && ignore_order) { context_->SetNonDeterministicOutput(); } - return nullptr; } @@ -3722,19 +4088,31 @@ class ArrayScanTupleIterator : public TupleIterator { return nullptr; } - const Value& element = array_value_.element(next_element_idx_); - for (int i = 0; i < field_list_.size(); ++i) { - current_.mutable_slot(i)->SetValue( - element.field(field_list_[i]->field_index())); + // We only prepend field list columns if they are populated. + if (!field_list_.empty()) { + ABSL_DCHECK(!array_values_[0].is_null()); + ABSL_DCHECK(next_element_idx_ < array_values_[0].num_elements()); + const Value& element_of_first_array = + array_values_[0].element(next_element_idx_); + for (int i = 0; i < field_list_.size(); ++i) { + current_.mutable_slot(i)->SetValue( + element_of_first_array.field(field_list_[i]->field_index())); + } } - int next_value_idx = field_list_.size(); + int next_slot_idx = static_cast(field_list_.size()); if (include_element_) { - current_.mutable_slot(next_value_idx)->SetValue(element); - - ++next_value_idx; + for (int i = 0; i < array_values_.size(); ++i) { + Value value = + (array_values_[i].is_null() || + next_element_idx_ >= array_values_[i].num_elements()) + ? Null(array_values_[i].type()->AsArray()->element_type()) + : array_values_[i].element(next_element_idx_); + current_.mutable_slot(next_slot_idx)->SetValue(value); + next_slot_idx++; + } } if (include_position_) { - current_.mutable_slot(next_value_idx)->SetValue(Int64(next_element_idx_)); + current_.mutable_slot(next_slot_idx)->SetValue(Int64(next_element_idx_)); } ++next_element_idx_; @@ -3744,7 +4122,10 @@ class ArrayScanTupleIterator : public TupleIterator { absl::Status Status() const override { return status_; } std::string DebugString() const override { - return ArrayScanOp::GetIteratorDebugString(array_value_.DebugString()); + return ArrayScanOp::GetIteratorDebugString(absl::StrJoin( + array_values_, ", ", [](std::string* out, const Value& value) { + absl::StrAppend(out, value.DebugString()); + })); } absl::Status Cancel() { @@ -3753,10 +4134,11 @@ class ArrayScanTupleIterator : public TupleIterator { } private: - const Value array_value_; + const std::vector array_values_; const std::unique_ptr schema_; const bool include_element_; const bool include_position_; + const int max_num_elements_; const std::vector field_list_; TupleData current_; int next_element_idx_ = 0; @@ -3764,34 +4146,81 @@ class ArrayScanTupleIterator : public TupleIterator { absl::Status status_; EvaluationContext* context_; }; + +int OutputArrayLength(functions::ArrayZipEnums::ArrayZipMode mode, + int min_length, int max_length) { + return mode == functions::ArrayZipEnums::PAD ? max_length : min_length; +} + } // namespace absl::StatusOr> ArrayScanOp::CreateIterator( absl::Span params, int num_extra_slots, EvaluationContext* context) const { - TupleSlot array_slot; - absl::Status status; - if (!array_expr()->EvalSimple(params, context, &array_slot, &status)) - return status; - std::unique_ptr iter = - std::make_unique( - array_slot.value(), element(), position(), field_list(), - CreateOutputSchema(), num_extra_slots, context); + absl::Span array_exprs = array_expr_list(); + ZETASQL_RET_CHECK(!array_exprs.empty()); + std::vector array_values(array_exprs.size()); + std::vector array_lengths(array_exprs.size()); + for (int i = 0; i < array_exprs.size(); ++i) { + TupleSlot array_slot; + absl::Status status; + if (!array_exprs[i]->value_expr()->EvalSimple(params, context, &array_slot, + &status)) { + return status; + } + array_values[i] = array_slot.value(); + array_lengths[i] = + array_slot.value().is_null() ? 0 : array_slot.value().num_elements(); + } + + // If the mode argument is unspecified, its value defaults to "PAD". + const ValueExpr* mode_expr = zip_mode_expr(); + functions::ArrayZipEnums::ArrayZipMode mode = functions::ArrayZipEnums::PAD; + if (mode_expr != nullptr) { + TupleSlot mode_slot; + absl::Status status; + if (!mode_expr->EvalSimple(params, context, &mode_slot, &status)) { + return status; + } + ZETASQL_RET_CHECK(mode_slot.value().type()->IsEnum()); + if (mode_slot.value().is_null()) { + return absl::OutOfRangeError("UNNEST does not allow NULL mode argument"); + } + mode = static_cast( + mode_slot.value().enum_value()); + } + + // Throw error against unequal arrays if `mode` is set to "STRICT". + const auto [min_length, max_length] = + std::minmax_element(array_lengths.begin(), array_lengths.end()); + if (mode == functions::ArrayZipEnums::STRICT && *min_length != *max_length) { + return absl::OutOfRangeError( + "Unnested arrays under STRICT mode must have equal lengths"); + } + + std::unique_ptr iter = std::make_unique< + ArrayScanTupleIterator>( + array_values, /*include_element=*/!elements().empty(), + /*include_position=*/position().is_valid(), + /*max_num_elements=*/OutputArrayLength(mode, *min_length, *max_length), + field_list(), CreateOutputSchema(), num_extra_slots, context); return MaybeReorder(std::move(iter), context); } std::unique_ptr ArrayScanOp::CreateOutputSchema() const { // Returns the variables to use for the scan of an // ArrayScanTupleIterator. These are the variables in field_list, followed by - // 'element' (if it is valid), followed by 'position' (if it is valid). See + // valid `elements`, followed by `position` (if it is valid). See // the class comment for ArrayScanTupleIterator for more details. std::vector vars; - vars.reserve(field_list().size() + 2); + vars.reserve(field_list().size() + elements().size() + 1); for (const ArrayScanOp::FieldArg* field : field_list()) { vars.push_back(field->variable()); } - if (element().is_valid()) { - vars.push_back(element()); + for (const ExprArg* element : elements()) { + // The elements vector only contains valid VariableId, thus, we can directly + // use it. + vars.push_back(element->variable()); } if (position().is_valid()) { vars.push_back(position()); @@ -3807,26 +4236,50 @@ std::string ArrayScanOp::DebugInternal(const std::string& indent, bool verbose) const { std::string indent_child = indent + kIndentSpace; std::string indent_input = indent + kIndentFork; - const Type* element_type = - array_expr()->output_type()->AsArray()->element_type(); + // Field list can only be used when there is only one array expression. It + // can not be used with multiway UNNEST. + const Type* element_type; + if (!field_list().empty()) { + ABSL_DCHECK_EQ(array_expr_list().size(), 1); + element_type = array_expr_list()[0] + ->value_expr() + ->output_type() + ->AsArray() + ->element_type(); + } std::vector fstr; for (auto ch : field_list()) { const std::string& field_name = element_type->AsStruct()->field(ch->field_index()).name; - fstr.push_back(absl::StrCat(ch->DebugInternal(indent, verbose), ":", - field_name, ",", indent_input)); + fstr.push_back(absl::StrCat(indent_input, + ch->DebugInternal(indent, verbose), ":", + field_name, ",")); } std::sort(fstr.begin(), fstr.end()); - return absl::StrCat( - "ArrayScanOp(", indent_input, - (!element().is_valid() ? "" - : absl::StrCat(GetArg(kElement)->DebugString(), - " := element,", indent_input)), - (!position().is_valid() ? "" - : absl::StrCat(GetArg(kPosition)->DebugString(), - " := position,", indent_input)), - absl::StrJoin(fstr, ""), - "array: ", array_expr()->DebugInternal(indent_child, verbose), ")"); + std::string out = "ArrayScanOp("; + for (const auto* element : GetArgs(kElement)) { + absl::StrAppend(&out, indent_input, + !element->variable().is_valid() + ? "" + : absl::StrCat(element->DebugString(), " := element,")); + } + absl::StrAppend( + &out, !position().is_valid() + ? "" + : absl::StrCat(indent_input, GetArg(kPosition)->DebugString(), + " := position,")); + if (num_arrays() > 1 && zip_mode_expr() != nullptr) { + absl::StrAppend(&out, indent_input, "mode: ", + GetArg(kMode)->DebugInternal(indent_child, verbose)); + } + absl::StrAppend(&out, absl::StrJoin(fstr, "")); + for (const auto* array_expr : array_expr_list()) { + absl::StrAppend( + &out, indent_input, "array: ", + array_expr->value_expr()->DebugInternal(indent_child, verbose)); + } + absl::StrAppend(&out, ")"); + return out; } ArrayScanOp::ArrayScanOp(const VariableId& element, const VariableId& position, @@ -3834,13 +4287,22 @@ ArrayScanOp::ArrayScanOp(const VariableId& element, const VariableId& position, std::unique_ptr array) { ABSL_CHECK(array->output_type()->IsArray()); const Type* element_type = array->output_type()->AsArray()->element_type(); - SetArg(kElement, !element.is_valid() - ? nullptr - : std::make_unique(element, element_type)); + std::vector> elements; + if (element.is_valid()) { + // Only valid VariableId get populated into the elements vector. + // Note that, the only case that `elements` is empty is when the user of + // ArrayScanOp only supplies one array and explicitly asked for no element + // column in the schema. + elements.push_back(std::make_unique(element, element_type)); + } + + SetArgs(kElement, std::move(elements)); SetArg(kPosition, !position.is_valid() ? nullptr : std::make_unique( position, types::Int64Type())); - SetArg(kArray, std::make_unique(std::move(array))); + std::vector> arrays(1); + arrays[0] = std::make_unique(std::move(array)); + SetArgs(kArray, std::move(arrays)); std::vector> field_args; field_args.reserve(fields.size()); for (const auto& f : fields) { @@ -3848,24 +4310,38 @@ ArrayScanOp::ArrayScanOp(const VariableId& element, const VariableId& position, f.first, f.second, element_type->AsStruct()->field(f.second).type)); } SetArgs(kField, std::move(field_args)); + // The singleton array scan does not support `mode` argument. + SetArg(kMode, nullptr); +} + +ArrayScanOp::ArrayScanOp(std::vector> elements, + std::unique_ptr position, + std::vector> arrays, + std::unique_ptr zip_mode_expr) { + // Factory method `Create` has already validated the size and output_type of + // array expressions. + SetArgs(kElement, std::move(elements)); + SetArg(kPosition, std::move(position)); + SetArg(kMode, std::move(zip_mode_expr)); + SetArgs(kArray, std::move(arrays)); + std::vector> field_args; + SetArgs(kField, std::move(field_args)); } -const ValueExpr* ArrayScanOp::array_expr() const { - return GetArg(kArray)->value_expr(); +absl::Span ArrayScanOp::array_expr_list() const { + return GetArgs(kArray); } -ValueExpr* ArrayScanOp::mutable_array_expr() { - return GetMutableArg(kArray)->mutable_value_expr(); +absl::Span ArrayScanOp::mutable_array_expr_list() { + return GetMutableArgs(kArray); } absl::Span ArrayScanOp::field_list() const { return GetArgs(kField); } -const VariableId& ArrayScanOp::element() const { - static const VariableId* empty_str = new VariableId(); - return GetArg(kElement) != nullptr ? GetArg(kElement)->variable() - : *empty_str; +absl::Span ArrayScanOp::elements() const { + return GetArgs(kElement); } const VariableId& ArrayScanOp::position() const { @@ -3874,6 +4350,14 @@ const VariableId& ArrayScanOp::position() const { : *empty_str; } +int ArrayScanOp::num_arrays() const { + return static_cast(GetArgs(kArray).size()); +} + +const ValueExpr* ArrayScanOp::zip_mode_expr() const { + return GetArg(kMode) != nullptr ? GetArg(kMode)->value_expr() : nullptr; +} + // ------------------------------------------------------- // DistinctOp // ------------------------------------------------------- diff --git a/zetasql/reference_impl/relational_op_test.cc b/zetasql/reference_impl/relational_op_test.cc index 63d0af1c9..cbf8ae7bd 100644 --- a/zetasql/reference_impl/relational_op_test.cc +++ b/zetasql/reference_impl/relational_op_test.cc @@ -34,9 +34,11 @@ #include "zetasql/common/testing/testing_proto_util.h" #include "zetasql/common/thread_stack.h" #include "zetasql/public/evaluator_table_iterator.h" +#include "zetasql/public/functions/array_zip_mode.pb.h" #include "zetasql/public/language_options.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/type.h" +#include "zetasql/public/types/type_factory.h" #include "zetasql/public/value.h" #include "zetasql/reference_impl/evaluation.h" #include "zetasql/reference_impl/function.h" @@ -913,8 +915,8 @@ class TestCppValuesOp : public RelationalOp { // output_vars: Variables to use in the output schema. Requirement: // output_vars.size() == tuple_vars.size() + cpp_vars.size() TestCppValuesOp(std::vector tuple_vars, - std::vector cpp_vars, - std::vector output_vars) + std::vector cpp_vars, + std::vector output_vars) : cpp_vars_(cpp_vars), output_vars_(output_vars) { // Initialize slot indices for tuple variables to -1, we'll substitute in // the actual values during SetSchemasForEvaluation(). @@ -3248,10 +3250,17 @@ TEST_F(CreateIteratorTest, ArrayScanOp) { ZETASQL_ASSERT_OK_AND_ASSIGN(auto deref_param, DerefExpr::Create(param, proto_array_type_)); + std::vector> arrays; + arrays.push_back(std::move(deref_param)); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto mode_expr, ConstExpr::Create(Value::Enum( + types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::PAD))); ZETASQL_ASSERT_OK_AND_ASSIGN( auto scan_op, - ArrayScanOp::Create(a, p, /*fields=*/{}, std::move(deref_param))); + ArrayScanOp::Create(/*elements=*/{a}, /*position=*/p, + /*arrays=*/std::move(arrays), + /*zip_mode_expr=*/std::move(mode_expr))); EXPECT_EQ(scan_op->IteratorDebugString(), "ArrayScanTupleIterator()"); EXPECT_EQ( "ArrayScanOp(\n" @@ -3318,8 +3327,16 @@ TEST_F(CreateIteratorTest, ArrayScanOpNonDeterministic) { ZETASQL_ASSERT_OK_AND_ASSIGN( auto array_expr, ConstExpr::Create(Array({Int64(1), Int64(2)}, kIgnoresOrder))); - ZETASQL_ASSERT_OK_AND_ASSIGN(auto scan_op, - ArrayScanOp::Create(a, p, {}, std::move(array_expr))); + std::vector> arrays; + arrays.push_back(std::move(array_expr)); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto mode_expr, ConstExpr::Create(Value::Enum( + types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::PAD))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto scan_op, + ArrayScanOp::Create(/*elements=*/{a}, /*position=*/p, + /*arrays=*/std::move(arrays), + /*zip_mode_expr=*/std::move(mode_expr))); EvaluationContext context((EvaluationOptions())); ZETASQL_ASSERT_OK_AND_ASSIGN( @@ -3350,14 +3367,271 @@ TEST_F(CreateIteratorTest, ArrayScanOpNonDeterministic) { EXPECT_FALSE(context.IsDeterministicOutput()); } +TEST_F(CreateIteratorTest, ArrayScanOpMultiwayUnnest) { + VariableId arr1("arr1"), arr2("arr2"), p("p"); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto array_expr1, + ConstExpr::Create(Array({Int64(1), Int64(2), Int64(3)}, + kPreservesOrder))); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto array_expr2, ConstExpr::Create(Array( + {String("hello"), String("world")}, + kPreservesOrder))); + std::vector> arrays; + arrays.push_back(std::move(array_expr1)); + arrays.push_back(std::move(array_expr2)); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto mode_expr, ConstExpr::Create(Value::Enum( + types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::PAD))); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto scan_op_without_position, + ArrayScanOp::Create(/*elements=*/{arr1, arr2}, /*position=*/VariableId(), + /*arrays=*/std::move(arrays), + /*zip_mode_expr=*/std::move(mode_expr))); + EXPECT_EQ(scan_op_without_position->IteratorDebugString(), + "ArrayScanTupleIterator()"); + EXPECT_EQ( + "ArrayScanOp(\n" + "+-$arr1 := element,\n" + "+-$arr2 := element,\n" + "+-mode: ConstExpr(PAD)\n" + "+-array: ConstExpr([1, 2, 3])\n" + "+-array: ConstExpr([\"hello\", \"world\"]))", + scan_op_without_position->DebugString()); + + EvaluationContext context((EvaluationOptions())); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr iter, + scan_op_without_position->CreateIterator( + EmptyParams(), /*num_extra_slots=*/0, &context)); + EXPECT_TRUE(iter->PreservesOrder()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector data, + ReadFromTupleIterator(iter.get())); + ASSERT_EQ(data.size(), 3); + EXPECT_EQ(Tuple(&iter->Schema(), &data[0]).DebugString(), + ""); + EXPECT_EQ(Tuple(&iter->Schema(), &data[1]).DebugString(), + ""); + EXPECT_EQ(Tuple(&iter->Schema(), &data[2]).DebugString(), + ""); + // Check for no extra slots. + EXPECT_EQ(data[0].num_slots(), 2); + EXPECT_EQ(data[1].num_slots(), 2); + EXPECT_EQ(data[2].num_slots(), 2); + // If all arrays are ordered in multiway UNNEST, the results are + // deterministic. + EXPECT_TRUE(context.IsDeterministicOutput()); + + // Check that scrambling works. + EvaluationContext scramble_context(GetScramblingEvaluationOptions()); + ZETASQL_ASSERT_OK_AND_ASSIGN( + iter, scan_op_without_position->CreateIterator( + EmptyParams(), /*num_extra_slots=*/0, &scramble_context)); + EXPECT_FALSE(iter->PreservesOrder()); + + // Unordered array with only one element. + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto array_expr3, + ConstExpr::Create(Array({String("hello")}, kIgnoresOrder))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto array_expr4, + ConstExpr::Create(Array({Int64(1), Int64(2)}, kPreservesOrder))); + std::vector> arrays2; + arrays2.push_back(std::move(array_expr3)); + arrays2.push_back(std::move(array_expr4)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto mode_expr2, + ConstExpr::Create(Value::Enum(types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::TRUNCATE))); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto scan_op_with_position, + ArrayScanOp::Create(/*elements=*/{arr1, arr2}, /*position=*/VariableId(), + /*arrays=*/std::move(arrays2), + /*zip_mode_expr=*/std::move(mode_expr2))); + EXPECT_EQ(scan_op_with_position->IteratorDebugString(), + "ArrayScanTupleIterator()"); + EXPECT_EQ( + "ArrayScanOp(\n" + "+-$arr1 := element,\n" + "+-$arr2 := element,\n" + "+-mode: ConstExpr(TRUNCATE)\n" + "+-array: ConstExpr([\"hello\"])\n" + "+-array: ConstExpr([1, 2]))", + scan_op_with_position->DebugString()); + + EvaluationContext context2((EvaluationOptions())); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr iter2, + scan_op_with_position->CreateIterator( + EmptyParams(), /*num_extra_slots=*/0, &context2)); + // Unordered arrays with only one element does not cause the output rows to be + // non-deterministic. + EXPECT_TRUE(iter2->PreservesOrder()); + EXPECT_TRUE(context2.IsDeterministicOutput()); +} + +static std::unique_ptr DivByZeroErrorExpr() { + std::vector> div_args; + div_args.push_back(ConstExpr::Create(Int64(1)).value()); + div_args.push_back(ConstExpr::Create(Int64(0)).value()); + + return ScalarFunctionCallExpr::Create( + CreateFunction(FunctionKind::kDiv, Int64Type()), + std::move(div_args), DEFAULT_ERROR_MODE) + .value(); +} + +TEST_F(CreateIteratorTest, ArrayScanOpBadModeExpr) { + VariableId arr1("arr1"), arr2("arr2"), x("x"); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto array_expr1, + ConstExpr::Create(Array({Int64(1), Int64(2), Int64(3)}, + kPreservesOrder))); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto array_expr2, ConstExpr::Create(Array( + {String("hello"), String("world")}, + kPreservesOrder))); + std::vector> arrays; + arrays.push_back(std::move(array_expr1)); + arrays.push_back(std::move(array_expr2)); + + ZETASQL_ASSERT_OK_AND_ASSIGN(auto body, ConstExpr::Create(Value::Enum( + types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::TRUNCATE))); + std::vector> let_assign; + let_assign.push_back(std::make_unique(x, DivByZeroErrorExpr())); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto mode_expr, WithExpr::Create(std::move(let_assign), std::move(body))); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto scan_op_without_position, + ArrayScanOp::Create(/*elements=*/{arr1, arr2}, /*position=*/VariableId(), + /*arrays=*/std::move(arrays), + /*zip_mode_expr=*/std::move(mode_expr))); + + EvaluationContext context((EvaluationOptions())); + EXPECT_THAT( + scan_op_without_position->CreateIterator(EmptyParams(), + /*num_extra_slots=*/0, &context), + StatusIs(absl::StatusCode::kOutOfRange, HasSubstr("division by zero"))); +} + +TEST_F(CreateIteratorTest, ArrayScanOpNonDeterministicMultiwayUnnest) { + VariableId arr1("arr1"), arr2("arr2"); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto array_expr1, + ConstExpr::Create(Array({Int64(1), Int64(2), Int64(3)}, + kPreservesOrder))); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto array_expr2, + ConstExpr::Create(Array( + {String("hello"), String("world")}, kIgnoresOrder))); + std::vector> arrays; + arrays.push_back(std::move(array_expr1)); + arrays.push_back(std::move(array_expr2)); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto mode_expr, ConstExpr::Create(Value::Enum( + types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::PAD))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto scan_op, + ArrayScanOp::Create(/*elements=*/{arr1, arr2}, /*position=*/VariableId(), + /*arrays=*/std::move(arrays), + /*zip_mode_expr=*/std::move(mode_expr))); + + EvaluationContext context((EvaluationOptions())); + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr iter, + scan_op->CreateIterator(EmptyParams(), /*num_extra_slots=*/0, &context)); + EXPECT_TRUE(iter->PreservesOrder()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector data, + ReadFromTupleIterator(iter.get())); + ASSERT_EQ(data.size(), 3); + EXPECT_EQ(Tuple(&iter->Schema(), &data[0]).DebugString(), + ""); + EXPECT_EQ(Tuple(&iter->Schema(), &data[1]).DebugString(), + ""); + EXPECT_EQ(Tuple(&iter->Schema(), &data[2]).DebugString(), + ""); + // Check for no extra slots. + EXPECT_EQ(data[0].num_slots(), 2); + EXPECT_EQ(data[1].num_slots(), 2); + EXPECT_EQ(data[2].num_slots(), 2); + // Unordered arrays used in multiway UNNEST are non-deterministic if there is + // more than one element. + EXPECT_FALSE(context.IsDeterministicOutput()); + + // Check that scrambling works. + EvaluationContext scramble_context(GetScramblingEvaluationOptions()); + ZETASQL_ASSERT_OK_AND_ASSIGN( + iter, scan_op->CreateIterator(EmptyParams(), /*num_extra_slots=*/0, + &scramble_context)); + EXPECT_FALSE(iter->PreservesOrder()); + ZETASQL_ASSERT_OK_AND_ASSIGN(data, ReadFromTupleIterator(iter.get())); + ASSERT_EQ(data.size(), 3); + // Don't look at 'data' in more detail because it is scrambled. + EXPECT_FALSE(context.IsDeterministicOutput()); +} + +TEST_F(CreateIteratorTest, + ArrayScanOpNonDeterministicMultiwayUnnestWithPosition) { + VariableId arr1("arr1"), arr2("arr2"), p("p"); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto array_expr1, + ConstExpr::Create(Array({Int64(1), Int64(2), Int64(3)}, kIgnoresOrder))); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto array_expr2, ConstExpr::Create(Array( + {String("hello"), String("world")}, + kPreservesOrder))); + std::vector> arrays; + arrays.push_back(std::move(array_expr1)); + arrays.push_back(std::move(array_expr2)); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto mode_expr, ConstExpr::Create(Value::Enum( + types::ArrayZipModeEnumType(), + functions::ArrayZipEnums::PAD))); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto scan_op, + ArrayScanOp::Create(/*elements=*/{arr1, arr2}, /*position=*/p, + /*arrays=*/std::move(arrays), + /*zip_mode_expr=*/std::move(mode_expr))); + + EvaluationContext context((EvaluationOptions())); + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr iter, + scan_op->CreateIterator(EmptyParams(), /*num_extra_slots=*/0, &context)); + EXPECT_TRUE(iter->PreservesOrder()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector data, + ReadFromTupleIterator(iter.get())); + ASSERT_EQ(data.size(), 3); + EXPECT_EQ(Tuple(&iter->Schema(), &data[0]).DebugString(), + ""); + EXPECT_EQ(Tuple(&iter->Schema(), &data[1]).DebugString(), + ""); + EXPECT_EQ(Tuple(&iter->Schema(), &data[2]).DebugString(), + ""); + // Check for no extra slots. + EXPECT_EQ(data[0].num_slots(), 3); + EXPECT_EQ(data[1].num_slots(), 3); + EXPECT_EQ(data[2].num_slots(), 3); + // Unordered arrays with position variables are non-deterministic if there is + // more than one element. + EXPECT_FALSE(context.IsDeterministicOutput()); + + // Check that scrambling works. + EvaluationContext scramble_context(GetScramblingEvaluationOptions()); + ZETASQL_ASSERT_OK_AND_ASSIGN( + iter, scan_op->CreateIterator(EmptyParams(), /*num_extra_slots=*/0, + &scramble_context)); + EXPECT_FALSE(iter->PreservesOrder()); + ZETASQL_ASSERT_OK_AND_ASSIGN(data, ReadFromTupleIterator(iter.get())); + ASSERT_EQ(data.size(), 3); + // Don't look at 'data' in more detail because it is scrambled. + EXPECT_FALSE(context.IsDeterministicOutput()); +} + TEST_F(CreateIteratorTest, ScanArrayOfStructs) { VariableId x("x"), v1("v1"), v2("v2"); ZETASQL_ASSERT_OK_AND_ASSIGN( auto array_expr, ConstExpr::Create(Array({Struct({"foo", "bar"}, {Int64(1), Int64(2)})}))); - ZETASQL_ASSERT_OK_AND_ASSIGN(auto scan_op, - ArrayScanOp::Create(x, VariableId(), {{v1, 0}, {v2, 1}}, - std::move(array_expr))); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto scan_op, + ArrayScanOp::Create(/*element=*/x, /*position=*/VariableId(), + /*fields=*/{{v1, 0}, {v2, 1}}, + /*array=*/std::move(array_expr))); EXPECT_EQ( "ArrayScanOp(\n" "+-$x := element,\n" @@ -3384,16 +3658,16 @@ TEST_F(CreateIteratorTest, ScanArrayOfStructs) { TEST_F(CreateIteratorTest, TableScanAsArray) { VariableId v1("v1"), v2("v2"); - Value table = Array({ - Struct({"a", "b"}, {Int64(1), Int64(2)}), - Struct({"a", "b"}, {Int64(3), Int64(4)})}); + Value table = Array({Struct({"a", "b"}, {Int64(1), Int64(2)}), + Struct({"a", "b"}, {Int64(3), Int64(4)})}); ZETASQL_ASSERT_OK_AND_ASSIGN( auto table_as_array_expr, TableAsArrayExpr::Create("mytable", table.type()->AsArray())); ZETASQL_ASSERT_OK_AND_ASSIGN( auto scan_op, - ArrayScanOp::Create(VariableId(), VariableId(), {{v1, 0}, {v2, 1}}, - std::move(table_as_array_expr))); + ArrayScanOp::Create(/*element=*/VariableId(), /*position=*/VariableId(), + /*fields=*/{{v1, 0}, {v2, 1}}, + /*array=*/std::move(table_as_array_expr))); EXPECT_EQ( "ArrayScanOp(\n" "+-$v1 := field[0]:a,\n" diff --git a/zetasql/reference_impl/rewrite_flags.cc b/zetasql/reference_impl/rewrite_flags.cc index 672eb4f22..befdeb1f0 100644 --- a/zetasql/reference_impl/rewrite_flags.cc +++ b/zetasql/reference_impl/rewrite_flags.cc @@ -45,6 +45,8 @@ absl::btree_set MinimalRewritesForReference() { // clang-format off // (broken link) start REWRITE_INLINE_SQL_TVFS, + // TODO: Remove this after resolving memory leak in direct UDA eval. + REWRITE_INLINE_SQL_UDAS, REWRITE_INLINE_SQL_VIEWS, // (broken link) end // clang-format on diff --git a/zetasql/reference_impl/tuple.cc b/zetasql/reference_impl/tuple.cc index 599e7c308..858ef5ff5 100644 --- a/zetasql/reference_impl/tuple.cc +++ b/zetasql/reference_impl/tuple.cc @@ -31,6 +31,7 @@ #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" namespace zetasql { @@ -117,7 +118,7 @@ std::string Tuple::DebugString(bool verbose) const { ">"); } -Tuple ConcatTuples(const std::vector& tuples, +Tuple ConcatTuples(absl::Span tuples, std::unique_ptr* new_schema, std::unique_ptr* new_data) { std::vector vars; diff --git a/zetasql/reference_impl/tuple.h b/zetasql/reference_impl/tuple.h index a3239ffcd..77642cdcc 100644 --- a/zetasql/reference_impl/tuple.h +++ b/zetasql/reference_impl/tuple.h @@ -40,6 +40,7 @@ #include "zetasql/reference_impl/variable_id.h" #include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" +#include "zetasql/base/check.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" @@ -489,6 +490,13 @@ class TupleData { void AddSlots(int num_slots) { slots_.resize(slots_.size() + num_slots); } + // Removes the slot at 'slot_index' from this TupleData. The complexity is + // O(n), prefer to use more efficient ways when possible. + void RemoveSlotAt(int slot_index) { + ABSL_DCHECK_LT(slot_index, num_slots()); + slots_.erase(slots_.begin() + slot_index); + } + void RemoveSlots(int num_slots) { slots_.resize(slots_.size() - num_slots); } int num_slots() const { return slots_.size(); } @@ -598,7 +606,7 @@ std::vector ConcatSpans(absl::Span span1, // concatenation of the corresponding TupleData. If a TupleData has more slots // than the associated TupleSchema, the extra values are dropped. Returns the // corresponding Tuple. -Tuple ConcatTuples(const std::vector& tuples, +Tuple ConcatTuples(absl::Span tuples, std::unique_ptr* new_schema, std::unique_ptr* new_data); diff --git a/zetasql/reference_impl/tuple_comparator.cc b/zetasql/reference_impl/tuple_comparator.cc index 4979fbea1..b498ccddb 100644 --- a/zetasql/reference_impl/tuple_comparator.cc +++ b/zetasql/reference_impl/tuple_comparator.cc @@ -32,6 +32,7 @@ #include "zetasql/reference_impl/common.h" #include "zetasql/reference_impl/operator.h" #include "zetasql/reference_impl/tuple.h" +#include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" @@ -101,11 +102,20 @@ static absl::Status GetZetaSqlCollators( absl::StatusOr> TupleComparator::Create( absl::Span keys, absl::Span slots_for_keys, absl::Span params, EvaluationContext* context) { + return Create(keys, slots_for_keys, /*extra_sort_key_slots=*/{}, params, + context); +} + +absl::StatusOr> TupleComparator::Create( + absl::Span keys, absl::Span slots_for_keys, + absl::Span extra_sort_key_slots, + absl::Span params, EvaluationContext* context) { std::shared_ptr collators = std::make_shared(CollatorList()); ZETASQL_RETURN_IF_ERROR( GetZetaSqlCollators(keys, params, context, collators.get())); - return absl::WrapUnique(new TupleComparator(keys, slots_for_keys, collators)); + return absl::WrapUnique(new TupleComparator(keys, slots_for_keys, + extra_sort_key_slots, collators)); } bool TupleComparator::operator()(const TupleData& t1, @@ -159,6 +169,25 @@ bool TupleComparator::operator()(const TupleData& t1, } } } + + // Sort by extra sort keys. + for (int i = 0; i < extra_sort_key_slots_.size(); ++i) { + const int slot_idx = extra_sort_key_slots_[i]; + const Value& v1 = t1.slot(slot_idx).value(); + const Value& v2 = t2.slot(slot_idx).value(); + + if (v1.is_null() || v2.is_null()) { + if (v1.is_null() && v2.is_null()) { // NULLs are considered equal + continue; + } + // NULLS FIRST is the default behavior + return !v2.is_null(); + } + // ASC by default. + if (!v1.Equals(v2)) { + return v1.LessThan(v2); + } + } // The keys are equal. return false; } @@ -242,7 +271,8 @@ bool TupleComparator::InvolvesUncertainArrayComparisons( TupleComparator prefix_comparator( absl::MakeSpan(keys_).subspan(0, safe_slot_count), - absl::MakeSpan(slots_for_keys_).subspan(0, safe_slot_count), collators_); + absl::MakeSpan(slots_for_keys_).subspan(0, safe_slot_count), + /*extra_sort_key_slots=*/{}, collators_); for (int i = 1; i < tuples.size(); ++i) { const TupleData* a = tuples[i - 1]; const TupleData* b = tuples[i]; diff --git a/zetasql/reference_impl/tuple_comparator.h b/zetasql/reference_impl/tuple_comparator.h index 8112d8f33..69558c9ff 100644 --- a/zetasql/reference_impl/tuple_comparator.h +++ b/zetasql/reference_impl/tuple_comparator.h @@ -44,6 +44,13 @@ class TupleComparator { absl::Span slots_for_keys, absl::Span params, EvaluationContext* context); + // Same with above, with an additional extra_sort_key_slots parameters. + static absl::StatusOr> Create( + absl::Span keys, + absl::Span slots_for_keys, + absl::Span extra_sort_key_slots, + absl::Span params, EvaluationContext* context); + // Returns true if t1 is less than t2. bool operator()(const TupleData& t1, const TupleData& t2) const; @@ -78,13 +85,22 @@ class TupleComparator { private: TupleComparator(absl::Span keys, absl::Span slots_for_keys, + absl::Span extra_sort_key_slots, std::shared_ptr collators) : keys_(keys.begin(), keys.end()), slots_for_keys_(slots_for_keys.begin(), slots_for_keys.end()), + extra_sort_key_slots_(extra_sort_key_slots.begin(), + extra_sort_key_slots.end()), collators_(collators) {} const std::vector keys_; const std::vector slots_for_keys_; + + // This indicates the extra keys being used during sorting, usually it doesn't + // have a key expression but just have a slot number. The sort specification + // for extra sort-keys is ASC, and null-first by default. The tuples will be + // sorted by regular keys at first, then extra keys. + const std::vector extra_sort_key_slots_; // indicates the COLLATE specific rules to compare strings for // each sort key in . This corresponds 1-1 with keys_. // NOTE: If any element of is nullptr, then the strings are diff --git a/zetasql/reference_impl/tuple_test_util.h b/zetasql/reference_impl/tuple_test_util.h index f9a091ffb..de605de78 100644 --- a/zetasql/reference_impl/tuple_test_util.h +++ b/zetasql/reference_impl/tuple_test_util.h @@ -30,6 +30,7 @@ #include "gtest/gtest.h" #include "absl/status/status.h" #include "absl/status/statusor.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status.h" #include "zetasql/base/status_macros.h" @@ -121,7 +122,7 @@ inline absl::StatusOr> ReadFromTupleIterator( // Returns a TupleData corresponding to 'values' where all slots have trivial // SharedProtoStates, which are also added to 'shared_states' if it is non-NULL. inline TupleData CreateTestTupleData( - const std::vector& values, + absl::Span values, std::vector* shared_states = nullptr) { TupleData data(values.size()); if (shared_states != nullptr) { @@ -140,7 +141,7 @@ inline TupleData CreateTestTupleData( // Returns a std::vector corresponding to 'values' where all slots // have empty maps, which are also added to 'shared_states' if it is non-NULL inline std::vector CreateTestTupleDatas( - const std::vector>& values, + absl::Span> values, std::vector>* shared_states = nullptr) { std::vector datas; diff --git a/zetasql/reference_impl/type_helpers.cc b/zetasql/reference_impl/type_helpers.cc index a2f6122ed..9344feba4 100644 --- a/zetasql/reference_impl/type_helpers.cc +++ b/zetasql/reference_impl/type_helpers.cc @@ -24,6 +24,7 @@ #include "zetasql/public/type.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -103,7 +104,7 @@ absl::StatusOr CreateTableArrayType( absl::StatusOr CreatePrimaryKeyType( const ResolvedColumnList& table_columns, - const std::vector& key_column_indexes, TypeFactory* type_factory) { + absl::Span key_column_indexes, TypeFactory* type_factory) { std::vector fields; fields.reserve(key_column_indexes.size()); for (int index : key_column_indexes) { diff --git a/zetasql/reference_impl/type_helpers.h b/zetasql/reference_impl/type_helpers.h index 7b06131cc..be78f0b3a 100644 --- a/zetasql/reference_impl/type_helpers.h +++ b/zetasql/reference_impl/type_helpers.h @@ -21,6 +21,7 @@ #include "zetasql/public/type.h" #include "zetasql/resolved_ast/resolved_column.h" #include "absl/status/statusor.h" +#include "absl/types/span.h" #ifndef ZETASQL_REFERENCE_IMPL_TYPE_HELPERS_H_ #define ZETASQL_REFERENCE_IMPL_TYPE_HELPERS_H_ @@ -51,7 +52,7 @@ extern const char* kDMLOutputReturningColumnName; // table_columns. absl::StatusOr CreatePrimaryKeyType( const ResolvedColumnList& table_columns, - const std::vector& key_column_indexes, TypeFactory* type_factory); + absl::Span key_column_indexes, TypeFactory* type_factory); // Creates the DML output struct type corresponding to a DML statement on a // table whose corresponding array type is 'table_array_type'. diff --git a/zetasql/reference_impl/uda_evaluation.cc b/zetasql/reference_impl/uda_evaluation.cc index 8f462a1c1..113c01b43 100644 --- a/zetasql/reference_impl/uda_evaluation.cc +++ b/zetasql/reference_impl/uda_evaluation.cc @@ -53,6 +53,10 @@ class UserDefinedAggregateFunctionEvaluator argument_is_aggregate_(std::move(argument_is_aggregate)) {} ~UserDefinedAggregateFunctionEvaluator() override = default; + void SetEvaluationContext(EvaluationContext* context) override { + eval_context_ = context; + } + absl::Status Reset() override { memory_accountant_ = std::make_unique( EvaluationOptions().max_intermediate_byte_size); @@ -77,8 +81,14 @@ class UserDefinedAggregateFunctionEvaluator } absl::StatusOr GetFinalResult() override { - auto context = std::make_unique(EvaluationOptions()); - context->set_active_group_rows(inputs_.get()); + ZETASQL_RET_CHECK(eval_context_ != nullptr) + << "UserDefinedAggregateFunctionEvaluator must have EvaluationContext " + << "set before calling GetFinalResult()."; + // Create a local context to evaluate the UDA function body on the + // accumulated rows. + std::unique_ptr local_context = + eval_context_->MakeChildContext(); + local_context->set_active_group_rows(inputs_.get()); if (!inputs_->IsEmpty()) { auto first_row = inputs_->GetTuplePtrs()[0]; @@ -86,9 +96,9 @@ class UserDefinedAggregateFunctionEvaluator // NOT_AGGREGATE arguments should be mapped by value since they are // represented by a FunctionArgumentRefExpr. These arguments have a // constant value for each grouped rows, so we can just add them once. - if (!context->HasFunctionArgumentRef(argument_names_[i]) && + if (!local_context->HasFunctionArgumentRef(argument_names_[i]) && !argument_is_aggregate_[i]) { - ZETASQL_RETURN_IF_ERROR(context->AddFunctionArgumentRef( + ZETASQL_RETURN_IF_ERROR(local_context->AddFunctionArgumentRef( argument_names_[i], first_row->slot(i).value())); } } @@ -99,7 +109,7 @@ class UserDefinedAggregateFunctionEvaluator ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr iter, algebrized_tree_->Eval(/*params=*/{}, - /*num_extra_slots=*/0, context.get())); + /*num_extra_slots=*/0, local_context.get())); Value result; while (true) { const TupleData* next_input = iter->Next(); @@ -121,6 +131,7 @@ class UserDefinedAggregateFunctionEvaluator std::vector argument_is_aggregate_; std::unique_ptr memory_accountant_; std::unique_ptr inputs_; + EvaluationContext* eval_context_; }; std::unique_ptr diff --git a/zetasql/reference_impl/value_expr.cc b/zetasql/reference_impl/value_expr.cc index 167f1e331..47867790a 100644 --- a/zetasql/reference_impl/value_expr.cc +++ b/zetasql/reference_impl/value_expr.cc @@ -18,21 +18,17 @@ #include #include +#include #include -#include -#include #include #include -#include #include -#include -#include -#include #include #include #include "zetasql/base/logging.h" #include "zetasql/common/internal_value.h" +#include "zetasql/common/thread_stack.h" #include "zetasql/public/catalog.h" #include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" @@ -46,6 +42,7 @@ #include "zetasql/reference_impl/evaluation.h" #include "zetasql/reference_impl/operator.h" #include "zetasql/reference_impl/tuple.h" +#include "zetasql/reference_impl/variable_generator.h" #include "zetasql/reference_impl/variable_id.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/resolved_ast/resolved_ast_enums.pb.h" @@ -57,6 +54,7 @@ #include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "absl/container/node_hash_map.h" +#include "zetasql/base/check.h" #include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" @@ -66,12 +64,12 @@ #include "absl/strings/str_split.h" #include "absl/strings/string_view.h" #include "absl/strings/strip.h" +#include "absl/time/time.h" #include "absl/types/optional.h" #include "absl/types/span.h" +#include "google/protobuf/dynamic_message.h" #include "zetasql/base/map_util.h" -#include "zetasql/base/source_location.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" @@ -1075,8 +1073,10 @@ bool ScalarFunctionCallExpr::Eval(absl::Span params, std::shared_ptr arg_shared_state; VirtualTupleSlot arg_result(&call_args.emplace_back(), &arg_shared_state); if (!args[i]->value_expr()->Eval(params, context, &arg_result, status)) { + ABSL_DCHECK(!status->ok()); return false; } + ZETASQL_DCHECK_OK(*status); } } @@ -1087,8 +1087,10 @@ bool ScalarFunctionCallExpr::Eval(absl::Span params, result->SetValue(Value::Null(output_type())); return true; } + ABSL_DCHECK(!status->ok()); return false; } + ZETASQL_DCHECK_OK(*status); result->MaybeResetSharedProtoState(); return true; } @@ -1289,6 +1291,8 @@ absl::StatusOr> IfExpr::Create( absl::Status IfExpr::SetSchemasForEvaluation( absl::Span params_schemas) { + ZETASQL_RETURN_IF_NOT_ENOUGH_STACK( + "Out of stack space due to deeply nested if expression"); ZETASQL_RETURN_IF_ERROR(mutable_join_expr()->SetSchemasForEvaluation(params_schemas)); ZETASQL_RETURN_IF_ERROR( mutable_true_value()->SetSchemasForEvaluation(params_schemas)); @@ -2675,7 +2679,7 @@ absl::StatusOr DMLUpdateValueExpr::UpdateNode::GetNewProtoValue( // set to unordered values (via new_field_value). Verification assumes // that the repeated field value is ordered leading to false negatives. If // new_field_value contains an unordered array value for a repeated field, - // result from ZetaSQL reference driver is marked non-determinstic and + // result from ZetaSQL reference driver is marked non-deterministic and // is ignored. // TODO : Fix the ordering issue in Proto repeated field, // after which below safeguard can be removed. @@ -3549,6 +3553,13 @@ absl::StatusOr DMLInsertValueExpr::Eval( ZETASQL_RETURN_IF_ERROR(resolved_node_->CheckFieldsAccessed()); + if (!context->options().return_all_rows_for_dml && + stmt()->insert_mode() == ResolvedInsertStmt::OR_IGNORE) { + // INSERT IGNORE inserts only new rows, others are ignored. + // Number of rows modified must match the number of rows actually inserted. + ZETASQL_RET_CHECK_EQ(num_rows_modified, rows_to_insert.size()); + } + return context->options().return_all_rows_for_dml ? GetDMLOutputValue(num_rows_modified, row_map, dml_returning_rows, context) @@ -3785,7 +3796,7 @@ absl::Status DMLInsertValueExpr::PopulateRowsInOriginalTable( absl::StatusOr DMLInsertValueExpr::InsertRows( const InsertColumnMap& insert_column_map, - const std::vector>& rows_to_insert, + std::vector>& rows_to_insert, std::vector>& dml_returning_rows, EvaluationContext* context, PrimaryKeyRowMap* row_map) const { absl::flat_hash_map modified_primary_keys; @@ -3906,10 +3917,16 @@ absl::StatusOr DMLInsertValueExpr::InsertRows( } } - if (!rows_ignored_indexes.empty() && stmt()->returning() != nullptr) { - // Needs to skip these rows in the dml returning output + if (!rows_ignored_indexes.empty()) { for (int64_t i = rows_ignored_indexes.size() - 1; i >= 0; --i) { - dml_returning_rows.erase(dml_returning_rows.begin() + i); + // Needs to skip these rows in the dml returning output. + if (stmt()->returning() != nullptr) { + dml_returning_rows.erase(dml_returning_rows.begin() + i); + } + // Erase the rows that were not inserted because they already exist. + if (stmt()->insert_mode() == ResolvedInsertStmt::OR_IGNORE) { + rows_to_insert.erase(rows_to_insert.begin() + rows_ignored_indexes[i]); + } } } diff --git a/zetasql/reference_impl/value_expr_test.cc b/zetasql/reference_impl/value_expr_test.cc index 365ca7b24..49b9bf896 100644 --- a/zetasql/reference_impl/value_expr_test.cc +++ b/zetasql/reference_impl/value_expr_test.cc @@ -55,6 +55,7 @@ #include "zetasql/reference_impl/variable_generator.h" #include "zetasql/reference_impl/variable_id.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/resolved_ast/resolved_ast_builder.h" #include "zetasql/resolved_ast/resolved_column.h" #include "zetasql/resolved_ast/resolved_node_kind.h" #include "zetasql/resolved_ast/resolved_node_kind.pb.h" @@ -65,6 +66,7 @@ #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/container/flat_hash_map.h" +#include "absl/flags/flag.h" #include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" @@ -151,7 +153,7 @@ struct NaryFunctionTemplate { NaryFunctionTemplate(FunctionKind kind, const std::vector& arguments, const ValueConstructor& result, - const std::string& error_message) + absl::string_view error_message) : kind(kind), params(arguments, result, error_message) {} NaryFunctionTemplate(FunctionKind kind, const std::vector& arguments, @@ -189,7 +191,7 @@ std::ostream& operator<<(std::ostream& out, const NaryFunctionTemplate& t) { } std::vector GetFunctionTemplates( - FunctionKind kind, const std::vector& tests) { + FunctionKind kind, absl::Span tests) { std::vector templates; for (const auto& t : tests) { templates.emplace_back(kind, t); @@ -202,7 +204,7 @@ std::vector GetFunctionTemplates( // because the cast implementation would cast any null input to any output type // (it relies on the function signatures to prevent these casts in SQL queries). std::vector NonNullArguments( - const std::vector& tests) { + absl::Span tests) { std::vector result; for (const auto& t : tests) { bool nulls_only = true; @@ -914,10 +916,63 @@ class DMLValueExprEvalTest : public EvalTest { const Table* table() { return &table_; } - const Function* function(const std::string& name) { + const Function* function(absl::string_view name) { return functions_[name].get(); } + absl::StatusOr> + InsertStmtToInsertExprConverter(const ResolvedInsertStmt* stmt, + const ArrayType* table_array_type) { + ZETASQL_ASSIGN_OR_RETURN( + const StructType* primary_key_type, + CreatePrimaryKeyType(stmt->table_scan()->column_list(), + stmt->table_scan()->table()->PrimaryKey().value(), + type_factory())); + ZETASQL_ASSIGN_OR_RETURN(const StructType* dml_output_type, + CreateDMLOutputType(table_array_type, type_factory())); + // Create a ColumnToVariableMapping. + auto column_to_variable_mapping = std::make_unique( + std::make_unique()); + + // Build a ResolvedScanMap and a ResolvedExprMap from the AST. + auto resolved_scan_map = std::make_unique(); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr table_as_array_expr, + TableAsArrayExpr::Create(stmt->table_scan()->table()->Name(), + table_array_type)); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr relation_op, + ArrayScanOp::Create( + /*element=*/VariableId(), + /*position=*/VariableId(), + {{column_to_variable_mapping->GetVariableNameFromColumn( + stmt->table_scan()->column_list()[0]), + 0}, + {column_to_variable_mapping->GetVariableNameFromColumn( + stmt->table_scan()->column_list()[1]), + 1}}, + std::move(table_as_array_expr))); + (*resolved_scan_map)[stmt->table_scan()] = std::move(relation_op); + auto resolved_expr_map = std::make_unique(); + for (const auto& value : stmt->row_list(0)->value_list()) { + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr const_expr, + ConstExpr::Create(value->value()->GetAs()->value())); + (*resolved_expr_map)[value->value()] = std::move(const_expr); + } + + auto column_expr_map = std::make_unique(); + + // Create the DMLInsertValueExpr to be tested. + return DMLInsertValueExpr::Create( + stmt->table_scan()->table(), table_array_type, + /*returning_array_type=*/nullptr, primary_key_type, dml_output_type, + stmt, &stmt->table_scan()->column_list(), + /*returning_column_values=*/nullptr, + std::move(column_to_variable_mapping), std::move(resolved_scan_map), + std::move(resolved_expr_map), std::move(column_expr_map)); + } + private: SimpleTable table_{"test_table", {{"int_val", Int64Type()}, {"str_val", StringType()}}}; @@ -959,57 +1014,10 @@ TEST_F(DMLValueExprEvalTest, DMLInsertValueExpr) { CreateTableArrayType(stmt->table_scan()->column_list(), stmt->table_scan()->table()->IsValueTable(), type_factory())); - ZETASQL_ASSERT_OK_AND_ASSIGN( - const StructType* primary_key_type, - CreatePrimaryKeyType(stmt->table_scan()->column_list(), - stmt->table_scan()->table()->PrimaryKey().value(), - type_factory())); - ZETASQL_ASSERT_OK_AND_ASSIGN(const StructType* dml_output_type, - CreateDMLOutputType(table_array_type, type_factory())); - - // Create a ColumnToVariableMapping. - auto column_to_variable_mapping = std::make_unique( - std::make_unique()); - - // Build a ResolvedScanMap and a ResolvedExprMap from the AST. - auto resolved_scan_map = std::make_unique(); - ZETASQL_ASSERT_OK_AND_ASSIGN( - std::unique_ptr table_as_array_expr, - TableAsArrayExpr::Create(stmt->table_scan()->table()->Name(), - table_array_type)); - ZETASQL_ASSERT_OK_AND_ASSIGN( - std::unique_ptr relation_op, - ArrayScanOp::Create( - /*element=*/VariableId(), - /*position=*/VariableId(), - {std::make_pair(column_to_variable_mapping->GetVariableNameFromColumn( - stmt->table_scan()->column_list()[0]), - 0), - std::make_pair(column_to_variable_mapping->GetVariableNameFromColumn( - stmt->table_scan()->column_list()[1]), - 1)}, - std::move(table_as_array_expr))); - (*resolved_scan_map)[stmt->table_scan()] = std::move(relation_op); - auto resolved_expr_map = std::make_unique(); - for (const auto& value : stmt->row_list(0)->value_list()) { - ZETASQL_ASSERT_OK_AND_ASSIGN( - std::unique_ptr const_expr, - ConstExpr::Create(value->value()->GetAs()->value())); - (*resolved_expr_map)[value->value()] = std::move(const_expr); - } - - auto column_expr_map = std::make_unique(); - // Create the DMLInsertValueExpr to be tested. ZETASQL_ASSERT_OK_AND_ASSIGN( std::unique_ptr expr, - DMLInsertValueExpr::Create( - stmt->table_scan()->table(), table_array_type, - /*returning_array_type=*/nullptr, primary_key_type, dml_output_type, - stmt.get(), &stmt->table_scan()->column_list(), - /*returning_column_values=*/nullptr, - std::move(column_to_variable_mapping), std::move(resolved_scan_map), - std::move(resolved_expr_map), std::move(column_expr_map))); + InsertStmtToInsertExprConverter(stmt.get(), table_array_type)); // Evaluate and check. TupleSlot result; @@ -1038,6 +1046,69 @@ TEST_F(DMLValueExprEvalTest, DMLInsertValueExpr) { &Value::fields, ElementsAre(Int64(3), String("three"))))); } +TEST_F(DMLValueExprEvalTest, DMLInsertOrIgnoreValueExpr) { + const ResolvedColumnList& column_list = { + ResolvedColumn{1, IdString::MakeGlobal("test_table"), + IdString::MakeGlobal("int_val"), Int64Type()}, + ResolvedColumn{2, IdString::MakeGlobal("test_table"), + zetasql::IdString::MakeGlobal("str_val"), StringType()}}; + auto table_scan = MakeResolvedTableScan(column_list, table(), + /*for_system_time_expr=*/nullptr); + + // Build a resolved AST for inserting a row with PK (1), that already exists. + std::vector> row_values; + row_values.push_back( + MakeResolvedDMLValue(MakeResolvedLiteral(Int64Type(), Int64(1)))); + row_values.push_back(MakeResolvedDMLValue( + MakeResolvedLiteral(StringType(), String("one again")))); + std::vector> row_list; + row_list.push_back(MakeResolvedInsertRow(std::move(row_values))); + + ZETASQL_ASSERT_OK_AND_ASSIGN(auto stmt, + ResolvedInsertStmtBuilder() + .set_table_scan(std::move(table_scan)) + .set_insert_mode(ResolvedInsertStmt::OR_IGNORE) + .set_row_list(std::move(row_list)) + .set_insert_column_list(column_list) + .Build()); + + // Create output types. + ZETASQL_ASSERT_OK_AND_ASSIGN( + const ArrayType* table_array_type, + CreateTableArrayType(stmt->table_scan()->column_list(), + stmt->table_scan()->table()->IsValueTable(), + type_factory())); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr, + InsertStmtToInsertExprConverter(stmt.get(), table_array_type)); + + // Evaluate and check. + TupleSlot result; + absl::Status status; + // the case for return_all_rows_for_dml = true is covered by the reference + // implementation compliance tests + EvaluationOptions options{}; + options.return_all_rows_for_dml = false; + EvaluationContext context{options}; + ZETASQL_ASSERT_OK(context.AddTableAsArray( + "test_table", /*is_value_table=*/false, + Value::Array(table_array_type, + {Value::Struct(table_array_type->element_type()->AsStruct(), + {Int64(1), String("one")}), + Value::Struct(table_array_type->element_type()->AsStruct(), + {Int64(2), NullString()}), + Value::Struct(table_array_type->element_type()->AsStruct(), + {Int64(4), NullString()})}), + LanguageOptions{})); + ZETASQL_ASSERT_OK(expr->SetSchemasForEvaluation({})); + EXPECT_TRUE(expr->EvalSimple({}, &context, &result, &status)); + ZETASQL_ASSERT_OK(status); + // No rows are returned + EXPECT_EQ(result.value().field(0).int64_value(), 0); + EXPECT_TRUE(result.value().field(1).empty()); +} + TEST_F(DMLValueExprEvalTest, DMLInsertValueExprSetsPrimaryKeyValuesToNullWhenDisallowed) { // Build a resolved AST for inserting a new row (3, "three") into the table. @@ -1832,7 +1903,7 @@ class ProtoEvalTest : public ::testing::Test { // Reads 'field_name' of 'proto_value' using a GetProtoFieldExpr. absl::StatusOr GetProtoField(const Value& proto_value, - const std::string& field_name) { + absl::string_view field_name) { TupleSlot proto_slot; proto_slot.SetValue(proto_value); EvaluationContext context((EvaluationOptions())); @@ -1855,7 +1926,7 @@ class ProtoEvalTest : public ::testing::Test { // Checks presence of 'field_name' in 'proto_value' using a // GetProtoFieldExpr. Crashes on error. Value HasProtoFieldOrDie(const Value& proto_value, - const std::string& field_name) { + absl::string_view field_name) { TupleSlot proto_slot; proto_slot.SetValue(proto_value); EvaluationContext context((EvaluationOptions())); @@ -1869,7 +1940,7 @@ class ProtoEvalTest : public ::testing::Test { // 'field_name'. Invokes it on 'proto_slot' by passing 'proto_slot' in // parameter 'p'. absl::StatusOr EvalGetProtoFieldExpr(const TupleSlot& proto_slot, - const std::string& field_name, + absl::string_view field_name, bool get_has_bit, EvaluationContext* context) { const google::protobuf::FieldDescriptor* field_descr = @@ -2009,9 +2080,8 @@ class ProtoEvalTest : public ::testing::Test { return absl::OkStatus(); } - absl::Status MakeProto( - const std::vector>& fields, - google::protobuf::Message* out) { + absl::Status MakeProto(absl::Span> fields, + google::protobuf::Message* out) { std::vector field_and_formats; std::vector values; std::vector> arguments; diff --git a/zetasql/resolved_ast/BUILD b/zetasql/resolved_ast/BUILD index 199cfc50b..82670747b 100644 --- a/zetasql/resolved_ast/BUILD +++ b/zetasql/resolved_ast/BUILD @@ -298,6 +298,7 @@ cc_library( "@com_google_absl//absl/memory", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", "@com_google_protobuf//:protobuf", ], ) @@ -611,6 +612,7 @@ cc_library( "//zetasql/base:status", "//zetasql/public:analyzer_options", "//zetasql/public:builtin_function_cc_proto", + "//zetasql/public:coercer", "//zetasql/public:function", "//zetasql/public/annotation:collation", "//zetasql/public/types", @@ -618,6 +620,7 @@ cc_library( "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -627,6 +630,7 @@ cc_test( srcs = ["rewrite_utils_test.cc"], deps = [ ":resolved_ast", + ":resolved_ast_builder", ":rewrite_utils", ":test_utils", "//zetasql/base/testing:status_matchers", @@ -634,9 +638,13 @@ cc_test( "//zetasql/public:analyzer", "//zetasql/public:analyzer_options", "//zetasql/public:analyzer_output", + "//zetasql/public:builtin_function_options", "//zetasql/public:id_string", + "//zetasql/public:numeric_value", "//zetasql/public:simple_catalog", + "//zetasql/public:value", "//zetasql/public/types", + "@com_google_absl//absl/status", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", ], @@ -656,10 +664,14 @@ cc_test( "//zetasql/base:status", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", + "//zetasql/public:analyzer_options", + "//zetasql/public:function_cc_proto", + "//zetasql/public:function_headers", "//zetasql/public:id_string", "//zetasql/public:language_options", "//zetasql/public:options_cc_proto", "//zetasql/public:simple_catalog", + "//zetasql/public:value", "//zetasql/public/types", "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", diff --git a/zetasql/resolved_ast/gen_resolved_ast.py b/zetasql/resolved_ast/gen_resolved_ast.py index 7c0276b67..5cdf7d9a8 100644 --- a/zetasql/resolved_ast/gen_resolved_ast.py +++ b/zetasql/resolved_ast/gen_resolved_ast.py @@ -296,7 +296,8 @@ def Field(name, to_string_method=None, java_to_string_method=None, comment=None, - propagate_order=False): + propagate_order=False, + override_virtual_getter=False): """Make a field to put in a node class. Args: @@ -326,6 +327,10 @@ def Field(name, propagate_order: If true, this field and the parent node must both be ResolvedScan subclasses, and the is_ordered property of the field will be propagated to the containing scan. + override_virtual_getter: If true, the getter is marked with `override`. + Used for cases where the parent class declares a + virtual method interface for this getter, so this + getter needs an 'override'. Returns: The newly created field. @@ -381,7 +386,10 @@ def Field(name, not_serialize_if_default = ctype.not_serialize_if_default if ctype.passed_by_reference: - setter_arg_type = 'const %s&' % member_type + if ctype.ctype == SCALAR_STRING.ctype: + setter_arg_type = 'absl::string_view' + else: + setter_arg_type = 'const %s&' % member_type getter_return_type = 'const %s&' % member_type else: setter_arg_type = member_type @@ -506,6 +514,7 @@ def Field(name, 'maybe_ptr_setter_arg_type': maybe_ptr_setter_arg_type, 'scoped_setter_arg_type': scoped_setter_arg_type, 'getter_return_type': getter_return_type, + 'override_virtual_getter': override_virtual_getter, 'release_return_type': release_return_type, 'is_vector': vector, 'element_getter_return_type': element_getter_return_type, @@ -544,24 +553,25 @@ def __init__(self): # is a list of the first-level children. self.root_child_nodes = [] - def AddNode(self, - name, - tag_id, - parent, - fields, - is_abstract=False, - extra_defs='', - extra_defs_node_only='', - emit_default_constructor=True, - use_custom_debug_string=False, - comment=None): + def AddNode( + self, + name, + tag_id, # Next tag_id: 257 + parent, + fields, + is_abstract=False, + extra_defs='', + extra_defs_node_only='', + emit_default_constructor=True, + use_custom_debug_string=False, + comment=None, + ): """Add a node class to be generated. Args: name: class name for this node tag_id: unique tag number for the node as a proto field or an enum value. - tag_id for each node type is hard coded and should never change. - Next tag_id: 248 + tag_id for each node type is hard coded and should never change. parent: class name of the parent node fields: list of fields in this class; created with Field function is_abstract: true if this node is an abstract class @@ -2378,9 +2388,8 @@ def main(unused_argv): tag_id=141, parent='ResolvedArgument', comment=""" - Represents a connection object as a TVF argument. - is the connection object encapsulated metadata to connect to - an external data source. + Represents a connection object, which encapsulates engine-specific + metadata used to connect to an external data source. """, fields=[ Field('connection', SCALAR_CONNECTION, tag_id=2), @@ -2502,15 +2511,29 @@ def main(unused_argv): 'join_type', SCALAR_JOIN_TYPE, tag_id=2, - ignorable=IGNORABLE_DEFAULT), + ignorable=IGNORABLE_DEFAULT, + ), Field('left_scan', 'ResolvedScan', tag_id=3), Field('right_scan', 'ResolvedScan', tag_id=4), Field( - 'join_expr', - 'ResolvedExpr', - tag_id=5, - ignorable=IGNORABLE_DEFAULT) - ]) + 'join_expr', 'ResolvedExpr', tag_id=5, ignorable=IGNORABLE_DEFAULT + ), + Field( + 'has_using', + SCALAR_BOOL, + is_optional_constructor_arg=True, + tag_id=6, + ignorable=IGNORABLE, + comment=""" + This indicates this join was generated from syntax with USING. + The sql_builder will use this field only as a suggestion. + JOIN USING(...) syntax will be used if and only if + `has_using` is True and `join_expr` has the correct shape. + Otherwise the sql_builder will generate JOIN ON. + """, + ), + ], + ) gen.AddNode( name='ResolvedArrayScan', @@ -2830,7 +2853,10 @@ def main(unused_argv): is_constructor_arg=False, ), Field( - 'aggregate_list', 'ResolvedComputedColumn', tag_id=4, vector=True + 'aggregate_list', + 'ResolvedComputedColumnBase', + tag_id=4, + vector=True, ), Field( 'grouping_set_list', @@ -2853,7 +2879,7 @@ def main(unused_argv): vector=True, ignorable=IGNORABLE_DEFAULT, is_optional_constructor_arg=True, - ) + ), ], ) @@ -3131,8 +3157,11 @@ def main(unused_argv): window ORDER BY. The output contains all columns from , - one column per analytic function. It may also conain partitioning/ordering + one column per analytic function. It may also contain partitioning/ordering expression columns if they reference to select columns. + + Currently, the analyzer combines equivalent OVER clauses into the same + ResolvedAnalyticFunctionGroup only for OVER () or a named window. """, fields=[ Field('input_scan', 'ResolvedScan', tag_id=2), @@ -3140,8 +3169,10 @@ def main(unused_argv): 'function_group_list', 'ResolvedAnalyticFunctionGroup', tag_id=3, - vector=True) - ]) + vector=True, + ), + ], + ) gen.AddNode( name='ResolvedSampleScan', @@ -3195,25 +3226,131 @@ def main(unused_argv): # Other nodes gen.AddNode( - name='ResolvedComputedColumn', - tag_id=32, + name='ResolvedComputedColumnBase', + tag_id=253, parent='ResolvedArgument', - use_custom_debug_string=True, + is_abstract=True, comment=""" This is used when an expression is computed and given a name (a new ResolvedColumn) that can be referenced elsewhere. The new ResolvedColumn can appear in a column_list or in ResolvedColumnRefs in other expressions, when appropriate. This node is not an expression itself - it is a container that holds an expression. + + There are 2 concrete subclasses: ResolvedComputedColumn and + ResolvedDeferredComputedColumn. + + ResolvedDeferredComputedColumn has extra information about deferring + side effects like errors. This can be used in cases like AggregateScans + before conditional expressions like IF(), where errors from the aggregate + function should only be exposed if the right IF branch is chosen. + + Nodes where deferred side effects are not possible (like GROUP BY + expressions) are declared as ResolvedComputedColumn directly. + + Nodes that might need to defer errors, such as AggregateScan's + aggregate_list(), are declared as ResolvedComputedColumnBase. + The runtime type will be either ResolvedComputedColumn or + ResolvedDeferredComputedColumn, depending on whether any side effects need + to be captured. + + If FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION is not set, the runtime + type is always just ResolvedComputedColumn. + + See (broken link) for more details. + """, + fields=[], + extra_defs=""" + // Virtual getter to avoid changing ResolvedComputedColumnProto + virtual const ResolvedColumn& column() const = 0; + + // Virtual getter to avoid changing ResolvedComputedColumnProto + virtual const ResolvedExpr* expr() const = 0; + """) + + gen.AddNode( + name='ResolvedComputedColumnImpl', + tag_id=254, + parent='ResolvedComputedColumnBase', + is_abstract=True, + comment=""" + An intermediate abstract superclass that holds common getters for + ResolvedComputedColumn and ResolvedDeferredComputedColumn. This class + exists to ensure that callers static_cast to the appropriate subclass, + rather than processing ResolvedComputedColumnBase directly. + """, + fields=[]) + + gen.AddNode( + name='ResolvedComputedColumn', + tag_id=32, + parent='ResolvedComputedColumnImpl', + use_custom_debug_string=True, + comment=""" + This is the usual ResolvedComputedColumn without deferred side effects. + See comments on ResolvedComputedColumnBase. """, fields=[ Field( 'column', SCALAR_RESOLVED_COLUMN, tag_id=2, - ignorable=IGNORABLE), - Field('expr', 'ResolvedExpr', tag_id=3) - ]) + ignorable=IGNORABLE, + override_virtual_getter=True, + ), + Field('expr', 'ResolvedExpr', tag_id=3, override_virtual_getter=True), + ], + ) + + gen.AddNode( + name='ResolvedDeferredComputedColumn', + tag_id=255, + parent='ResolvedComputedColumnImpl', + use_custom_debug_string=True, + comment=""" + This is a ResolvedColumnColumn variant that adds deferred side effect + capture. + + This is used for computations that get separated into multiple scans, + where side effects like errors in earlier scans need to be deferred + util conditional expressions in later scans are evalauted. + See (broken link) for details. + For example: + SELECT IF(C, SUM(A/B), -1) FROM T + The division A/B could produce an error when B is 0, but errors should not + be exposed when C is false, due to IF's conditional evaluation semantics. + + `side_effect_column` is a new column (of type BYTES) created at the same + time as `column`, storing side effects like errors from the computation. + This column will store an implementation-specific representation of the + side effect (e.g. util::StatusProto) and will get a NULL value if there + are no captured side effects. + + Typically, this column will be passed to a call to the internal function + $with_side_effect() later to expose the side effects. The validator checks + that it is consumed downstream. + """, + fields=[ + Field( + 'column', SCALAR_RESOLVED_COLUMN, + tag_id=2, + ignorable=IGNORABLE, + override_virtual_getter=True + ), + Field('expr', 'ResolvedExpr', tag_id=3, override_virtual_getter=True), + Field( + 'side_effect_column', + SCALAR_RESOLVED_COLUMN, + tag_id=4, + ignorable=NOT_IGNORABLE, + comment="""Creates the companion side effects columns for this + computation, of type BYTES. Instead of immediately exposing the + side effect (e.g. an error), the side effect is captured in the + side_effects_column. + """, + ), + ], + ) gen.AddNode( name='ResolvedOrderByItem', @@ -4166,17 +4303,41 @@ def main(unused_argv): ignorable=IGNORABLE_DEFAULT), ]) + gen.AddNode( + name='ResolvedCreateSchemaStmtBase', + tag_id=248, + parent='ResolvedCreateStatement', + is_abstract=True, + comment=""" + A base for statements that create schemas, such as: + CREATE [OR REPLACE] SCHEMA [IF NOT EXISTS] + [DEFAULT COLLATE ] + [OPTIONS (name=value, ...)] + + CREATE [OR REPLACE] [TEMP|TEMPORARY|PUBLIC|PRIVATE] EXTERNAL SCHEMA + [IF NOT EXISTS] WITH CONNECTION + OPTIONS (name=value, ...) + + contains engine-specific options associated with the schema + """, + fields=[ + Field( + 'option_list', + 'ResolvedOption', + tag_id=2, + vector=True, + ignorable=IGNORABLE_DEFAULT), + ]) + gen.AddNode( name='ResolvedCreateSchemaStmt', tag_id=157, - parent='ResolvedCreateStatement', + parent='ResolvedCreateSchemaStmtBase', comment=""" This statement: CREATE [OR REPLACE] SCHEMA [IF NOT EXISTS] [DEFAULT COLLATE ] [OPTIONS (name=value, ...)] - - engine-specific options. specifies the default collation specification for future tables created in the dataset. If a table is created in this dataset without specifying table-level default collation, it inherits the @@ -4194,12 +4355,30 @@ def main(unused_argv): 'ResolvedExpr', tag_id=3, ignorable=IGNORABLE_DEFAULT), + ]) + + gen.AddNode( + name='ResolvedCreateExternalSchemaStmt', + tag_id=249, + parent='ResolvedCreateSchemaStmtBase', + comment=""" + This statement: + CREATE [OR REPLACE] [TEMP|TEMPORARY|PUBLIC|PRIVATE] EXTERNAL SCHEMA + [IF NOT EXISTS] WITH CONNECTION + OPTIONS (name=value, ...) + + encapsulates engine-specific metadata used to connect + to an external data source + + Note: external schemas are pointers to schemas defined in an external + system. CREATE EXTERNAL SCHEMA does not actually build a new schema. + """, + fields=[ Field( - 'option_list', - 'ResolvedOption', + 'connection', + 'ResolvedConnection', tag_id=2, - vector=True, - ignorable=IGNORABLE_DEFAULT) + ignorable=IGNORABLE_DEFAULT), ]) gen.AddNode( @@ -4465,13 +4644,13 @@ def main(unused_argv): * Trained: * External: ! * Remote models = TRUE - * Trained: [Not supported yet] + * Trained: * External: ! has engine-specific directives for how to train this model. - is the AS SELECT statement. It can be only set when is - false and all of , - and are empty. + is the AS SELECT statement. It can be only set when all of + , and + are empty. TODO: consider rename to . matches 1:1 with the 's column_list and identifies the names and types of the columns output from the select @@ -4484,8 +4663,7 @@ def main(unused_argv): columns. Cannot be set if is true. Might be absent when is true, meaning schema is read from the remote model itself. - is true if this is a remote model. Cannot be set when - is true. + is true if this is a remote model. is the identifier path of the connection object. It can be only set when is true. is the list of ResolvedComputedColumn in TRANSFORM @@ -5166,6 +5344,54 @@ def main(unused_argv): """, fields=[]) + gen.AddNode( + name='ResolvedRecursionDepthModifier', + tag_id=256, + parent='ResolvedArgument', + comment=""" + This represents a recursion depth modifier to recursive CTE: + WITH DEPTH [ AS ] + [ BETWEEN AND ] + + and represents the range of iterations (both + side included) whose results are part of CTE's final output. + + lower_bound and upper_bound are two integer literals or + query parameters. Query parameter values must be checked at run-time by + ZetaSQL compliant backend systems. + - both lower/upper_bound must be non-negative; + - lower_bound is by default zero if unspecified; + - upper_bound is by default infinity if unspecified; + - lower_bound must be smaller or equal than upper_bound; + + is the column that represents the + recursion depth semantics: the iteration number that outputs this row; + it is part of ResolvedRecursiveScan's column list when specified, but + there is no corresponding column in the inputs of Recursive CTE. + + See (broken link):explicit-recursion-depth for details. + """, + fields=[ + Field( + 'lower_bound', + 'ResolvedExpr', + tag_id=2, + ignorable=IGNORABLE_DEFAULT, + ), + Field( + 'upper_bound', + 'ResolvedExpr', + tag_id=3, + ignorable=IGNORABLE_DEFAULT, + ), + Field( + 'recursion_depth_column', + 'ResolvedColumnHolder', + tag_id=4, + ), + ], + ) + gen.AddNode( name='ResolvedRecursiveScan', tag_id=148, @@ -5188,19 +5414,29 @@ def main(unused_argv): At runtime, a recursive scan is evaluated using an iterative process: - Step 1: Evaluate the non-recursive term. If UNION DISTINCT + Step 1 (iteration 0): Evaluate the non-recursive term. If UNION DISTINCT is specified, discard duplicates. - Step 2: + Step 2 (iteration k): Repeat until step 2 produces an empty result: Evaluate the recursive term, binding the recursive table to the - new rows produced by previous step. If UNION DISTINCT is specified, - discard duplicate rows, as well as any rows which match any - previously-produced result. + new rows produced by previous step (iteration k-1). + If UNION DISTINCT is specified, discard duplicate rows, as well as any + rows which match any previously-produced result. Step 3: The final content of the recursive table is the UNION ALL of all results - produced (step 1, plus all iterations of step 2). + produced [lower_bound, upper_bound] iterations specified in the + recursion depth modifier. (which are already DISTINCT because of step 2, + if the query had UNION DISTINCT). The final content is augmented by the + column specified in the recursion depth modifier (if specified) which + represents the iteration number that the row is output. + If UNION DISTINCT is specified, the depth column represents the first + iteration that produces a given row. + The depth column will be part of the output column list. + + When recursion_depth_modifier is unspecified, the lower bound is + effectively zero, the upper bound is infinite. ResolvedRecursiveScan only supports a recursive WITH entry which directly references itself; ZetaSQL does not support mutual recursion @@ -5212,7 +5448,15 @@ def main(unused_argv): Field('op_type', SCALAR_RECURSIVE_SET_OPERATION_TYPE, tag_id=2), Field('non_recursive_term', 'ResolvedSetOperationItem', tag_id=3), Field('recursive_term', 'ResolvedSetOperationItem', tag_id=4), - ]) + Field( + 'recursion_depth_modifier', + 'ResolvedRecursionDepthModifier', + is_optional_constructor_arg=True, + tag_id=5, + ignorable=IGNORABLE_DEFAULT, + ), + ], + ) gen.AddNode( name='ResolvedWithScan', @@ -6496,6 +6740,16 @@ def main(unused_argv): """, fields=[]) + gen.AddNode( + name='ResolvedAlterExternalSchemaStmt', + tag_id=250, + parent='ResolvedAlterObjectStmt', + comment=""" + This statement: + ALTER EXTERNAL SCHEMA [IF EXISTS] ; + """, + fields=[]) + gen.AddNode( name='ResolvedAlterModelStmt', tag_id=205, @@ -8976,7 +9230,8 @@ def main(unused_argv): comment=""" This statement: UNDROP [IF NOT EXISTS] - FOR SYSTEM_TIME AS OF []; + FOR SYSTEM_TIME AS OF [] + [OPTIONS (name=value, ...)]; is a string identifier for the entity to be undroped. Currently, only 'SCHEMA' object is supported. @@ -8989,6 +9244,8 @@ def main(unused_argv): specifies point in time from which entity is to be undropped. + + contains engine-specific options associated with the schema. """, fields=[ Field('schema_object_kind', SCALAR_STRING, tag_id=2), @@ -9005,6 +9262,13 @@ def main(unused_argv): tag_id=5, ignorable=IGNORABLE_DEFAULT, ), + Field( + 'option_list', + 'ResolvedOption', + tag_id=6, + vector=True, + ignorable=IGNORABLE_DEFAULT, + ), ], ) diff --git a/zetasql/resolved_ast/node_sources.h b/zetasql/resolved_ast/node_sources.h index 38d07aecf..48efd8e96 100644 --- a/zetasql/resolved_ast/node_sources.h +++ b/zetasql/resolved_ast/node_sources.h @@ -35,6 +35,8 @@ inline constexpr char kNodeSourceResolverSetOperationCorresponding[] = inline constexpr char kNodeSourceSingleTableArrayNamePath[] = "single_table_array_name_path"; +inline constexpr char kGqlLetOperatorScan[] = "gql_let_operator"; + } // namespace zetasql #endif // ZETASQL_RESOLVED_AST_NODE_SOURCES_H_ diff --git a/zetasql/resolved_ast/query_expression.cc b/zetasql/resolved_ast/query_expression.cc index 243c19007..0e9fb1ad6 100644 --- a/zetasql/resolved_ast/query_expression.cc +++ b/zetasql/resolved_ast/query_expression.cc @@ -30,6 +30,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/case.h" #include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" @@ -40,7 +41,7 @@ namespace zetasql { // . While appending each pair we add the second element (if present) // as an alias to the first element. static std::string JoinListWithAliases( - const std::vector>& list, + absl::Span> list, absl::string_view delimiter) { std::string list_str; bool first = true; @@ -338,7 +339,7 @@ bool QueryExpression::TrySetWithClause( bool QueryExpression::TrySetSelectClause( const std::vector>& select_list, - const std::string& select_hints) { + absl::string_view select_hints) { if (!CanSetSelectClause()) { return false; } @@ -352,7 +353,7 @@ void QueryExpression::ResetSelectClause() { select_list_.clear(); } -bool QueryExpression::TrySetFromClause(const std::string& from) { +bool QueryExpression::TrySetFromClause(absl::string_view from) { if (!CanSetFromClause()) { return false; } @@ -360,7 +361,7 @@ bool QueryExpression::TrySetFromClause(const std::string& from) { return true; } -bool QueryExpression::TrySetWhereClause(const std::string& where) { +bool QueryExpression::TrySetWhereClause(absl::string_view where) { if (!CanSetWhereClause()) { return false; } @@ -370,10 +371,10 @@ bool QueryExpression::TrySetWhereClause(const std::string& where) { bool QueryExpression::TrySetSetOpScanList( std::vector>* set_op_scan_list, - const std::string& set_op_type, const std::string& set_op_modifier, - const std::string& set_op_column_match_mode, - const std::string& set_op_column_propagation_mode, - const std::string& query_hints) { + absl::string_view set_op_type, absl::string_view set_op_modifier, + absl::string_view set_op_column_match_mode, + absl::string_view set_op_column_propagation_mode, + absl::string_view query_hints) { if (!CanSetSetOpScanList()) { return false; } @@ -392,7 +393,7 @@ bool QueryExpression::TrySetSetOpScanList( bool QueryExpression::TrySetGroupByClause( const std::map& group_by_list, - const std::string& group_by_hints, + absl::string_view group_by_hints, const std::vector& grouping_set_id_list, const std::vector& rollup_column_id_list) { if (!CanSetGroupByClause()) { @@ -408,7 +409,7 @@ bool QueryExpression::TrySetGroupByClause( absl::Status QueryExpression::SetGroupByAllClause( const std::map& group_by_list, - const std::string& group_by_hints) { + absl::string_view group_by_hints) { ZETASQL_RET_CHECK(CanSetGroupByClause()); group_by_all_ = true; group_by_list_ = group_by_list; @@ -429,7 +430,7 @@ bool QueryExpression::TrySetOrderByClause( return true; } -bool QueryExpression::TrySetLimitClause(const std::string& limit) { +bool QueryExpression::TrySetLimitClause(absl::string_view limit) { if (!CanSetLimitClause()) { return false; } @@ -437,7 +438,7 @@ bool QueryExpression::TrySetLimitClause(const std::string& limit) { return true; } -bool QueryExpression::TrySetOffsetClause(const std::string& offset) { +bool QueryExpression::TrySetOffsetClause(absl::string_view offset) { if (!CanSetOffsetClause()) { return false; } @@ -446,7 +447,7 @@ bool QueryExpression::TrySetOffsetClause(const std::string& offset) { } bool QueryExpression::TrySetWithAnonymizationClause( - const std::string& anonymization_options) { + absl::string_view anonymization_options) { if (!CanSetWithAnonymizationClause()) { return false; } @@ -462,7 +463,7 @@ bool QueryExpression::TrySetPivotClause(const std::string& pivot) { return true; } -bool QueryExpression::TrySetUnpivotClause(const std::string& unpivot) { +bool QueryExpression::TrySetUnpivotClause(absl::string_view unpivot) { if (!CanSetUnpivotClause()) { return false; } @@ -584,7 +585,7 @@ absl::Status QueryExpression::SetAliasesForSelectList( return absl::OkStatus(); } -void QueryExpression::SetSelectAsModifier(const std::string& modifier) { +void QueryExpression::SetSelectAsModifier(absl::string_view modifier) { ABSL_DCHECK(select_as_modifier_.empty()); select_as_modifier_ = modifier; } diff --git a/zetasql/resolved_ast/query_expression.h b/zetasql/resolved_ast/query_expression.h index 0fb42c2eb..24693cf1b 100644 --- a/zetasql/resolved_ast/query_expression.h +++ b/zetasql/resolved_ast/query_expression.h @@ -68,30 +68,30 @@ class QueryExpression { bool recursive); bool TrySetSelectClause( const std::vector>& select_list, - const std::string& select_hints); - bool TrySetFromClause(const std::string& from); - bool TrySetWhereClause(const std::string& where); + absl::string_view select_hints); + bool TrySetFromClause(absl::string_view from); + bool TrySetWhereClause(absl::string_view where); bool TrySetSetOpScanList( std::vector>* set_op_scan_list, - const std::string& set_op_type, const std::string& set_op_modifier, - const std::string& set_op_column_match_mode, - const std::string& set_op_column_propagation_mode, - const std::string& query_hints); + absl::string_view set_op_type, absl::string_view set_op_modifier, + absl::string_view set_op_column_match_mode, + absl::string_view set_op_column_propagation_mode, + absl::string_view query_hints); bool TrySetGroupByClause( const std::map& group_by_list, - const std::string& group_by_hints, + absl::string_view group_by_hints, const std::vector& grouping_set_id_list, const std::vector& rollup_column_id_list); absl::Status SetGroupByAllClause( const std::map& group_by_list, - const std::string& group_by_hints); + absl::string_view group_by_hints); bool TrySetOrderByClause(const std::vector& order_by_list, const std::string& order_by_hints); - bool TrySetLimitClause(const std::string& limit); - bool TrySetOffsetClause(const std::string& offset); - bool TrySetWithAnonymizationClause(const std::string& anonymization_options); + bool TrySetLimitClause(absl::string_view limit); + bool TrySetOffsetClause(absl::string_view offset); + bool TrySetWithAnonymizationClause(absl::string_view anonymization_options); bool TrySetPivotClause(const std::string& pivot); - bool TrySetUnpivotClause(const std::string& unpivot); + bool TrySetUnpivotClause(absl::string_view unpivot); // The below CanSet... methods return true if filling in the concerned clause // in the QueryExpression will succeed (without mutating it or wrapping it as @@ -147,7 +147,7 @@ class QueryExpression { const absl::flat_hash_map& aliases); // Set the AS modifier for the SELECT. e.g. "AS VALUE". - void SetSelectAsModifier(const std::string& modifier); + void SetSelectAsModifier(absl::string_view modifier); // Returns a mutable pointer to the group_by_list_ of QueryExpression. Used // mostly to update the sql text of the group_by columns to reflect the diff --git a/zetasql/resolved_ast/resolved_ast.cc.template b/zetasql/resolved_ast/resolved_ast.cc.template index 149928426..15e38324e 100644 --- a/zetasql/resolved_ast/resolved_ast.cc.template +++ b/zetasql/resolved_ast/resolved_ast.cc.template @@ -569,7 +569,7 @@ absl::StatusOr RestoreFromImpl( if (proto.name().empty()) { return zetasql_base::InvalidArgumentErrorBuilder(zetasql_base::SourceLocation::current()) << "Tried to parse function with blank name: " - << proto.DebugString(); + << absl::StrCat(proto); } const Constant* constant; @@ -596,7 +596,7 @@ absl::StatusOr RestoreFromImpl( absl::StrSplit(proto.name(), ":"); if (group_and_name.empty()) { return zetasql_base::InvalidArgumentErrorBuilder(zetasql_base::SourceLocation::current()) - << "Tried to parse function with blank name: " << proto.DebugString(); + << "Tried to parse function with blank name: " << absl::StrCat(proto); } const Function* func; @@ -864,15 +864,8 @@ static absl::Status SaveToImpl( for (const TVFInputArgumentType& arg : tvf_signature->input_arguments()) { TVFArgumentProto* arg_proto = proto->add_argument(); if (arg.is_relation()) { - for (const TVFRelation::Column& col : arg.relation().columns()) { - TVFRelationColumnProto* col_proto = - arg_proto->mutable_relation_argument()->add_column(); - col_proto->set_name(col.name); - ZETASQL_RETURN_IF_ERROR(SaveToImpl( - col.type, file_descriptor_set_map, col_proto->mutable_type())); - } - arg_proto->mutable_relation_argument()->set_is_value_table( - arg.relation().is_value_table()); + ZETASQL_RETURN_IF_ERROR(arg.relation().Serialize( + file_descriptor_set_map, arg_proto->mutable_relation_argument())); } else if (arg.is_model()) { TVFModelProto* model_proto = arg_proto->mutable_model_argument(); model_proto->set_name(arg.model().model()->Name()); @@ -902,21 +895,8 @@ static absl::Status SaveToImpl( } } } - TVFRelationProto* output_schema_proto = proto->mutable_output_schema(); - for (const TVFRelation::Column& col : - tvf_signature->result_schema().columns()) { - TVFRelationColumnProto* col_proto = output_schema_proto->add_column(); - col_proto->set_name(col.name); - col_proto->set_is_pseudo_column(col.is_pseudo_column); - ZETASQL_RETURN_IF_ERROR(SaveToImpl( - col.type, file_descriptor_set_map, col_proto->mutable_type())); - if (col.annotation_map != nullptr) { - ZETASQL_RETURN_IF_ERROR( - col.annotation_map->Serialize(col_proto->mutable_annotation_map())); - } - } - output_schema_proto->set_is_value_table( - tvf_signature->result_schema().is_value_table()); + ZETASQL_RETURN_IF_ERROR(tvf_signature->result_schema().Serialize( + file_descriptor_set_map, proto->mutable_output_schema())); for (const FreestandingDeprecationWarning& warning : tvf_signature->options().additional_deprecation_warnings) { @@ -935,21 +915,10 @@ absl::StatusOr> RestoreFromImpl( input_args.reserve(proto.argument_size()); for (const TVFArgumentProto& argument : proto.argument()) { if (argument.has_relation_argument()) { - const TVFRelationProto& relation_arg = argument.relation_argument(); - std::vector cols; - cols.reserve(relation_arg.column_size()); - const Type* type = nullptr; - for (const TVFRelationColumnProto& col_proto : relation_arg.column()) { - ZETASQL_ASSIGN_OR_RETURN( - type, RestoreFromImpl(col_proto.type(), params)); - cols.emplace_back(TVFRelation::Column(col_proto.name(), type)); - } - if (relation_arg.is_value_table()) { - input_args.push_back(TVFInputArgumentType( - TVFRelation::ValueTable(type))); - } else { - input_args.push_back(TVFInputArgumentType(TVFRelation(cols))); - } + ZETASQL_ASSIGN_OR_RETURN(TVFRelation r, TVFRelation::Deserialize( + argument.relation_argument(), + TypeDeserializer(params.type_factory, params.pools))); + input_args.push_back(TVFInputArgumentType(r)); } else if (argument.has_model_argument()) { const Model* model; const std::vector path = absl::StrSplit( @@ -986,22 +955,9 @@ absl::StatusOr> RestoreFromImpl( } } } - std::vector cols; - cols.reserve(proto.output_schema().column_size()); - for (const TVFRelationColumnProto& col_proto : - proto.output_schema().column()) { - const Type* type = nullptr; - const AnnotationMap* annotation_map = nullptr; - ZETASQL_ASSIGN_OR_RETURN( - type, RestoreFromImpl(col_proto.type(), params)); - if (col_proto.has_annotation_map()) { - ZETASQL_RETURN_IF_ERROR(params.type_factory->DeserializeAnnotationMap( - col_proto.annotation_map(), &annotation_map)); - } - cols.emplace_back(TVFRelation::Column(col_proto.name(), - {type, annotation_map}, - col_proto.is_pseudo_column())); - } + ZETASQL_ASSIGN_OR_RETURN(TVFRelation table_schema, TVFRelation::Deserialize( + proto.output_schema(), + TypeDeserializer(params.type_factory, params.pools))); TVFSignatureOptions options; for (const FreestandingDeprecationWarning& warning : @@ -1009,17 +965,8 @@ absl::StatusOr> RestoreFromImpl( options.additional_deprecation_warnings.push_back(warning); } - if (proto.output_schema().is_value_table()) { - AnnotatedType annotated_type = cols[0].annotated_type(); - cols.erase(cols.begin()); - ZETASQL_ASSIGN_OR_RETURN(TVFRelation table_schema, - TVFRelation::ValueTable(annotated_type, cols)); - return std::shared_ptr( - new TVFSignature(input_args, std::move(table_schema), options)); - } else { - return std::shared_ptr( - new TVFSignature(input_args, TVFRelation(cols), options)); - } + return std::shared_ptr( + new TVFSignature(input_args, std::move(table_schema), options)); } static absl::Status SaveToImpl( diff --git a/zetasql/resolved_ast/resolved_ast.h.template b/zetasql/resolved_ast/resolved_ast.h.template index 3dd91c551..08572a10e 100644 --- a/zetasql/resolved_ast/resolved_ast.h.template +++ b/zetasql/resolved_ast/resolved_ast.h.template @@ -20,6 +20,7 @@ #define ZETASQL_RESOLVED_AST_RESOLVED_AST_H_ #include +#include #include #include @@ -42,6 +43,7 @@ #include "zetasql/resolved_ast/resolved_node_kind.h" #include "zetasql/base/status.h" #include "absl/status/statusor.h" +#include "absl/strings/string_view.h" {% macro ZeroArgCtor(node) %} {{node.name}}() : {{node.parent}}() @@ -570,7 +572,7 @@ template < typename group_by_list_t = std::vector>, typename aggregate_list_t - = std::vector>, + = std::vector>, typename grouping_set_list_t = std::vector>, typename rollup_column_list_t @@ -588,10 +590,10 @@ std::unique_ptr MakeResolvedAggregateScan( "group_by_list must be a container of unique_ptr with elements of type " "ResolvedComputedColumn (or its descendants)."); static_assert(std::is_base_of< - ResolvedComputedColumn, + ResolvedComputedColumnBase, typename std::decay::type>::value, "aggregate_list must be a container of unique_ptr with elements of type " - "ResolvedComputedColumn (or its descendants)."); + "ResolvedComputedColumnBase (or its descendants)."); static_assert(std::is_base_of< ResolvedGroupingSetBase, typename std::decay::type>::value, diff --git a/zetasql/resolved_ast/resolved_ast_builder.h.template b/zetasql/resolved_ast/resolved_ast_builder.h.template index f965d3406..1763ef703 100644 --- a/zetasql/resolved_ast/resolved_ast_builder.h.template +++ b/zetasql/resolved_ast/resolved_ast_builder.h.template @@ -156,19 +156,20 @@ class {{node.builder_name}} final { return *this; }; - // Build() releases the current inner node, so it is callable only on an - // r-value, where the builder is expected to be going away. Resets the + // BuildMutable() releases the current inner node, so it is callable only on + // an r-value, where the builder is expected to be going away. Resets the // `accessed_` bits. - absl::StatusOr> Build() && { - {# // The result of Build() is treated as a new node so + absl::StatusOr> BuildMutable() && { + {# // The result of BuildMutable() is treated as a new node so // CheckFieldsAccessed() requires revisiting all fields again as their // values may have changed. Furthermore, accessed_ bits may have been set as // a side effect of accessing existing values through the builder to // incrementally create new values, rather than a client actually accessing // the value. #} - // Performs an emptiness check on node.fields to determine if accessed_ should - // be created. In the case of a concrete node without fields it will not be. + // Performs an emptiness check on node.fields to determine if accessed_ + // should be created. In the case of a concrete node without fields it will + // not be. # if (node.fields) {{inner_node_member_name}}->accessed_ = 0; # endif @@ -185,10 +186,14 @@ class {{node.builder_name}} final { if ({{status_member_name}}.ok()) { return std::move({{inner_node_member_name}}); } - return {{status_member_name}}; } + // Same as the above method, except that it returns an immutable object. + absl::StatusOr> Build() && { + return std::move(*this).BuildMutable(); + } + // Getters and chained setters # for field in (node.fields + node.inherited_fields) # if field.comment diff --git a/zetasql/resolved_ast/resolved_ast_field.h.template.import b/zetasql/resolved_ast/resolved_ast_field.h.template.import index 328d46f3c..a5ac066c9 100644 --- a/zetasql/resolved_ast/resolved_ast_field.h.template.import +++ b/zetasql/resolved_ast/resolved_ast_field.h.template.import @@ -37,7 +37,7 @@ import directive. Example: # if field.comment {{field.comment}} # endif - {{field.getter_return_type}} {{field.name}}() const { + {{field.getter_return_type}} {{field.name}}() const{%if field.override_virtual_getter%} override{%endif%} { # if is_from_builder ABSL_DCHECK({{inner_node_member_name}} != nullptr); # endif diff --git a/zetasql/resolved_ast/resolved_column.h b/zetasql/resolved_ast/resolved_column.h index 11039f60f..b9ac6ecba 100644 --- a/zetasql/resolved_ast/resolved_column.h +++ b/zetasql/resolved_ast/resolved_column.h @@ -49,6 +49,7 @@ class ResolvedColumn { // Default constructor makes an uninitialized ResolvedColumn. ResolvedColumn() = default; ResolvedColumn(const ResolvedColumn&) = default; + ResolvedColumn& operator=(const ResolvedColumn&) = default; // Construct a ResolvedColumn with the given and . // and are for display only, have no defined meaning and diff --git a/zetasql/resolved_ast/resolved_node.cc b/zetasql/resolved_ast/resolved_node.cc index ace09ab1b..b91ae9796 100644 --- a/zetasql/resolved_ast/resolved_node.cc +++ b/zetasql/resolved_ast/resolved_node.cc @@ -40,8 +40,10 @@ #include "zetasql/resolved_ast/resolved_collation.h" #include "zetasql/resolved_ast/resolved_column.h" #include "absl/status/statusor.h" +#include "absl/strings/match.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" +#include "absl/strings/str_split.h" #include "absl/strings/string_view.h" #include "zetasql/base/map_util.h" @@ -126,7 +128,8 @@ void ResolvedNode::DebugStringImpl(const ResolvedNode* node, *output += "\n"; for (const DebugStringField& field : fields) { const bool print_field_name = !field.name.empty(); - const bool print_one_line = field.nodes.empty(); + const bool value_has_newlines = absl::StrContains(field.value, "\n"); + const bool print_one_line = field.nodes.empty() && !value_has_newlines; absl::string_view accessed_string = config.print_accessed ? field.accessed ? "{*}" : "{ }" : ""; @@ -143,6 +146,13 @@ void ResolvedNode::DebugStringImpl(const ResolvedNode* node, } if (!print_one_line) { + if (value_has_newlines) { + absl::StrAppend(output, prefix1, "| \"\"\"\n"); + for (auto line : absl::StrSplit(field.value, '\n')) { + absl::StrAppend(output, prefix1, "| ", line, "\n"); + } + absl::StrAppend(output, prefix1, "| \"\"\"\n"); + } for (const ResolvedNode* node : field.nodes) { const std::string field_name_indent = print_field_name ? (&field != &fields.back() ? "| " : " ") : ""; @@ -340,6 +350,18 @@ std::string ResolvedComputedColumn::GetNameForDebugString() const { expr_.get()); } +void ResolvedDeferredComputedColumn::CollectDebugStringFields( + std::vector* fields) const { + SUPER::CollectDebugStringFields(fields); + CollectDebugStringFieldsWithNameFormat(expr_.get(), fields); +} + +std::string ResolvedDeferredComputedColumn::GetNameForDebugString() const { + std::string name = column().ShortDebugString(); + absl::StrAppend(&name, " [", side_effect_column_.ShortDebugString(), "]"); + return GetNameForDebugStringWithNameFormat(name, expr()); +} + // ResolvedOutputColumn gets formatted as // AS [type>] void ResolvedOutputColumn::CollectDebugStringFields( diff --git a/zetasql/resolved_ast/resolved_node.h b/zetasql/resolved_ast/resolved_node.h index ac0db446a..401ce00e4 100644 --- a/zetasql/resolved_ast/resolved_node.h +++ b/zetasql/resolved_ast/resolved_node.h @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -410,6 +411,17 @@ class ResolvedNode { std::unique_ptr parse_location_range_; // May be NULL. }; +template +std::vector GetAsPointerList( + const std::vector& input_list) { + std::vector output_list; + output_list.resize(input_list.size()); + for (const T2* element : input_list) { + output_list.push_back(element); + } + return output_list; +} + } // namespace zetasql #endif // ZETASQL_RESOLVED_AST_RESOLVED_NODE_H_ diff --git a/zetasql/resolved_ast/rewrite_utils.cc b/zetasql/resolved_ast/rewrite_utils.cc index 10a1bf65a..6c0488d11 100644 --- a/zetasql/resolved_ast/rewrite_utils.cc +++ b/zetasql/resolved_ast/rewrite_utils.cc @@ -28,6 +28,7 @@ #include "zetasql/public/builtin_function.pb.h" #include "zetasql/public/function.h" #include "zetasql/public/function_signature.h" +#include "zetasql/public/input_argument_type.h" #include "zetasql/public/types/annotation.h" #include "zetasql/public/types/simple_value.h" #include "zetasql/public/types/type.h" @@ -37,12 +38,14 @@ #include "zetasql/resolved_ast/resolved_ast_deep_copy_visitor.h" #include "zetasql/resolved_ast/resolved_ast_helper.h" #include "zetasql/resolved_ast/resolved_ast_visitor.h" +#include "zetasql/resolved_ast/resolved_collation.h" #include "zetasql/resolved_ast/resolved_column.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "absl/strings/substitute.h" +#include "absl/types/span.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" @@ -50,6 +53,24 @@ namespace zetasql { namespace { +// A visitor to check whether the ResolvedAST has ResolvedGroupingCall nodes. +class GroupingCallDetectorVisitor : public ResolvedASTVisitor { + public: + explicit GroupingCallDetectorVisitor(bool* has_grouping_call) + : has_grouping_call_(has_grouping_call) {} + + absl::Status VisitResolvedAggregateScan( + const ResolvedAggregateScan* node) override { + if (!node->grouping_call_list().empty()) { + *has_grouping_call_ = true; + } + return DefaultVisit(node); + } + + private: + bool* has_grouping_call_; +}; + // A visitor that changes ResolvedColumnRef nodes to be correlated. class CorrelateColumnRefVisitor : public ResolvedASTDeepCopyVisitor { private: @@ -96,7 +117,7 @@ class CorrelateColumnRefVisitor : public ResolvedASTDeepCopyVisitor { // If this is the first lambda or subquery encountered, we need to correlate // the column references in the parameter list and for the in expression. - // Column refererences of outer columns are already correlated. + // Column references of outer columns are already correlated. if (!in_subquery_or_lambda_) { std::unique_ptr expr = ConsumeTopOfStack(); @@ -300,7 +321,7 @@ class ColumnRemappingResolvedASTDeepCopyVisitor const ResolvedColumn& column) override { if (!column_map_.contains(column)) { column_map_[column] = column_factory_.MakeCol( - column.table_name(), column.name(), column.type()); + column.table_name(), column.name(), column.annotated_type()); } return column_map_[column]; } @@ -334,7 +355,10 @@ absl::StatusOr> FunctionCallBuilder::If( ZETASQL_RET_CHECK_NE(then_case.get(), nullptr); ZETASQL_RET_CHECK_NE(else_case.get(), nullptr); ZETASQL_RET_CHECK(condition->type()->IsBool()); - ZETASQL_RET_CHECK(then_case->type()->Equals(else_case->type())); + ZETASQL_RET_CHECK(then_case->type()->Equals(else_case->type())) + << "Inconsistent types of then_case and else_case: " + << then_case->type()->DebugString() << " vs " + << else_case->type()->DebugString(); const Function* if_fn = nullptr; ZETASQL_RETURN_IF_ERROR(GetBuiltinFunctionFromCatalog("if", &if_fn)); @@ -371,7 +395,7 @@ absl::StatusOr> ReplaceScanColumns( std::vector CreateReplacementColumns( ColumnFactory& column_factory, - const std::vector& column_list) { + absl::Span column_list) { std::vector replacement_columns; replacement_columns.reserve(column_list.size()); @@ -383,8 +407,6 @@ std::vector CreateReplacementColumns( return replacement_columns; } -// TODO: Propagate annotations correctly for this function, if -// needed, after creating resolved function node. absl::StatusOr> FunctionCallBuilder::IsNull(std::unique_ptr arg) { ZETASQL_RET_CHECK_NE(arg.get(), nullptr); @@ -401,6 +423,21 @@ FunctionCallBuilder::IsNull(std::unique_ptr arg) { ResolvedFunctionCall::DEFAULT_ERROR_MODE); } +absl::StatusOr> +FunctionCallBuilder::AnyIsNull( + std::vector> args) { + std::vector> is_nulls; + is_nulls.reserve(args.size()); + for (const auto& arg : args) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr arg_copy, + ResolvedASTDeepCopyVisitor::Copy(arg.get())); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr is_null, + IsNull(std::move(arg_copy))); + is_nulls.push_back(std::move(is_null)); + } + return Or(std::move(is_nulls)); +} + // TODO: Propagate annotations correctly for this function, if // needed, after creating resolved function node. absl::StatusOr> @@ -429,6 +466,37 @@ FunctionCallBuilder::IfError(std::unique_ptr try_expr, .Build(); } +absl::StatusOr> +FunctionCallBuilder::Error(const std::string& error_text, + const Type* target_type) { + std::unique_ptr error_expr = + MakeResolvedLiteral(types::StringType(), Value::StringValue(error_text)); + return Error(std::move(error_expr), target_type); +} + +absl::StatusOr> +FunctionCallBuilder::Error(std::unique_ptr error_expr, + const Type* target_type) { + ZETASQL_RET_CHECK_NE(error_expr.get(), nullptr); + ZETASQL_RET_CHECK(error_expr->type()->IsString()); + + const Function* error_fn = nullptr; + ZETASQL_RETURN_IF_ERROR(GetBuiltinFunctionFromCatalog("error", &error_fn)); + FunctionArgumentType arg_type(types::StringType(), /*num_occurrences=*/1); + if (target_type == nullptr) { + target_type = types::Int64Type(); + } + FunctionArgumentType return_type(target_type, /*num_occurrences=*/1); + return ResolvedFunctionCallBuilder() + .set_type(return_type.type()) + .set_function(error_fn) + .set_signature({return_type, {arg_type}, FN_ERROR}) + .add_argument_list(std::move(error_expr)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .Build(); +} + absl::StatusOr> FunctionCallBuilder::MakeArray( const Type* element_type, @@ -595,6 +663,245 @@ FunctionCallBuilder::Equal(std::unique_ptr left_expr, .Build(); } +absl::StatusOr> +FunctionCallBuilder::NotEqual(std::unique_ptr left_expr, + std::unique_ptr right_expr) { + ZETASQL_RET_CHECK_NE(left_expr.get(), nullptr); + ZETASQL_RET_CHECK_NE(right_expr.get(), nullptr); + ZETASQL_RET_CHECK(left_expr->type()->Equals(right_expr->type())); + ZETASQL_RET_CHECK(left_expr->type()->SupportsEquality()); + + const Function* not_equal_fn = nullptr; + ZETASQL_RETURN_IF_ERROR(GetBuiltinFunctionFromCatalog("$not_equal", ¬_equal_fn)); + + // Only the first signature has collation enabled in function signature + // options. + ZETASQL_RET_CHECK_GT(not_equal_fn->signatures().size(), 1); + const FunctionSignature* catalog_signature = not_equal_fn->GetSignature(0); + ZETASQL_RET_CHECK(catalog_signature != nullptr); + ZETASQL_RET_CHECK_EQ(catalog_signature->arguments().size(), 2); + + FunctionArgumentType result_type(types::BoolType(), + catalog_signature->result_type().options(), + /*num_occurrences=*/1); + FunctionArgumentType left_arg_type(left_expr->type(), + catalog_signature->argument(0).options(), + /*num_occurrences=*/1); + FunctionArgumentType right_arg_type(right_expr->type(), + catalog_signature->argument(1).options(), + /*num_occurrences=*/1); + + FunctionSignature not_equal_signature( + result_type, {left_arg_type, right_arg_type}, + catalog_signature->context_id(), catalog_signature->options()); + std::vector> args; + args.reserve(2); + args.push_back(std::move(left_expr)); + args.push_back(std::move(right_expr)); + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr resolved_function, + ResolvedFunctionCallBuilder() + .set_type(types::BoolType()) + .set_function(not_equal_fn) + .set_signature(not_equal_signature) + .set_argument_list(std::move(args)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .BuildMutable()); + // Attach type annotation to `collation_list` if there is any and it is + // consistent in all arguments with annotation. + auto annotation_map = CollationAnnotation().GetCollationFromFunctionArguments( + /*error_location=*/nullptr, *resolved_function, + FunctionEnums::AFFECTS_OPERATION); + if (annotation_map.ok() && annotation_map.value() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN( + ResolvedCollation resolved_collation, + ResolvedCollation::MakeResolvedCollation(*annotation_map.value())); + resolved_function->add_collation_list(std::move(resolved_collation)); + } + // We don't need to propagate type annotation map for this function because + // the return type is not STRING. + return resolved_function; +} + +absl::StatusOr> +FunctionCallBuilder::FunctionCallWithSameTypeArgumentsSupportingOrdering( + std::vector> expressions, + absl::string_view builtin_function_name) { + ZETASQL_RET_CHECK_GE(expressions.size(), 1); + ZETASQL_RET_CHECK_NE(expressions[0].get(), nullptr); + + const Type* type = expressions[0]->type(); + ZETASQL_RET_CHECK(type->SupportsOrdering(analyzer_options_.language(), + /*type_description=*/nullptr)); + for (int i = 1; i < expressions.size(); ++i) { + ZETASQL_RET_CHECK(expressions[i]->type()->Equals(type)) + << "Type of expression " << i << " is not the same as the first one: " + << expressions[i]->type()->DebugString() << " vs " + << type->DebugString(); + } + const Function* fn = nullptr; + ZETASQL_RETURN_IF_ERROR(GetBuiltinFunctionFromCatalog(builtin_function_name, &fn)); + ZETASQL_RET_CHECK(fn != nullptr); + + ZETASQL_RET_CHECK_EQ(fn->signatures().size(), 1); + const FunctionSignature* catalog_signature = fn->GetSignature(0); + ZETASQL_RET_CHECK(catalog_signature != nullptr); + + // Construct arguments type and result type to pass to FunctionSignature. + FunctionArgumentType result_type( + type, catalog_signature->result_type().options(), /*num_occurrences=*/1); + FunctionArgumentType arguments_type(type, + catalog_signature->argument(0).options(), + static_cast(expressions.size())); + FunctionSignature concrete_signature(result_type, {arguments_type}, + catalog_signature->context_id(), + catalog_signature->options()); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr resolved_function, + ResolvedFunctionCallBuilder() + .set_type(type) + .set_function(fn) + .set_signature(concrete_signature) + .set_argument_list(std::move(expressions)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .BuildMutable()); + + ZETASQL_RETURN_IF_ERROR(annotation_propagator_.CheckAndPropagateAnnotations( + /*error_node=*/nullptr, resolved_function.get())); + return resolved_function; +} + +absl::StatusOr> +FunctionCallBuilder::Least( + std::vector> expressions) { + return FunctionCallWithSameTypeArgumentsSupportingOrdering( + std::move(expressions), "least"); +} + +absl::StatusOr> +FunctionCallBuilder::Greatest( + std::vector> expressions) { + return FunctionCallWithSameTypeArgumentsSupportingOrdering( + std::move(expressions), "greatest"); +} + +absl::StatusOr> +FunctionCallBuilder::Coalesce( + std::vector> expressions) { + ZETASQL_RET_CHECK_GE(expressions.size(), 1); + ZETASQL_RET_CHECK_NE(expressions[0].get(), nullptr); + + InputArgumentTypeSet arg_set; + for (int i = 0; i < expressions.size(); ++i) { + arg_set.Insert(InputArgumentType(expressions[i]->type())); + } + const Type* super_type = nullptr; + ZETASQL_RETURN_IF_ERROR(coercer_.GetCommonSuperType(arg_set, &super_type)); + + const Function* coalesce_fn = nullptr; + ZETASQL_RETURN_IF_ERROR(GetBuiltinFunctionFromCatalog("coalesce", &coalesce_fn)); + ZETASQL_RET_CHECK(coalesce_fn != nullptr); + + ZETASQL_RET_CHECK_EQ(coalesce_fn->signatures().size(), 1); + const FunctionSignature* catalog_signature = coalesce_fn->GetSignature(0); + ZETASQL_RET_CHECK(catalog_signature != nullptr); + + // Construct arguments type and result type to pass to FunctionSignature. + FunctionArgumentType result_type(super_type, + catalog_signature->result_type().options(), + /*num_occurrences=*/1); + FunctionArgumentType arguments_type(super_type, + catalog_signature->argument(0).options(), + static_cast(expressions.size())); + FunctionSignature coalesce_signature(result_type, {arguments_type}, + catalog_signature->context_id(), + catalog_signature->options()); + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr resolved_function, + ResolvedFunctionCallBuilder() + .set_type(super_type) + .set_function(coalesce_fn) + .set_signature(coalesce_signature) + .set_argument_list(std::move(expressions)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .BuildMutable()); + + ZETASQL_RETURN_IF_ERROR(annotation_propagator_.CheckAndPropagateAnnotations( + /*error_node=*/nullptr, resolved_function.get())); + return resolved_function; +} + +absl::StatusOr> +FunctionCallBuilder::Less(std::unique_ptr left_expr, + std::unique_ptr right_expr) { + ZETASQL_RET_CHECK_NE(left_expr.get(), nullptr); + ZETASQL_RET_CHECK_NE(right_expr.get(), nullptr); + ZETASQL_RET_CHECK(left_expr->type()->Equals(right_expr->type())) + << "Type of expression are not the same: " + << left_expr->type()->DebugString() << " vs " + << right_expr->type()->DebugString(); + std::string unused_type_description; + ZETASQL_RET_CHECK(left_expr->type()->SupportsOrdering(analyzer_options_.language(), + &unused_type_description)); + + const Function* less_fn = nullptr; + ZETASQL_RETURN_IF_ERROR(GetBuiltinFunctionFromCatalog("$less", &less_fn)); + + // Only the first signature has collation enabled in function signature + // options. + ZETASQL_RET_CHECK_GT(less_fn->signatures().size(), 1); + const FunctionSignature* catalog_signature = less_fn->GetSignature(0); + ZETASQL_RET_CHECK(catalog_signature != nullptr); + ZETASQL_RET_CHECK_EQ(catalog_signature->arguments().size(), 2); + + FunctionArgumentType result_type(types::BoolType(), + catalog_signature->result_type().options(), + /*num_occurrences=*/1); + FunctionArgumentType left_arg_type(left_expr->type(), + catalog_signature->argument(0).options(), + /*num_occurrences=*/1); + FunctionArgumentType right_arg_type(right_expr->type(), + catalog_signature->argument(1).options(), + /*num_occurrences=*/1); + + FunctionSignature less_signature(result_type, {left_arg_type, right_arg_type}, + catalog_signature->context_id(), + catalog_signature->options()); + std::vector> args; + args.reserve(2); + args.push_back(std::move(left_expr)); + args.push_back(std::move(right_expr)); + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr resolved_function, + ResolvedFunctionCallBuilder() + .set_type(types::BoolType()) + .set_function(less_fn) + .set_signature(less_signature) + .set_argument_list(std::move(args)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .BuildMutable()); + // Attach type annotation to `collation_list` if there is any and it is + // consistent in all arguments with annotation. + auto annotation_map = CollationAnnotation().GetCollationFromFunctionArguments( + /*error_location=*/nullptr, *resolved_function, + FunctionEnums::AFFECTS_OPERATION); + if (annotation_map.ok() && annotation_map.value() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN( + ResolvedCollation resolved_collation, + ResolvedCollation::MakeResolvedCollation(*annotation_map.value())); + resolved_function->add_collation_list(std::move(resolved_collation)); + } + // We don't need to propagate type annotation map for this function because + // the return type is not STRING. + return resolved_function; +} + namespace { absl::StatusOr GetBinaryFunctionSignatureFromArgumentTypes( const Function* function, const Type* left_expr_type, @@ -755,6 +1062,184 @@ FunctionCallBuilder::NaryLogic( .Build(); } +absl::StatusOr> +FunctionCallBuilder::ArrayLength( + std::unique_ptr array_expr) { + ZETASQL_RET_CHECK_NE(array_expr.get(), nullptr); + ZETASQL_RET_CHECK(array_expr->type()->IsArray()); + const Function* array_length_fn = nullptr; + ZETASQL_RETURN_IF_ERROR( + GetBuiltinFunctionFromCatalog("array_length", &array_length_fn)); + + ZETASQL_RET_CHECK_EQ(array_length_fn->signatures().size(), 1); + const FunctionSignature* catalog_signature = array_length_fn->GetSignature(0); + ZETASQL_RET_CHECK(catalog_signature != nullptr); + ZETASQL_RET_CHECK_EQ(catalog_signature->arguments().size(), 1); + + FunctionArgumentType result_type(types::Int64Type(), + catalog_signature->result_type().options(), + /*num_occurrences=*/1); + FunctionArgumentType arg_type(array_expr->type(), + catalog_signature->argument(0).options(), + /*num_occurrences=*/1); + + FunctionSignature concrete_signature(result_type, {arg_type}, + catalog_signature->context_id(), + catalog_signature->options()); + std::vector> args; + args.push_back(std::move(array_expr)); + + return ResolvedFunctionCallBuilder() + .set_type(types::Int64Type()) + .set_function(array_length_fn) + .set_signature(concrete_signature) + .set_argument_list(std::move(args)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .Build(); +} + +absl::StatusOr> +FunctionCallBuilder::ArrayAtOffset( + std::unique_ptr array_expr, + std::unique_ptr offset_expr) { + ZETASQL_RET_CHECK(array_expr->type()->IsArray()); + ZETASQL_RET_CHECK_EQ(offset_expr->type(), types::Int64Type()); + + const Function* array_at_offset_fn = nullptr; + ZETASQL_RETURN_IF_ERROR( + GetBuiltinFunctionFromCatalog("$array_at_offset", &array_at_offset_fn)); + + ZETASQL_RET_CHECK_EQ(array_at_offset_fn->signatures().size(), 1); + const FunctionSignature* catalog_signature = + array_at_offset_fn->GetSignature(0); + ZETASQL_RET_CHECK(catalog_signature != nullptr); + ZETASQL_RET_CHECK_EQ(catalog_signature->arguments().size(), 2); + + FunctionArgumentType result_type( + array_expr->type()->AsArray()->element_type(), + catalog_signature->result_type().options(), + /*num_occurrences=*/1); + FunctionArgumentType array_arg(array_expr->type(), + catalog_signature->argument(0).options(), + /*num_occurrences=*/1); + FunctionArgumentType offset_arg(offset_expr->type(), + catalog_signature->argument(1).options(), + /*num_occurrences=*/1); + + FunctionSignature concrete_signature(result_type, {array_arg, offset_arg}, + catalog_signature->context_id(), + catalog_signature->options()); + std::vector> args; + args.push_back(std::move(array_expr)); + args.push_back(std::move(offset_expr)); + + ZETASQL_ASSIGN_OR_RETURN( + std::unique_ptr resolved_function, + ResolvedFunctionCallBuilder() + .set_type(result_type.type()) + .set_function(array_at_offset_fn) + .set_signature(concrete_signature) + .set_argument_list(std::move(args)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .Build()); + + ZETASQL_RETURN_IF_ERROR(annotation_propagator_.CheckAndPropagateAnnotations( + /*error_node=*/nullptr, + const_cast(resolved_function.get()))); + return resolved_function; +} + +// Returns the FunctionSignatureId of MOD corresponding to `input_type`. +static absl::StatusOr GetModSignatureIdForInputType( + const Function* mod_fn, const Type* input_type) { + if (input_type == types::Int64Type()) { + return FN_MOD_INT64; + } + if (input_type == types::Uint64Type()) { + return FN_MOD_UINT64; + } + if (input_type == types::NumericType()) { + return FN_MOD_NUMERIC; + } + if (input_type == types::BigNumericType()) { + return FN_MOD_BIGNUMERIC; + } + return absl::InvalidArgumentError(absl::StrCat( + "Unsupported input type for mod: ", input_type->DebugString())); +} + +// Returns the FunctionSignature of MOD corresponding to `input_type`. +static absl::StatusOr GetModSignature( + const Function* mod_fn, const Type* input_type) { + ZETASQL_ASSIGN_OR_RETURN(FunctionSignatureId mod_signature_id, + GetModSignatureIdForInputType(mod_fn, input_type)); + const FunctionSignature* catalog_signature = nullptr; + for (const FunctionSignature& signature : mod_fn->signatures()) { + if (signature.context_id() == mod_signature_id) { + catalog_signature = &signature; + break; + } + } + if (catalog_signature == nullptr) { + switch (mod_signature_id) { + case FN_MOD_NUMERIC: + return absl::InvalidArgumentError( + "The provided catalog does not have the FN_MOD_NUMERIC signature. " + "Did you forget to enable FEATURE_NUMERIC_TYPE?"); + case FN_MOD_BIGNUMERIC: + return absl::InvalidArgumentError( + "The provided catalog does not have the FN_MOD_BIGNUMERIC " + "signature. Did you forget to enable FEATURE_BIGNUMERIC_TYPE?"); + default: + ZETASQL_RET_CHECK_FAIL(); + } + } + return catalog_signature; +} + +absl::StatusOr> +FunctionCallBuilder::Mod(std::unique_ptr dividend_expr, + std::unique_ptr divisor_expr) { + ZETASQL_RET_CHECK_EQ(dividend_expr->type(), divisor_expr->type()); + const Type* input_type = dividend_expr->type(); + + const Function* mod_fn = nullptr; + ZETASQL_RETURN_IF_ERROR(GetBuiltinFunctionFromCatalog("mod", &mod_fn)); + + ZETASQL_ASSIGN_OR_RETURN(const FunctionSignature* catalog_signature, + GetModSignature(mod_fn, input_type)); + ZETASQL_RET_CHECK_EQ(catalog_signature->arguments().size(), 2); + + FunctionArgumentType result_type(catalog_signature->result_type().type(), + catalog_signature->result_type().options(), + /*num_occurrences=*/1); + FunctionArgumentType dividend_arg(dividend_expr->type(), + catalog_signature->argument(0).options(), + /*num_occurrences=*/1); + FunctionArgumentType divisor_arg(divisor_expr->type(), + catalog_signature->argument(1).options(), + /*num_occurrences=*/1); + + FunctionSignature concrete_signature(result_type, {dividend_arg, divisor_arg}, + catalog_signature->context_id(), + catalog_signature->options()); + + std::vector> args; + args.push_back(std::move(dividend_expr)); + args.push_back(std::move(divisor_expr)); + + return ResolvedFunctionCallBuilder() + .set_type(result_type.type()) + .set_function(mod_fn) + .set_signature(concrete_signature) + .set_argument_list(std::move(args)) + .set_error_mode(ResolvedFunctionCall::DEFAULT_ERROR_MODE) + .set_function_call_info(std::make_shared()) + .Build(); +} + absl::StatusOr CatalogSupportsBuiltinFunction( absl::string_view function_name, const AnalyzerOptions& analyzer_options, Catalog& catalog) { @@ -791,6 +1276,14 @@ absl::Status CheckCatalogSupportsSafeMode( return absl::OkStatus(); } +// Checks whether the ResolvedAST has grouping function related nodes. +absl::StatusOr HasGroupingCallNode(const ResolvedNode* node) { + bool has_grouping_call = false; + GroupingCallDetectorVisitor visitor(&has_grouping_call); + ZETASQL_RETURN_IF_ERROR(node->Accept(&visitor)); + return has_grouping_call; +} + absl::Status FunctionCallBuilder::GetBuiltinFunctionFromCatalog( absl::string_view function_name, const Function** fn_out) { ZETASQL_RET_CHECK_NE(fn_out, nullptr); diff --git a/zetasql/resolved_ast/rewrite_utils.h b/zetasql/resolved_ast/rewrite_utils.h index 0c37f6739..e34d2d9b8 100644 --- a/zetasql/resolved_ast/rewrite_utils.h +++ b/zetasql/resolved_ast/rewrite_utils.h @@ -25,15 +25,18 @@ #include "zetasql/analyzer/annotation_propagator.h" #include "zetasql/public/analyzer_options.h" #include "zetasql/public/builtin_function.pb.h" +#include "zetasql/public/coercer.h" #include "zetasql/public/types/annotation.h" #include "zetasql/public/types/simple_value.h" #include "zetasql/public/types/type.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/resolved_ast/resolved_ast_visitor.h" +#include "zetasql/resolved_ast/resolved_node.h" #include "absl/memory/memory.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/status_builder.h" #include "zetasql/base/status_macros.h" @@ -205,7 +208,7 @@ absl::StatusOr> ReplaceScanColumns( // Useful when replacing columns for a ResolvedExecuteAsRole node. std::vector CreateReplacementColumns( ColumnFactory& column_factory, - const std::vector& column_list); + absl::Span column_list); // Helper for rewriters to check whether a needed built-in function is part // of the catalog. This is useful to generate good error messages when a @@ -224,6 +227,9 @@ absl::Status CheckCatalogSupportsSafeMode( absl::string_view function_name, const AnalyzerOptions& analyzer_options, Catalog& catalog); +// Checks whether the ResolvedAST has ResolvedGroupingCall nodes. +absl::StatusOr HasGroupingCallNode(const ResolvedNode* node); + // Contains helper functions that reduce boilerplate in rewriting rules logic // related to constructing new ResolvedFunctionCall instances. // TODO: Move FunctionCallBuilder class from rewriter utils @@ -237,7 +243,8 @@ class FunctionCallBuilder { catalog_(catalog), type_factory_(type_factory), annotation_propagator_( - AnnotationPropagator(analyzer_options, type_factory)) {} + AnnotationPropagator(analyzer_options, type_factory)), + coercer_(&type_factory, &analyzer_options.language(), &catalog) {} // Helper to check that engines support the required IFERROR and NULLIFERROR // functions that are used to implement SAFE mode in rewriters. If the @@ -270,6 +277,12 @@ class FunctionCallBuilder { absl::StatusOr> IsNull( std::unique_ptr arg); + // Construct a ResolvedFunctionCall for arg[0] IS NULL OR arg[1] IS NULL OR .. + // + // Like `IsNull`, a built-in function "$is_null" must be available. + absl::StatusOr> AnyIsNull( + std::vector> args); + // Construct a ResolvedFunctionCall for IFERROR(try_expr, handle_expr) // // Requires: try_expr and handle_expr must return equal types. @@ -280,6 +293,27 @@ class FunctionCallBuilder { std::unique_ptr try_expr, std::unique_ptr handle_expr); + // Construct a ResolvedFunctionCall for ERROR(error_text). + // + // The signature for the built-in function "error" must be available in + // or an error status is returned. + // If `target_type` is supplied, set the return type of ERROR function to + // `target_type` if needed. + absl::StatusOr> Error( + const std::string& error_text, const Type* target_type = nullptr); + + // Construct a ResolvedFunctionCall for ERROR(error_expr). + // + // Requires: error_expr has STRING type. + // + // The signature for the built-in function "error" must be available in + // or an error status is returned. + // If `target_type` is supplied, set the return type of ERROR function to + // `target_type` if needed. + absl::StatusOr> Error( + std::unique_ptr error_expr, + const Type* target_type = nullptr); + // Constructs a ResolvedFunctionCall for the $make_array function to create an // array for a list of elements // @@ -346,6 +380,58 @@ class FunctionCallBuilder { std::unique_ptr left_expr, std::unique_ptr right_expr); + // Construct a ResolvedFunctionCall for != . + // + // Requires: and must return equal types AND + // the type supports equality. + // + // The signature for the built-in function "$not_equal" must be available in + // or an error status is returned. + absl::StatusOr> NotEqual( + std::unique_ptr left_expr, + std::unique_ptr right_expr); + + // Construct a ResolvedFunctionCall for LEAST(REPEATED ). + // + // Requires: All elements in must have the same type which + // supports ordering. + // + // The signature for the built-in function "least" must be available in + // or an error status is returned. + absl::StatusOr> Least( + std::vector> expressions); + + // Construct a ResolvedFunctionCall for GREATEST(REPEATED ). + // + // Requires: All elements in must have the same type which + // supports ordering. + // + // The signature for the built-in function "greatest" must be available in + // or an error status is returned. + absl::StatusOr> Greatest( + std::vector> expressions); + + // Construct a ResolvedFunctionCall for COALESCE(REPEATED ). + // + // Requires: Elements in must have types which are implicitly + // coercible to a common supertype. + // + // The signature for the built-in function "coalesce" must be available in + // or an error status is returned. + absl::StatusOr> Coalesce( + std::vector> expressions); + + // Construct a ResolvedFunctionCall for < . + // + // Requires: and must return equal types AND + // the type supports comparison (aka. ordering). + // + // The signature for the built-in function "$less" must be available in + // or an error status is returned. + absl::StatusOr> Less( + std::unique_ptr left_expr, + std::unique_ptr right_expr); + // Construct a ResolvedFunctionCall for >= . // // Requires: Both `left_expr` and `right_expr` must be order-able types. The @@ -405,6 +491,38 @@ class FunctionCallBuilder { absl::StatusOr> Or( std::vector> expressions); + // Construct a ResolvedFunctionCall for ARRAY_LENGTH(array_expr). + // + // Requires: array_expr is of ARRAY type. + // + // The signature for the built-in function "array_length" must be available in + // or an error status is returned. + absl::StatusOr> ArrayLength( + std::unique_ptr array_expr); + + // Constructs a ResolvedFunctionCall for ARRAY[OFFSET(offset_expr)]. + // + // Requires: + // - `array_expr` is ARRAY. + // - `offset_expr` is INT64. + // + // The signature for the built-in function "$array_at_offset" must be + // available in or an error status is returned. + absl::StatusOr> ArrayAtOffset( + std::unique_ptr array_expr, + std::unique_ptr offset_expr); + + // Constructs a ResolvedFunctionCall for the MOD(dividend, divisor). + // + // Requires: `dividend_expr` and `divisor_expr` must be of the same type and + // are of one of the following types: [INT64, UINT64, NUMERIC, BIGNUMERIC]. + // + // The signature for the built-in function "mod" must be available in + // or an error status is returned. + absl::StatusOr> Mod( + std::unique_ptr dividend_expr, + std::unique_ptr divisor_expr); + private: static AnnotationPropagator BuildAnnotationPropagator( const AnalyzerOptions& analyzer_options, TypeFactory& type_factory) { @@ -424,6 +542,14 @@ class FunctionCallBuilder { absl::string_view op_catalog_name, FunctionSignatureId op_function_id, std::vector> expressions); + // Construct a ResolvedFunctionCall of + // builtin_function_name(REPEATED ) + // whose arguments have the same type and supports ordering. + absl::StatusOr> + FunctionCallWithSameTypeArgumentsSupportingOrdering( + std::vector> expressions, + absl::string_view builtin_function_name); + // Helper that controls the error message when built-in functions are not // found in the catalog. absl::Status GetBuiltinFunctionFromCatalog(absl::string_view function_name, @@ -434,6 +560,7 @@ class FunctionCallBuilder { TypeFactory& type_factory_; AnnotationPropagator annotation_propagator_; + Coercer coercer_; }; // Contains helper functions for building components of the ResolvedAST when diff --git a/zetasql/resolved_ast/rewrite_utils_test.cc b/zetasql/resolved_ast/rewrite_utils_test.cc index 8e94943d1..45a4f4cfa 100644 --- a/zetasql/resolved_ast/rewrite_utils_test.cc +++ b/zetasql/resolved_ast/rewrite_utils_test.cc @@ -25,18 +25,23 @@ #include "zetasql/public/analyzer.h" #include "zetasql/public/analyzer_options.h" #include "zetasql/public/analyzer_output.h" +#include "zetasql/public/builtin_function_options.h" #include "zetasql/public/id_string.h" +#include "zetasql/public/numeric_value.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/types/annotation.h" #include "zetasql/public/types/simple_type.h" #include "zetasql/public/types/simple_value.h" #include "zetasql/public/types/type.h" #include "zetasql/public/types/type_factory.h" +#include "zetasql/public/value.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/resolved_ast/resolved_ast_builder.h" #include "zetasql/resolved_ast/resolved_column.h" #include "zetasql/resolved_ast/test_utils.h" #include "gmock/gmock.h" #include "gtest/gtest.h" +#include "absl/status/status.h" #include "absl/strings/ascii.h" #include "absl/strings/str_format.h" @@ -780,6 +785,501 @@ TEST_F(FunctionCallBuilderTest, AndInvalidExpressionsTest) { StatusIs(absl::StatusCode::kInternal)); } +TEST_F(FunctionCallBuilderTest, ErrorFunctionWithCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a1, + ResolvedLiteralBuilder() + .set_value(Value::String("a")) + .set_type(types::StringType()) + .Build()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a2, + ResolvedLiteralBuilder() + .set_value(Value::String("A")) + .set_type(types::StringType()) + .Build()); + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr1, + testing::MakeCollateCallForTest( + std::move(a1), "und:ci", analyzer_options_, catalog_, type_factory_)); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr2, + testing::MakeCollateCallForTest( + std::move(a2), "und:cs", analyzer_options_, catalog_, type_factory_)); + + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn1, + fn_builder_.Error(std::move(expr1))); + EXPECT_EQ(resolved_fn1->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:error(STRING) -> INT64) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:ci"} + +-Literal(type=STRING, value="a") + +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) +)")); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr resolved_fn2, + fn_builder_.Error(std::move(expr2), types::StringType())); + EXPECT_EQ(resolved_fn2->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:error(STRING) -> STRING) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:cs"} + +-Literal(type=STRING, value="A") + +-Literal(type=STRING, value="und:cs", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, NotEqualWithSameCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a1, + ResolvedLiteralBuilder() + .set_value(Value::String("a")) + .set_type(types::StringType()) + .Build()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a2, + ResolvedLiteralBuilder() + .set_value(Value::String("A")) + .set_type(types::StringType()) + .Build()); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr1, + testing::MakeCollateCallForTest( + std::move(a1), "und:ci", analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr2, + testing::MakeCollateCallForTest( + std::move(a2), "und:ci", analyzer_options_, catalog_, type_factory_)); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr resolved_fn, + fn_builder_.NotEqual(std::move(expr1), std::move(expr2))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:$not_equal(STRING, STRING) -> BOOL) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="a") +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:ci"} + +-Literal(type=STRING, value="A") + +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-collation_list=[und:ci] +)")); +} + +TEST_F(FunctionCallBuilderTest, NotEqualWithMixedCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a1, + ResolvedLiteralBuilder() + .set_value(Value::String("a")) + .set_type(types::StringType()) + .Build()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a2, + ResolvedLiteralBuilder() + .set_value(Value::String("A")) + .set_type(types::StringType()) + .Build()); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr1, + testing::MakeCollateCallForTest( + std::move(a1), "und:ci", analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr2, + testing::MakeCollateCallForTest( + std::move(a2), "und:cs", analyzer_options_, catalog_, type_factory_)); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr resolved_fn, + fn_builder_.NotEqual(std::move(expr1), std::move(expr2))); + // When different arguments have different collation attached, the collation + // propagator does not attach `collation_list`. + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:$not_equal(STRING, STRING) -> BOOL) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="a") +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:cs"} + +-Literal(type=STRING, value="A") + +-Literal(type=STRING, value="und:cs", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, LeastWithSameCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"bar", "und:ci"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Least(std::move(args))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:least(repeated(2) STRING) -> STRING) ++-type_annotation_map={Collation:"und:ci"} ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:ci"} + +-Literal(type=STRING, value="bar", has_explicit_type=TRUE) + +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, LeastWithMixedCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"FOO", "binary"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Least(std::move(args))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:least(repeated(2) STRING) -> STRING) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"binary"} + +-Literal(type=STRING, value="FOO", has_explicit_type=TRUE) + +-Literal(type=STRING, value="binary", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, GreatestWithSameCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"FOO", "und:ci"}, {"BaR", "und:ci"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Greatest(std::move(args))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:greatest(repeated(3) STRING) -> STRING) ++-type_annotation_map={Collation:"und:ci"} ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="FOO", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:ci"} + +-Literal(type=STRING, value="BaR", has_explicit_type=TRUE) + +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, GreatestWithMixedCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"FOO", "binary"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Greatest(std::move(args))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:greatest(repeated(2) STRING) -> STRING) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"binary"} + +-Literal(type=STRING, value="FOO", has_explicit_type=TRUE) + +-Literal(type=STRING, value="binary", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, CoalesceWithCommonSuperTypeTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a1, + ResolvedLiteralBuilder() + .set_value(Value::Int64(1)) + .set_type(types::Int64Type()) + .Build()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a2, + ResolvedLiteralBuilder() + .set_value(Value::Int32(2)) + .set_type(types::Int32Type()) + .Build()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a3, + ResolvedLiteralBuilder() + .set_value(Value::Uint32(300)) + .set_type(types::Uint32Type()) + .Build()); + + std::vector> args; + args.push_back(std::move(a1)); + args.push_back(std::move(a2)); + args.push_back(std::move(a3)); + + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Coalesce(std::move(args))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:coalesce(repeated(3) INT64) -> INT64) ++-Literal(type=INT64, value=1) ++-Literal(type=INT32, value=2) ++-Literal(type=UINT32, value=300) +)")); +} + +TEST_F(FunctionCallBuilderTest, CoalesceWithSameCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"FOO", "und:ci"}, {"BaR", "und:ci"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Coalesce(std::move(args))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:coalesce(repeated(3) STRING) -> STRING) ++-type_annotation_map={Collation:"und:ci"} ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="FOO", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:ci"} + +-Literal(type=STRING, value="BaR", has_explicit_type=TRUE) + +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, CoalesceWithMixedCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"FOO", "binary"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Coalesce(std::move(args))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:coalesce(repeated(2) STRING) -> STRING) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"binary"} + +-Literal(type=STRING, value="FOO", has_explicit_type=TRUE) + +-Literal(type=STRING, value="binary", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, LessWithSameCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a1, + ResolvedLiteralBuilder() + .set_value(Value::String("a")) + .set_type(types::StringType()) + .Build()); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr a2, + ResolvedLiteralBuilder() + .set_value(Value::String("A")) + .set_type(types::StringType()) + .Build()); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr1, + testing::MakeCollateCallForTest( + std::move(a1), "und:ci", analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr expr2, + testing::MakeCollateCallForTest( + std::move(a2), "und:ci", analyzer_options_, catalog_, type_factory_)); + + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.Less(std::move(expr1), std::move(expr2))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:$less(STRING, STRING) -> BOOL) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="a") +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:ci"} + +-Literal(type=STRING, value="A") + +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-collation_list=[und:ci] +)")); +} + +TEST_F(FunctionCallBuilderTest, ArrayLengthWithSameCollationTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"bar", "und:ci"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr array_expr, + fn_builder_.MakeArray(args[0]->type(), args)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder_.ArrayLength(std::move(array_expr))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:array_length(ARRAY) -> INT64) ++-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) + +-type_annotation_map=[{Collation:"und:ci"}] + +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + | +-type_annotation_map={Collation:"und:ci"} + | +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) + | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) + +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) + +-type_annotation_map={Collation:"und:ci"} + +-Literal(type=STRING, value="bar", has_explicit_type=TRUE) + +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) +)")); +} + +TEST_F(FunctionCallBuilderTest, ArrayAtOffsetTest) { + ZETASQL_ASSERT_OK_AND_ASSIGN(std::vector> args, + testing::BuildResolvedLiteralsWithCollationForTest( + {{"foo", "und:ci"}, {"bar", "und:ci"}}, + analyzer_options_, catalog_, type_factory_)); + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr array_expr, + fn_builder_.MakeArray(args[0]->type(), args)); + std::unique_ptr offset_expr = + MakeResolvedLiteral(Value::Int64(0)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr resolved_fn, + fn_builder_.ArrayAtOffset(std::move(array_expr), std::move(offset_expr))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:$array_at_offset(ARRAY, INT64) -> STRING) ++-type_annotation_map={Collation:"und:ci"} ++-FunctionCall(ZetaSQL:$make_array(repeated(2) STRING) -> ARRAY) +| +-type_annotation_map=[{Collation:"und:ci"}] +| +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| | +-type_annotation_map={Collation:"und:ci"} +| | +-Literal(type=STRING, value="foo", has_explicit_type=TRUE) +| | +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) +| +-FunctionCall(ZetaSQL:collate(STRING, STRING) -> STRING) +| +-type_annotation_map={Collation:"und:ci"} +| +-Literal(type=STRING, value="bar", has_explicit_type=TRUE) +| +-Literal(type=STRING, value="und:ci", preserve_in_literal_remover=TRUE) ++-Literal(type=INT64, value=0) +)")); +} + +TEST_F(FunctionCallBuilderTest, ModInt64Test) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr resolved_fn, + fn_builder_.Mod(/*dividend_expr=*/MakeResolvedLiteral(Value::Int64(1)), + /*divisor_expr=*/MakeResolvedLiteral(Value::Int64(2)))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:mod(INT64, INT64) -> INT64) ++-Literal(type=INT64, value=1) ++-Literal(type=INT64, value=2) +)")); +} + +TEST_F(FunctionCallBuilderTest, ModUint64Test) { + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr resolved_fn, + fn_builder_.Mod(/*dividend_expr=*/MakeResolvedLiteral(Value::Uint64(1)), + /*divisor_expr=*/MakeResolvedLiteral(Value::Uint64(2)))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:mod(UINT64, UINT64) -> UINT64) ++-Literal(type=UINT64, value=1) ++-Literal(type=UINT64, value=2) +)")); +} + +// The catalog does not have the FN_MOD_NUMERIC signature. +TEST_F(FunctionCallBuilderTest, ModNumericNoSignatureTest) { + EXPECT_THAT( + fn_builder_.Mod(/*dividend_expr=*/MakeResolvedLiteral( + Value::Numeric(NumericValue(1))), + /*divisor_expr=*/MakeResolvedLiteral( + Value::Numeric(NumericValue(2)))), + StatusIs( + absl::StatusCode::kInvalidArgument, + ::testing::HasSubstr( + "The provided catalog does not have the FN_MOD_NUMERIC " + "signature. Did you forget to enable FEATURE_NUMERIC_TYPE?"))); +} + +// The new catalog has the FN_MOD_NUMERIC signature. +TEST_F(FunctionCallBuilderTest, ModNumericTest) { + AnalyzerOptions analyzer_options; + analyzer_options.mutable_language()->EnableLanguageFeature( + FEATURE_NUMERIC_TYPE); + SimpleCatalog catalog("mod_numeric_builder_catalog"); + catalog.AddBuiltinFunctions( + BuiltinFunctionOptions(analyzer_options.language())); + FunctionCallBuilder fn_builder(analyzer_options, catalog, type_factory_); + + ZETASQL_ASSERT_OK_AND_ASSIGN(std::unique_ptr resolved_fn, + fn_builder.Mod(/*dividend_expr=*/MakeResolvedLiteral( + Value::Numeric(NumericValue(1))), + /*divisor_expr=*/MakeResolvedLiteral( + Value::Numeric(NumericValue(2))))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:mod(NUMERIC, NUMERIC) -> NUMERIC) ++-Literal(type=NUMERIC, value=1) ++-Literal(type=NUMERIC, value=2) +)")); +} + +// The catalog does not have the FN_MOD_BIGNUMERIC signature. +TEST_F(FunctionCallBuilderTest, ModBigNumericNoSignatureTest) { + EXPECT_THAT( + fn_builder_.Mod(/*dividend_expr=*/MakeResolvedLiteral( + Value::BigNumeric(BigNumericValue(1))), + /*divisor_expr=*/MakeResolvedLiteral( + Value::BigNumeric(BigNumericValue(2)))), + StatusIs( + absl::StatusCode::kInvalidArgument, + ::testing::HasSubstr( + "The provided catalog does not have the FN_MOD_BIGNUMERIC " + "signature. Did you forget to enable FEATURE_BIGNUMERIC_TYPE?"))); +} + +// The new catalog has the FN_MOD_BIGNUMERIC signature. +TEST_F(FunctionCallBuilderTest, ModBigNumericTest) { + AnalyzerOptions analyzer_options; + analyzer_options.mutable_language()->EnableLanguageFeature( + FEATURE_BIGNUMERIC_TYPE); + SimpleCatalog catalog("mod_big_numeric_builder_catalog"); + catalog.AddBuiltinFunctions( + BuiltinFunctionOptions(analyzer_options.language())); + FunctionCallBuilder fn_builder(analyzer_options, catalog, type_factory_); + + ZETASQL_ASSERT_OK_AND_ASSIGN( + std::unique_ptr resolved_fn, + fn_builder.Mod(/*dividend_expr=*/MakeResolvedLiteral( + Value::BigNumeric(BigNumericValue(1))), + /*divisor_expr=*/MakeResolvedLiteral( + Value::BigNumeric(BigNumericValue(2))))); + + EXPECT_EQ(resolved_fn->DebugString(), absl::StripLeadingAsciiWhitespace(R"( +FunctionCall(ZetaSQL:mod(BIGNUMERIC, BIGNUMERIC) -> BIGNUMERIC) ++-Literal(type=BIGNUMERIC, value=1) ++-Literal(type=BIGNUMERIC, value=2) +)")); +} + +TEST_F(FunctionCallBuilderTest, ModInvalidInputTypeTest) { + EXPECT_THAT( + fn_builder_.Mod(/*dividend_expr=*/MakeResolvedLiteral(Value::String("a")), + /*divisor_expr=*/MakeResolvedLiteral(Value::String("b"))), + StatusIs(absl::StatusCode::kInvalidArgument, + ::testing::HasSubstr("Unsupported input type for mod: STRING"))); +} + class LikeAnyAllSubqueryScanBuilderTest : public ::testing::TestWithParam { public: diff --git a/zetasql/resolved_ast/sql_builder.cc b/zetasql/resolved_ast/sql_builder.cc index 4493cbefc..ebfcf614e 100644 --- a/zetasql/resolved_ast/sql_builder.cc +++ b/zetasql/resolved_ast/sql_builder.cc @@ -123,6 +123,10 @@ std::string SQLBuilder::GetColumnAlias(const ResolvedColumn& column) { return alias; } +bool SQLBuilder::HasColumnAlias(const ResolvedColumn& column) { + return zetasql_base::ContainsKey(computed_column_alias_, column.column_id()); +} + std::string SQLBuilder::UpdateColumnAlias(const ResolvedColumn& column) { auto it = computed_column_alias_.find(column.column_id()); ABSL_CHECK(it != computed_column_alias_.end()) @@ -156,7 +160,7 @@ class ColumnNameCollector : public ResolvedASTVisitor { : col_ref_names_(col_ref_names) {} private: - void Register(const std::string& col_name) { + void Register(absl::string_view col_name) { std::string name = absl::AsciiStrToLower(col_name); // Value tables do not currently participate in FilterScan flattening, so // avoid complexities and don't worry about refs to them. @@ -333,6 +337,22 @@ absl::flat_hash_map CreateGroupingColumnIdMap( } return grouping_column_id_map; } + +ResolvedColumnList GetRecursiveScanColumnsExcludingDepth( + const ResolvedRecursiveScan* scan) { + if (scan->recursion_depth_modifier() == nullptr) { + return scan->column_list(); + } + ResolvedColumn depth_column = + scan->recursion_depth_modifier()->recursion_depth_column()->column(); + ResolvedColumnList columns_excluding_depth; + for (const auto& col : scan->column_list()) { + if (col != depth_column) { + columns_excluding_depth.push_back(col); + } + } + return columns_excluding_depth; +} } // namespace absl::Status SQLBuilder::Process(const ResolvedNode& ast) { @@ -400,9 +420,11 @@ SQLBuilder::ProcessNode(const ResolvedNode* node) { return PopQueryFragment(); } -static std::string AddExplicitCast(const std::string& sql, const Type* type, - ProductMode mode) { - return absl::StrCat("CAST(", sql, " AS ", type->TypeName(mode), ")"); +static std::string AddExplicitCast(absl::string_view sql, const Type* type, + ProductMode mode, + bool use_external_float32) { + return absl::StrCat("CAST(", sql, " AS ", + type->TypeName(mode, use_external_float32), ")"); } // Usually we add explicit casts to ensure that typed literals match their @@ -416,7 +438,7 @@ static std::string AddExplicitCast(const std::string& sql, const Type* type, // remove the hack from this method. absl::StatusOr SQLBuilder::GetSQL( const Value& value, const AnnotationMap* annotation_map, ProductMode mode, - bool is_constant_value) { + bool use_external_float32, bool is_constant_value) { const Type* type = value.type(); if (annotation_map != nullptr) { @@ -432,7 +454,7 @@ absl::StatusOr SQLBuilder::GetSQL( // print them as a casted literal. return std::string("NULL"); } else if (!CollationAnnotation::ExistsIn(annotation_map)) { - return value.GetSQL(mode); + return value.GetSQL(mode, use_external_float32); } else { // TODO: Put this logic into value.GetSQL(mode) function to // avoid logic duplication. Would need to change the function signature or @@ -443,7 +465,7 @@ absl::StatusOr SQLBuilder::GetSQL( std::string type_name_with_collation, type->TypeNameWithModifiers( TypeModifiers::MakeTypeModifiers(TypeParameters(), collation), - mode)); + mode, use_external_float32)); return absl::StrCat("CAST(NULL AS ", type_name_with_collation, ")"); } } @@ -461,7 +483,7 @@ absl::StatusOr SQLBuilder::GetSQL( if (is_constant_value) { return value.DebugString(); } - return value.GetSQL(mode); + return value.GetSQL(mode, use_external_float32); } if (type->IsEnum()) { @@ -526,7 +548,7 @@ absl::StatusOr SQLBuilder::GetSQL( if (is_constant_value) { return literal_str; } - return AddExplicitCast(literal_str, type, mode); + return AddExplicitCast(literal_str, type, mode, use_external_float32); } if (type->IsStruct()) { // Once STRUCT<...>(...) syntax is supported in hints, and CASTs work in @@ -534,8 +556,9 @@ absl::StatusOr SQLBuilder::GetSQL( const StructType* struct_type = type->AsStruct(); std::vector fields_sql; for (const auto& field_value : value.fields()) { - ZETASQL_ASSIGN_OR_RETURN(const std::string result, - GetSQL(field_value, mode, is_constant_value)); + ZETASQL_ASSIGN_OR_RETURN( + const std::string result, + GetSQL(field_value, mode, use_external_float32, is_constant_value)); fields_sql.push_back(result); } // If any of the fields have names (are not anonymous) then we need to add @@ -548,25 +571,26 @@ absl::StatusOr SQLBuilder::GetSQL( has_explicit_field_name = true; field_name = absl::StrCat(ToIdentifierLiteral(field_type.name), " "); } - field_types.push_back( - absl::StrCat(field_name, field_type.type->TypeName(mode))); + field_types.push_back(absl::StrCat( + field_name, field_type.type->TypeName(mode, use_external_float32))); } ABSL_DCHECK_EQ(type->AsStruct()->num_fields(), fields_sql.size()); if (has_explicit_field_name) { return absl::StrCat("STRUCT<", absl::StrJoin(field_types, ", "), ">(", absl::StrJoin(fields_sql, ", "), ")"); } - return absl::StrCat(struct_type->TypeName(mode), "(", + return absl::StrCat(struct_type->TypeName(mode, use_external_float32), "(", absl::StrJoin(fields_sql, ", "), ")"); } if (type->IsArray()) { std::vector elements_sql; for (const auto& elem : value.elements()) { - ZETASQL_ASSIGN_OR_RETURN(const std::string result, - GetSQL(elem, mode, is_constant_value)); + ZETASQL_ASSIGN_OR_RETURN( + const std::string result, + GetSQL(elem, mode, use_external_float32, is_constant_value)); elements_sql.push_back(result); } - return absl::StrCat(type->TypeName(mode), "[", + return absl::StrCat(type->TypeName(mode, use_external_float32), "[", absl::StrJoin(elements_sql, ", "), "]"); } if (type->IsRange()) { @@ -583,7 +607,23 @@ absl::StatusOr SQLBuilder::GetSQL( absl::StatusOr SQLBuilder::GetSQL(const Value& value, ProductMode mode, bool is_constant_value) { - return GetSQL(value, /*annotation_map=*/nullptr, mode, is_constant_value); + return GetSQL(value, /*annotation_map=*/nullptr, mode, + /*use_external_float32=*/false, is_constant_value); +} + +absl::StatusOr SQLBuilder::GetSQL( + const Value& value, const AnnotationMap* annotation_map, ProductMode mode, + bool is_constant_value) { + return GetSQL(value, /*annotation_map=*/nullptr, mode, + /*use_external_float32=*/false, is_constant_value); +} + +absl::StatusOr SQLBuilder::GetSQL(const Value& value, + ProductMode mode, + bool use_external_float32, + bool is_constant_value) { + return GetSQL(value, /*annotation_map=*/nullptr, mode, use_external_float32, + is_constant_value); } absl::Status SQLBuilder::VisitResolvedCloneDataStmt( @@ -628,9 +668,11 @@ absl::Status SQLBuilder::VisitResolvedCatalogColumnRef( return absl::OkStatus(); } absl::Status SQLBuilder::VisitResolvedLiteral(const ResolvedLiteral* node) { - ZETASQL_ASSIGN_OR_RETURN(const std::string result, - GetSQL(node->value(), node->type_annotation_map(), - options_.language_options.product_mode())); + ZETASQL_ASSIGN_OR_RETURN( + const std::string result, + GetSQL(node->value(), node->type_annotation_map(), + options_.language_options.product_mode(), + options_.use_external_float32, /*is_constant_value=*/false)); PushQueryFragment(node, result); return absl::OkStatus(); } @@ -638,7 +680,7 @@ absl::Status SQLBuilder::VisitResolvedLiteral(const ResolvedLiteral* node) { absl::Status SQLBuilder::VisitResolvedConstant(const ResolvedConstant* node) { PushQueryFragment( node, absl::StrJoin(node->constant()->name_path(), ".", - [](std::string* out, const std::string& part) { + [](std::string* out, absl::string_view part) { absl::StrAppend(out, ToIdentifierLiteral(part)); })); return absl::OkStatus(); @@ -685,7 +727,8 @@ absl::Status SQLBuilder::VisitResolvedFunctionCall( // For MakeArray function we explicitly prepend the array type to the // function sql, and is passed as a part of the inputs. inputs.push_back( - node->type()->TypeName(options_.language_options.product_mode())); + node->type()->TypeName(options_.language_options.product_mode(), + options_.use_external_float32)); } for (const auto& argument : node->argument_list()) { ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, @@ -883,11 +926,11 @@ class SQLBuilder::AnalyticFunctionInfo { std::string GetSQL() const; - void set_partition_by(const std::string& partition_by) { + void set_partition_by(absl::string_view partition_by) { partition_by_ = partition_by; } - void set_order_by(const std::string& order_by) { order_by_ = order_by; } + void set_order_by(absl::string_view order_by) { order_by_ = order_by; } void set_window(const std::string& window) { window_ = window; } @@ -1357,7 +1400,8 @@ absl::Status SQLBuilder::VisitResolvedCast(const ResolvedCast* node) { ZETASQL_ASSIGN_OR_RETURN( std::string type_name, node->type()->TypeNameWithModifiers( - node->type_modifiers(), options_.language_options.product_mode())); + node->type_modifiers(), options_.language_options.product_mode(), + options_.use_external_float32)); std::string format_clause; if (node->format() != nullptr) { ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr format, @@ -1589,11 +1633,13 @@ absl::Status SQLBuilder::AppendColumnSchema( type->TypeNameWithModifiers( TypeModifiers::MakeTypeModifiers(annotations->type_parameters(), zetasql::Collation()), - options_.language_options.product_mode())); + options_.language_options.product_mode(), + options_.use_external_float32)); absl::StrAppend(text, typename_with_parameters); } else { - absl::StrAppend( - text, type->TypeName(options_.language_options.product_mode())); + absl::StrAppend(text, + type->TypeName(options_.language_options.product_mode(), + options_.use_external_float32)); } if (annotations != nullptr && annotations->collation_name() != nullptr) { @@ -1681,7 +1727,7 @@ absl::Status SQLBuilder::AppendColumnSchema( } absl::StatusOr SQLBuilder::GetHintListString( - const std::vector>& hint_list) { + absl::Span> hint_list) { if (hint_list.empty()) { return std::string() /* no hints */; } @@ -1758,7 +1804,7 @@ absl::Status SQLBuilder::VisitResolvedOption(const ResolvedOption* node) { ZETASQL_ASSIGN_OR_RETURN( const std::string result, GetSQL(literal->value(), options_.language_options.product_mode(), - true /* is_constant_value */)); + options_.use_external_float32, true /* is_constant_value */)); absl::StrAppend(&text, result); } else { ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, @@ -1797,7 +1843,8 @@ absl::Status SQLBuilder::VisitResolvedParameter(const ResolvedParameter* node) { if (node->position() - 1 < options_.undeclared_positional_parameters.size()) { param_str = AddExplicitCast(param_str, node->type(), - options_.language_options.product_mode()); + options_.language_options.product_mode(), + options_.use_external_float32); } PushQueryFragment(node, param_str); } else { @@ -1805,7 +1852,8 @@ absl::Status SQLBuilder::VisitResolvedParameter(const ResolvedParameter* node) { absl::StrCat("@", ToIdentifierLiteral(node->name())); if (zetasql_base::ContainsKey(options_.undeclared_parameters, node->name())) { param_str = AddExplicitCast(param_str, node->type(), - options_.language_options.product_mode()); + options_.language_options.product_mode(), + options_.use_external_float32); } PushQueryFragment(node, param_str); } @@ -1853,7 +1901,9 @@ absl::Status SQLBuilder::VisitResolvedMakeStruct( const StructType* struct_type = node->type()->AsStruct(); std::string text; absl::StrAppend( - &text, struct_type->TypeName(options_.language_options.product_mode()), + &text, + struct_type->TypeName(options_.language_options.product_mode(), + options_.use_external_float32), "("); if (struct_type->num_fields() != node->field_list_size()) { return ::zetasql_base::InvalidArgumentErrorBuilder() @@ -2710,7 +2760,74 @@ absl::StatusOr SQLBuilder::GetJoinOperand( return absl::StrCat("(", scan_f->GetSQL(), ") AS ", alias); } -std::string SQLBuilder::MakeNonconflictingAlias(const std::string& name) { +absl::StatusOr SQLBuilder::GetJoinOperand( + const ResolvedScan* scan, absl::flat_hash_map>& + right_to_left_column_id_mapping) { + std::string result; + ZETASQL_ASSIGN_OR_RETURN(std::string inner_result, GetJoinOperand(scan)); + std::string alias = GetScanAlias(scan); + + ZETASQL_ASSIGN_OR_RETURN(std::string using_augment, + GetUsingWrapper(alias, scan->column_list(), + right_to_left_column_id_mapping)); + absl::StrAppend(&result, "(SELECT ", using_augment, " FROM ", inner_result, + ") AS ", alias); + return result; +} + +absl::StatusOr SQLBuilder::GetUsingWrapper( + absl::string_view scan_alias, const ResolvedColumnList& column_list, + absl::flat_hash_map>& + right_to_left_column_id_mapping) { + std::string result; + absl::flat_hash_map updated_alias; + for (const auto& col : column_list) { + int right_scan_col_id = col.column_id(); + const std::string& column_alias = GetColumnAlias(col); + // The `right_to_left_column_id_mapping` will contain the `column_ids` of + // each item in the right scan column list which needs to be given a new + // alias corresponding to one in the left side of the join scan. + std::vector* right_using_ids = + zetasql_base::FindOrNull(right_to_left_column_id_mapping, right_scan_col_id); + if (right_to_left_column_id_mapping.contains(right_scan_col_id) && + right_using_ids != nullptr && !right_using_ids->empty()) { + int left_scan_col_id = right_using_ids->back(); + // Pop the left column id in case there are duplicate right columns in the + // USING(...) expression. + right_using_ids->pop_back(); + // The alias should have already been computed at this point. + + const std::string* const new_alias = + zetasql_base::FindOrNull(computed_column_alias_, left_scan_col_id); + ZETASQL_RET_CHECK(new_alias != nullptr); + if (!result.empty()) { + absl::StrAppend(&result, ", "); + } + absl::StrAppend(&result, scan_alias, ".", column_alias, " AS ", + *new_alias); + + // Update the computed column alias so that outer queries which need to + // reference this column use the updated alias. We'll update at the end + // in case there are duplicate right columns not in the USING clause. + if (right_using_ids->empty()) { + updated_alias[col] = new_alias; + } + } else { + // Write a copy for other columns which did not need a new alias. + if (!result.empty()) { + absl::StrAppend(&result, ", "); + } + absl::StrAppend(&result, scan_alias, ".", column_alias); + } + } + for (const auto& elem : updated_alias) { + computed_column_alias_[elem.first.column_id()] = *elem.second; + SetPathForColumn(elem.first, absl::StrCat(scan_alias, ".", *elem.second)); + } + return result; +} + +std::string SQLBuilder::MakeNonconflictingAlias(absl::string_view name) { const std::string alias_prefix_lower = absl::AsciiStrToLower(name); std::string alias; do { @@ -2828,27 +2945,203 @@ absl::Status SQLBuilder::SetPathForColumnsInReturningExpr( return absl::OkStatus(); } +absl::StatusOr ExtractColumnFromArgumentForUsing( + const ResolvedExpr* arg) { + const ResolvedColumnRef* c_ref; + if (arg->Is()) { + c_ref = arg->GetAs(); + } else if (arg->Is()) { + ZETASQL_RET_CHECK(arg->GetAs()->expr()->Is()); + c_ref = arg->GetAs()->expr()->GetAs(); + } else { + return absl::InvalidArgumentError("Bad argument in USING(...) expression"); + } + return c_ref->column(); +} + +bool ValidateArgumentForUsing(const ResolvedExpr* expr) { + // If there is only one column in the using clause the join expr is just + // an "$equal" function call between two columns. Otherwise, the join_expr + // is an "$and" function call joining multiple "$equal" function calls. This + // function validates each "$equal" function call and thus we check that the + // argument_list_size is 2. + auto func = expr->GetAs(); + if (func->argument_list_size() != 2) { + return false; + } + auto arg_one = func->argument_list(0); + auto arg_two = func->argument_list(1); + + // May be incorrect in the case that the two types can be coerced into the + // same supertype which supports equality. + if (!(arg_one->type()->Equals(arg_two->type()) || + (arg_one->type()->IsNumerical() && arg_two->type()->IsNumerical()))) { + return false; + } + + if (!(arg_one->Is() || + (arg_one->Is() && + arg_one->GetAs()->expr()->Is()))) { + return false; + } + if (!(arg_two->Is() || + (arg_two->Is() && + arg_two->GetAs()->expr()->Is()))) { + return false; + } + return true; +} + +bool ValidateUsingScan(const ResolvedJoinScan* node) { + if (node->join_expr() == nullptr) { + return false; + } + if (!node->join_expr()->Is()) { + return false; + } + auto func = node->join_expr()->GetAs(); + for (const auto& arg : func->argument_list()) { + // There are only two cases. + // 1. If there is only one argument in the using clause then the outer + // function is an "$equal" function call with two column refs as + // arguments. + // + // 2. If there are multiple arguments in the using clause then the + // outer function is "and" function call with each argument being + // an "$equal" function call. + if (!(arg->Is() || arg->Is() || + arg->Is())) { + return false; + } + if (arg->Is() || arg->Is()) { + if (!ValidateArgumentForUsing(node->join_expr())) { + return false; + } + break; + } else if (arg->Is()) { + if (!ValidateArgumentForUsing(arg.get())) { + return false; + } + } + } + return true; +} + absl::Status SQLBuilder::VisitResolvedJoinScan(const ResolvedJoinScan* node) { + absl::flat_hash_map> right_to_left_column_id_mapping; + // Only build using if the scan has using and the join scan is of the correct + // form. + bool build_using = node->has_using() && ValidateUsingScan(node); + + if (build_using) { + auto func = node->join_expr()->GetAs(); + // Need this to pass CheckFieldsAccessed. Validator should already confirm + // the shape of these things. + ZETASQL_RET_CHECK(func->function()->Name() == "$equal" || + func->function()->Name() == "$and"); + std::set left_column_ids; + for (const auto& arg : func->argument_list()) { + if (arg->Is()) { + auto sub_func = arg->GetAs(); + ZETASQL_RET_CHECK(sub_func->function()->Name() == "$equal"); + + const ResolvedExpr* left_arg = sub_func->argument_list(0); + const ResolvedExpr* right_arg = sub_func->argument_list(1); + ZETASQL_ASSIGN_OR_RETURN(const ResolvedColumn left_col, + ExtractColumnFromArgumentForUsing(left_arg)); + ZETASQL_ASSIGN_OR_RETURN(const ResolvedColumn right_col, + ExtractColumnFromArgumentForUsing(right_arg)); + int left_id = left_col.column_id(); + int right_id = right_col.column_id(); + + // Deparsing a statement with USING(...) that has duplicate left columns + // is unsupported. + if (left_column_ids.find(left_id) != left_column_ids.end()) { + build_using = false; + break; + } + left_column_ids.insert(left_id); + if (right_to_left_column_id_mapping.contains(right_id)) { + right_to_left_column_id_mapping[right_id].push_back(left_id); + } else { + right_to_left_column_id_mapping[right_id] = {left_id}; + } + } else { + const ResolvedExpr* right_arg = func->argument_list(1); + ZETASQL_ASSIGN_OR_RETURN(const ResolvedColumn left_col, + ExtractColumnFromArgumentForUsing(arg.get())); + ZETASQL_ASSIGN_OR_RETURN(const ResolvedColumn right_col, + ExtractColumnFromArgumentForUsing(right_arg)); + int left_id = left_col.column_id(); + int right_id = right_col.column_id(); + right_to_left_column_id_mapping[right_id] = {left_id}; + break; + } + } + // Need to check collation list in case there is a non default value. The + // collation list does not affect how the using clause is deparsed. + ZETASQL_RET_CHECK(func->collation_list().empty() || + func->collation_list_size() > 0); + } std::unique_ptr query_expression(new QueryExpression); ZETASQL_ASSIGN_OR_RETURN(const std::string left_join_operand, GetJoinOperand(node->left_scan())); - ZETASQL_ASSIGN_OR_RETURN(const std::string right_join_operand, - GetJoinOperand(node->right_scan())); + std::string right_join_operand; + if (build_using) { + ZETASQL_ASSIGN_OR_RETURN( + right_join_operand, + GetJoinOperand(node->right_scan(), right_to_left_column_id_mapping)); + } else { + ZETASQL_ASSIGN_OR_RETURN(right_join_operand, GetJoinOperand(node->right_scan())); + } std::string hints = ""; if (node->hint_list_size() > 0) { ZETASQL_RETURN_IF_ERROR(AppendHintsIfPresent(node->hint_list(), &hints)); } + + std::string join_expr_sql; + // A non-null `join_expr` in a non-USING join condition is necessary and + // sufficient for a JOIN with ON. + if (!build_using && node->join_expr() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, + ProcessNode(node->join_expr())); + absl::StrAppend(&join_expr_sql, " ON ", result->GetSQL()); + } + + if (build_using) { + std::string using_names; + auto func = node->join_expr()->GetAs(); + for (const auto& arg : func->argument_list()) { + if (arg->Is()) { + auto sub_func = arg->GetAs(); + ZETASQL_RET_CHECK(sub_func->function()->Name() == "$equal"); + + const ResolvedExpr* left_arg = sub_func->argument_list(0); + ZETASQL_ASSIGN_OR_RETURN(const ResolvedColumn column, + ExtractColumnFromArgumentForUsing(left_arg)); + const std::string& alias_name = GetColumnAlias(column); + if (!using_names.empty()) { + absl::StrAppend(&using_names, ","); + } + absl::StrAppend(&using_names, alias_name); + } else { + ZETASQL_ASSIGN_OR_RETURN(const ResolvedColumn column, + ExtractColumnFromArgumentForUsing(arg.get())); + const std::string& alias_name = GetColumnAlias(column); + absl::StrAppend(&using_names, alias_name); + break; + } + } + absl::StrAppend(&join_expr_sql, " USING (", using_names, ")"); + } + std::string from; absl::StrAppend( &from, left_join_operand, " ", GetJoinTypeString(node->join_type(), node->join_expr() != nullptr), hints, - " ", right_join_operand); - if (node->join_expr() != nullptr) { - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, - ProcessNode(node->join_expr())); - absl::StrAppend(&from, " ON ", result->GetSQL()); - } + " ", right_join_operand, join_expr_sql); + ZETASQL_RET_CHECK(query_expression->TrySetFromClause(from)); PushSQLForQueryExpression(node, query_expression.release()); return absl::OkStatus(); @@ -2929,10 +3222,34 @@ absl::Status SQLBuilder::VisitResolvedArrayScan(const ResolvedArrayScan* node) { absl::StrAppend(&from, join_operand, node->is_outer() ? " LEFT" : "", " JOIN "); } - ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, - ProcessNode(node->array_expr_list(0))); - absl::StrAppend(&from, "UNNEST(", result->GetSQL(), ") ", - GetColumnAlias(node->element_column_list(0))); + + bool is_multiway_enabled = options_.language_options.LanguageFeatureEnabled( + FEATURE_V_1_4_MULTIWAY_UNNEST); + absl::StrAppend(&from, "UNNEST("); + for (int i = 0; i < node->array_expr_list_size(); ++i) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr array_expr, + ProcessNode(node->array_expr_list(i))); + absl::StrAppend(&from, i > 0 ? ", " : "", array_expr->GetSQL()); + if (is_multiway_enabled) { + absl::StrAppend(&from, " AS ", + GetColumnAlias(node->element_column_list(i))); + } + } + + if (node->array_zip_mode() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr mode, + ProcessNode(node->array_zip_mode())); + absl::StrAppend(&from, ", mode =>", mode->GetSQL()); + } + absl::StrAppend(&from, ") "); + + // When multiway UNNEST language feature is disabled and we see a singleton + // UNNEST, we omit "AS" and write `UNNEST() ` directly for + // backward compatibility reason. + if (!is_multiway_enabled) { + ZETASQL_RET_CHECK(node->element_column_list_size() == 1); + absl::StrAppend(&from, GetColumnAlias(node->element_column_list(0))); + } if (node->array_offset_column() != nullptr) { ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, @@ -2963,15 +3280,9 @@ absl::Status SQLBuilder::VisitResolvedLimitOffsetScan( ZETASQL_RETURN_IF_ERROR( WrapQueryExpression(node->input_scan(), query_expression.get())); } - ZETASQL_ASSIGN_OR_RETURN( - std::unique_ptr result, - // If limit is a ResolvedCast, it means that the original limit in the - // query is a literal or parameter with a type other than int64_t and - // hence, analyzer has added a cast on top of it. We should skip this - // cast here to avoid returning CAST(CAST ...)). - ProcessNode(node->limit()->node_kind() != RESOLVED_CAST - ? node->limit() - : node->limit()->GetAs()->expr())); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, + ProcessNode(node->limit())); + ZETASQL_RET_CHECK(query_expression->TrySetLimitClause(result->GetSQL())); } if (node->offset() != nullptr) { @@ -2979,11 +3290,8 @@ absl::Status SQLBuilder::VisitResolvedLimitOffsetScan( ZETASQL_RETURN_IF_ERROR( WrapQueryExpression(node->input_scan(), query_expression.get())); } - ZETASQL_ASSIGN_OR_RETURN( - std::unique_ptr result, - ProcessNode(node->offset()->node_kind() != RESOLVED_CAST - ? node->offset() - : node->offset()->GetAs()->expr())); + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, + ProcessNode(node->offset())); ZETASQL_RET_CHECK(query_expression->TrySetOffsetClause(result->GetSQL())); } ZETASQL_RETURN_IF_ERROR( @@ -3045,7 +3353,7 @@ static bool HasDuplicateAliases( return false; } -static bool HasDuplicateAliases(const std::vector& aliases) { +static bool HasDuplicateAliases(absl::Span aliases) { absl::flat_hash_set seen_aliases; @@ -3164,8 +3472,9 @@ using ColumnPathAndAliasList = std::vector; // Helper method to provide a copy of the `inputs` in the form of // ColumnPathAndAliasList. ColumnPathAndAliasList ToColumnPathAndAliasList( - const std::vector>& inputs) { + absl::Span> + inputs) { ColumnPathAndAliasList column_path_and_alias_list; column_path_and_alias_list.reserve(inputs.size()); for (const auto& [column_path, column_alias] : inputs) { @@ -3401,7 +3710,7 @@ static absl::StatusOr GetScanColumnsFromOutputColumnList( absl::Status SQLBuilder::RenameSetOperationItemsFullCorresponding( const ResolvedSetOperationScan* node, - const std::vector>& final_column_list, + absl::Span> final_column_list, std::vector>& set_op_scan_list) { // `final_column_list` positionally matches and is equivalent to the // `output_column_list` of each set operation item. Their structures can be @@ -3558,7 +3867,7 @@ static bool AliasAssignmentShouldRespectTheOrderInOutputColumnList( absl::Status SQLBuilder::RenameSetOperationItemsNonFullCorresponding( const ResolvedSetOperationScan* node, - const std::vector>& final_column_list, + absl::Span> final_column_list, std::vector>& set_op_scan_list) { ZETASQL_RET_CHECK_NE(node, nullptr); // Unparse and rename each set operation item. @@ -3835,7 +4144,9 @@ absl::Status SQLBuilder::ProcessAggregateScanBase( collation.CollationName(); } - for (const auto& computed_col : node->aggregate_list()) { + for (const auto& computed_column : node->aggregate_list()) { + const auto* computed_col = + computed_column->GetAs(); ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, ProcessNode(computed_col->expr())); zetasql_base::InsertOrDie(&pending_columns_, computed_col->column().column_id(), @@ -3844,8 +4155,11 @@ absl::Status SQLBuilder::ProcessAggregateScanBase( std::string group_by_hints; ZETASQL_RETURN_IF_ERROR(AppendHintsIfPresent(node->hint_list(), &group_by_hints)); - if (zetasql_base::ContainsKeyValuePair(options_.target_syntax, node, - SQLBuildTargetSyntax::kGroupByAll)) { + + ZETASQL_ASSIGN_OR_RETURN(bool has_grouping_func, HasGroupingCallNode(node)); + bool has_group_by_all = zetasql_base::ContainsKeyValuePair( + options_.target_syntax, node, SQLBuildTargetSyntax::kGroupByAll); + if (has_group_by_all && !has_grouping_func) { ZETASQL_RETURN_IF_ERROR( query_expression->SetGroupByAllClause(group_by_list, group_by_hints)); } else { @@ -4007,11 +4321,18 @@ absl::Status SQLBuilder::VisitResolvedWithScan(const ResolvedWithScan* node) { bool actually_recursive = false; std::optional recursive_scan = MaybeGetRecursiveScan(with_entry.get()); + std::string modifier_sql; if (recursive_scan.has_value()) { has_recursive_entries = true; actually_recursive = true; - recursive_query_info_.push( - {ToIdentifierLiteral(name), recursive_scan.value()}); + const ResolvedRecursiveScan* scan = recursive_scan.value(); + recursive_query_info_.push({ToIdentifierLiteral(name), scan}); + + if (scan->recursion_depth_modifier() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN(auto modifier, + ProcessNode(scan->recursion_depth_modifier())); + modifier_sql = modifier->GetSQL(); + } } ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, ProcessNode(scan)); @@ -4020,9 +4341,11 @@ absl::Status SQLBuilder::VisitResolvedWithScan(const ResolvedWithScan* node) { ZETASQL_RETURN_IF_ERROR( AddSelectListIfNeeded(scan->column_list(), query_expression.get())); - with_list.push_back(std::make_pair( - ToIdentifierLiteral(name), - absl::StrCat("(", query_expression->GetSQLQuery(), ")"))); + with_list.push_back( + std::make_pair(ToIdentifierLiteral(name), + absl::StrCat("(", query_expression->GetSQLQuery(), ")", + std::move(modifier_sql)))); + SetPathForColumnList(scan->column_list(), ToIdentifierLiteral(name)); if (actually_recursive) { @@ -4081,9 +4404,10 @@ absl::Status SQLBuilder::VisitResolvedSampleScan( if (node->size()->node_kind() == RESOLVED_LITERAL) { const Value value = node->size()->GetAs()->value(); ZETASQL_RET_CHECK(!value.is_null()); - ZETASQL_ASSIGN_OR_RETURN(const std::string value_sql, - GetSQL(value, options_.language_options.product_mode(), - true /* is_constant_value */)); + ZETASQL_ASSIGN_OR_RETURN( + const std::string value_sql, + GetSQL(value, options_.language_options.product_mode(), + options_.use_external_float32, true /* is_constant_value */)); absl::StrAppend(&sample, value_sql); } else { ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr size, @@ -4181,7 +4505,7 @@ absl::Status SQLBuilder::VisitResolvedSampleScan( } absl::Status SQLBuilder::MatchOutputColumns( - const std::vector>& + absl::Span> output_column_list, const ResolvedScan* query, QueryExpression* query_expression) { ResolvedColumnList column_list; @@ -4370,25 +4694,22 @@ absl::Status SQLBuilder::MaybeSetupRecursiveView( node->name_path(), ".", [](std::string* out, const std::string& name) { absl::StrAppend(out, ToIdentifierLiteral(name)); }); - recursive_query_info_.push( - {query_name, node->query()->GetAs()}); + const auto* recursive_scan = node->query()->GetAs(); + ZETASQL_RET_CHECK(recursive_scan->recursion_depth_modifier() == nullptr) + << "Recursive view should NOT have recursion depth modifier."; + recursive_query_info_.push({query_name, recursive_scan}); // Force the actual column names to be used against the recursive table; // we cannot use generated column names with an outer SELECT wrapper, as // that wrapper would violate the form that recursive queries must follow, // preventing the unparsed string from resolving. - ZETASQL_RET_CHECK_EQ(node->query()->column_list_size(), - node->query() - ->GetAs() - ->non_recursive_term() - ->output_column_list_size()); - for (int i = 0; i < node->query()->column_list_size(); ++i) { - const ResolvedColumn recursive_query_column = node->query()->column_list(i); + ZETASQL_RET_CHECK_EQ(recursive_scan->column_list_size(), + recursive_scan->non_recursive_term()->output_column_list_size()); + for (int i = 0; i < recursive_scan->column_list_size(); ++i) { + const ResolvedColumn recursive_query_column = + recursive_scan->column_list(i); const ResolvedColumn nonrecursive_term_column = - node->query() - ->GetAs() - ->non_recursive_term() - ->output_column_list(i); + recursive_scan->non_recursive_term()->output_column_list(i); ZETASQL_RET_CHECK_EQ(recursive_query_column.name(), nonrecursive_term_column.name()); zetasql_base::InsertOrDie(&computed_column_alias_, @@ -4502,7 +4823,7 @@ absl::Status SQLBuilder::GetCreateStatementPrefix( } absl::Status SQLBuilder::GetPartitionByListString( - const std::vector>& partition_by_list, + absl::Span> partition_by_list, std::string* sql) { ABSL_DCHECK(!partition_by_list.empty()); @@ -4519,7 +4840,7 @@ absl::Status SQLBuilder::GetPartitionByListString( } absl::Status SQLBuilder::GetTableAndColumnInfoList( - const std::vector>& + absl::Span> table_and_column_info_list, std::string* sql) { std::vector expressions; @@ -4580,7 +4901,7 @@ static std::string GetColumnListSql( } static std::string GetColumnListSql( - const std::vector& column_index_list, + absl::Span column_index_list, const std::function& get_name) { std::vector column_names; column_names.reserve(column_index_list.size()); @@ -4791,6 +5112,24 @@ absl::Status SQLBuilder::VisitResolvedCreateSchemaStmt( return absl::OkStatus(); } +absl::Status SQLBuilder::VisitResolvedCreateExternalSchemaStmt( + const ResolvedCreateExternalSchemaStmt* node) { + std::string sql; + ZETASQL_RETURN_IF_ERROR(GetCreateStatementPrefix(node, "EXTERNAL SCHEMA", &sql)); + + if (node->connection() != nullptr) { + const std::string connection_alias = + ToIdentifierLiteral(node->connection()->connection()->Name()); + absl::StrAppend(&sql, "WITH CONNECTION ", connection_alias, " "); + } + + ZETASQL_ASSIGN_OR_RETURN(const std::string options_string, + GetHintListString(node->option_list())); + absl::StrAppend(&sql, " OPTIONS(", options_string, ")"); + PushQueryFragment(node, sql); + return absl::OkStatus(); +} + absl::Status SQLBuilder::AppendCloneDataSource(const ResolvedScan* source, std::string* sql) { ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, ProcessNode(source)); @@ -5313,7 +5652,7 @@ absl::Status SQLBuilder::VisitResolvedCreateExternalTableStmt( } absl::StatusOr SQLBuilder::GetFunctionArgListString( - const std::vector& arg_name_list, + absl::Span arg_name_list, const FunctionSignature& signature) { if (signature.arguments().empty()) { return std::string(); // no args @@ -5507,13 +5846,13 @@ absl::Status SQLBuilder::VisitResolvedCreateProcedureStmt( absl::Status SQLBuilder::VisitResolvedArgumentDef( const ResolvedArgumentDef* node) { PushQueryFragment( - node, - absl::StrCat( - ToIdentifierLiteral(node->name()), " ", - node->type()->TypeName(options_.language_options.product_mode()), - node->argument_kind() == ResolvedArgumentDef::NOT_AGGREGATE - ? " NOT AGGREGATE" - : "")); + node, absl::StrCat( + ToIdentifierLiteral(node->name()), " ", + node->type()->TypeName(options_.language_options.product_mode(), + options_.use_external_float32), + node->argument_kind() == ResolvedArgumentDef::NOT_AGGREGATE + ? " NOT AGGREGATE" + : "")); return absl::OkStatus(); } @@ -5669,8 +6008,10 @@ absl::Status SQLBuilder::VisitResolvedShowStmt(const ResolvedShowStmt* node) { if (node->like_expr() != nullptr) { const Value value = node->like_expr()->value(); ZETASQL_RET_CHECK(!value.is_null()); - ZETASQL_ASSIGN_OR_RETURN(const std::string result, - GetSQL(value, options_.language_options.product_mode())); + ZETASQL_ASSIGN_OR_RETURN( + const std::string result, + GetSQL(value, options_.language_options.product_mode(), + options_.use_external_float32, /*is_constant_value=*/false)); absl::StrAppend(&sql, " LIKE ", result); } PushQueryFragment(node, sql); @@ -5924,8 +6265,8 @@ absl::Status SQLBuilder::VisitResolvedReturningClause( absl::StrAppend(&sql, " ", result->GetSQL(), " AS ", ToIdentifierLiteral(col->name())); } else { - absl::StrAppend(&sql, " ", returning_table_alias_, ".", - col->column().name(), " AS ", + absl::StrAppend(&sql, " `", returning_table_alias_, "`.`", + col->column().name(), "` AS ", ToIdentifierLiteral(col->name())); } @@ -5950,6 +6291,11 @@ absl::Status SQLBuilder::VisitResolvedUndropStmt( ProcessNode(node->for_system_time_expr())); absl::StrAppend(&sql, " FOR SYSTEM_TIME AS OF ", result->GetSQL()); } + if (node->option_list_size() > 0) { + ZETASQL_ASSIGN_OR_RETURN(const std::string options_string, + GetHintListString(node->option_list())); + absl::StrAppend(&sql, " OPTIONS(", options_string, ")"); + } PushQueryFragment(node, sql); return absl::OkStatus(); } @@ -5967,7 +6313,7 @@ static std::string GetDropModeSQL(ResolvedDropStmtEnums::DropMode mode) { absl::Status SQLBuilder::VisitResolvedDropStmt(const ResolvedDropStmt* node) { std::string sql; - absl::StrAppend(&sql, "DROP ", ToIdentifierLiteral(node->object_type()), + absl::StrAppend(&sql, "DROP ", node->object_type(), node->is_if_exists() ? " IF EXISTS " : " ", IdentifierPathToString(node->name_path())); absl::StrAppend(&sql, GetDropModeSQL(node->drop_mode())); @@ -6541,6 +6887,11 @@ absl::Status SQLBuilder::VisitResolvedAlterSchemaStmt( return GetResolvedAlterObjectStmtSQL(node, "SCHEMA"); } +absl::Status SQLBuilder::VisitResolvedAlterExternalSchemaStmt( + const ResolvedAlterExternalSchemaStmt* node) { + return GetResolvedAlterObjectStmtSQL(node, "EXTERNAL SCHEMA"); +} + absl::Status SQLBuilder::VisitResolvedAlterTableSetOptionsStmt( const ResolvedAlterTableSetOptionsStmt* node) { std::string sql = "ALTER TABLE "; @@ -6591,7 +6942,7 @@ absl::Status SQLBuilder::VisitResolvedAlterModelStmt( } absl::StatusOr SQLBuilder::GetAlterActionListSQL( - const std::vector>& + absl::Span> alter_action_list) { std::vector alter_action_sql; for (const auto& alter_action : alter_action_list) { @@ -6899,8 +7250,8 @@ class EscapeFormatter { } // namespace absl::StatusOr SQLBuilder::GetGranteeListSQL( - const std::string& prefix, const std::vector& grantee_list, - const std::vector>& grantee_expr_list) { + absl::string_view prefix, const std::vector& grantee_list, + absl::Span> grantee_expr_list) { std::string sql; // We ABSL_CHECK the expected invariant that only one of grantee_list or // grantee_expr_list is empty. @@ -7283,7 +7634,7 @@ absl::Status SQLBuilder::DefaultVisit(const ResolvedNode* node) { } absl::StatusOr SQLBuilder::GetUpdateItemListSQL( - const std::vector>& + absl::Span> update_item_list) { std::vector update_item_list_sql; update_item_list_sql.reserve(update_item_list.size()); @@ -7299,7 +7650,7 @@ absl::StatusOr SQLBuilder::GetUpdateItemListSQL( } std::string SQLBuilder::GetInsertColumnListSQL( - const std::vector& insert_column_list) const { + absl::Span insert_column_list) const { std::vector columns_sql; columns_sql.reserve(insert_column_list.size()); for (const auto& col : insert_column_list) { @@ -7354,10 +7705,12 @@ absl::Status SQLBuilder::VisitResolvedRecursiveScan( std::unique_ptr query_expression(new QueryExpression); std::vector> set_op_scan_list; + const ResolvedColumnList columns_excluding_depth = + GetRecursiveScanColumnsExcludingDepth(node); for (const auto& input_item : std::vector{ node->non_recursive_term(), node->recursive_term()}) { ZETASQL_RET_CHECK_EQ(input_item->output_column_list_size(), - node->column_list_size()); + columns_excluding_depth.size()); const ResolvedScan* scan = input_item->scan(); ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr result, ProcessNode(scan)); set_op_scan_list.push_back(std::move(result->query_expression)); @@ -7372,8 +7725,8 @@ absl::Status SQLBuilder::VisitResolvedRecursiveScan( // If node->column_list() was empty, first_select_list will have a NULL // added. ZETASQL_RET_CHECK_EQ(first_select_list.size(), - std::max(node->column_list_size(), 1)); - for (int i = 0; i < node->column_list_size(); i++) { + std::max(columns_excluding_depth.size(), 1)); + for (int i = 0; i < columns_excluding_depth.size(); i++) { if (zetasql_base::ContainsKey(computed_column_alias_, node->column_list(i).column_id())) { ZETASQL_RET_CHECK_EQ( @@ -7413,6 +7766,28 @@ absl::Status SQLBuilder::VisitResolvedRecursiveScan( return absl::OkStatus(); } +absl::Status SQLBuilder::VisitResolvedRecursionDepthModifier( + const ResolvedRecursionDepthModifier* node) { + std::string lower_bound = "UNBOUNDED", upper_bound = "UNBOUNDED"; + if (node->lower_bound() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr lower, + ProcessNode(node->lower_bound())); + lower_bound = lower->GetSQL(); + } + if (node->upper_bound() != nullptr) { + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr upper, + ProcessNode(node->upper_bound())); + upper_bound = upper->GetSQL(); + } + std::string sql = "WITH DEPTH"; + ZETASQL_ASSIGN_OR_RETURN(std::unique_ptr column_alias, + ProcessNode(node->recursion_depth_column())); + absl::StrAppend(&sql, " AS ", column_alias->GetSQL(), " BETWEEN ", + std::move(lower_bound), " AND ", std::move(upper_bound)); + PushQueryFragment(node, std::move(sql)); + return absl::OkStatus(); +} + absl::Status SQLBuilder::VisitResolvedRecursiveRefScan( const ResolvedRecursiveRefScan* node) { std::unique_ptr query_expression(new QueryExpression); @@ -7423,10 +7798,14 @@ absl::Status SQLBuilder::VisitResolvedRecursiveRefScan( const ResolvedScan* with_scan = recursive_query_info_.top().scan; std::string query_name = recursive_query_info_.top().query_name; + ZETASQL_RET_CHECK(with_scan->Is()); + const ResolvedColumnList columns_excluding_depth = + GetRecursiveScanColumnsExcludingDepth( + with_scan->GetAs()); std::string from; absl::StrAppend(&from, query_name, " AS ", alias); - ZETASQL_RET_CHECK_EQ(node->column_list_size(), with_scan->column_list_size()); + ZETASQL_RET_CHECK_EQ(node->column_list_size(), columns_excluding_depth.size()); for (int i = 0; i < node->column_list_size(); ++i) { // Entry was added to computed_column_alias_ back in // VisitResolvedRecursiveScan() while processing the non-recursive term; a diff --git a/zetasql/resolved_ast/sql_builder.h b/zetasql/resolved_ast/sql_builder.h index e125f64c1..9cf50b50c 100644 --- a/zetasql/resolved_ast/sql_builder.h +++ b/zetasql/resolved_ast/sql_builder.h @@ -45,6 +45,7 @@ #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" namespace zetasql { @@ -78,11 +79,15 @@ class SQLBuilder : public ResolvedASTVisitor { // PRODUCT_EXTERNAL is the default product mode. language_options.set_product_mode(PRODUCT_EXTERNAL); } - explicit SQLBuilderOptions(ProductMode product_mode) { + explicit SQLBuilderOptions(ProductMode product_mode, + bool use_external_float32 = false) + : use_external_float32(use_external_float32) { language_options.set_product_mode(product_mode); } - explicit SQLBuilderOptions(const LanguageOptions& language_options) - : language_options(language_options) {} + explicit SQLBuilderOptions(const LanguageOptions& language_options, + bool use_external_float32 = false) + : language_options(language_options), + use_external_float32(use_external_float32) {} ~SQLBuilderOptions() {} // Language options included enabled/disabled features and product mode, @@ -129,6 +134,12 @@ class SQLBuilder : public ResolvedASTVisitor { // It's an advanced feature with specific intended use cases and misusing it // can cause the SQLBuilder to misbehave. TargetSyntaxMap target_syntax; + + // Setting to true will return FLOAT32 as the type name for TYPE_FLOAT, + // for PRODUCT_EXTERNAL mode. + // TODO: Remove once all engines are updated to use FLOAT32 in + // the external mode. + bool use_external_float32 = false; }; explicit SQLBuilder(const SQLBuilderOptions& options = SQLBuilderOptions()); @@ -155,6 +166,8 @@ class SQLBuilder : public ResolvedASTVisitor { const ResolvedCreateModelStmt* node) override; absl::Status VisitResolvedCreateSchemaStmt( const ResolvedCreateSchemaStmt* node) override; + absl::Status VisitResolvedCreateExternalSchemaStmt( + const ResolvedCreateExternalSchemaStmt* node) override; absl::Status VisitResolvedCreateTableStmt( const ResolvedCreateTableStmt* node) override; absl::Status VisitResolvedCreateSnapshotTableStmt( @@ -253,6 +266,8 @@ class SQLBuilder : public ResolvedASTVisitor { const ResolvedAlterAllRowAccessPoliciesStmt* node) override; absl::Status VisitResolvedAlterSchemaStmt( const ResolvedAlterSchemaStmt* node) override; + absl::Status VisitResolvedAlterExternalSchemaStmt( + const ResolvedAlterExternalSchemaStmt* node) override; absl::Status VisitResolvedAlterTableSetOptionsStmt( const ResolvedAlterTableSetOptionsStmt* node) override; absl::Status VisitResolvedAlterTableStmt( @@ -376,6 +391,8 @@ class SQLBuilder : public ResolvedASTVisitor { absl::Status VisitResolvedWithScan(const ResolvedWithScan* node) override; absl::Status VisitResolvedRecursiveRefScan( const ResolvedRecursiveRefScan* node) override; + absl::Status VisitResolvedRecursionDepthModifier( + const ResolvedRecursionDepthModifier* node) override; absl::Status VisitResolvedWithRefScan( const ResolvedWithRefScan* node) override; absl::Status VisitResolvedSampleScan( @@ -490,7 +507,7 @@ class SQLBuilder : public ResolvedASTVisitor { const ResolvedColumnDefaultValue* default_value, std::string* text); absl::StatusOr GetHintListString( - const std::vector>& hint_list); + absl::Span> hint_list); absl::Status AppendCloneDataSource(const ResolvedScan* source, std::string* sql); @@ -515,6 +532,9 @@ class SQLBuilder : public ResolvedASTVisitor { // i.e. . std::string GetColumnPath(const ResolvedColumn& column); + // Returns whether the column has an existing alias. + bool HasColumnAlias(const ResolvedColumn& column); + // Returns the alias to be used to select the column. std::string GetColumnAlias(const ResolvedColumn& column); @@ -525,9 +545,47 @@ class SQLBuilder : public ResolvedASTVisitor { // scan alias if necessary. absl::StatusOr GetJoinOperand(const ResolvedScan* scan); + // For USING(...) expressions track `right_to_left_column_id_mapping`. More + // details can be found in the function comment for `GetUsingWrapper`. + absl::StatusOr GetJoinOperand( + const ResolvedScan* scan, absl::flat_hash_map>& + right_to_left_column_id_mapping); + + // Returns wrapper sql text for a JOIN with USING(...) expression for the + // right side of the join to ensure that columns within the USING(...) + // expression have the same alias. + // + // The following query fragment + // + // (select key from keyvalue) join (select key, value from keyvalue2) + // using (key) + // + // would get deparsed into + // + // (select key as a1 from keyvalue) sq1 + // join (select sq2.a2 as a1, sq2.a3 from + // (select key as a2, value as a3 from keyvalue2) sq2) AS sq2 + // using (a1) + // + // The wrapper wraps the right side of the join to match column aliases with + // the left side of the join (`sq2.a2 as a1`) and re-aliases these columns in + // the case that outer parts of the query reference them. + // + // We maintain the `right_to_left_column_id_mapping` from `int` to + // `vector` in the case that there are duplicate columns in the + // USING(...) expression on the right side of the join. + // + // We do not support deparsing USING(...) with duplicate left columns. + // If there is a duplicate left column then the sql_builder will build JOIN + // ON. + absl::StatusOr GetUsingWrapper( + absl::string_view scan_alias, const ResolvedColumnList& column_list, + absl::flat_hash_map>& + right_to_left_column_id_mapping); + // Helper function which fetches the list of function arguments absl::StatusOr GetFunctionArgListString( - const std::vector& arg_name_list, + absl::Span arg_name_list, const FunctionSignature& signature); // Fetches the scan alias corresponding to the given scan node. If not @@ -540,7 +598,7 @@ class SQLBuilder : public ResolvedASTVisitor { // Helper for the above. Keep generating aliases until find one that does not // conflict with a column name. - std::string MakeNonconflictingAlias(const std::string& name); + std::string MakeNonconflictingAlias(absl::string_view name); // Checks whether the table can be used without an explicit "AS" clause, which // will use the last component of its table name as its alias. If we have two @@ -572,13 +630,13 @@ class SQLBuilder : public ResolvedASTVisitor { // Appends PARTITION BY or CLUSTER BY expressions to the provided string, not // including the "PARTITION BY " or "CLUSTER BY " prefix. absl::Status GetPartitionByListString( - const std::vector>& partition_by_list, + absl::Span> partition_by_list, std::string* sql); // Helper function to get corresponding SQL for a list of TableAndColumnInfo // to be analyzed in ANALYZE STATEMENT. absl::Status GetTableAndColumnInfoList( - const std::vector>& + absl::Span> table_and_column_info_list, std::string* sql); @@ -611,7 +669,7 @@ class SQLBuilder : public ResolvedASTVisitor { // - Column aliases in (if not internal) matches the // select-list. absl::Status MatchOutputColumns( - const std::vector>& + absl::Span> output_column_list, const ResolvedScan* query, QueryExpression* query_expression); @@ -627,7 +685,14 @@ class SQLBuilder : public ResolvedASTVisitor { static absl::StatusOr GetNullHandlingModifier( ResolvedNonScalarFunctionCallBase::NullHandlingModifier kind); + // Setting the optional parameter `use_external_float32` to true will return + // FLOAT32 as the type name for TYPE_FLOAT, for PRODUCT_EXTERNAL mode. + // TODO: Remove `use_external_float32` once all engines are + // updated to use FLOAT32 as the external name. + absl::StatusOr GetSQL(const Value& value, ProductMode mode, + bool is_constant_value = false); absl::StatusOr GetSQL(const Value& value, ProductMode mode, + bool use_external_float32, bool is_constant_value = false); // Similar to the above function, but uses to indicate the @@ -638,6 +703,11 @@ class SQLBuilder : public ResolvedASTVisitor { const AnnotationMap* annotation_map, ProductMode mode, bool is_constant_value = false); + absl::StatusOr GetSQL(const Value& value, + const AnnotationMap* annotation_map, + ProductMode mode, + bool use_external_float32, + bool is_constant_value = false); absl::StatusOr GetFunctionCallSQL( const ResolvedFunctionCallBase* function_call, @@ -646,13 +716,13 @@ class SQLBuilder : public ResolvedASTVisitor { // Helper function to return corresponding SQL for a list of // ResolvedUpdateItems. absl::StatusOr GetUpdateItemListSQL( - const std::vector>& + absl::Span> update_item_list); // Helper function to return corresponding SQL for a list of columns to be // inserted. std::string GetInsertColumnListSQL( - const std::vector& insert_column_list) const; + absl::Span insert_column_list) const; absl::Status ProcessWithPartitionColumns( std::string* sql, const ResolvedWithPartitionColumns* node); @@ -674,7 +744,7 @@ class SQLBuilder : public ResolvedASTVisitor { // Helper function to return corresponding SQL for a list of // ResolvedAlterActions. absl::StatusOr GetAlterActionListSQL( - const std::vector>& + absl::Span> alter_action_list); // Helper function to return corresponding SQL for a single @@ -689,9 +759,8 @@ class SQLBuilder : public ResolvedASTVisitor { // Helper function to return corresponding SQL from the grantee list of // GRANT, REVOKE, CREATE/ALTER ROW POLICY statements. absl::StatusOr GetGranteeListSQL( - const std::string& prefix, const std::vector& grantee_list, - const std::vector>& - grantee_expr_list); + absl::string_view prefix, const std::vector& grantee_list, + absl::Span> grantee_expr_list); // Helper function to append table_element, including column_schema and // table_constraints, to sql. absl::Status ProcessTableElementsBase( @@ -888,7 +957,7 @@ class SQLBuilder : public ResolvedASTVisitor { // which is handled by method RenameSetOperationItemsNonFullCorresponding. absl::Status RenameSetOperationItemsFullCorresponding( const ResolvedSetOperationScan* node, - const std::vector>& final_column_list, + absl::Span> final_column_list, std::vector>& set_op_scan_list); // Renames the columns of each set operation item (represented by each entry @@ -898,7 +967,7 @@ class SQLBuilder : public ResolvedASTVisitor { // - or `node->column_match_mode` is CORRESPONDING_BY (not CORRESPONDING). absl::Status RenameSetOperationItemsNonFullCorresponding( const ResolvedSetOperationScan* node, - const std::vector>& final_column_list, + absl::Span> final_column_list, std::vector>& set_op_scan_list); // Returns the original scan from the input `scan`. diff --git a/zetasql/resolved_ast/test_utils.cc b/zetasql/resolved_ast/test_utils.cc index 4189a62e1..bd3fea6b7 100644 --- a/zetasql/resolved_ast/test_utils.cc +++ b/zetasql/resolved_ast/test_utils.cc @@ -104,7 +104,7 @@ std::unique_ptr WrapInFunctionCall( absl::StatusOr>> BuildResolvedLiteralsWithCollationForTest( std::vector> literals, - AnalyzerOptions& analyzer_options, Catalog& catlog, + AnalyzerOptions& analyzer_options, Catalog& catalog, TypeFactory& type_factory) { std::vector> result; @@ -115,7 +115,7 @@ BuildResolvedLiteralsWithCollationForTest( ZETASQL_ASSIGN_OR_RETURN( std::unique_ptr expr, MakeCollateCallForTest(std::move(arg), collation_str, analyzer_options, - catlog, type_factory)); + catalog, type_factory)); result.push_back(std::move(expr)); } return result; diff --git a/zetasql/resolved_ast/test_utils.h b/zetasql/resolved_ast/test_utils.h index f953d7acb..134976f73 100644 --- a/zetasql/resolved_ast/test_utils.h +++ b/zetasql/resolved_ast/test_utils.h @@ -48,7 +48,7 @@ absl::StatusOr> MakeCollateCallForTest( absl::StatusOr>> BuildResolvedLiteralsWithCollationForTest( std::vector> literals, - AnalyzerOptions& analyzer_options, Catalog& catlog, + AnalyzerOptions& analyzer_options, Catalog& catalog, TypeFactory& type_factory); // Convenience function to create concat function for string literals. diff --git a/zetasql/resolved_ast/validator.cc b/zetasql/resolved_ast/validator.cc index 71911afc8..d82babccd 100644 --- a/zetasql/resolved_ast/validator.cc +++ b/zetasql/resolved_ast/validator.cc @@ -17,6 +17,7 @@ #include "zetasql/resolved_ast/validator.h" #include +#include #include #include #include @@ -70,6 +71,7 @@ #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" #include "zetasql/base/source_location.h" #include "zetasql/base/stl_util.h" @@ -184,7 +186,7 @@ Validator::~Validator() {} absl::Status Validator::ValidateResolvedExprList( const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& expr_list) { + absl::Span> expr_list) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); for (const auto& expr_iter : expr_list) { ZETASQL_RETURN_IF_ERROR(ValidateResolvedExpr(visible_columns, visible_parameters, @@ -196,7 +198,7 @@ absl::Status Validator::ValidateResolvedExprList( absl::Status Validator::ValidateResolvedFunctionArgumentList( const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& + absl::Span> expr_list) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); for (const auto& expr_iter : expr_list) { @@ -368,7 +370,7 @@ absl::Status Validator::ValidateResolvedFilterField( absl::Status Validator::ValidateArgumentAliases( const FunctionSignature& signature, - const std::vector>& + absl::Span> arguments) { VALIDATOR_RET_CHECK_EQ(arguments.size(), signature.NumConcreteArguments()); for (int i = 0; i < arguments.size(); i++) { @@ -475,6 +477,13 @@ absl::Status Validator::ValidateResolvedFunctionCallBase( return absl::OkStatus(); } +absl::Status Validator::ValidateFinalState() { + VALIDATOR_RET_CHECK(unconsumed_side_effect_columns_.empty()) + << "Unconsumed side effect columns: " + << absl::StrJoin(unconsumed_side_effect_columns_, ", "); + return absl::OkStatus(); +} + absl::Status Validator::ValidateStandaloneResolvedExpr( const ResolvedExpr* expr) { Reset(); @@ -495,7 +504,8 @@ absl::Status Validator::ValidateStandaloneResolvedExpr( << expr->DebugString(ResolvedNode::DebugStringConfig{ {{error_context_, "(validation failed here)"}}, false}); } - return absl::OkStatus(); + + return ValidateFinalState(); } absl::Status Validator::ValidateResolvedExpr( @@ -681,27 +691,31 @@ absl::Status Validator::ValidateOrderByAndLimitClausesOfAggregateFunctionCall( } if (aggregate_function_call->limit() != nullptr) { - ZETASQL_RETURN_IF_ERROR( - ValidateArgumentIsInt64Constant(aggregate_function_call->limit())); + bool validate_constant_nonnegative = + !language_options_.LanguageFeatureEnabled( + FEATURE_V_1_4_LIMIT_OFFSET_EXPRESSIONS); + ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64( + aggregate_function_call->limit(), validate_constant_nonnegative, + absl::StrCat("Limit in ", function_name))); } return absl::OkStatus(); } absl::Status Validator::ValidateGroupingSetListAreEmpty( - const std::vector>& + absl::Span> grouping_set_list, - const std::vector>& + absl::Span> rollup_column_list) { VALIDATOR_RET_CHECK(grouping_set_list.empty() && rollup_column_list.empty()); return absl::OkStatus(); } absl::Status Validator::ValidateGroupingSetList( - const std::vector>& + absl::Span> grouping_set_list, - const std::vector>& + absl::Span> rollup_column_list, - const std::vector>& + absl::Span> group_by_list) { if (!grouping_set_list.empty()) { // This validation is true when only ROLLUP exists in the query. @@ -738,7 +752,8 @@ absl::Status Validator::ValidateGroupingSetList( } } - bool has_non_empty_grouping_sets = false; + // All columns used in grouping sets, rollup and cube. + std::set grouping_sets_all_columns; for (const auto& grouping_set_base : grouping_set_list) { if (grouping_set_base->Is()) { const ResolvedGroupingSet* grouping_set = @@ -751,9 +766,8 @@ absl::Status Validator::ValidateGroupingSetList( VALIDATOR_RET_CHECK(zetasql_base::InsertIfNotPresent(&grouping_set_columns, column_ref->column())); } - if (!grouping_set->group_by_column_list().empty()) { - has_non_empty_grouping_sets = true; - } + grouping_sets_all_columns.insert(grouping_set_columns.begin(), + grouping_set_columns.end()); } else { // rollup or cube VALIDATOR_RET_CHECK(grouping_set_base->Is() || @@ -775,17 +789,15 @@ absl::Status Validator::ValidateGroupingSetList( VALIDATOR_RET_CHECK(zetasql_base::InsertIfNotPresent(&multi_column_set, column_ref->column())); } + grouping_sets_all_columns.insert(multi_column_set.begin(), + multi_column_set.end()); } } } - // Validate the group_by_column being non-empty only when the query has - // non-empty grouping sets. A query with a list of empty grouping sets - // (a.k.a GROUPING SETS((), (), (), ...)) is valid, and there are no - // group by keys in this case. - if (has_non_empty_grouping_sets) { - // group_by_columns should be non-empty, and each item in the rollup list - // or a grouping set should be a computed column from group_by_columns. - VALIDATOR_RET_CHECK(!group_by_columns.empty()); + // Verify that all group by columns are also in grouping sets/rollup/cube. + for (const ResolvedColumn& column : group_by_columns) { + ZETASQL_RETURN_IF_ERROR( + CheckColumnIsPresentInColumnSet(column, grouping_sets_all_columns)); } } else { // Presence of grouping sets should indicate that there is a rollup list. @@ -1234,19 +1246,27 @@ absl::Status Validator::ValidateResolvedWithExpr( absl::Status Validator::ValidateResolvedComputedColumn( const std::set& visible_columns, const std::set& visible_parameters, - const ResolvedComputedColumn* computed_column) { + const ResolvedComputedColumnBase* computed_column) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); VALIDATOR_RET_CHECK(nullptr != computed_column); + + if (computed_column->Is()) { + VALIDATOR_RET_CHECK(language_options_.LanguageFeatureEnabled( + FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION)); + } + + VALIDATOR_RET_CHECK(computed_column->Is()); + auto computed_col = computed_column->GetAs(); PushErrorContext push(this, computed_column); - const ResolvedExpr* expr = computed_column->expr(); + const ResolvedExpr* expr = computed_col->expr(); VALIDATOR_RET_CHECK(nullptr != expr); - ZETASQL_RETURN_IF_ERROR(ValidateResolvedExpr(visible_columns, visible_parameters, - expr)); - VALIDATOR_RET_CHECK(computed_column->column().type()->Equals(expr->type())) - << computed_column->DebugString() // includes newline - << "column: " << computed_column->column().DebugString() - << " type: " << computed_column->column().type()->DebugString(); + ZETASQL_RETURN_IF_ERROR( + ValidateResolvedExpr(visible_columns, visible_parameters, expr)); + VALIDATOR_RET_CHECK(computed_col->column().type()->Equals(expr->type())) + << computed_col->DebugString() // includes newline + << "column: " << computed_col->column().DebugString() + << " type: " << computed_col->column().type()->DebugString(); // TODO: Enable this check. // VALIDATOR_RET_CHECK(AnnotationMap::HasEqualAnnotations( @@ -1255,12 +1275,12 @@ absl::Status Validator::ValidateResolvedComputedColumn( // TODO: Add a more general check to handle any ResolvedExpr // (not just RESOLVED_COLUMN_REF). The ResolvedExpr should not // reference the ResolvedColumn to be computed. - if (computed_column->expr()->node_kind() == RESOLVED_COLUMN_REF && - computed_column->column() == - computed_column->expr()->GetAs()->column()) { + if (computed_col->expr()->node_kind() == RESOLVED_COLUMN_REF && + computed_col->column() == + computed_col->expr()->GetAs()->column()) { return InternalErrorBuilder() << "ResolvedComputedColumn expression cannot reference itself: " - << computed_column->DebugString(); + << computed_col->DebugString(); } return absl::OkStatus(); } @@ -1268,7 +1288,7 @@ absl::Status Validator::ValidateResolvedComputedColumn( absl::Status Validator::ValidateResolvedComputedColumnList( const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& + absl::Span> computed_column_list) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); for (const auto& computed_column : computed_column_list) { @@ -1279,9 +1299,22 @@ absl::Status Validator::ValidateResolvedComputedColumnList( return absl::OkStatus(); } +absl::Status Validator::ValidateResolvedComputedColumnList( + const std::set& visible_columns, + const std::set& visible_parameters, + absl::Span> + computed_column_list) { + RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); + for (const auto& computed_column : computed_column_list) { + ZETASQL_RETURN_IF_ERROR(ValidateResolvedComputedColumn( + visible_columns, visible_parameters, computed_column.get())); + } + return absl::OkStatus(); +} + absl::Status Validator::ValidateGroupingFunctionCallList( const std::set& visible_columns, - const std::vector>& + absl::Span> grouping_call_list, const std::set& group_by_columns) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); @@ -1307,8 +1340,8 @@ absl::Status Validator::ValidateResolvedOutputColumn( } absl::Status Validator::ValidateResolvedOutputColumnList( - const std::vector& visible_columns, - const std::vector>& + absl::Span visible_columns, + absl::Span> output_column_list, bool is_value_table) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); @@ -1361,13 +1394,33 @@ absl::Status Validator::AddColumnList( return absl::OkStatus(); } +absl::Status Validator::MakeColumnList( + const ResolvedColumnList& column_list, + std::set* visible_columns) { + VALIDATOR_RET_CHECK_NE(visible_columns, nullptr); + for (const ResolvedColumn& column : column_list) { + visible_columns->insert(column); + } + return absl::OkStatus(); +} + absl::Status Validator::AddColumnFromComputedColumn( - const ResolvedComputedColumn* computed_column, + const ResolvedComputedColumnBase* computed_column, std::set* visible_columns) { VALIDATOR_RET_CHECK(nullptr != visible_columns && nullptr != computed_column); PushErrorContext push(this, computed_column); - ZETASQL_RETURN_IF_ERROR(CheckUniqueColumnId(computed_column->column())); - visible_columns->insert(computed_column->column()); + auto computed_col = computed_column->GetAs(); + ZETASQL_RETURN_IF_ERROR(CheckUniqueColumnId(computed_col->column())); + visible_columns->insert(computed_col->column()); + if (computed_column->Is()) { + const ResolvedColumn& side_effect_column = + computed_column->GetAs() + ->side_effect_column(); + VALIDATOR_RET_CHECK(side_effect_column.type()->IsBytes()); + ZETASQL_RETURN_IF_ERROR(CheckUniqueColumnId(side_effect_column)); + visible_columns->insert(side_effect_column); + unconsumed_side_effect_columns_.insert(side_effect_column.column_id()); + } return absl::OkStatus(); } @@ -1381,19 +1434,31 @@ absl::Status Validator::AddGroupingFunctionCallColumn( } absl::Status Validator::AddColumnsFromComputedColumnList( - const std::vector>& + absl::Span> computed_column_list, std::set* visible_columns) { VALIDATOR_RET_CHECK(nullptr != visible_columns); for (const auto& computed_column : computed_column_list) { - ZETASQL_RETURN_IF_ERROR(AddColumnFromComputedColumn(computed_column.get(), - visible_columns)); + ZETASQL_RETURN_IF_ERROR( + AddColumnFromComputedColumn(computed_column.get(), visible_columns)); + } + return absl::OkStatus(); +} + +absl::Status Validator::AddColumnsFromComputedColumnList( + absl::Span> + computed_column_list, + std::set* visible_columns) { + VALIDATOR_RET_CHECK(nullptr != visible_columns); + for (const auto& computed_column : computed_column_list) { + ZETASQL_RETURN_IF_ERROR( + AddColumnFromComputedColumn(computed_column.get(), visible_columns)); } return absl::OkStatus(); } absl::Status Validator::AddColumnsFromGroupingCallList( - const std::vector>& + absl::Span> grouping_call_list, std::set* visible_columns) { VALIDATOR_RET_CHECK(nullptr != visible_columns); @@ -1478,6 +1543,7 @@ absl::Status Validator::ValidateResolvedJoinScan( const std::set visible_columns = zetasql_base::STLSetUnion(left_visible_columns, right_visible_columns); + if (nullptr != scan->join_expr()) { ZETASQL_RETURN_IF_ERROR(ValidateResolvedExpr(visible_columns, visible_parameters, scan->join_expr())); @@ -1485,6 +1551,7 @@ absl::Status Validator::ValidateResolvedJoinScan( << "JoinScan has join_expr with non-BOOL type: " << scan->join_expr()->type()->DebugString(); } + ZETASQL_RETURN_IF_ERROR(CheckColumnList(scan, visible_columns)); return absl::OkStatus(); @@ -1532,17 +1599,26 @@ absl::Status Validator::ValidateResolvedArrayScan( AddColumnList(scan->input_scan()->column_list(), &visible_columns)); } - // TODO: b/236300834 - update validator to support multiple array elements. - VALIDATOR_RET_CHECK_EQ(scan->array_expr_list_size(), 1); - VALIDATOR_RET_CHECK(scan->array_expr_list(0) != nullptr); - ZETASQL_RETURN_IF_ERROR(ValidateResolvedExpr(visible_columns, visible_parameters, - scan->array_expr_list(0))); - VALIDATOR_RET_CHECK(scan->array_expr_list(0)->type()->IsArray()) - << "ArrayScan of non-ARRAY type: " - << scan->array_expr_list(0)->type()->DebugString(); - VALIDATOR_RET_CHECK(scan->element_column_list_size() == 1); - ZETASQL_RETURN_IF_ERROR(CheckUniqueColumnId(scan->element_column_list(0))); - visible_columns.insert(scan->element_column_list(0)); + // Validate named argument `mode`. + if (scan->array_zip_mode() != nullptr) { + VALIDATOR_RET_CHECK(scan->array_zip_mode()->type()->IsEnum()); + ZETASQL_RETURN_IF_ERROR(ValidateResolvedExpr(visible_columns, visible_parameters, + scan->array_zip_mode())); + } + + VALIDATOR_RET_CHECK_EQ(scan->array_expr_list_size(), + scan->element_column_list_size()); + for (int i = 0; i < scan->array_expr_list_size(); ++i) { + VALIDATOR_RET_CHECK(scan->array_expr_list(i) != nullptr); + ZETASQL_RETURN_IF_ERROR(ValidateResolvedExpr(visible_columns, visible_parameters, + scan->array_expr_list(i))); + VALIDATOR_RET_CHECK(scan->array_expr_list(i)->type()->IsArray()) + << "ArrayScan of non-ARRAY type: " + << scan->array_expr_list(i)->type()->DebugString(); + ZETASQL_RETURN_IF_ERROR(CheckUniqueColumnId(scan->element_column_list(i))); + visible_columns.insert(scan->element_column_list(i)); + } + if (nullptr != scan->array_offset_column()) { ZETASQL_RETURN_IF_ERROR(CheckUniqueColumnId(scan->array_offset_column()->column())); visible_columns.insert(scan->array_offset_column()->column()); @@ -1555,9 +1631,7 @@ absl::Status Validator::ValidateResolvedArrayScan( << scan->join_expr()->type()->DebugString(); } ZETASQL_RETURN_IF_ERROR(CheckColumnList(scan, visible_columns)); - scan->is_outer(); // Mark field as visited. - scan->array_zip_mode(); // Mark field as visited. return absl::OkStatus(); } @@ -1583,7 +1657,7 @@ absl::Status Validator::ValidateResolvedFilterScan( } absl::Status Validator::ValidateResolvedAggregateComputedColumn( - const ResolvedComputedColumn* computed_column, + const ResolvedComputedColumnImpl* computed_column, const std::set& input_scan_visible_columns, const std::set& visible_parameters) { RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); @@ -1636,9 +1710,10 @@ absl::Status Validator::ValidateResolvedAggregateScanBase( // Validates other constructs in aggregates such as ORDER BY and LIMIT. for (const auto& computed_column : scan->aggregate_list()) { + ZETASQL_RET_CHECK(computed_column->Is()); ZETASQL_RETURN_IF_ERROR(ValidateResolvedAggregateComputedColumn( - computed_column.get(), *input_scan_visible_columns, - visible_parameters)); + computed_column->GetAs(), + *input_scan_visible_columns, visible_parameters)); } ZETASQL_RETURN_IF_ERROR(ValidateGroupingSetList(scan->grouping_set_list(), @@ -1857,7 +1932,7 @@ absl::Status Validator::ValidateGroupSelectionThresholdExpr( const ResolvedExpr* group_threshold_expr, const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& scan_options, + absl::Span> scan_options, absl::string_view expression_name) { if (group_threshold_expr == nullptr) { return absl::OkStatus(); @@ -2021,7 +2096,9 @@ absl::Status Validator::ValidateResolvedSampleScan( const ResolvedSampleScan::SampleUnit unit = scan->unit(); if (unit == ResolvedSampleScan::ROWS) { - ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64Constant(scan->size())); + ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64( + scan->size(), /*validate_constant_nonnegative=*/true, + /*context_msg=*/"Sample unit")); } else { VALIDATOR_RET_CHECK_EQ(unit, ResolvedSampleScan::PERCENT); ZETASQL_RETURN_IF_ERROR(ValidatePercentArgument(scan->size())); @@ -2029,8 +2106,9 @@ absl::Status Validator::ValidateResolvedSampleScan( VALIDATOR_RET_CHECK(scan->size() != nullptr); if (scan->repeatable_argument() != nullptr) { - ZETASQL_RETURN_IF_ERROR( - ValidateArgumentIsInt64Constant(scan->repeatable_argument())); + ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64( + scan->repeatable_argument(), /*validate_constant_nonnegative=*/true, + /*context_msg=*/"Sample repeatable argument")); } std::set visible_columns; @@ -2413,45 +2491,6 @@ absl::Status Validator::ValidateResolvedSetOperationScan( input_item.get(), set_op_scan->column_list(), visible_parameters)); } - // TODO: This validation cannot be enabled for BY_POSITION - // right now because the BY_POSITION set operations generated from rewriter - // potentially have `node_source` = - // kNodeSourceResolverSetOperationCorresponding. Enable this when the set - // operation rewriter is deprecated. - if (set_op_scan->column_match_mode() == - ResolvedSetOperationScan::CORRESPONDING || - set_op_scan->column_match_mode() == - ResolvedSetOperationScan::CORRESPONDING_BY) { - // `column_propagation_mode`-specific validation for set operation items. - switch (set_op_scan->column_propagation_mode()) { - case ResolvedSetOperationScan::LEFT: { - const ResolvedSetOperationItem& first_item = - *set_op_scan->input_item_list(0); - // The first item must not be wrapped with a ProjectScan to pad NULL - // columns. - VALIDATOR_RET_CHECK_NE(first_item.scan()->node_source(), - kNodeSourceResolverSetOperationCorresponding) - << " The first scan for LEFT mode should not have node_source = " - << kNodeSourceResolverSetOperationCorresponding; - break; - } - case ResolvedSetOperationScan::FULL: - case ResolvedSetOperationScan::INNER: { - break; - } - case ResolvedSetOperationScan::STRICT: { - // No item should be wrapped with a ProjectScan to pad NULL columns. - for (const auto& input_item : set_op_scan->input_item_list()) { - VALIDATOR_RET_CHECK_NE(input_item->scan()->node_source(), - kNodeSourceResolverSetOperationCorresponding) - << " STRICT mode should not have node_source = " - << kNodeSourceResolverSetOperationCorresponding; - } - break; - } - } - } - set_op_scan->op_type(); // Mark field as visited. switch (set_op_scan->column_match_mode()) { @@ -2489,23 +2528,28 @@ static bool IsResolvedLiteralOrParameter(ResolvedNodeKind kind) { return kind == RESOLVED_LITERAL || kind == RESOLVED_PARAMETER; } -absl::Status Validator::ValidateArgumentIsInt64Constant( - const ResolvedExpr* expr) { +absl::Status Validator::ValidateArgumentIsInt64( + const ResolvedExpr* expr, bool validate_constant_nonnegative, + absl::string_view context_msg) { VALIDATOR_RET_CHECK(expr != nullptr); PushErrorContext push(this, expr); ZETASQL_RETURN_IF_ERROR(ValidateResolvedExpr( /*visible_columns=*/{}, /*visible_parameters=*/{}, expr)); + VALIDATOR_RET_CHECK(expr->type()->IsInt64()) + << context_msg << " must take INT64"; + + if (!validate_constant_nonnegative) { + return absl::OkStatus(); + } + VALIDATOR_RET_CHECK(IsResolvedLiteralOrParameter(expr->node_kind()) || (expr->node_kind() == RESOLVED_CAST && expr->type()->IsInt64() && IsResolvedLiteralOrParameter( expr->GetAs()->expr()->node_kind()))) - << "LIMIT ... OFFSET ... arg is of incorrect node kind: " - << expr->node_kind_string(); - - VALIDATOR_RET_CHECK(expr->type()->IsInt64()) - << "LIMIT ... OFFSET .... literal must be an integer"; + << context_msg + << " arg is of incorrect node kind: " << expr->node_kind_string(); if (expr->node_kind() == RESOLVED_LITERAL) { // If a literal, we can also validate its value. @@ -2533,15 +2577,21 @@ absl::Status Validator::ValidateResolvedLimitOffsetScan( PushErrorContext push(this, scan); VALIDATOR_RET_CHECK(scan->limit() != nullptr); - ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64Constant(scan->limit())); + const bool validate_constant_nonnegative = + !language_options_.LanguageFeatureEnabled( + FEATURE_V_1_4_LIMIT_OFFSET_EXPRESSIONS); + + constexpr absl::string_view kClauseName = "LIMIT ... OFFSET ..."; + ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64( + scan->limit(), validate_constant_nonnegative, kClauseName)); if (scan->offset() != nullptr) { // OFFSET is optional. - ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64Constant(scan->offset())); + ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64( + scan->offset(), validate_constant_nonnegative, kClauseName)); } - ZETASQL_RETURN_IF_ERROR(ValidateResolvedScan(scan->input_scan(), - visible_parameters)); + ZETASQL_RETURN_IF_ERROR(ValidateResolvedScan(scan->input_scan(), visible_parameters)); return absl::OkStatus(); } @@ -2990,12 +3040,14 @@ void Validator::Reset() { column_ids_seen_.clear(); context_stack_.clear(); error_context_ = nullptr; + unconsumed_side_effect_columns_.clear(); } absl::Status Validator::ValidateResolvedStatement( const ResolvedStatement* statement) { Reset(); - return ValidateResolvedStatementInternal(statement); + ZETASQL_RETURN_IF_ERROR(ValidateResolvedStatementInternal(statement)); + return ValidateFinalState(); } absl::Status Validator::ValidateResolvedStatementInternal( @@ -3024,6 +3076,10 @@ absl::Status Validator::ValidateResolvedStatementInternal( status = ValidateResolvedCreateSchemaStmt( statement->GetAs()); break; + case RESOLVED_CREATE_EXTERNAL_SCHEMA_STMT: + status = ValidateResolvedCreateExternalSchemaStmt( + statement->GetAs()); + break; case RESOLVED_CREATE_SNAPSHOT_TABLE_STMT: status = ValidateResolvedCreateSnapshotTableStmt( statement->GetAs()); @@ -3238,6 +3294,10 @@ absl::Status Validator::ValidateResolvedStatementInternal( status = ValidateResolvedAlterObjectStmt( statement->GetAs()); break; + case RESOLVED_ALTER_EXTERNAL_SCHEMA_STMT: + status = ValidateResolvedAlterObjectStmt( + statement->GetAs()); + break; case RESOLVED_ALTER_TABLE_STMT: status = ValidateResolvedAlterObjectStmt( statement->GetAs()); @@ -3383,6 +3443,15 @@ absl::Status Validator::ValidateResolvedCreateSchemaStmt( return absl::OkStatus(); } +absl::Status Validator::ValidateResolvedCreateExternalSchemaStmt( + const ResolvedCreateExternalSchemaStmt* stmt) { + RETURN_ERROR_IF_OUT_OF_STACK_SPACE(); + PushErrorContext push(this, stmt); + VALIDATOR_RET_CHECK(stmt->connection() != nullptr); + ZETASQL_RETURN_IF_ERROR(ValidateOptionsList(stmt->option_list())); + return absl::OkStatus(); +} + absl::Status Validator::ValidateResolvedForeignKey( const ResolvedForeignKey* foreign_key, const std::vector column_types, @@ -3469,7 +3538,7 @@ absl::Status Validator::ValidateResolvedPrimaryKey( } absl::Status Validator::ValidateColumnDefinitions( - const std::vector>& + absl::Span> column_definitions, std::set* visible_columns) { for (const auto& column_definition : column_definitions) { @@ -3764,24 +3833,28 @@ absl::Status Validator::ValidateResolvedCreateModelStmt( if (stmt->is_remote()) { // Remote model. VALIDATOR_RET_CHECK(enable_remote_model) << "Remote model is not supported"; - - if (stmt->query() == nullptr && stmt->aliased_query_list().empty()) { + VALIDATOR_RET_CHECK(stmt->transform_list().empty()) + << "Remote model training with transform list is not supported"; + VALIDATOR_RET_CHECK(stmt->transform_input_column_list().empty()); + VALIDATOR_RET_CHECK(stmt->transform_output_column_list().empty()); + VALIDATOR_RET_CHECK(stmt->transform_analytic_function_group_list().empty()); + VALIDATOR_RET_CHECK(stmt->aliased_query_list().empty()) + << "Remote model training with aliased query list is not supported"; + + if (stmt->query() == nullptr) { // External model. - // Input & output are optional. ZETASQL_RETURN_IF_ERROR(validate_input_output_columns()); // The rest must be empty. - VALIDATOR_RET_CHECK_EQ(stmt->query(), nullptr); VALIDATOR_RET_CHECK(stmt->output_column_list().empty()); - VALIDATOR_RET_CHECK(stmt->transform_list().empty()); - VALIDATOR_RET_CHECK(stmt->transform_input_column_list().empty()); - VALIDATOR_RET_CHECK(stmt->transform_output_column_list().empty()); - VALIDATOR_RET_CHECK( - stmt->transform_analytic_function_group_list().empty()); } else { - // Remotely trained model. - VALIDATOR_RET_CHECK_FAIL() << "Remotely trained models are unsupported"; + // Remotely trained model with AS SELECT clause. + ZETASQL_RETURN_IF_ERROR( + ValidateResolvedScan(stmt->query(), {} /* visible_parameters */)); + ZETASQL_RETURN_IF_ERROR(ValidateResolvedOutputColumnList( + stmt->query()->column_list(), stmt->output_column_list(), + /*is_value_table=*/false)); } } else { // Local model. @@ -4752,11 +4825,27 @@ absl::Status Validator::ValidateResolvedRecursiveScan( << "Recursive scan detected in non-recursive context"; VALIDATOR_RET_CHECK(scan->non_recursive_term() != nullptr); VALIDATOR_RET_CHECK(scan->recursive_term() != nullptr); + + ResolvedColumn depth_column; + if (scan->recursion_depth_modifier() != nullptr) { + VALIDATOR_RET_CHECK_OK(ValidateResolvedRecursionDepthModifier( + scan->recursion_depth_modifier(), scan->column_list())); + depth_column = + scan->recursion_depth_modifier()->recursion_depth_column()->column(); + } + + ResolvedColumnList columns_excluding_depth; + for (const auto& col : scan->column_list()) { + if (col != depth_column) { + columns_excluding_depth.push_back(col); + } + } + ZETASQL_RETURN_IF_ERROR(ValidateResolvedSetOperationItem( - scan->non_recursive_term(), scan->column_list(), visible_parameters)); + scan->non_recursive_term(), columns_excluding_depth, visible_parameters)); nested_recursive_scans_.push_back(RecursiveScanInfo(scan)); ZETASQL_RETURN_IF_ERROR(ValidateResolvedSetOperationItem( - scan->recursive_term(), scan->column_list(), visible_parameters)); + scan->recursive_term(), columns_excluding_depth, visible_parameters)); VALIDATOR_RET_CHECK_EQ(nested_recursive_scans_.back().scan, scan); VALIDATOR_RET_CHECK(nested_recursive_scans_.back().saw_recursive_ref) << "Recursive scan generated without a recursive reference in the " @@ -4768,7 +4857,45 @@ absl::Status Validator::ValidateResolvedRecursiveScan( for (const ResolvedColumn& column : scan->column_list()) { ZETASQL_RETURN_IF_ERROR(CheckUniqueColumnId(column)); } + return absl::Status(); +} + +absl::Status Validator::ValidateResolvedRecursionDepthModifier( + const ResolvedRecursionDepthModifier* modifier, + const ResolvedColumnList& recursion_column_list) { + if (modifier->lower_bound() != nullptr) { + ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64( + modifier->lower_bound(), /*validate_constant_nonnegative=*/true, + /*context_msg=*/"Recursion lower bound")); + } + if (modifier->upper_bound() != nullptr) { + ZETASQL_RETURN_IF_ERROR(ValidateArgumentIsInt64( + modifier->upper_bound(), /*validate_constant_nonnegative=*/true, + /*context_msg=*/"Recursion upper bound")); + } + if (modifier->lower_bound() != nullptr && + modifier->upper_bound() != nullptr && + modifier->lower_bound()->Is() && + modifier->upper_bound()->Is()) { + const auto& lower_value = + modifier->lower_bound()->GetAs()->value(); + VALIDATOR_RET_CHECK(lower_value.is_valid()); + VALIDATOR_RET_CHECK(!lower_value.is_null()); + VALIDATOR_RET_CHECK(lower_value.type()->IsInt64()); + + const auto& upper_value = + modifier->upper_bound()->GetAs()->value(); + VALIDATOR_RET_CHECK(upper_value.is_valid()); + VALIDATOR_RET_CHECK(!upper_value.is_null()); + VALIDATOR_RET_CHECK(upper_value.type()->IsInt64()); + VALIDATOR_RET_CHECK_LE(lower_value.int64_value(), + upper_value.int64_value()); + } + VALIDATOR_RET_CHECK(modifier->recursion_depth_column() != nullptr); + VALIDATOR_RET_CHECK(absl::c_linear_search( + recursion_column_list, modifier->recursion_depth_column()->column())) + << "Recursion depth column is not in the output column list"; return absl::OkStatus(); } @@ -4869,7 +4996,7 @@ absl::Status Validator::ValidateGroupRowsScan( } absl::Status Validator::ValidateHintList( - const std::vector>& hint_list) { + absl::Span> hint_list) { for (const std::unique_ptr& hint : hint_list) { // The value in a Hint must be a constant so we don't pass any visible // column names. @@ -4882,7 +5009,7 @@ absl::Status Validator::ValidateHintList( } absl::Status Validator::ValidateOptionsList( - const std::vector>& hint_list) { + absl::Span> hint_list) { for (const std::unique_ptr& hint : hint_list) { // The value in a Hint must be a constant so we don't pass any visible // column names. @@ -4897,7 +5024,7 @@ absl::Status Validator::ValidateOptionsList( template absl::Status Validator::ValidateOptionsList( - const std::vector>& list, + absl::Span> list, const MapType& allowed_options, const std::set& visible_columns, const std::set& visible_parameters, @@ -4949,7 +5076,7 @@ absl::Status Validator::ValidateResolvedTableAndColumnInfo( } absl::Status Validator::ValidateResolvedTableAndColumnInfoList( - const std::vector>& + absl::Span> table_and_column_info_list) { std::set table_names; for (const std::unique_ptr& @@ -5036,7 +5163,9 @@ absl::Status Validator::ValidateResolvedDMLStmt( if (stmt->assert_rows_modified() != nullptr) { ZETASQL_RETURN_IF_ERROR( - ValidateArgumentIsInt64Constant(stmt->assert_rows_modified()->rows())); + ValidateArgumentIsInt64(stmt->assert_rows_modified()->rows(), + /*validate_constant_nonnegative=*/true, + /*context_msg=*/"Assert rows modified")); } return absl::OkStatus(); diff --git a/zetasql/resolved_ast/validator.h b/zetasql/resolved_ast/validator.h index 16c549feb..5a3a7538d 100644 --- a/zetasql/resolved_ast/validator.h +++ b/zetasql/resolved_ast/validator.h @@ -35,6 +35,7 @@ #include "absl/container/flat_hash_set.h" #include "absl/status/status.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" namespace zetasql { @@ -74,6 +75,10 @@ class Validator { absl::Status ValidateStandaloneResolvedExpr(const ResolvedExpr* expr); private: + // Called at the end of each entry point, ValidateResolvedStatement() and + // ValidateStandaloneExpr(). + absl::Status ValidateFinalState(); + // Statements. absl::Status ValidateResolvedStatementInternal( const ResolvedStatement* statement); @@ -322,7 +327,7 @@ class Validator { const ResolvedAggregateFunctionCall* aggregate_function_call); absl::Status ValidateResolvedAggregateComputedColumn( - const ResolvedComputedColumn* computed_column, + const ResolvedComputedColumnImpl* computed_column, const std::set& input_scan_visible_columns, const std::set& visible_parameters); @@ -390,29 +395,34 @@ class Validator { absl::Status ValidateResolvedExprList( const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& expr_list); + absl::Span> expr_list); absl::Status ValidateResolvedFunctionArgumentList( const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& + absl::Span> expr_list); absl::Status ValidateResolvedComputedColumn( const std::set& visible_columns, const std::set& visible_parameters, - const ResolvedComputedColumn* computed_column); + const ResolvedComputedColumnBase* computed_column); absl::Status ValidateGroupingFunctionCallList( const std::set& visible_columns, - const std::vector>& + absl::Span> grouping_call_list, const std::set& group_by_columns); absl::Status ValidateResolvedComputedColumnList( const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& + absl::Span> + computed_column_list); + absl::Status ValidateResolvedComputedColumnList( + const std::set& visible_columns, + const std::set& visible_parameters, + absl::Span> computed_column_list); absl::Status ValidateResolvedOutputColumn( @@ -420,14 +430,17 @@ class Validator { const ResolvedOutputColumn* output_column); absl::Status ValidateResolvedOutputColumnList( - const std::vector& visible_columns, - const std::vector>& + absl::Span visible_columns, + absl::Span> output_column_list, bool is_value_table); absl::Status ValidateResolvedCreateSchemaStmt( const ResolvedCreateSchemaStmt* stmt); + absl::Status ValidateResolvedCreateExternalSchemaStmt( + const ResolvedCreateExternalSchemaStmt* stmt); + absl::Status ValidateResolvedCreateTableStmtBase( const ResolvedCreateTableStmtBase* stmt, std::set* visible_columns); @@ -454,28 +467,28 @@ class Validator { absl::Status ValidateArgumentAliases( const FunctionSignature& signature, - const std::vector>& + absl::Span> arguments); absl::Status ValidateOptionsList( - const std::vector>& list); + absl::Span> list); template absl::Status ValidateOptionsList( - const std::vector>& list, + absl::Span> list, const MapType& allowed_options, const std::set& visible_columns, const std::set& visible_parameters, absl::string_view option_type); absl::Status ValidateHintList( - const std::vector>& list); + absl::Span> list); absl::Status ValidateResolvedTableAndColumnInfo( const ResolvedTableAndColumnInfo* table_and_column_info); absl::Status ValidateResolvedTableAndColumnInfoList( - const std::vector>& + absl::Span> table_and_column_info_list); absl::Status ValidateCollateExpr( @@ -488,7 +501,7 @@ class Validator { const ResolvedColumnAnnotations* annotations); absl::Status ValidateColumnDefinitions( - const std::vector>& + absl::Span> column_definitions, std::set* visible_columns); @@ -526,24 +539,31 @@ class Validator { absl::Status CheckColumnList(const ResolvedScan* scan, const std::set& visible_columns); + absl::Status MakeColumnList(const ResolvedColumnList& column_list, + std::set* visible_columns); + absl::Status AddColumnList(const ResolvedColumnList& column_list, std::set* visible_columns); absl::Status AddColumnList( const ResolvedColumnList& column_list, absl::flat_hash_set* visible_columns); absl::Status AddColumnFromComputedColumn( - const ResolvedComputedColumn* computed_column, + const ResolvedComputedColumnBase* computed_column, std::set* visible_columns); absl::Status AddGroupingFunctionCallColumn( ResolvedColumn grouping_call_column, std::set* visible_columns); absl::Status AddColumnsFromComputedColumnList( - const std::vector>& + absl::Span> + computed_column_list, + std::set* visible_columns); + absl::Status AddColumnsFromComputedColumnList( + absl::Span> computed_column_list, std::set* visible_columns); absl::Status AddColumnsFromGroupingCallList( - const std::vector>& + absl::Span> grouping_call_list, std::set* visible_columns); @@ -590,6 +610,10 @@ class Validator { absl::Status ValidateResolvedRecursiveRefScan( const ResolvedRecursiveRefScan* scan); + absl::Status ValidateResolvedRecursionDepthModifier( + const ResolvedRecursionDepthModifier* modifier, + const ResolvedColumnList& recursion_column_list); + absl::Status ValidateResolvedPivotScan( const ResolvedPivotScan* scan, const std::set& visible_parameters); @@ -635,7 +659,7 @@ class Validator { const ResolvedExpr* group_threshold_expr, const std::set& visible_columns, const std::set& visible_parameters, - const std::vector>& scan_options, + absl::Span> scan_options, absl::string_view expression_name); // Validates the group selection threshold expression for differential privacy @@ -668,18 +692,18 @@ class Validator { // Validates GroupingSet and grouping columns are empty. // This is only for the nodes that don't have grouping sets implemented yet. absl::Status ValidateGroupingSetListAreEmpty( - const std::vector>& + absl::Span> grouping_set_list, - const std::vector>& + absl::Span> rollup_column_list); // Validates GroupingSet and grouping columns based on grouping conditions. absl::Status ValidateGroupingSetList( - const std::vector>& + absl::Span> grouping_set_list, - const std::vector>& + absl::Span> rollup_column_list, - const std::vector>& + absl::Span> group_by_list); // Checks that contains only ColumnRefs, GetProtoField, GetStructField @@ -689,8 +713,11 @@ class Validator { const ResolvedColumnRef** ref); // Validates whether is a literal or a parameter. In either case, it - // should be of type int64_t. - absl::Status ValidateArgumentIsInt64Constant(const ResolvedExpr* expr); + // should be of type int64_t. represents where this validation + // happens. + absl::Status ValidateArgumentIsInt64(const ResolvedExpr* expr, + bool validate_constant_nonnegative, + absl::string_view context_msg); absl::Status ValidateGenericArgumentsAgainstConcreteArguments( const ResolvedFunctionCallBase* resolved_function_call, @@ -792,6 +819,11 @@ class Validator { // has a distinct id. absl::flat_hash_set column_ids_seen_; + // List of side effect columns that are yet to be consumed by a + // $with_side_effects() call. At the end of validation, this list must be + // empty. + absl::flat_hash_set unconsumed_side_effect_columns_; + // The node at the top of the stack is the innermost node being validated. std::vector context_stack_; diff --git a/zetasql/resolved_ast/validator_test.cc b/zetasql/resolved_ast/validator_test.cc index d9e346b60..3591a95ec 100644 --- a/zetasql/resolved_ast/validator_test.cc +++ b/zetasql/resolved_ast/validator_test.cc @@ -16,17 +16,23 @@ #include "zetasql/resolved_ast/validator.h" +#include #include #include #include #include #include "zetasql/base/testing/status_matchers.h" +#include "zetasql/public/analyzer_options.h" +#include "zetasql/public/function.h" +#include "zetasql/public/function.pb.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/id_string.h" #include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/types/type_factory.h" +#include "zetasql/public/value.h" #include "zetasql/resolved_ast/make_node_vector.h" #include "zetasql/resolved_ast/node_sources.h" #include "zetasql/resolved_ast/resolved_ast.h" @@ -276,6 +282,111 @@ MakeAggregationThresholdQuery( .Build(); } +enum class GroupingSetTestMode { + kValid, + // group by key list doesn't contain all referenced keys in the grouping + // set list. + kMissGroupByKey, + // grouping set list doesn't contains references of all keys in the group + // by list. + KMissGroupByKeyRef, +}; + +absl::StatusOr> +MakeGroupingSetsResolvedAST(IdStringPool& pool, GroupingSetTestMode mode) { + // Prepare 3 group by columns, col3 will be used as the missing test key or + // reference. + ResolvedColumn col1 = ResolvedColumn(1, pool.Make("$groupby"), + pool.Make("col1"), types::Int64Type()); + ResolvedColumn col2 = ResolvedColumn(2, pool.Make("$groupby"), + pool.Make("col2"), types::Int64Type()); + ResolvedColumn col3 = ResolvedColumn(3, pool.Make("$groupby"), + pool.Make("col3"), types::Int64Type()); + std::vector columns = {col1, col2, col3}; + + // Prepare the input scan. + ResolvedAggregateScanBuilder builder = + ResolvedAggregateScanBuilder().set_input_scan( + ResolvedSingleRowScanBuilder()); + // Prepare the group by list + builder + .add_group_by_list( + ResolvedComputedColumnBuilder().set_column(col1).set_expr( + ResolvedLiteralBuilder() + .set_value(Value::Int64(1)) + .set_type(types::Int64Type()))) + .add_group_by_list( + ResolvedComputedColumnBuilder().set_column(col2).set_expr( + ResolvedLiteralBuilder() + .set_value(Value::Int64(2)) + .set_type(types::Int64Type()))); + + // Otherwise col3 is missing in the group by key list. + if (mode != GroupingSetTestMode::kMissGroupByKey) { + builder.add_group_by_list( + ResolvedComputedColumnBuilder().set_column(col3).set_expr( + ResolvedLiteralBuilder() + .set_value(Value::Int64(3)) + .set_type(types::Int64Type()))); + } + + // Prepare grouping set list + // Simulate the group by clause GROUPING SETS((col1, col3), CUBE(col1, + // col2), ROLLUP((col1, col2))) + auto resolved_grouping_set = + ResolvedGroupingSetBuilder().add_group_by_column_list( + ResolvedColumnRefBuilder() + .set_type(col1.type()) + .set_column(col1) + .set_is_correlated(false)); + // Otherwise the key reference of col3 is missing. + if (mode != GroupingSetTestMode::KMissGroupByKeyRef) { + resolved_grouping_set.add_group_by_column_list( + ResolvedColumnRefBuilder() + .set_type(col3.type()) + .set_column(col3) + .set_is_correlated(false)); + } + builder.add_grouping_set_list(resolved_grouping_set); + builder.add_grouping_set_list( + ResolvedCubeBuilder() + .add_cube_column_list( + ResolvedGroupingSetMultiColumnBuilder().add_column_list( + ResolvedColumnRefBuilder() + .set_type(col1.type()) + .set_column(col1) + .set_is_correlated(false))) + .add_cube_column_list( + ResolvedGroupingSetMultiColumnBuilder().add_column_list( + ResolvedColumnRefBuilder() + .set_type(col2.type()) + .set_column(col2) + .set_is_correlated(false)))); + builder.add_grouping_set_list(ResolvedRollupBuilder().add_rollup_column_list( + ResolvedGroupingSetMultiColumnBuilder() + .add_column_list(ResolvedColumnRefBuilder() + .set_type(col1.type()) + .set_column(col1) + .set_is_correlated(false)) + .add_column_list(ResolvedColumnRefBuilder() + .set_type(col2.type()) + .set_column(col2) + .set_is_correlated(false)))); + builder.set_column_list(columns); + + return ResolvedQueryStmtBuilder() + .add_output_column_list( + ResolvedOutputColumnBuilder().set_column(col1).set_name("col1")) + .add_output_column_list( + ResolvedOutputColumnBuilder().set_column(col2).set_name("col2")) + .add_output_column_list( + ResolvedOutputColumnBuilder().set_column(col3).set_name("col3")) + .set_query( + ResolvedProjectScanBuilder().set_column_list(columns).set_input_scan( + builder)) + .Build(); +} + TEST(ValidatorTest, ValidQueryStatement) { IdStringPool pool; std::unique_ptr query_stmt = MakeSelect1Stmt(pool); @@ -1652,54 +1763,124 @@ static std::unique_ptr CreateSetOperationItem( ResolvedColumn column = ResolvedColumn( column_id, pool.Make("table"), pool.Make("column"), types::Int64Type()); std::unique_ptr scan = MakeResolvedSingleRowScan({column}); - scan->set_node_source(std::string(node_source)); + scan->set_node_source(node_source); std::unique_ptr item = MakeResolvedSetOperationItem(std::move(scan), {column}); return item; } -TEST(ValidateTest, SetOperationCorrespondingStrictMode) { +TEST(ValidateTest, ValidGroupingSetsResolvedAST) { IdStringPool pool; - // Prepare a set operation scan with mode = STRICT CORRESPONDING, but its - // input scans have node_source = - // kNodeSourceResolverSetOperationCorresponding, so the validation should - // fail. - std::vector> items; - items.push_back(CreateSetOperationItem( - /*column_id=*/1, kNodeSourceResolverSetOperationCorresponding, pool)); - items.push_back(CreateSetOperationItem( - /*column_id=*/2, kNodeSourceResolverSetOperationCorresponding, pool)); - - ResolvedColumn result_column = ResolvedColumn( - 3, pool.Make("set_op"), pool.Make("column"), types::Int64Type()); - std::unique_ptr set_operation_scan = - MakeResolvedSetOperationScan({result_column}, - ResolvedSetOperationScan::UNION_ALL, - std::move(items)); - set_operation_scan->set_column_match_mode( - ResolvedSetOperationScan::CORRESPONDING); - set_operation_scan->set_column_propagation_mode( - ResolvedSetOperationScan::STRICT); + ZETASQL_ASSERT_OK_AND_ASSIGN(auto query_stmt, MakeGroupingSetsResolvedAST( + pool, GroupingSetTestMode::kValid)); + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_GROUPING_SETS); + Validator validator(language_options); + + ZETASQL_EXPECT_OK(validator.ValidateResolvedStatement(query_stmt.get())); +} +// The ResolvedAggregateScan.grouping_set_list has additional key references +// that are not in the group_by_list. +TEST(ValidateTest, InvalidGroupingSetsResolvedASTMissingGroupByKey) { + IdStringPool pool; + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto query_stmt, + MakeGroupingSetsResolvedAST(pool, GroupingSetTestMode::kMissGroupByKey)); LanguageOptions language_options; - language_options.EnableLanguageFeature(FEATURE_V_1_4_CORRESPONDING); - language_options.EnableLanguageFeature( - FEATURE_V_1_4_SET_OPERATION_COLUMN_PROPAGATION_MODE); + language_options.EnableLanguageFeature(FEATURE_V_1_4_GROUPING_SETS); Validator validator(language_options); - std::vector> output_column_list; - output_column_list.push_back( - MakeResolvedOutputColumn("column", result_column)); - std::unique_ptr query_stmt = MakeResolvedQueryStmt( - std::move(output_column_list), - /*is_value_table=*/false, std::move(set_operation_scan)); + absl::Status status = validator.ValidateResolvedStatement(query_stmt.get()); + EXPECT_THAT(status, testing::StatusIs(absl::StatusCode::kInternal)); + EXPECT_THAT(status.message(), HasSubstr("Incorrect reference to column")); +} + +// The ResolvedAggregateScan.grouping_set_list doesn't contain all keys in the +// group_by_list. +TEST(ValidateTest, InvalidGroupingSetsResolvedASTMissingGroupByKeyReference) { + IdStringPool pool; + ZETASQL_ASSERT_OK_AND_ASSIGN(auto query_stmt, + MakeGroupingSetsResolvedAST( + pool, GroupingSetTestMode::KMissGroupByKeyRef)); + LanguageOptions language_options; + language_options.EnableLanguageFeature(FEATURE_V_1_4_GROUPING_SETS); + Validator validator(language_options); absl::Status status = validator.ValidateResolvedStatement(query_stmt.get()); EXPECT_THAT(status, testing::StatusIs(absl::StatusCode::kInternal)); - EXPECT_THAT(status.message(), - testing::HasSubstr( - absl::StrCat("STRICT mode should not have node_source = ", - kNodeSourceResolverSetOperationCorresponding))); + EXPECT_THAT(status.message(), HasSubstr("Incorrect reference to column")); +} + +TEST(ValidateTest, ErrorWhenSideEffectColumnIsNotConsumed) { + IdStringPool pool; + + ResolvedColumn main_column(/*column_id=*/1, pool.Make("table"), + pool.Make("main_col"), types::Int64Type()); + ResolvedColumn side_effect_column(/*column_id=*/2, pool.Make("table"), + pool.Make("side_effect"), + types::BytesType()); + + // Manually construct an invalid ResolvedAST where the side effect column + // is not consumed. + Function agg_fn("agg1", "test_group", Function::AGGREGATE); + FunctionSignature sig(/*result_type=*/FunctionArgumentType( + types::Int64Type(), /*num_occurrences=*/1), + /*arguments=*/{}, + /*context_id=*/static_cast(1234)); + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto stmt, + ResolvedQueryStmtBuilder() + .add_output_column_list(ResolvedOutputColumnBuilder() + .set_column(main_column) + .set_name("col1")) + .set_query( + ResolvedAggregateScanBuilder() + .set_input_scan(ResolvedSingleRowScanBuilder()) + .add_column_list(main_column) + .add_aggregate_list( + ResolvedDeferredComputedColumnBuilder() + .set_column(main_column) + .set_side_effect_column(side_effect_column) + .set_expr(ResolvedAggregateFunctionCallBuilder() + .set_type(types::Int64Type()) + .set_function(&agg_fn) + .set_signature(sig)))) + .Build()); + + LanguageOptions options_with_conditional_eval; + options_with_conditional_eval.EnableLanguageFeature( + FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION); + + // Validate statement + EXPECT_THAT(Validator().ValidateResolvedStatement(stmt.get()), + testing::StatusIs( + absl::StatusCode::kInternal, + HasSubstr("FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION)"))); + + EXPECT_THAT( + Validator(options_with_conditional_eval) + .ValidateResolvedStatement(stmt.get()), + testing::StatusIs(absl::StatusCode::kInternal, + HasSubstr("unconsumed_side_effect_columns_.empty()"))); + + // Validating a standalone expr + ZETASQL_ASSERT_OK_AND_ASSIGN( + auto expr, ResolvedSubqueryExprBuilder() + .set_type(types::Int64Type()) + .set_subquery_type(ResolvedSubqueryExpr::SCALAR) + .set_subquery(ToBuilder(std::move(stmt)).release_query()) + .Build()); + + EXPECT_THAT(Validator().ValidateStandaloneResolvedExpr(expr.get()), + testing::StatusIs( + absl::StatusCode::kInternal, + HasSubstr("FEATURE_V_1_4_ENFORCE_CONDITIONAL_EVALUATION"))); + EXPECT_THAT( + Validator(options_with_conditional_eval) + .ValidateStandaloneResolvedExpr(expr.get()), + testing::StatusIs(absl::StatusCode::kInternal, + HasSubstr("unconsumed_side_effect_columns_.empty()"))); } } // namespace diff --git a/zetasql/scripting/parsed_script.cc b/zetasql/scripting/parsed_script.cc index ed06a9283..25effda5b 100644 --- a/zetasql/scripting/parsed_script.cc +++ b/zetasql/scripting/parsed_script.cc @@ -38,6 +38,7 @@ #include "absl/flags/flag.h" #include "absl/status/statusor.h" #include "absl/strings/str_cat.h" +#include "absl/strings/string_view.h" #include "absl/types/variant.h" #include "zetasql/base/map_util.h" #include "zetasql/base/status.h" @@ -233,7 +234,7 @@ class ValidateVariableDeclarationsVisitor } absl::Status MakeVariableDeclarationError( - const ASTNode* node, const std::string& error_message, + const ASTNode* node, absl::string_view error_message, absl::string_view source_message, const ParseLocationPoint& source_location) { std::string script_text(parsed_script_->script_text()); @@ -247,7 +248,7 @@ class ValidateVariableDeclarationsVisitor } absl::Status MakeVariableDeclarationErrorSkipSourceLocation( - const ASTNode* node, const std::string& error_message, + const ASTNode* node, absl::string_view error_message, absl::string_view source_message) { std::string script_text(parsed_script_->script_text()); const InternalErrorLocation location = SetErrorSourcesFromStatus( diff --git a/zetasql/scripting/script_executor_impl.cc b/zetasql/scripting/script_executor_impl.cc index 09637f82b..bd901d5cd 100644 --- a/zetasql/scripting/script_executor_impl.cc +++ b/zetasql/scripting/script_executor_impl.cc @@ -121,7 +121,7 @@ absl::StatusOr> ScriptExecutorImpl::Create( options.script_variables())); } else { ParserOptions parser_options; - parser_options.set_language_options(&options.language_options()); + parser_options.set_language_options(options.language_options()); ZETASQL_ASSIGN_OR_RETURN( parsed_script, ParsedScript::Create(script, parser_options, error_message_mode, @@ -332,7 +332,7 @@ absl::StatusOr ScriptExecutorImpl::SetupNewException( return zetasql_base::InternalErrorBuilder() << "Engines should not set ScriptException::internal field when " "raising an exception" - << exception.DebugString(); + << absl::StrCat(exception); } ZETASQL_ASSIGN_OR_RETURN(*exception.mutable_internal()->mutable_statement_text(), @@ -1875,7 +1875,7 @@ absl::Status ScriptExecutorImpl::ValidateVariablesOnSetState( ParserOptions ScriptExecutorImpl::GetParserOptions() const { ParserOptions parser_options; - parser_options.set_language_options(&options_.language_options()); + parser_options.set_language_options(options_.language_options()); return parser_options; } @@ -2008,10 +2008,9 @@ absl::Status ScriptExecutorImpl::SetState( ZETASQL_RET_CHECK_NE(next_cfg_node, nullptr) << "Deserialized AST node has no associated control flow node"; - ZETASQL_RETURN_IF_ERROR( - ValidateVariablesOnSetState( - next_cfg_node, new_variables, *parsed_script)) - << state.DebugString(); + ZETASQL_RETURN_IF_ERROR(ValidateVariablesOnSetState(next_cfg_node, new_variables, + *parsed_script)) + << absl::StrCat(state); ZETASQL_RETURN_IF_ERROR( ResetVariableSizes(next_node, new_variables, &new_variable_sizes)); ZETASQL_RETURN_IF_ERROR( diff --git a/zetasql/testdata/BUILD b/zetasql/testdata/BUILD index 04af53640..cd5cd2abd 100644 --- a/zetasql/testdata/BUILD +++ b/zetasql/testdata/BUILD @@ -12,7 +12,6 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# package( default_visibility = ["//zetasql/base:zetasql_implementation"], @@ -195,17 +194,21 @@ cc_library( ":test_schema_cc_proto", "//zetasql/base", "//zetasql/base:check", + "//zetasql/base:map_util", "//zetasql/base:ret_check", + "//zetasql/base:source_location", "//zetasql/base:status", "//zetasql/base/testing:status_matchers", "//zetasql/common:errors", "//zetasql/public:analyzer", "//zetasql/public:analyzer_output", "//zetasql/public:anon_function", + "//zetasql/public:builtin_function_options", "//zetasql/public:catalog", "//zetasql/public:cycle_detector", "//zetasql/public:deprecation_warning_cc_proto", "//zetasql/public:error_location_cc_proto", + "//zetasql/public:evaluator_table_iterator", "//zetasql/public:function", "//zetasql/public:function_cc_proto", "//zetasql/public:language_options", @@ -215,7 +218,6 @@ cc_library( "//zetasql/public:simple_catalog_util", "//zetasql/public:sql_function", "//zetasql/public:sql_tvf", - "//zetasql/public:sql_view", "//zetasql/public:strings", "//zetasql/public:templated_sql_function", "//zetasql/public:templated_sql_tvf", @@ -227,11 +229,17 @@ cc_library( "//zetasql/resolved_ast", "//zetasql/resolved_ast:resolved_ast_enums_cc_proto", "//zetasql/resolved_ast:resolved_node_kind_cc_proto", + "@com_google_absl//absl/container:btree", + "@com_google_absl//absl/container:flat_hash_map", + "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/container:node_hash_map", "@com_google_absl//absl/memory", + "@com_google_absl//absl/status", "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:cord", + "@com_google_absl//absl/time", + "@com_google_absl//absl/types:span", "@com_google_protobuf//:protobuf", ], ) diff --git a/zetasql/testdata/populate_sample_tables.cc b/zetasql/testdata/populate_sample_tables.cc index 5fa6cb2a2..c61358d3f 100644 --- a/zetasql/testdata/populate_sample_tables.cc +++ b/zetasql/testdata/populate_sample_tables.cc @@ -31,10 +31,15 @@ using zetasql_test__::KitchenSinkPB; using zetasql::test_values::Struct; +using zetasql::values::Bytes; +using zetasql::values::Date; using zetasql::values::Int32; using zetasql::values::Int64; using zetasql::values::Proto; using zetasql::values::String; +using zetasql::values::TimestampFromUnixMicros; +using zetasql::values::Uint32; +using zetasql::values::Uint64; absl::Status PopulateSampleTables(TypeFactory* type_factory, SampleCatalog* catalog) { diff --git a/zetasql/testdata/sample_catalog.cc b/zetasql/testdata/sample_catalog.cc index 55c7b9d3a..55d5002cc 100644 --- a/zetasql/testdata/sample_catalog.cc +++ b/zetasql/testdata/sample_catalog.cc @@ -18,6 +18,7 @@ #include #include +#include #include #include #include @@ -25,17 +26,17 @@ #include #include "zetasql/base/logging.h" -#include "google/protobuf/descriptor.h" -#include "google/protobuf/descriptor_database.h" #include "zetasql/common/errors.h" #include "zetasql/public/analyzer.h" #include "zetasql/public/analyzer_output.h" #include "zetasql/public/annotation/collation.h" #include "zetasql/public/anon_function.h" +#include "zetasql/public/builtin_function_options.h" #include "zetasql/public/catalog.h" #include "zetasql/public/cycle_detector.h" #include "zetasql/public/deprecation_warning.pb.h" #include "zetasql/public/error_location.pb.h" +#include "zetasql/public/evaluator_table_iterator.h" #include "zetasql/public/function.h" #include "zetasql/public/function.pb.h" #include "zetasql/public/function_signature.h" @@ -46,13 +47,13 @@ #include "zetasql/public/simple_catalog_util.h" #include "zetasql/public/sql_function.h" #include "zetasql/public/sql_tvf.h" -#include "zetasql/public/sql_view.h" #include "zetasql/public/strings.h" #include "zetasql/public/table_valued_function.h" #include "zetasql/public/templated_sql_function.h" #include "zetasql/public/templated_sql_tvf.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/types/annotation.h" +#include "zetasql/public/types/simple_value.h" #include "zetasql/public/types/type.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/public/value.h" @@ -63,17 +64,28 @@ #include "zetasql/testdata/referenced_schema.pb.h" #include "zetasql/testdata/sample_annotation.h" #include "zetasql/testdata/test_proto3.pb.h" +#include "absl/container/btree_map.h" #include "zetasql/base/testing/status_matchers.h" +#include "absl/container/flat_hash_map.h" +#include "absl/container/flat_hash_set.h" #include "zetasql/base/check.h" #include "absl/memory/memory.h" +#include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/ascii.h" #include "absl/strings/cord.h" #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" +#include "absl/time/time.h" +#include "zetasql/base/source_location.h" +#include "absl/types/span.h" +#include "google/protobuf/descriptor.h" +#include "google/protobuf/descriptor_database.h" +#include "google/protobuf/message.h" +#include "zetasql/base/map_util.h" #include "zetasql/base/ret_check.h" -#include "zetasql/base/status.h" #include "zetasql/base/status_builder.h" +#include "zetasql/base/status_macros.h" namespace zetasql { @@ -250,7 +262,7 @@ const EnumType* SampleCatalog::GetEnumType( static absl::StatusOr ComputeResultTypeCallbackForNullOfType( Catalog* catalog, TypeFactory* type_factory, CycleDetector* cycle_detector, const FunctionSignature& signature, - const std::vector& arguments, + absl::Span arguments, const AnalyzerOptions& analyzer_options) { ZETASQL_RET_CHECK_EQ(arguments.size(), 1); ZETASQL_RET_CHECK_EQ(signature.NumConcreteArguments(), arguments.size()); @@ -278,7 +290,7 @@ static absl::StatusOr ComputeResultTypeCallbackForNullOfType( static absl::StatusOr ComputeResultTypeFromStringArgumentValue( Catalog* catalog, TypeFactory* type_factory, CycleDetector* cycle_detector, const FunctionSignature& signature, - const std::vector& arguments, + absl::Span arguments, const AnalyzerOptions& analyzer_options) { ZETASQL_RET_CHECK_EQ(signature.NumConcreteArguments(), arguments.size()); const LanguageOptions& language_options = analyzer_options.language(); @@ -342,7 +354,7 @@ static absl::StatusOr ComputeResultTypeCallbackToStructUseArgumentAliases( Catalog* catalog, TypeFactory* type_factory, CycleDetector* cycle_detector, const FunctionSignature& signature, - const std::vector& arguments, + absl::Span arguments, const AnalyzerOptions& analyzer_options) { const StructType* struct_type; std::vector struct_fields; @@ -598,6 +610,7 @@ void SampleCatalog::LoadCatalogImpl(const LanguageOptions& language_options) { LoadTemplatedSQLUDFs(); LoadTableValuedFunctions1(); LoadTableValuedFunctions2(); + LoadTableValuedFunctionsWithEvaluators(); LoadTVFWithExtraColumns(); LoadDescriptorTableValuedFunctions(); LoadConnectionTableValuedFunctions(); @@ -651,8 +664,7 @@ void SampleCatalog::LoadTypes() { {{"a", types_->get_int32()}, {"b", types_->get_string()}}, &struct_type_)); ZETASQL_CHECK_OK(types_->MakeStructType( - {{"c", types_->get_int32()}, {"d", struct_type_}}, - &nested_struct_type_)); + {{"c", types_->get_int32()}, {"d", struct_type_}}, &nested_struct_type_)); ZETASQL_CHECK_OK(types_->MakeStructType( {{"e", types_->get_int32()}, {"f", nested_struct_type_}}, &doubly_nested_struct_type_)); @@ -672,10 +684,16 @@ void SampleCatalog::LoadTypes() { ZETASQL_CHECK_OK(types_->MakeArrayType(proto_TestExtraPB_, &proto_array_type_)); ZETASQL_CHECK_OK(types_->MakeArrayType(struct_type_, &struct_array_type_)); ZETASQL_CHECK_OK(types_->MakeArrayType(types_->get_json(), &json_array_type_)); + ZETASQL_CHECK_OK(types_->MakeArrayType(types_->get_numeric(), &numeric_array_type_)); + ZETASQL_CHECK_OK( + types_->MakeArrayType(types_->get_bignumeric(), &bignumeric_array_type_)); + ZETASQL_CHECK_OK( + types_->MakeArrayType(types_->get_interval(), &interval_array_type_)); - ZETASQL_CHECK_OK(types_->MakeStructType( - {{"x", types_->get_int64()}, {"y", struct_type_}, - {"z", struct_array_type_}}, &struct_with_array_field_type_)); + ZETASQL_CHECK_OK(types_->MakeStructType({{"x", types_->get_int64()}, + {"y", struct_type_}, + {"z", struct_array_type_}}, + &struct_with_array_field_type_)); ZETASQL_CHECK_OK(types_->MakeStructType({{"x", types_->get_int64()}}, &struct_with_one_field_type_)); @@ -796,10 +814,12 @@ void SampleCatalog::LoadTables() { SimpleTable* key_value_table = new SimpleTable( "KeyValue", {{"Key", types_->get_int64()}, {"Value", types_->get_string()}}); + key_value_table->SetContents({{Value::Int64(1), Value::String("a")}, + {Value::Int64(2), Value::String("b")}}); + AddOwnedTable(key_value_table); key_value_table_ = key_value_table; - SimpleTable* multiple_columns_table = new SimpleTable("MultipleColumns", {{"int_a", types_->get_int64()}, {"string_a", types_->get_string()}, @@ -824,12 +844,10 @@ void SampleCatalog::LoadTables() { AddOwnedTable(update_to_default_table); SimpleTable* ab_table = new SimpleTable( - "abTable", - {{"a", types_->get_int64()}, {"b", types_->get_string()}}); + "abTable", {{"a", types_->get_int64()}, {"b", types_->get_string()}}); AddOwnedTable(ab_table); SimpleTable* bc_table = new SimpleTable( - "bcTable", - {{"b", types_->get_int64()}, {"c", types_->get_string()}}); + "bcTable", {{"b", types_->get_int64()}, {"c", types_->get_string()}}); AddOwnedTable(bc_table); SimpleTable* key_value_table_read_time_ignored = @@ -1611,11 +1629,9 @@ void SampleCatalog::LoadProtoTables() { catalog_->AddOwnedTable( new SimpleTable("TestExtraPBValueTable", proto_TestExtraPB_)); - catalog_->AddOwnedTable( - new SimpleTable("TestAbPBValueTable", proto_abPB_)); + catalog_->AddOwnedTable(new SimpleTable("TestAbPBValueTable", proto_abPB_)); - catalog_->AddOwnedTable( - new SimpleTable("TestBcPBValueTable", proto_bcPB_)); + catalog_->AddOwnedTable(new SimpleTable("TestBcPBValueTable", proto_bcPB_)); catalog_->AddOwnedTable(new SimpleTable("TestBcPBValueProtoTable", {{"value", proto_bcPB_}, @@ -1682,7 +1698,10 @@ void SampleCatalog::LoadProtoTables() { {"TimestampArray", timestamp_array_type_}, {"ProtoArray", proto_array_type_}, {"StructArray", struct_array_type_}, - {"JsonArray", json_array_type_}})); + {"JsonArray", json_array_type_}, + {"NumericArray", numeric_array_type_}, + {"BigNumericArray", bignumeric_array_type_}, + {"IntervalArray", interval_array_type_}})); const EnumType* enum_TestEnum = GetEnumType(zetasql_test__::TestEnum_descriptor()); @@ -1813,9 +1832,8 @@ void SampleCatalog::LoadNestedCatalogs() { {types_->get_int64(), {types_->get_int64()}, /*context_id=*/-1}); std::vector function_name_path = {"nested_catalog", "nested_function"}; - Function* function = - new Function(function_name_path, "sample_functions", - Function::SCALAR, {signature}); + Function* function = new Function(function_name_path, "sample_functions", + Function::SCALAR, {signature}); nested_catalog->AddOwnedFunction(function); // A scalar function with argument alias support in the nested catalog. @@ -1841,8 +1859,8 @@ void SampleCatalog::LoadNestedCatalogs() { // nested_catalog.nested_nested_catalog.nested_function() -> SimpleCatalog* nested_nested_catalog = nested_catalog->MakeOwnedSimpleCatalog("nested_nested_catalog"); - function_name_path = - {"nested_catalog", "nested_nested_catalog", "nested_function"}; + function_name_path = {"nested_catalog", "nested_nested_catalog", + "nested_function"}; function = new Function(function_name_path, "sample_functions", Function::SCALAR, {signature}); nested_nested_catalog->AddOwnedFunction(function); @@ -1918,17 +1936,17 @@ void SampleCatalog::LoadNestedCatalogs() { SimpleCatalog* name_conflict_catalog = catalog_->MakeOwnedSimpleCatalog("name_conflict_table"); std::unique_ptr constant; - ZETASQL_CHECK_OK(SimpleConstant::Create( - {"name_conflict_table", "name_conflict_field"}, Value::Bool(false), - &constant)); + ZETASQL_CHECK_OK( + SimpleConstant::Create({"name_conflict_table", "name_conflict_field"}, + Value::Bool(false), &constant)); name_conflict_catalog->AddOwnedConstant(constant.release()); // Add for testing named constants in catalogs. SimpleCatalog* nested_catalog_with_constant = catalog_->MakeOwnedSimpleCatalog("nested_catalog_with_constant"); - ZETASQL_CHECK_OK(SimpleConstant::Create( - {"nested_catalog_with_constant", "KnownConstant"}, Value::Bool(false), - &constant)); + ZETASQL_CHECK_OK( + SimpleConstant::Create({"nested_catalog_with_constant", "KnownConstant"}, + Value::Bool(false), &constant)); nested_catalog_with_constant->AddOwnedConstant(constant.release()); // Add for testing conflicts with named @@ -1939,9 +1957,8 @@ void SampleCatalog::LoadNestedCatalogs() { {"nested_catalog_with_catalog", "TestConstantBool"}, Value::Bool(false), &constant)); nested_catalog_with_catalog->AddOwnedConstant(constant.release()); - ZETASQL_CHECK_OK(SimpleConstant::Create( - {"nested_catalog_with_catalog", "c"}, Value::Double(-9999.999), - &constant)); + ZETASQL_CHECK_OK(SimpleConstant::Create({"nested_catalog_with_catalog", "c"}, + Value::Double(-9999.999), &constant)); nested_catalog_with_catalog->AddOwnedConstant(constant.release()); SimpleCatalog* nested_catalog_catalog = nested_catalog_with_catalog->MakeOwnedSimpleCatalog( @@ -1956,15 +1973,13 @@ void SampleCatalog::LoadNestedCatalogs() { nested_catalog_catalog->AddOwnedConstant(constant.release()); // Add a constant to . - ZETASQL_CHECK_OK(SimpleConstant::Create( - {"nested_catalog", "TestConstantBool"}, Value::Bool(false), - &constant)); + ZETASQL_CHECK_OK(SimpleConstant::Create({"nested_catalog", "TestConstantBool"}, + Value::Bool(false), &constant)); nested_catalog->AddOwnedConstant(constant.release()); // Add another constant to that conflicts with a procedure. - ZETASQL_CHECK_OK(SimpleConstant::Create( - {"nested_catalog", "nested_procedure"}, Value::Int64(2345), - &constant)); + ZETASQL_CHECK_OK(SimpleConstant::Create({"nested_catalog", "nested_procedure"}, + Value::Int64(2345), &constant)); nested_catalog->AddOwnedConstant(constant.release()); // Add a constant to which requires backticks. @@ -2098,44 +2113,44 @@ const Function* SampleCatalog::AddFunction( void SampleCatalog::LoadFunctions() { // Add a function to illustrate how repeated/optional arguments are resolved. - Function* function = new Function("test_function", "sample_functions", - Function::SCALAR); + Function* function = + new Function("test_function", "sample_functions", Function::SCALAR); function->AddSignature( {types_->get_int64(), - {{types_->get_int64(), FunctionArgumentType::REQUIRED}, - {types_->get_int64(), FunctionArgumentType::REPEATED}, - {types_->get_int64(), FunctionArgumentType::REPEATED}, - {types_->get_int64(), FunctionArgumentType::REQUIRED}, - {types_->get_int64(), FunctionArgumentType::OPTIONAL}}, - /*context_id=*/-1}); + {{types_->get_int64(), FunctionArgumentType::REQUIRED}, + {types_->get_int64(), FunctionArgumentType::REPEATED}, + {types_->get_int64(), FunctionArgumentType::REPEATED}, + {types_->get_int64(), FunctionArgumentType::REQUIRED}, + {types_->get_int64(), FunctionArgumentType::OPTIONAL}}, + /*context_id=*/-1}); catalog_->AddOwnedFunction(function); - function = new Function( - "volatile_function", "sample_functions", Function::SCALAR, - {{types_->get_int64(), - {{types_->get_int64(), FunctionArgumentType::REQUIRED}}, - /*context_id=*/-1}}, - FunctionOptions().set_volatility(FunctionEnums::VOLATILE)); + function = + new Function("volatile_function", "sample_functions", Function::SCALAR, + {{types_->get_int64(), + {{types_->get_int64(), FunctionArgumentType::REQUIRED}}, + /*context_id=*/-1}}, + FunctionOptions().set_volatility(FunctionEnums::VOLATILE)); catalog_->AddOwnedFunction(function); - function = new Function( - "stable_function", "sample_functions", Function::SCALAR, - {{types_->get_int64(), - {{types_->get_int64(), FunctionArgumentType::REQUIRED}}, - /*context_id=*/-1}}, - FunctionOptions().set_volatility(FunctionEnums::STABLE)); + function = + new Function("stable_function", "sample_functions", Function::SCALAR, + {{types_->get_int64(), + {{types_->get_int64(), FunctionArgumentType::REQUIRED}}, + /*context_id=*/-1}}, + FunctionOptions().set_volatility(FunctionEnums::STABLE)); catalog_->AddOwnedFunction(function); // Add a function that takes a specific proto as an argument. - function = new Function("fn_on_KitchenSinkPB", "sample_functions", - Function::SCALAR); + function = + new Function("fn_on_KitchenSinkPB", "sample_functions", Function::SCALAR); function->AddSignature( {types_->get_bool(), {proto_KitchenSinkPB_}, /*context_id=*/-1}); catalog_->AddOwnedFunction(function); // Add a function that takes a specific enum as an argument. - function = new Function("fn_on_TestEnum", "sample_functions", - Function::SCALAR); + function = + new Function("fn_on_TestEnum", "sample_functions", Function::SCALAR); function->AddSignature( {types_->get_bool(), {enum_TestEnum_}, /*context_id=*/-1}); catalog_->AddOwnedFunction(function); @@ -2161,22 +2176,22 @@ void SampleCatalog::LoadFunctions() { catalog_->AddOwnedFunction(function); // Add a function that takes any type enum. - function = new Function("fn_on_any_enum", "sample_functions", - Function::SCALAR); + function = + new Function("fn_on_any_enum", "sample_functions", Function::SCALAR); function->AddSignature( {types_->get_bool(), {ARG_ENUM_ANY}, /*context_id=*/-1}); catalog_->AddOwnedFunction(function); // Add a function that takes any type proto. - function = new Function("fn_on_any_proto", "sample_functions", - Function::SCALAR); + function = + new Function("fn_on_any_proto", "sample_functions", Function::SCALAR); function->AddSignature( {types_->get_bool(), {ARG_PROTO_ANY}, /*context_id=*/-1}); catalog_->AddOwnedFunction(function); // Add a function that takes any type struct. - function = new Function("fn_on_any_struct", "sample_functions", - Function::SCALAR); + function = + new Function("fn_on_any_struct", "sample_functions", Function::SCALAR); function->AddSignature( {types_->get_bool(), {ARG_STRUCT_ANY}, /*context_id=*/-1}); catalog_->AddOwnedFunction(function); @@ -2394,10 +2409,9 @@ void SampleCatalog::LoadFunctions() { .Build()}); // Adds an aggregate function that takes no argument but supports order by. - function = new Function( - "sort_count", "sample_functions", Function::AGGREGATE, - {{types_->get_int64(), {}, /*context_id=*/-1}}, - FunctionOptions().set_supports_order_by(true)); + function = new Function("sort_count", "sample_functions", Function::AGGREGATE, + {{types_->get_int64(), {}, /*context_id=*/-1}}, + FunctionOptions().set_supports_order_by(true)); catalog_->AddOwnedFunction(function); // Adds an aggregate function that takes multiple arguments and supports @@ -2471,27 +2485,25 @@ void SampleCatalog::LoadFunctions2() { /*window_framing_support_in=*/false)); catalog_->AddOwnedFunction(function); - function = new Function( - "afn_no_order_no_frame", "sample_functions", Function::ANALYTIC, - function_signatures, - FunctionOptions(FunctionOptions::ORDER_UNSUPPORTED, - /*window_framing_support_in=*/false)); + function = new Function("afn_no_order_no_frame", "sample_functions", + Function::ANALYTIC, function_signatures, + FunctionOptions(FunctionOptions::ORDER_UNSUPPORTED, + /*window_framing_support_in=*/false)); catalog_->AddOwnedFunction(function); - function = new Function( - "afn_agg", "sample_functions", Function::AGGREGATE, function_signatures, - FunctionOptions(FunctionOptions::ORDER_OPTIONAL, - /*window_framing_support_in=*/true)); + function = new Function("afn_agg", "sample_functions", Function::AGGREGATE, + function_signatures, + FunctionOptions(FunctionOptions::ORDER_OPTIONAL, + /*window_framing_support_in=*/true)); catalog_->AddOwnedFunction(function); - function = new Function( - "afn_null_handling", "sample_functions", Function::AGGREGATE, - function_signatures, - FunctionOptions(FunctionOptions::ORDER_OPTIONAL, - /*window_framing_support_in=*/false) - .set_supports_order_by(true) - .set_supports_limit(true) - .set_supports_null_handling_modifier(true)); + function = new Function("afn_null_handling", "sample_functions", + Function::AGGREGATE, function_signatures, + FunctionOptions(FunctionOptions::ORDER_OPTIONAL, + /*window_framing_support_in=*/false) + .set_supports_order_by(true) + .set_supports_limit(true) + .set_supports_null_handling_modifier(true)); catalog_->AddOwnedFunction(function); // NULL_OF_TYPE(string) -> (a NULL of type matching the named simple type). @@ -2691,20 +2703,20 @@ void SampleCatalog::LoadFunctions2() { // specified positionally. function = new Function("fn_named_args_error_if_positional_first_arg", "sample_functions", mode); - function->AddSignature({types_->get_bool(), - {named_required_format_arg_error_if_positional, - named_required_date_arg}, - /*context_id=*/-1}); + function->AddSignature( + {types_->get_bool(), + {named_required_format_arg_error_if_positional, named_required_date_arg}, + /*context_id=*/-1}); catalog_->AddOwnedFunction(function); // Add a function with two named arguments where the second may not be // specified positionally. function = new Function("fn_named_args_error_if_positional_second_arg", "sample_functions", mode); - function->AddSignature({types_->get_bool(), - {named_required_format_arg, - named_required_date_arg_error_if_positional}, - /*context_id=*/-1}); + function->AddSignature( + {types_->get_bool(), + {named_required_format_arg, named_required_date_arg_error_if_positional}, + /*context_id=*/-1}); catalog_->AddOwnedFunction(function); // Add a function with two named arguments, one required and one optional, @@ -2742,8 +2754,8 @@ void SampleCatalog::LoadFunctions2() { // Add a function with two signatures, one using regular arguments and one // using named arguments that cannot be specified positionally. - function = new Function("fn_regular_and_named_signatures", - "sample_functions", mode); + function = + new Function("fn_regular_and_named_signatures", "sample_functions", mode); function->AddSignature( {types_->get_bool(), {{types_->get_string(), FunctionArgumentType::REQUIRED}, @@ -2768,12 +2780,11 @@ void SampleCatalog::LoadFunctions2() { /*context_id=*/-1}); catalog_->AddOwnedFunction(function); - // A FunctionSignatureArgumentConstraintsCallback that checks for NULL // arguments. auto sanity_check_nonnull_arg_constraints = [](const FunctionSignature& signature, - const std::vector& arguments) -> std::string { + absl::Span arguments) -> std::string { ABSL_CHECK(signature.IsConcrete()); ABSL_CHECK_EQ(signature.NumConcreteArguments(), arguments.size()); for (int i = 0; i < arguments.size(); ++i) { @@ -2796,25 +2807,24 @@ void SampleCatalog::LoadFunctions2() { // INT64 arguments to be nonnegative if they are literals. auto post_resolution_arg_constraints = [](const FunctionSignature& signature, - const std::vector& arguments, + absl::Span arguments, const LanguageOptions& language_options) -> absl::Status { - for (int i = 0; i < arguments.size(); ++i) { - ABSL_CHECK( - arguments[i].type()->Equals(signature.ConcreteArgumentType(i))); - if (!arguments[i].type()->IsInt64() || !arguments[i].is_literal()) { - continue; - } - if (arguments[i].literal_value()->int64_value() < 0) { - return MakeSqlError() - << "Argument " - << (signature.ConcreteArgument(i).has_argument_name() - ? signature.ConcreteArgument(i).argument_name() - : std::to_string(i+1)) - << " must not be negative"; - } - } - return absl::OkStatus(); - }; + for (int i = 0; i < arguments.size(); ++i) { + ABSL_CHECK(arguments[i].type()->Equals(signature.ConcreteArgumentType(i))); + if (!arguments[i].type()->IsInt64() || !arguments[i].is_literal()) { + continue; + } + if (arguments[i].literal_value()->int64_value() < 0) { + return MakeSqlError() + << "Argument " + << (signature.ConcreteArgument(i).has_argument_name() + ? signature.ConcreteArgument(i).argument_name() + : std::to_string(i + 1)) + << " must not be negative"; + } + } + return absl::OkStatus(); + }; // Add a function with an argument constraint that verifies the concrete // arguments in signature matches the input argument list, and rejects @@ -2859,10 +2869,9 @@ void SampleCatalog::LoadFunctions2() { catalog_->AddOwnedFunction(function); // Adds a templated function that generates its result type via the callback. - function = new Function( - "fn_result_type_from_arg", "sample_functions", mode, - FunctionOptions().set_compute_result_type_callback( - &ComputeResultTypeFromStringArgumentValue)); + function = new Function("fn_result_type_from_arg", "sample_functions", mode, + FunctionOptions().set_compute_result_type_callback( + &ComputeResultTypeFromStringArgumentValue)); function->AddSignature( {{types_->get_string()}, {{types_->get_string(), @@ -3532,11 +3541,11 @@ void SampleCatalog::LoadFunctionsWithDefaultArguments() { /*arguments=*/ { {types_->get_string(), - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::REQUIRED)}, + FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::REQUIRED)}, {types_->get_string(), - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::OPTIONAL)}, + FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::OPTIONAL)}, {types_->get_string(), FunctionArgumentTypeOptions() .set_cardinality(FunctionArgumentType::OPTIONAL) @@ -3678,15 +3687,14 @@ void SampleCatalog::LoadFunctionsWithDefaultArguments() { /*result_type=*/ARG_TYPE_RELATION, /*arguments=*/ { - {ARG_TYPE_RELATION, - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::REQUIRED)}, + {ARG_TYPE_RELATION, FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::REQUIRED)}, {types_->get_bool(), - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::REQUIRED)}, + FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::REQUIRED)}, {types_->get_string(), - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::OPTIONAL)}, + FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::OPTIONAL)}, {types_->get_float(), FunctionArgumentTypeOptions() .set_cardinality(FunctionArgumentType::OPTIONAL) @@ -3704,15 +3712,13 @@ void SampleCatalog::LoadFunctionsWithDefaultArguments() { /*result_type=*/ARG_TYPE_RELATION, /*arguments=*/ { - {ARG_TYPE_RELATION, - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::REQUIRED)}, + {ARG_TYPE_RELATION, FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::REQUIRED)}, {types_->get_bool(), - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::REQUIRED)}, - {ARG_TYPE_ANY_1, - FunctionArgumentTypeOptions() - .set_cardinality(FunctionArgumentType::OPTIONAL)}, + FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::REQUIRED)}, + {ARG_TYPE_ANY_1, FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::OPTIONAL)}, {ARG_TYPE_ANY_2, FunctionArgumentTypeOptions() .set_cardinality(FunctionArgumentType::OPTIONAL) @@ -4100,6 +4106,45 @@ void SampleCatalog::LoadTemplatedSQLUDFs() { /*argument_names=*/{"x"}, ParseResumeLocation::FromString("999999999999999"))); + catalog_->AddOwnedFunction(std::make_unique( + std::vector{"udf_any_and_string_args_return_string_arg"}, + FunctionSignature(FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + {FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + FunctionArgumentType(types::StringType(), + FunctionArgumentType::REQUIRED)}, + context_id++), + /*argument_names=*/std::vector{"a", "x"}, + ParseResumeLocation::FromString("x"))); + + catalog_->AddOwnedFunction(std::make_unique( + std::vector{"udf_any_and_double_args_return_any"}, + FunctionSignature(FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + {FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + FunctionArgumentType(types::DoubleType(), + FunctionArgumentType::REQUIRED)}, + context_id++), + /*argument_names=*/std::vector{"a", "x"}, + ParseResumeLocation::FromString("IF(x < 0, 'a', 'b')"))); + + const ArrayType* double_array_type = nullptr; + ZETASQL_CHECK_OK( + type_factory()->MakeArrayType(types::DoubleType(), &double_array_type)); + catalog_->AddOwnedFunction(std::make_unique( + std::vector{"udf_any_and_double_array_args_return_any"}, + FunctionSignature(FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + {FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + FunctionArgumentType(double_array_type, + FunctionArgumentType::REQUIRED)}, + context_id++), + /*argument_names=*/std::vector{"a", "x"}, + ParseResumeLocation::FromString("IF(x[SAFE_OFFSET(0)] < 0, 'a', 'b')"))); + // Add a SQL UDA with a valid templated SQL body that refers to an aggregate // argument only. FunctionArgumentType int64_aggregate_arg_type(types::Int64Type()); @@ -4158,6 +4203,51 @@ void SampleCatalog::LoadTemplatedSQLUDFs() { ParseResumeLocation::FromString("sum(x order by x)"), Function::AGGREGATE)); + FunctionArgumentTypeOptions required_non_agg_options = + FunctionArgumentTypeOptions(FunctionArgumentType::REQUIRED) + .set_is_not_aggregate(true); + catalog_->AddOwnedFunction(std::make_unique( + std::vector{"uda_any_and_string_args_return_string"}, + FunctionSignature( + FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + {FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + FunctionArgumentType(types::StringType(), required_non_agg_options)}, + context_id++), + /*argument_names=*/std::vector{"a", "x"}, + ParseResumeLocation::FromString("IF(LOGICAL_OR(a), x, x || '_suffix')"), + Function::AGGREGATE)); + + catalog_->AddOwnedFunction(std::make_unique( + std::vector{"uda_any_and_double_args_return_any"}, + FunctionSignature( + FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + {FunctionArgumentType(ARG_TYPE_ARBITRARY, required_non_agg_options), + FunctionArgumentType(types::DoubleType(), + FunctionArgumentType::REQUIRED)}, + context_id++), + /*argument_names=*/std::vector{"a", "x"}, + ParseResumeLocation::FromString("STRING_AGG(IF(x < 0, 'a', 'b'))"), + Function::AGGREGATE)); + + ZETASQL_CHECK_OK( + type_factory()->MakeArrayType(types::DoubleType(), &double_array_type)); + catalog_->AddOwnedFunction(std::make_unique( + std::vector{"uda_any_and_double_array_args_return_any"}, + FunctionSignature( + FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + {FunctionArgumentType(ARG_TYPE_ARBITRARY, + FunctionArgumentType::REQUIRED), + FunctionArgumentType(double_array_type, required_non_agg_options)}, + context_id++), + /*argument_names=*/std::vector{"a", "x"}, + ParseResumeLocation::FromString( + "IF(x[SAFE_OFFSET(0)] < 0, MAX(a), MIN(a))"), + Function::AGGREGATE)); + // This function template cannot be invoked because the UDA does not have // type information for the `GROUP_ROWS()` TVF. We added it here to reproduce // unhelpful error messages. @@ -4331,8 +4421,7 @@ static std::vector GetOutputColumnsForAllTypes( {kTypeUInt64, kColumnNameUInt64, types->get_uint64()}}; } -static std::vector GetTVFArgumentsForAllTypes( - TypeFactory* types) { +static std::vector GetTVFArgumentsForAllTypes(TypeFactory* types) { return {{kTypeBool, types->get_bool()}, {kTypeBytes, types->get_bytes()}, {kTypeDate, types->get_date()}, @@ -4847,8 +4936,7 @@ void SampleCatalog::LoadTableValuedFunctions2() { TVFRelation({TVFRelation::Column( kMyEnum, GetEnumType(zetasql_test__::TestEnum_descriptor()))}), - /*extra_relation_input_columns_allowed=*/true - )}, + /*extra_relation_input_columns_allowed=*/true)}, context_id++), output_schema_two_types)); @@ -5146,6 +5234,518 @@ void SampleCatalog::LoadTableValuedFunctions2() { } } // NOLINT(readability/fn_size) +// Tests handling of optional relation arguments. +// If the relation is present, it doubles the value of each row. +// If the input relation is absent, returns a single zero. +// +// It has one optional value table argument with a single int64_t column. +// The output schema is also an int64_t value table. +class TvfOptionalRelation : public FixedOutputSchemaTVF { + class TvfOptionalRelationIterator : public EvaluatorTableIterator { + public: + explicit TvfOptionalRelationIterator( + std::unique_ptr input) + : input_(std::move(input)) {} + + bool NextRow() override { + if (!input_) { + if (rows_returned_ > 0) { + return false; + } + value_ = values::Int64(0); + ++rows_returned_; + return true; + } + + if (!input_->NextRow()) { + return false; + } + + value_ = values::Int64(input_->GetValue(0).int64_value() * 2); + ++rows_returned_; + return true; + } + + int NumColumns() const override { return 1; } + std::string GetColumnName(int i) const override { + ABSL_DCHECK_EQ(i, 0); + return ""; + } + const Type* GetColumnType(int i) const override { + ABSL_DCHECK_EQ(i, 0); + return types::Int64Type(); + } + const Value& GetValue(int i) const override { + ABSL_DCHECK_EQ(i, 0); + return value_; + } + absl::Status Status() const override { + return input_ ? input_->Status() : absl::OkStatus(); + } + absl::Status Cancel() override { return input_->Cancel(); } + + private: + int64_t rows_returned_ = 0; + std::unique_ptr input_; + Value value_; + }; + + public: + explicit TvfOptionalRelation() + : FixedOutputSchemaTVF( + {R"(tvf_optional_relation)"}, + FunctionSignature( + FunctionArgumentType::RelationWithSchema( + TVFRelation::ValueTable(types::Int64Type()), + /*extra_relation_input_columns_allowed=*/false), + {FunctionArgumentType( + ARG_TYPE_RELATION, + FunctionArgumentTypeOptions( + TVFRelation::ValueTable(types::Int64Type()), + /*extra_relation_input_columns_allowed=*/true) + .set_cardinality(FunctionArgumentType::OPTIONAL))}, + /*context_id=*/-1), + TVFRelation::ValueTable(types::Int64Type())) {} + + absl::StatusOr> CreateEvaluator( + std::vector input_arguments, + const std::vector& output_columns, + const FunctionSignature* function_call_signature) const override { + ZETASQL_RET_CHECK_LE(input_arguments.size(), 1); + + std::unique_ptr input; + if (input_arguments.size() == 1) { + ZETASQL_RET_CHECK(input_arguments[0].relation); + input = std::move(input_arguments[0].relation); + ZETASQL_RET_CHECK_EQ(input->NumColumns(), 1); + ZETASQL_RET_CHECK_EQ(input->GetColumnType(0), types::Int64Type()); + } + + ZETASQL_RET_CHECK_EQ(output_columns.size(), 1); + return std::make_unique(std::move(input)); + } +}; + +// Tests handling of optional scalar and named arguments. +// +// Calculates and emits the value of y=xa+b, `steps` numbers of times +// incrementing x by `dx` each time. +class TvfOptionalArguments : public FixedOutputSchemaTVF { + class Evaluator : public EvaluatorTableIterator { + public: + Evaluator(double x, int64_t a, int64_t b, double dx, int64_t steps) + : x_(x), a_(a), b_(b), dx_(dx), steps_(steps) {} + + bool NextRow() override { + if (current_step_ >= steps_) { + return false; + } + + value_ = values::Double(x_ * a_ + b_); + + ++current_step_; + x_ += dx_; + return true; + } + int NumColumns() const override { return 1; } + std::string GetColumnName(int i) const override { + ABSL_DCHECK_EQ(i, 0); + return "y"; + } + const Type* GetColumnType(int i) const override { + ABSL_DCHECK_EQ(i, 0); + return types::DoubleType(); + } + const Value& GetValue(int i) const override { + ABSL_DCHECK_EQ(i, 0); + return value_; + } + absl::Status Status() const override { return absl::OkStatus(); } + absl::Status Cancel() override { return absl::OkStatus(); } + + private: + double x_; + int64_t a_; + int64_t b_; + double dx_; + int64_t steps_; + int64_t current_step_ = 0; + Value value_; + }; + + public: + explicit TvfOptionalArguments() + : FixedOutputSchemaTVF( + {R"(tvf_optional_arguments)"}, + FunctionSignature( + FunctionArgumentType::RelationWithSchema( + TVFRelation({{"value", types::Int64Type()}}), + /*extra_relation_input_columns_allowed=*/false), + { + // Starting x value. + FunctionArgumentType(types::DoubleType(), + FunctionArgumentType::OPTIONAL), + // A constant. + FunctionArgumentType( + types::Int64Type(), + FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::OPTIONAL)), + // B constant. + FunctionArgumentType( + types::Int64Type(), + FunctionArgumentTypeOptions().set_cardinality( + FunctionArgumentType::OPTIONAL)), + // X increment. + FunctionArgumentType( + types::DoubleType(), + FunctionArgumentTypeOptions() + .set_argument_name( + "dx", FunctionEnums::POSITIONAL_OR_NAMED) + .set_cardinality(FunctionArgumentType::OPTIONAL)), + // Number of steps. + FunctionArgumentType( + types::Int64Type(), + FunctionArgumentTypeOptions() + .set_argument_name( + "steps", FunctionEnums::POSITIONAL_OR_NAMED) + .set_cardinality(FunctionArgumentType::OPTIONAL)), + }, + /*context_id=*/-1), + TVFRelation(TVFRelation({{"y", types::DoubleType()}}))) {} + + absl::StatusOr> CreateEvaluator( + std::vector input_arguments, + const std::vector& output_columns, + const FunctionSignature* function_call_signature) const override { + ZETASQL_RET_CHECK_LE(input_arguments.size(), 5); + + double x = 1; + if (input_arguments.size() >= 1) { + ZETASQL_RET_CHECK(input_arguments[0].value); + ZETASQL_RET_CHECK(input_arguments[0].value->type()->IsDouble()); + if (!input_arguments[0].value->is_null()) { + x = input_arguments[0].value->double_value(); + } + } + + int64_t a = 2; + if (input_arguments.size() >= 2) { + ZETASQL_RET_CHECK(input_arguments[1].value); + ZETASQL_RET_CHECK(input_arguments[1].value->type()->IsInt64()); + if (!input_arguments[1].value->is_null()) { + a = input_arguments[1].value->int64_value(); + } + } + + int64_t b = 3; + if (input_arguments.size() >= 3) { + ZETASQL_RET_CHECK(input_arguments[2].value); + ZETASQL_RET_CHECK(input_arguments[2].value->type()->IsInt64()); + if (!input_arguments[2].value->is_null()) { + b = input_arguments[2].value->int64_value(); + } + } + + double dx = 1; + if (input_arguments.size() >= 4) { + ZETASQL_RET_CHECK(input_arguments[3].value); + ZETASQL_RET_CHECK(input_arguments[3].value->type()->IsDouble()); + if (!input_arguments[3].value->is_null()) { + dx = input_arguments[3].value->double_value(); + } + } + + int64_t steps = 1; + if (input_arguments.size() >= 5) { + ZETASQL_RET_CHECK(input_arguments[4].value); + ZETASQL_RET_CHECK(input_arguments[4].value->type()->IsInt64()); + if (!input_arguments[4].value->is_null()) { + steps = input_arguments[4].value->int64_value(); + } + } + + return std::make_unique(x, a, b, dx, + steps); + } +}; + +// Tests handling of repeated arguments. +// Takes pairs of string and int arguments and produces a table with a row for +// each pair. +class TvfRepeatedArguments : public FixedOutputSchemaTVF { + class TvfRepeatedArgumentsIterator : public EvaluatorTableIterator { + public: + explicit TvfRepeatedArgumentsIterator(std::vector args) + : args_(std::move(args)) {} + + bool NextRow() override { + if (current_ + 1 >= args_.size()) { + return false; + } + + if (!args_[current_].value || + !args_[current_].value->type()->IsString()) { + status_ = absl::InternalError("Bad key"); + return false; + } + + if (!args_[current_ + 1].value || + !args_[current_ + 1].value->type()->IsInt64()) { + status_ = absl::InternalError("Bad value"); + return false; + } + + key_ = *args_[current_].value; + value_ = *args_[current_ + 1].value; + current_ += 2; + return true; + } + + int NumColumns() const override { return 2; } + std::string GetColumnName(int i) const override { + ABSL_DCHECK_GE(i, 0); + ABSL_DCHECK_LT(i, NumColumns()); + return i == 0 ? "key" : "value"; + } + const Type* GetColumnType(int i) const override { + ABSL_DCHECK_GE(i, 0); + ABSL_DCHECK_LT(i, NumColumns()); + return i == 0 ? types::StringType() : types::Int64Type(); + } + const Value& GetValue(int i) const override { + ABSL_DCHECK_GE(i, 0); + ABSL_DCHECK_LT(i, NumColumns()); + return i == 0 ? key_ : value_; + } + absl::Status Status() const override { return status_; } + absl::Status Cancel() override { return absl::OkStatus(); } + + private: + std::vector args_; + int64_t current_ = 0; + Value key_; + Value value_; + absl::Status status_; + }; + + public: + explicit TvfRepeatedArguments() + : FixedOutputSchemaTVF( + {R"(tvf_repeated_arguments)"}, + FunctionSignature( + FunctionArgumentType::RelationWithSchema( + TVFRelation({{"key", types::StringType()}, + {"value", types::Int64Type()}}), + /*extra_relation_input_columns_allowed=*/false), + { + FunctionArgumentType(types::StringType(), + FunctionArgumentType::REPEATED), + FunctionArgumentType(types::Int64Type(), + FunctionArgumentType::REPEATED), + }, + /*context_id=*/-1), + TVFRelation({{"key", types::StringType()}, + {"value", types::Int64Type()}})) {} + + absl::StatusOr> CreateEvaluator( + std::vector input_arguments, + const std::vector& output_columns, + const FunctionSignature* function_call_signature) const override { + ZETASQL_RET_CHECK(input_arguments.size() % 2 == 0); + return std::make_unique( + std::move(input_arguments)); + } +}; + +// Tests forwarding input schema in a TVF. +// +// This function will pass through values from input columns to matching output +// columns. For INT64 columns it will additionally add the value of the integer +// argument. +// +// It has one relation argument and one integer argument. The output +// schema is set to be the same as the input schema of the relation argument. +class TvfIncrementBy : public ForwardInputSchemaToOutputSchemaTVF { + class TvfIncrementByIterator : public EvaluatorTableIterator { + public: + explicit TvfIncrementByIterator( + std::unique_ptr input, int64_t value_arg, + std::vector output_columns) + : input_(std::move(input)), + value_arg_(value_arg), + output_columns_(std::move(output_columns)), + values_(output_columns_.size()) {} + + bool NextRow() override { + if (!input_->NextRow()) { + status_ = input_->Status(); + return false; + } + + for (int o = 0; o < output_columns_.size(); ++o) { + std::string output_column_name = GetColumnName(o); + const Value* value = nullptr; + for (int i = 0; i < input_->NumColumns(); ++i) { + if (input_->GetColumnName(i) == output_column_name) { + value = &input_->GetValue(i); + break; + } + } + + if (value == nullptr) { + status_ = ::zetasql_base::InternalErrorBuilder() + << "Could not find input column for " << output_column_name; + return false; + } + + values_[o] = value->type()->IsInt64() + ? values::Int64(value->ToInt64() + value_arg_) + : *value; + } + + return true; + } + + int NumColumns() const override { + return static_cast(output_columns_.size()); + } + std::string GetColumnName(int i) const override { + ABSL_DCHECK_LT(i, output_columns_.size()); + return output_columns_[i].name; + } + const Type* GetColumnType(int i) const override { + ABSL_DCHECK_LT(i, output_columns_.size()); + return output_columns_[i].type; + } + const Value& GetValue(int i) const override { + ABSL_DCHECK_LT(i, values_.size()); + return values_[i]; + } + absl::Status Status() const override { return status_; } + absl::Status Cancel() override { return input_->Cancel(); } + + private: + std::unique_ptr input_; + int64_t value_arg_; + absl::Status status_; + std::vector output_columns_; + std::vector values_; + }; + + public: + explicit TvfIncrementBy() + : ForwardInputSchemaToOutputSchemaTVF( + {R"(tvf_increment_by)"}, + FunctionSignature( + ARG_TYPE_RELATION, + {FunctionArgumentType::AnyRelation(), + FunctionArgumentType( + types::Int64Type(), + FunctionArgumentTypeOptions() + .set_cardinality(FunctionArgumentType::OPTIONAL) + .set_default(values::Int64(1)))}, + /*context_id=*/-1)) {} + + absl::StatusOr> CreateEvaluator( + std::vector input_arguments, + const std::vector& output_columns, + const FunctionSignature* function_call_signature) const override { + ZETASQL_RET_CHECK_EQ(input_arguments.size(), 2); + ZETASQL_RET_CHECK(input_arguments[0].relation); + ZETASQL_RET_CHECK(input_arguments[1].value); + ZETASQL_RET_CHECK_EQ(input_arguments[1].value->type_kind(), TypeKind::TYPE_INT64); + return std::make_unique( + std::move(input_arguments[0].relation), + input_arguments[1].value->ToInt64(), output_columns); + } +}; + +// This function takes two integer values and provides both sum and difference. +// +// Has a fixed input and output schema. +class TvfSumAndDiff : public FixedOutputSchemaTVF { + class TvfSumAndDiffIterator : public EvaluatorTableIterator { + public: + explicit TvfSumAndDiffIterator( + std::unique_ptr input) + : input_(std::move(input)) { + output_columns_["sum"] = values::Int64(0); + output_columns_["diff"] = values::Int64(0); + } + + bool NextRow() override { + if (!input_->NextRow()) { + return false; + } + int64_t a = input_->GetValue(0).int64_value(); + int64_t b = input_->GetValue(1).int64_value(); + output_columns_["sum"] = values::Int64(a + b); + output_columns_["diff"] = values::Int64(a - b); + return true; + } + + int NumColumns() const override { + return static_cast(output_columns_.size()); + } + std::string GetColumnName(int i) const override { + ABSL_DCHECK_LT(i, output_columns_.size()); + auto iter = output_columns_.cbegin(); + std::advance(iter, i); + return iter->first; + } + const Type* GetColumnType(int i) const override { + return GetValue(i).type(); + } + const Value& GetValue(int i) const override { + ABSL_DCHECK_LT(i, output_columns_.size()); + auto iter = output_columns_.cbegin(); + std::advance(iter, i); + return iter->second; + } + absl::Status Status() const override { return input_->Status(); } + absl::Status Cancel() override { return input_->Cancel(); } + + private: + std::unique_ptr input_; + absl::btree_map output_columns_; + }; + + public: + TvfSumAndDiff() + : FixedOutputSchemaTVF( + {R"(tvf_sum_diff)"}, + FunctionSignature( + FunctionArgumentType::RelationWithSchema( + TVFRelation({{"sum", types::Int64Type()}, + {"diff", types::Int64Type()}}), + /*extra_relation_input_columns_allowed=*/false), + {FunctionArgumentType::RelationWithSchema( + TVFRelation( + {{"a", types::Int64Type()}, {"b", types::Int64Type()}}), + /*extra_relation_input_columns_allowed=*/false)}, + /*context_id=*/-1), + TVFRelation( + {{"sum", types::Int64Type()}, {"diff", types::Int64Type()}})) {} + + absl::StatusOr> CreateEvaluator( + std::vector input_arguments, + const std::vector& output_columns, + const FunctionSignature* function_call_signature) const override { + ZETASQL_RET_CHECK_EQ(input_arguments.size(), 1); + ZETASQL_RET_CHECK(input_arguments[0].relation); + return std::make_unique( + std::move(input_arguments[0].relation)); + } +}; + +void SampleCatalog::LoadTableValuedFunctionsWithEvaluators() { + catalog_->AddOwnedTableValuedFunction(new TvfOptionalRelation()); + catalog_->AddOwnedTableValuedFunction(new TvfOptionalArguments()); + catalog_->AddOwnedTableValuedFunction(new TvfRepeatedArguments()); + catalog_->AddOwnedTableValuedFunction(new TvfIncrementBy()); + catalog_->AddOwnedTableValuedFunction(new TvfSumAndDiff()); +} + void SampleCatalog::LoadFunctionsWithStructArgs() { const std::vector kOutputColumnsAllTypes = GetOutputColumnsForAllTypes(types_); @@ -5156,9 +5756,9 @@ void SampleCatalog::LoadFunctionsWithStructArgs() { ZETASQL_CHECK_OK(types_->MakeArrayType(types_->get_string(), &array_string_type)); const Type* struct_type1 = nullptr; - ZETASQL_CHECK_OK(types_->MakeStructType({{"field1", array_string_type}, - {"field2", array_string_type}}, - &struct_type1)); + ZETASQL_CHECK_OK(types_->MakeStructType( + {{"field1", array_string_type}, {"field2", array_string_type}}, + &struct_type1)); const Type* struct_type2 = nullptr; ZETASQL_CHECK_OK(types_->MakeStructType({{"field1", array_string_type}, {"field2", array_string_type}, @@ -6490,8 +7090,8 @@ void SampleCatalog::LoadTemplatedSQLTableValuedFunctions() { // b/259000660: Add a templated SQL TVF whose code has a braced proto // constructor. - auto templated_braced_ctor_tvf = std::make_unique( - std::vector{"templated_braced_ctor_tvf"}, + auto templated_proto_braced_ctor_tvf = std::make_unique( + std::vector{"templated_proto_braced_ctor_tvf"}, FunctionSignature(ARG_TYPE_RELATION, {FunctionArgumentType(ARG_TYPE_RELATION)}, context_id++), @@ -6499,7 +7099,19 @@ void SampleCatalog::LoadTemplatedSQLTableValuedFunctions() { ParseResumeLocation::FromString(R"sql( SELECT NEW zetasql_test__.TestExtraPB {int32_val1 : v} AS dice_roll FROM T)sql")); - catalog_->AddOwnedTableValuedFunction(std::move(templated_braced_ctor_tvf)); + catalog_->AddOwnedTableValuedFunction( + std::move(templated_proto_braced_ctor_tvf)); + auto templated_struct_braced_ctor_tvf = std::make_unique( + std::vector{"templated_struct_braced_ctor_tvf"}, + FunctionSignature(ARG_TYPE_RELATION, + {FunctionArgumentType(ARG_TYPE_RELATION)}, + context_id++), + /*arg_name_list=*/std::vector{"T"}, + ParseResumeLocation::FromString(R"sql( + SELECT STRUCT {int32_val1 : v} AS dice_roll + FROM T)sql")); + catalog_->AddOwnedTableValuedFunction( + std::move(templated_struct_braced_ctor_tvf)); } void SampleCatalog::LoadTableValuedFunctionsWithAnonymizationUid() { @@ -6615,28 +7227,28 @@ void SampleCatalog::LoadProcedures() { // Add a procedure that takes a specific enum as an argument. const EnumType* enum_TestEnum = GetEnumType(zetasql_test__::TestEnum_descriptor()); - procedure = new Procedure( - {"proc_on_TestEnum"}, - {types_->get_bool(), {enum_TestEnum}, /*context_id=*/-1}); + procedure = + new Procedure({"proc_on_TestEnum"}, + {types_->get_bool(), {enum_TestEnum}, /*context_id=*/-1}); catalog_->AddOwnedProcedure(procedure); // Add a procedure to illustrate how repeated/optional arguments are resolved. - procedure = new Procedure( - {"proc_on_req_opt_rep"}, - {types_->get_int64(), - {{types_->get_int64(), FunctionArgumentType::REQUIRED}, - {types_->get_int64(), FunctionArgumentType::REPEATED}, - {types_->get_int64(), FunctionArgumentType::REPEATED}, - {types_->get_int64(), FunctionArgumentType::REQUIRED}, - {types_->get_int64(), FunctionArgumentType::OPTIONAL}}, - /*context_id=*/-1}); + procedure = + new Procedure({"proc_on_req_opt_rep"}, + {types_->get_int64(), + {{types_->get_int64(), FunctionArgumentType::REQUIRED}, + {types_->get_int64(), FunctionArgumentType::REPEATED}, + {types_->get_int64(), FunctionArgumentType::REPEATED}, + {types_->get_int64(), FunctionArgumentType::REQUIRED}, + {types_->get_int64(), FunctionArgumentType::OPTIONAL}}, + /*context_id=*/-1}); catalog_->AddOwnedProcedure(procedure); // Add a procedure with templated arguments. - procedure = new Procedure( - {"proc_on_any_any"}, - {types_->get_int64(), - {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, /*context_id=*/-1}); + procedure = + new Procedure({"proc_on_any_any"}, {types_->get_int64(), + {ARG_TYPE_ANY_1, ARG_TYPE_ANY_1}, + /*context_id=*/-1}); catalog_->AddOwnedProcedure(procedure); // Add a procedure with templated arguments of arbitrary type. @@ -6648,18 +7260,16 @@ void SampleCatalog::LoadProcedures() { // Add a procedure with one repeated argument. procedure = new Procedure( - {"proc_on_rep"}, - {types_->get_int64(), - {{types_->get_int64(), FunctionArgumentType::REPEATED}}, - /*context_id=*/-1}); + {"proc_on_rep"}, {types_->get_int64(), + {{types_->get_int64(), FunctionArgumentType::REPEATED}}, + /*context_id=*/-1}); catalog_->AddOwnedProcedure(procedure); // Add a procedure with one optional argument. procedure = new Procedure( - {"proc_on_opt"}, - {types_->get_int64(), - {{types_->get_int64(), FunctionArgumentType::OPTIONAL}}, - /*context_id=*/-1}); + {"proc_on_opt"}, {types_->get_int64(), + {{types_->get_int64(), FunctionArgumentType::OPTIONAL}}, + /*context_id=*/-1}); catalog_->AddOwnedProcedure(procedure); // These sample procedures are named 'proc_on_' with one argument of @@ -6780,8 +7390,8 @@ void SampleCatalog::LoadConstants() { // Load a constant that conflicts with a zero-argument procedure. // The multi-argument case is handled in the nested catalog. std::unique_ptr constant; - ZETASQL_CHECK_OK(SimpleConstant::Create({"proc_no_args"}, Value::Bool(true), - &constant)); + ZETASQL_CHECK_OK( + SimpleConstant::Create({"proc_no_args"}, Value::Bool(true), &constant)); catalog_->AddOwnedConstant(constant.release()); // Load a constant that conflicts with a catalog. @@ -7360,6 +7970,15 @@ void SampleCatalog::LoadAggregateSqlFunctions( );)sql", language_options); + AddSqlDefinedFunctionFromCreate( + R"sql( + CREATE AGGREGATE FUNCTION ExprOutsideSumExpressionOfAggregateArgs( + agg_arg INT64 + ) AS ( + 1 + SUM(agg_arg + agg_arg) + );)sql", + language_options); + AddSqlDefinedFunctionFromCreate( R"sql( CREATE AGGREGATE FUNCTION ExprOutsideAndInsideSum( diff --git a/zetasql/testdata/sample_catalog.h b/zetasql/testdata/sample_catalog.h index dd152c4a6..10a4f273d 100644 --- a/zetasql/testdata/sample_catalog.h +++ b/zetasql/testdata/sample_catalog.h @@ -21,19 +21,21 @@ #include #include #include -#include #include -#include "google/protobuf/descriptor.h" -#include "google/protobuf/descriptor_database.h" #include "zetasql/public/analyzer_output.h" +#include "zetasql/public/builtin_function_options.h" #include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/language_options.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/type.h" +#include "zetasql/resolved_ast/resolved_ast.h" #include "absl/container/node_hash_map.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" +#include "google/protobuf/descriptor.h" +#include "google/protobuf/descriptor_database.h" namespace zetasql { @@ -58,9 +60,8 @@ class SampleCatalog { // and this SampleCatalog does not take ownership of it. If 'type_factory' // is not specified then a locally owned TypeFactory is created and // used instead. - explicit SampleCatalog( - const ZetaSQLBuiltinFunctionOptions& builtin_function_options, - TypeFactory* type_factory = nullptr); + explicit SampleCatalog(const BuiltinFunctionOptions& builtin_function_options, + TypeFactory* type_factory = nullptr); SampleCatalog(const SampleCatalog&) = delete; SampleCatalog& operator=(const SampleCatalog&) = delete; @@ -118,6 +119,7 @@ class SampleCatalog { // split it up in order to avoid lint warnings. void LoadTableValuedFunctions1(); void LoadTableValuedFunctions2(); + void LoadTableValuedFunctionsWithEvaluators(); void LoadTVFWithExtraColumns(); void LoadConnectionTableValuedFunctions(); void LoadDescriptorTableValuedFunctions(); @@ -209,6 +211,9 @@ class SampleCatalog { const ArrayType* proto_array_type_; const ArrayType* struct_array_type_; const ArrayType* json_array_type_; + const ArrayType* numeric_array_type_; + const ArrayType* bignumeric_array_type_; + const ArrayType* interval_array_type_; const EnumType* enum_TestEnum_; const EnumType* enum_AnotherTestEnum_; diff --git a/zetasql/testing/BUILD b/zetasql/testing/BUILD index 12b75432e..edbfc4bfc 100644 --- a/zetasql/testing/BUILD +++ b/zetasql/testing/BUILD @@ -22,9 +22,8 @@ cc_library( srcs = ["sql_types_test.cc"], hdrs = ["sql_types_test.h"], deps = [ - "//zetasql/base", + "//zetasql/base:check", "//zetasql/base:map_util", - "//zetasql/base:status", "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/public:civil_time", "//zetasql/public:coercer", @@ -35,11 +34,10 @@ cc_library( "//zetasql/public:type", "//zetasql/public:type_cc_proto", "//zetasql/public:value", + "//zetasql/public/types", "//zetasql/testdata:test_schema_cc_proto", "@com_google_absl//absl/container:flat_hash_map", - "@com_google_absl//absl/memory", "@com_google_absl//absl/strings:cord", - "@com_google_absl//absl/time", "@com_google_protobuf//:protobuf", ], ) @@ -52,11 +50,11 @@ cc_library( deps = [ "//zetasql/base:map_util", "//zetasql/base:status", - "//zetasql/base:strings", "//zetasql/public:catalog", "//zetasql/public:function", "//zetasql/public:simple_catalog", "//zetasql/public:type", + "//zetasql/public/types", "@com_google_absl//absl/container:node_hash_map", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", @@ -69,10 +67,11 @@ cc_test( srcs = ["test_catalog_test.cc"], deps = [ ":test_catalog", - "//zetasql/base:status", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", + "//zetasql/public:catalog", "//zetasql/public:function", + "//zetasql/public:simple_catalog", "//zetasql/public:type", "@com_google_absl//absl/status", ], @@ -87,7 +86,7 @@ cc_library( "using_test_value.cc", ], deps = [ - "//zetasql/base", + "//zetasql/base:check", "//zetasql/base:status", "//zetasql/common:float_margin", "//zetasql/common:internal_value", @@ -120,8 +119,12 @@ cc_library( "//zetasql/public:type", "//zetasql/public:type_cc_proto", "//zetasql/public:value", + "//zetasql/public/types", + "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/status", + "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", ], ) @@ -131,8 +134,7 @@ cc_library( srcs = ["type_util.cc"], hdrs = ["type_util.h"], deps = [ - "//zetasql/base", - "//zetasql/base:status", + "//zetasql/base:check", "//zetasql/public:type", "//zetasql/public:type_cc_proto", "//zetasql/public/types", diff --git a/zetasql/testing/sql_types_test.cc b/zetasql/testing/sql_types_test.cc index 3d8995c9b..adbf755a1 100644 --- a/zetasql/testing/sql_types_test.cc +++ b/zetasql/testing/sql_types_test.cc @@ -19,17 +19,23 @@ #include #include -#include "zetasql/base/logging.h" -#include "google/protobuf/descriptor.h" +#include "zetasql/public/coercer.h" +#include "zetasql/public/input_argument_type.h" #include "zetasql/public/language_options.h" #include "zetasql/public/options.pb.h" #include "zetasql/public/type.pb.h" +#include "zetasql/public/types/array_type.h" +#include "zetasql/public/types/enum_type.h" +#include "zetasql/public/types/proto_type.h" +#include "zetasql/public/types/struct_type.h" +#include "zetasql/public/types/type.h" +#include "zetasql/public/types/type_factory.h" +#include "zetasql/public/value.h" #include "zetasql/testdata/test_schema.pb.h" -#include "absl/memory/memory.h" +#include "zetasql/base/check.h" #include "absl/strings/cord.h" -#include "absl/time/time.h" +#include "google/protobuf/descriptor.h" #include "zetasql/base/map_util.h" -#include "zetasql/base/status.h" namespace zetasql { diff --git a/zetasql/testing/test_catalog.cc b/zetasql/testing/test_catalog.cc index 2658258ba..2fff2ecf0 100644 --- a/zetasql/testing/test_catalog.cc +++ b/zetasql/testing/test_catalog.cc @@ -19,8 +19,13 @@ #include #include -#include "zetasql/base/case.h" +#include "zetasql/public/catalog.h" +#include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" +#include "zetasql/public/simple_catalog.h" +#include "zetasql/public/types/type.h" #include "absl/status/status.h" +#include "absl/strings/ascii.h" #include "absl/strings/string_view.h" #include "zetasql/base/map_util.h" #include "zetasql/base/status_macros.h" diff --git a/zetasql/testing/test_catalog.h b/zetasql/testing/test_catalog.h index 57452c49e..7c87e6621 100644 --- a/zetasql/testing/test_catalog.h +++ b/zetasql/testing/test_catalog.h @@ -28,6 +28,7 @@ #include "zetasql/public/catalog.h" #include "zetasql/public/function.h" +#include "zetasql/public/function_signature.h" #include "zetasql/public/simple_catalog.h" #include "zetasql/public/type.h" #include "absl/container/node_hash_map.h" diff --git a/zetasql/testing/test_catalog_test.cc b/zetasql/testing/test_catalog_test.cc index ec1a7f5bd..f699836aa 100644 --- a/zetasql/testing/test_catalog_test.cc +++ b/zetasql/testing/test_catalog_test.cc @@ -17,12 +17,13 @@ #include "zetasql/testing/test_catalog.h" #include "zetasql/base/testing/status_matchers.h" +#include "zetasql/public/catalog.h" #include "zetasql/public/function.h" +#include "zetasql/public/simple_catalog.h" #include "zetasql/public/type.h" #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/status/status.h" -#include "zetasql/base/status.h" namespace zetasql { diff --git a/zetasql/testing/test_function.cc b/zetasql/testing/test_function.cc index 7a3afa704..4b9857178 100644 --- a/zetasql/testing/test_function.cc +++ b/zetasql/testing/test_function.cc @@ -24,15 +24,18 @@ #include #include "zetasql/base/logging.h" +#include "zetasql/common/float_margin.h" +#include "zetasql/public/options.pb.h" #include "zetasql/public/type.pb.h" +#include "zetasql/public/types/type.h" #include "zetasql/public/value.h" #include "zetasql/testing/test_value.h" #include "zetasql/base/check.h" #include "absl/status/status.h" #include "absl/strings/str_join.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/map_util.h" -#include "zetasql/base/status.h" namespace zetasql { @@ -42,7 +45,7 @@ QueryParamsWithResult::QueryParamsWithResult( : params_(ValueConstructor::ToValues(arguments)), result_(result, status) {} QueryParamsWithResult::QueryParamsWithResult( - const std::vector& arguments, + absl::Span arguments, const ValueConstructor& result, FloatMargin float_margin_arg, absl::Status status) : params_(ValueConstructor::ToValues(arguments)), @@ -146,7 +149,7 @@ QueryParamsWithResult& QueryParamsWithResult::AddProhibitedFeatures( } std::vector InvertResults( - const std::vector& tests) { + absl::Span tests) { std::vector new_tests; new_tests.reserve(tests.size()); for (const QueryParamsWithResult& test : tests) { diff --git a/zetasql/testing/test_function.h b/zetasql/testing/test_function.h index ef9274d99..883c9bb3c 100644 --- a/zetasql/testing/test_function.h +++ b/zetasql/testing/test_function.h @@ -33,8 +33,11 @@ #include "zetasql/public/options.pb.h" #include "zetasql/public/type.h" #include "zetasql/public/value.h" +#include "absl/base/attributes.h" #include "absl/status/status.h" +#include "absl/status/statusor.h" #include "absl/strings/string_view.h" +#include "absl/types/span.h" #include "zetasql/base/status.h" namespace zetasql { @@ -78,7 +81,7 @@ class QueryParamsWithResult { const ValueConstructor& result, absl::Status status = absl::OkStatus()); - QueryParamsWithResult(const std::vector& arguments, + QueryParamsWithResult(absl::Span arguments, const ValueConstructor& result, FloatMargin float_margin_arg, absl::Status status = absl::OkStatus()); @@ -175,7 +178,7 @@ class QueryParamsWithResult { // Return a vector of test cases with boolean results inverted, as in // CopyWithInvertedResult above. std::vector InvertResults( - const std::vector& tests); + absl::Span tests); struct FunctionTestCall { std::string function_name; diff --git a/zetasql/testing/test_value.cc b/zetasql/testing/test_value.cc index 49f443d50..300bfe9df 100644 --- a/zetasql/testing/test_value.cc +++ b/zetasql/testing/test_value.cc @@ -20,12 +20,15 @@ #include #include -#include "zetasql/base/logging.h" #include "zetasql/common/float_margin.h" #include "zetasql/public/type.h" +#include "zetasql/public/types/type.h" #include "zetasql/public/types/type_factory.h" +#include "zetasql/public/types/value_equality_check_options.h" +#include "zetasql/base/check.h" #include "absl/strings/string_view.h" -#include "zetasql/base/status_macros.h" +#include "absl/types/span.h" +#include "google/protobuf/descriptor.h" namespace zetasql { @@ -90,6 +93,27 @@ Value Range(ValueConstructor start, ValueConstructor end) { return *range_value; } +Value Map( + absl::Span> elements, + TypeFactory* type_factory) { + ABSL_CHECK(!elements.empty()); + std::vector> elements_list; + elements_list.reserve(elements.size()); + + const Type* key_type = elements[0].first.get().type(); + const Type* value_type = elements[0].second.get().type(); + for (const auto& [key, value] : elements) { + elements_list.push_back(std::make_pair(key.get(), value.get())); + } + + auto map_type = MakeMapType(key_type, value_type, type_factory); + ZETASQL_CHECK_OK(map_type.status()); + + auto map = Value::MakeMap(*map_type, std::move(elements_list)); + ZETASQL_CHECK_OK(map.status()); + return *map; +} + const ArrayType* MakeArrayType(const Type* element_type, TypeFactory* type_factory) { const ArrayType* array_type; @@ -134,6 +158,13 @@ const RangeType* MakeRangeType(const Type* element_type, return range_type; } +absl::StatusOr MakeMapType(const Type* key_type, + const Type* value_type, + TypeFactory* type_factory) { + type_factory = type_factory != nullptr ? type_factory : static_type_factory(); + return type_factory->MakeMapType(key_type, value_type); +} + bool AlmostEqualsValue(const Value& x, const Value& y, std::string* reason) { return InternalValue::Equals( x, y, diff --git a/zetasql/testing/test_value.h b/zetasql/testing/test_value.h index 07630afa5..f7a0f080d 100644 --- a/zetasql/testing/test_value.h +++ b/zetasql/testing/test_value.h @@ -30,11 +30,15 @@ #include "zetasql/compliance/test_driver.h" #include "zetasql/public/numeric_value.h" #include "zetasql/public/type.h" +#include "zetasql/public/types/value_equality_check_options.h" #include "zetasql/public/value.h" #include "gmock/gmock.h" +#include "zetasql/base/check.h" #include "absl/status/statusor.h" #include "absl/strings/match.h" +#include "absl/strings/string_view.h" #include "absl/types/span.h" +#include "google/protobuf/descriptor.h" #include "zetasql/base/status.h" namespace zetasql { @@ -263,6 +267,11 @@ Value StructArray(absl::Span names, // Creates a range with values 'start' and 'end'. Value Range(ValueConstructor start, ValueConstructor end); +// Creates a map with key/value pairs from 'elements'. +Value Map( + absl::Span> elements, + TypeFactory* type_factory = nullptr); + // If type_factory is not provided the function will use the default static type // factory (see: static_type_factory()) const ArrayType* MakeArrayType(const Type* element_type, @@ -289,6 +298,12 @@ const EnumType* MakeEnumType(const google::protobuf::EnumDescriptor* descriptor, const RangeType* MakeRangeType(const Type* element_type, TypeFactory* type_factory = nullptr); +// If type_factory is not provided the function will use the default static type +// factory (see: static_type_factory()) +absl::StatusOr MakeMapType(const Type* key_type, + const Type* value_type, + TypeFactory* type_factory = nullptr); + // Matches x against y respecting array orderedness and using the default // floating point error margin. If the reason parameter is nullptr then no // mismatch diagnostic string will be populated. diff --git a/zetasql/testing/type_util.cc b/zetasql/testing/type_util.cc index cbae19b5f..cc09e71f2 100644 --- a/zetasql/testing/type_util.cc +++ b/zetasql/testing/type_util.cc @@ -19,20 +19,19 @@ #include #include -#include "zetasql/base/logging.h" #include "google/protobuf/timestamp.pb.h" #include "google/protobuf/wrappers.pb.h" #include "google/type/date.pb.h" #include "google/type/latlng.pb.h" #include "google/type/timeofday.pb.h" #include "google/protobuf/descriptor.pb.h" -#include "google/protobuf/descriptor.h" #include "zetasql/public/type.h" #include "zetasql/public/type.pb.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/testdata/test_proto3.pb.h" #include "zetasql/testdata/test_schema.pb.h" -#include "zetasql/base/status.h" +#include "zetasql/base/check.h" +#include "google/protobuf/descriptor.h" namespace zetasql { namespace testing { @@ -96,10 +95,11 @@ std::vector ZetaSqlComplexTestTypes( } std::vector ZetaSqlTestProtoFilepaths() { - // `rounding_mode` and `array_find_mode` fixes `Enum not found` + // `rounding_mode`, `array_find_mode`, `array_zip_mode` fixes `Enum not found` // error in RQG / RSG: b/293474126. return {"zetasql/public/functions/rounding_mode.proto", "zetasql/public/functions/array_find_mode.proto", + "zetasql/public/functions/array_zip_mode.proto", "zetasql/testdata/test_schema.proto", "zetasql/testdata/test_proto3.proto", "google/protobuf/timestamp.proto", @@ -154,11 +154,12 @@ std::vector ZetaSqlRandomTestProtoNames() { } std::vector ZetaSqlTestEnumNames() { - // `RoundingMode` and `ArrayFindMode` fixes `Enum not found` + // `RoundingMode`, `ArrayFindMode`, `ArrayZipMode` fixes `Enum not found` // error in RQG / RSG: b/293474126. return {"zetasql_test__.TestEnum", "zetasql_test__.AnotherTestEnum", "zetasql.functions.RoundingMode", - "zetasql.functions.ArrayFindEnums.ArrayFindMode"}; + "zetasql.functions.ArrayFindEnums.ArrayFindMode", + "zetasql.functions.ArrayZipEnums.ArrayZipMode"}; } } // namespace testing diff --git a/zetasql/testing/using_test_value.cc b/zetasql/testing/using_test_value.cc index 11bee487c..fabcb807e 100644 --- a/zetasql/testing/using_test_value.cc +++ b/zetasql/testing/using_test_value.cc @@ -29,8 +29,10 @@ using zetasql::test_values::Array; using zetasql::test_values::kIgnoresOrder; using zetasql::test_values::kPreservesOrder; using zetasql::test_values::MakeArrayType; +using zetasql::test_values::MakeMapType; using zetasql::test_values::MakeRangeType; using zetasql::test_values::MakeStructType; +using zetasql::test_values::Map; using zetasql::test_values::OrderPreservationKind; using zetasql::test_values::Range; using zetasql::test_values::Struct; diff --git a/zetasql/tools/execute_query/BUILD b/zetasql/tools/execute_query/BUILD index 8c3f588cb..4350d0805 100644 --- a/zetasql/tools/execute_query/BUILD +++ b/zetasql/tools/execute_query/BUILD @@ -60,6 +60,7 @@ cc_test( "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/public:analyzer_options", "//zetasql/public:catalog", + "//zetasql/public:options_cc_proto", "//zetasql/public/types", "//zetasql/resolved_ast", "//zetasql/testdata:test_schema_cc_proto", @@ -132,8 +133,10 @@ cc_library( hdrs = ["execute_query_prompt.h"], deps = [ ":execute_query_cc_proto", + ":execute_query_tool", "//zetasql/base:status", "//zetasql/common:status_payload_utils", + "//zetasql/public:language_options", "//zetasql/public:parse_helpers", "//zetasql/public:parse_resume_location", "@com_google_absl//absl/functional:bind_front", @@ -154,10 +157,12 @@ cc_test( ":execute_query_cc_proto", ":execute_query_prompt", ":execute_query_prompt_testutils", + ":execute_query_tool", "//zetasql/base/testing:status_matchers", "//zetasql/base/testing:zetasql_gtest_main", "//zetasql/common/testing:proto_matchers", "//zetasql/common/testing:status_payload_matchers", + "//zetasql/public:options_cc_proto", "@com_google_absl//absl/status", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:str_format", @@ -341,11 +346,16 @@ cc_library( ":simple_proto_evaluator_table_iterator", ":string_error_collector", "//zetasql/base", + "//zetasql/base:check", "//zetasql/base:file_util", "//zetasql/base:map_util", "//zetasql/base:ret_check", "//zetasql/base:status", "//zetasql/common:options_utils", + "//zetasql/parser:parse_tree", + "//zetasql/parser:parse_tree_serializer", + "//zetasql/parser/macros:macro_expander", + "//zetasql/parser/macros:standalone_macro_expansion", "//zetasql/public:analyzer", "//zetasql/public:analyzer_options", "//zetasql/public:analyzer_output", @@ -353,6 +363,7 @@ cc_library( "//zetasql/public:evaluator", "//zetasql/public:evaluator_table_iterator", "//zetasql/public:simple_catalog", + "//zetasql/public:simple_catalog_util", "//zetasql/public:type", "//zetasql/public:type_cc_proto", "//zetasql/public:value", @@ -367,6 +378,7 @@ cc_library( "@com_google_absl//absl/status:statusor", "@com_google_absl//absl/strings", "@com_google_absl//absl/strings:cord", + "@com_google_absl//absl/strings:str_format", "@com_google_absl//absl/types:optional", "@com_google_absl//absl/types:span", "@com_google_protobuf//:protobuf", diff --git a/zetasql/tools/execute_query/execute_query.cc b/zetasql/tools/execute_query/execute_query.cc index 3d45c9fb1..9177ad0fb 100644 --- a/zetasql/tools/execute_query/execute_query.cc +++ b/zetasql/tools/execute_query/execute_query.cc @@ -104,7 +104,7 @@ absl::Status RunTool(const std::vector& args) { const std::string sql = absl::StrJoin(args, " "); - ExecuteQuerySingleInput prompt{sql}; + ExecuteQuerySingleInput prompt(sql, config); return ExecuteQueryLoop(prompt, config, *writer, &ExecuteQueryLoopPrintErrorHandler); @@ -124,13 +124,13 @@ int main(int argc, char* argv[]) { } if (absl::GetFlag(FLAGS_interactive) != args.empty()) { - ABSL_LOG(QFATAL) << kUsage; + ABSL_LOG(QFATAL) << "\n" << kUsage << "Pass --help for a full list of flags.\n"; } if (const absl::Status status = zetasql::RunTool(args); status.ok()) { return 0; } else { - std::cerr << status.message() << std::endl; + std::cerr << status.message() << '\n'; return 1; } } diff --git a/zetasql/tools/execute_query/execute_query_loop.cc b/zetasql/tools/execute_query/execute_query_loop.cc index 28f9d5cdf..818fc7d89 100644 --- a/zetasql/tools/execute_query/execute_query_loop.cc +++ b/zetasql/tools/execute_query/execute_query_loop.cc @@ -55,10 +55,10 @@ absl::Status ExecuteQueryLoopPrintErrorHandler(absl::Status status) { std::cerr << zetasql::FormatErrorLocation( location, context.text(), ERROR_MESSAGE_MULTI_LINE_WITH_CARET) - << std::endl; + << '\n'; } else { // We can produce a nice error message at least... - std::cerr << FormatErrorLocation(location) << std::endl; + std::cerr << FormatErrorLocation(location) << '\n'; } return status; } diff --git a/zetasql/tools/execute_query/execute_query_loop_test.cc b/zetasql/tools/execute_query/execute_query_loop_test.cc index 71ed0fdac..f6c6d4ab0 100644 --- a/zetasql/tools/execute_query/execute_query_loop_test.cc +++ b/zetasql/tools/execute_query/execute_query_loop_test.cc @@ -61,8 +61,8 @@ class StaticResultPrompt : public ExecuteQueryPrompt { } // namespace TEST(ExecuteQueryLoopTest, SelectOne) { - ExecuteQuerySingleInput prompt{"SELECT 1"}; ExecuteQueryConfig config; + ExecuteQuerySingleInput prompt{"SELECT 1", config}; std::ostringstream output; ExecuteQueryStreamWriter writer{output}; diff --git a/zetasql/tools/execute_query/execute_query_prompt.cc b/zetasql/tools/execute_query/execute_query_prompt.cc index 33e768655..b7f490d00 100644 --- a/zetasql/tools/execute_query/execute_query_prompt.cc +++ b/zetasql/tools/execute_query/execute_query_prompt.cc @@ -24,9 +24,11 @@ #include #include "zetasql/common/status_payload_utils.h" +#include "zetasql/public/language_options.h" #include "zetasql/public/parse_resume_location.h" #include "zetasql/public/parse_tokens.h" #include "zetasql/tools/execute_query/execute_query.pb.h" +#include "zetasql/tools/execute_query/execute_query_tool.h" #include "absl/functional/bind_front.h" #include "absl/status/status.h" #include "absl/status/statusor.h" @@ -55,11 +57,12 @@ bool IsUnclosedTripleQuotedLiteralError(absl::Status status) { // Pluck next statement terminated by a semiclon from input string. nullopt is // returned if more input is necessary, e.g. because a statement is incomplete. absl::StatusOr> NextStatement( - absl::string_view input) { + absl::string_view input, const LanguageOptions& language_options) { ABSL_DCHECK(!input.empty()); ParseTokenOptions options; options.stop_at_end_of_statement = true; + options.language_options = language_options; ParseResumeLocation resume_loc{ParseResumeLocation::FromStringView(input)}; @@ -106,9 +109,10 @@ absl::Status ExecuteQueryCompletionRequest::Validate() const { } ExecuteQueryStatementPrompt::ExecuteQueryStatementPrompt( + const ExecuteQueryConfig& config, std::function>(bool)> read_next_func) - : read_next_func_{read_next_func} { + : config_(config), read_next_func_{read_next_func} { ABSL_CHECK(read_next_func_); } @@ -206,7 +210,7 @@ void ExecuteQueryStatementPrompt::ReadInput(bool continuation) { void ExecuteQueryStatementPrompt::ProcessBuffer() { while (!buf_.empty()) { absl::StatusOr> stmt = - NextStatement(buf_.Flatten()); + NextStatement(buf_.Flatten(), config_.analyzer_options().language()); if (!stmt.ok()) { absl::Status status = std::move(stmt).status(); @@ -258,9 +262,12 @@ void ExecuteQueryStatementPrompt::ProcessBuffer() { } } -ExecuteQuerySingleInput::ExecuteQuerySingleInput(absl::string_view query) - : ExecuteQueryStatementPrompt{absl::bind_front( - &ExecuteQuerySingleInput::ReadNext, this)}, +ExecuteQuerySingleInput::ExecuteQuerySingleInput( + absl::string_view query, const ExecuteQueryConfig& config) + : ExecuteQueryStatementPrompt{config, + absl::bind_front( + &ExecuteQuerySingleInput::ReadNext, + this)}, query_{query} {} std::optional ExecuteQuerySingleInput::ReadNext( diff --git a/zetasql/tools/execute_query/execute_query_prompt.h b/zetasql/tools/execute_query/execute_query_prompt.h index 5e329b1b0..0b742e850 100644 --- a/zetasql/tools/execute_query/execute_query_prompt.h +++ b/zetasql/tools/execute_query/execute_query_prompt.h @@ -24,6 +24,7 @@ #include #include +#include "zetasql/tools/execute_query/execute_query_tool.h" #include "gtest/gtest_prod.h" #include "absl/status/statusor.h" #include "absl/strings/cord.h" @@ -82,6 +83,7 @@ class ExecuteQueryStatementPrompt : public ExecuteQueryPrompt { // caller may log the error and proceed as if nothing happened, therefore // handling SQL syntax issues gracefully. explicit ExecuteQueryStatementPrompt( + const ExecuteQueryConfig& config, std::function< absl::StatusOr>(bool continuation)> read_next_func); @@ -115,6 +117,7 @@ class ExecuteQueryStatementPrompt : public ExecuteQueryPrompt { FRIEND_TEST(ExecuteQueryStatementPrompt, LargeInput); size_t max_length_ = kMaxLength; + const ExecuteQueryConfig& config_; const std::function>(bool)> read_next_func_; const std::function @@ -130,7 +133,8 @@ class ExecuteQueryStatementPrompt : public ExecuteQueryPrompt { class ExecuteQuerySingleInput : public ExecuteQueryStatementPrompt { public: - explicit ExecuteQuerySingleInput(absl::string_view query); + ExecuteQuerySingleInput(absl::string_view query, + const ExecuteQueryConfig& config); ExecuteQuerySingleInput(const ExecuteQuerySingleInput&) = delete; ExecuteQuerySingleInput& operator=(const ExecuteQuerySingleInput&) = delete; diff --git a/zetasql/tools/execute_query/execute_query_prompt_test.cc b/zetasql/tools/execute_query/execute_query_prompt_test.cc index 6a5898d54..6083cf0a1 100644 --- a/zetasql/tools/execute_query/execute_query_prompt_test.cc +++ b/zetasql/tools/execute_query/execute_query_prompt_test.cc @@ -25,8 +25,10 @@ #include "zetasql/common/testing/proto_matchers.h" #include "zetasql/base/testing/status_matchers.h" #include "zetasql/common/testing/status_payload_matchers.h" +#include "zetasql/public/options.pb.h" #include "zetasql/tools/execute_query/execute_query.pb.h" #include "zetasql/tools/execute_query/execute_query_prompt_testutils.h" +#include "zetasql/tools/execute_query/execute_query_tool.h" #include "gmock/gmock.h" #include "gtest/gtest.h" #include "absl/status/status.h" @@ -86,7 +88,8 @@ struct StmtPromptInput final { // given return values or parser errors. All inputs, return values and parser // errors must be consumed. void TestStmtPrompt(absl::Span inputs, - absl::Span> want) { + absl::Span> want, + const ExecuteQueryConfig* config = nullptr) { std::unique_ptr prompt; auto cur_input = inputs.cbegin(); @@ -101,7 +104,11 @@ void TestStmtPrompt(absl::Span inputs, return (cur_input++)->ret; }; - prompt = std::make_unique(readfunc); + ExecuteQueryConfig default_config; + if (config == nullptr) { + config = &default_config; + } + prompt = std::make_unique(*config, readfunc); for (const auto& matcher : want) { EXPECT_THAT(prompt->Read(), matcher); @@ -127,6 +134,29 @@ TEST(ExecuteQueryStatementPrompt, EmptyInput) { }); } +TEST(ExecuteQueryStatementPrompt, UsesOptionsFromConfig) { + TestStmtPrompt( + { + {.ret = "$m;"}, + }, + { + StatusIs(absl::StatusCode::kInvalidArgument, + HasSubstr("Unexpected macro")), + }); + + ExecuteQueryConfig config; + config.mutable_analyzer_options().mutable_language()->EnableLanguageFeature( + FEATURE_V_1_4_SQL_MACROS); + TestStmtPrompt( + { + {.ret = "$m;"}, + }, + { + IsOkAndHolds("$m;"), + }, + &config); +} + TEST(ExecuteQueryStatementPrompt, SingleLine) { TestStmtPrompt( { @@ -440,10 +470,12 @@ TEST(ExecuteQueryStatementPrompt, LargeInput) { const std::string large(32, 'A'); unsigned int count = 0; - ExecuteQueryStatementPrompt prompt{[&large, &count](bool continuation) { - EXPECT_EQ(continuation, ++count > 1); - return large; - }}; + ExecuteQueryConfig config; + ExecuteQueryStatementPrompt prompt{config, + [&large, &count](bool continuation) { + EXPECT_EQ(continuation, ++count > 1); + return large; + }}; EXPECT_EQ(prompt.max_length_, ExecuteQueryStatementPrompt::kMaxLength); @@ -461,13 +493,15 @@ TEST(ExecuteQueryStatementPrompt, LargeInput) { } TEST(ExecuteQuerySingleInputTest, ReadEmptyString) { - ExecuteQuerySingleInput prompt{""}; + ExecuteQueryConfig config; + ExecuteQuerySingleInput prompt{"", config}; EXPECT_THAT(prompt.Read(), IsOkAndHolds(std::nullopt)); } TEST(ExecuteQuerySingleInputTest, ReadMultiLine) { - ExecuteQuerySingleInput prompt{"test\nline; SELECT 100;"}; + ExecuteQueryConfig config; + ExecuteQuerySingleInput prompt{"test\nline; SELECT 100;", config}; EXPECT_THAT(prompt.Read(), IsOkAndHolds("test\nline;")); EXPECT_THAT(prompt.Read(), IsOkAndHolds("SELECT 100;")); @@ -475,7 +509,8 @@ TEST(ExecuteQuerySingleInputTest, ReadMultiLine) { } TEST(ExecuteQuerySingleInputTest, UnexpectedEnd) { - ExecuteQuerySingleInput prompt{"SELECT 99;\nSELECT"}; + ExecuteQueryConfig config; + ExecuteQuerySingleInput prompt{"SELECT 99;\nSELECT", config}; EXPECT_THAT(prompt.Read(), IsOkAndHolds("SELECT 99;")); EXPECT_THAT(prompt.Read(), IsOkAndHolds("SELECT")); diff --git a/zetasql/tools/execute_query/execute_query_tool.cc b/zetasql/tools/execute_query/execute_query_tool.cc index 6ccd7c5c9..c018c0168 100644 --- a/zetasql/tools/execute_query/execute_query_tool.cc +++ b/zetasql/tools/execute_query/execute_query_tool.cc @@ -16,6 +16,7 @@ #include "zetasql/tools/execute_query/execute_query_tool.h" +#include #include #include #include @@ -25,30 +26,41 @@ #include #include -#include "google/protobuf/descriptor.h" -#include "google/protobuf/descriptor_database.h" #include "zetasql/common/options_utils.h" +#include "zetasql/parser/macros/macro_expander.h" +#include "zetasql/parser/macros/standalone_macro_expansion.h" +#include "zetasql/parser/parse_tree.h" +#include "zetasql/parser/parser.h" #include "zetasql/public/analyzer.h" #include "zetasql/public/analyzer_output.h" #include "zetasql/public/catalog.h" #include "zetasql/public/evaluator.h" #include "zetasql/public/evaluator_table_iterator.h" #include "zetasql/public/simple_catalog.h" +#include "zetasql/public/simple_catalog_util.h" #include "zetasql/public/type.h" #include "zetasql/public/types/proto_type.h" #include "zetasql/resolved_ast/resolved_ast.h" +#include "zetasql/resolved_ast/resolved_node.h" #include "zetasql/resolved_ast/resolved_node_kind.pb.h" #include "zetasql/resolved_ast/sql_builder.h" #include "zetasql/tools/execute_query/execute_query_proto_writer.h" #include "zetasql/tools/execute_query/execute_query_writer.h" #include "absl/flags/flag.h" +#include "zetasql/base/check.h" #include "absl/memory/memory.h" #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/ascii.h" +#include "absl/strings/match.h" #include "absl/strings/str_cat.h" +#include "absl/strings/str_format.h" #include "absl/strings/str_split.h" #include "absl/strings/string_view.h" +#include "absl/strings/strip.h" +#include "google/protobuf/descriptor.h" +#include "google/protobuf/descriptor_database.h" +#include "google/protobuf/message.h" #include "zetasql/base/ret_check.h" #include "zetasql/base/status_macros.h" @@ -151,6 +163,7 @@ namespace zetasql { namespace { using ToolMode = ExecuteQueryConfig::ToolMode; using SqlMode = ExecuteQueryConfig::SqlMode; +using ExpansionOutput = parser::macros::ExpansionOutput; } // namespace absl::Status SetToolModeFromFlags(ExecuteQueryConfig& config) { @@ -443,6 +456,47 @@ void ExecuteQueryConfig::SetOwnedDescriptorDatabase( absl::Status ExecuteQuery(absl::string_view sql, ExecuteQueryConfig& config, ExecuteQueryWriter& writer) { + bool enable_macros = + config.sql_mode() == SqlMode::kQuery && + config.analyzer_options().language().LanguageFeatureEnabled( + FEATURE_V_1_4_SQL_MACROS); + std::string expanded_sql; // Defined outside of if() to stay in scope. + + if (enable_macros) { + ZETASQL_ASSIGN_OR_RETURN(ExpansionOutput expansion_output, + parser::macros::MacroExpander::ExpandMacros( + "", sql, config.macro_catalog(), + config.analyzer_options().language())); + expanded_sql = parser::macros::TokensToString( + expansion_output.expanded_tokens, /*force_single_whitespace=*/true); + for (const absl::Status& warning : expansion_output.warnings) { + ZETASQL_RETURN_IF_ERROR( + writer.unparsed(absl::StrCat("Warning: ", warning.message()))); + } + ZETASQL_RETURN_IF_ERROR(writer.unparsed("Expanded SQL:")); + ZETASQL_RETURN_IF_ERROR(writer.unparsed(expanded_sql)); + sql = expanded_sql; + + // TODO: Hack: we did not implement + // ResolvedDefineMacroStatement yet, so for now we short-circuit into + // updating the macro catalog. + if (config.tool_mode() == ToolMode::kExecute) { + std::unique_ptr parser_output; + absl::Status parse_stmt = ParseStatement( + sql, ParserOptions(config.analyzer_options().language()), + &parser_output); + if (parse_stmt.ok() && + parser_output->statement()->Is()) { + auto define_macro_statement = + parser_output->statement()->GetAsOrNull(); + std::string macro_name = define_macro_statement->name()->GetAsString(); + config.mutable_macro_catalog().insert_or_assign( + macro_name, define_macro_statement->body()->image()); + return writer.unparsed( + absl::StrFormat("Macro registered: %s", macro_name)); + } + } + } if (config.tool_mode() == ToolMode::kParse || config.tool_mode() == ToolMode::kUnparse) { std::unique_ptr parser_output; @@ -450,14 +504,12 @@ absl::Status ExecuteQuery(absl::string_view sql, ExecuteQueryConfig& config, const ASTNode* root = nullptr; if (config.sql_mode() == SqlMode::kQuery) { - parser_options.set_language_options( - &config.analyzer_options().language()); + parser_options.set_language_options(config.analyzer_options().language()); ZETASQL_RETURN_IF_ERROR(ParseStatement(sql, parser_options, &parser_output)); root = parser_output->statement(); } else if (config.sql_mode() == SqlMode::kExpression) { - parser_options.set_language_options( - &config.analyzer_options().language()); + parser_options.set_language_options(config.analyzer_options().language()); ZETASQL_RETURN_IF_ERROR(ParseExpression(sql, parser_options, &parser_output)); root = parser_output->expression(); } else { @@ -509,6 +561,16 @@ absl::Status ExecuteQuery(absl::string_view sql, ExecuteQueryConfig& config, return writer.unanalyze(builder.sql()); } + if (resolved_node->node_kind() == RESOLVED_CREATE_FUNCTION_STMT) { + std::unique_ptr function_artifacts; + ZETASQL_RET_CHECK_OK(AddFunctionFromCreateFunction( + sql, config.analyzer_options(), + /*allow_persistent_function=*/true, /*function_options=*/std::nullopt, + function_artifacts, config.mutable_catalog())); + config.AddFunctionArtifacts(std::move(function_artifacts)); + return writer.unparsed("Function registered."); + } + if (config.sql_mode() == SqlMode::kQuery) { ZETASQL_RET_CHECK_EQ(resolved_node->node_kind(), RESOLVED_QUERY_STMT); diff --git a/zetasql/tools/execute_query/execute_query_tool.h b/zetasql/tools/execute_query/execute_query_tool.h index 937c4d879..bd70fcf5b 100644 --- a/zetasql/tools/execute_query/execute_query_tool.h +++ b/zetasql/tools/execute_query/execute_query_tool.h @@ -23,9 +23,11 @@ #include #include #include +#include #include "google/protobuf/descriptor_database.h" #include "zetasql/common/options_utils.h" +#include "zetasql/parser/macros/macro_expander.h" #include "zetasql/public/analyzer_options.h" #include "zetasql/public/evaluator.h" #include "zetasql/public/simple_catalog.h" @@ -131,6 +133,18 @@ class ExecuteQueryConfig { return descriptor_pool_; } + const parser::macros::MacroCatalog& macro_catalog() const { + return macro_catalog_; + } + parser::macros::MacroCatalog& mutable_macro_catalog() { + return macro_catalog_; + } + + void AddFunctionArtifacts( + std::unique_ptr function_artifact) { + function_artifacts_.push_back(std::move(function_artifact)); + } + private: ExamineResolvedASTCallback examine_resolved_ast_callback_ = nullptr; ToolMode tool_mode_ = ToolMode::kExecute; @@ -142,6 +156,8 @@ class ExecuteQueryConfig { const google::protobuf::DescriptorPool* descriptor_pool_ = nullptr; std::unique_ptr owned_descriptor_pool_; std::unique_ptr descriptor_db_; + parser::macros::MacroCatalog macro_catalog_; + std::vector> function_artifacts_; }; absl::Status SetToolModeFromFlags(ExecuteQueryConfig& config); diff --git a/zetasql/tools/execute_query/execute_query_tool_test.cc b/zetasql/tools/execute_query/execute_query_tool_test.cc index b820267a4..7954949f5 100644 --- a/zetasql/tools/execute_query/execute_query_tool_test.cc +++ b/zetasql/tools/execute_query/execute_query_tool_test.cc @@ -29,6 +29,7 @@ #include "zetasql/base/testing/status_matchers.h" #include "zetasql/public/analyzer_options.h" #include "zetasql/public/catalog.h" +#include "zetasql/public/options.pb.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/resolved_ast/resolved_ast.h" #include "zetasql/testdata/test_schema.pb.h" @@ -593,6 +594,49 @@ TEST(ExecuteQuery, ExecuteQuery) { )"); } +TEST(ExecuteQuery, ExecuteQueryWithMacroExpansion) { + ExecuteQueryConfig config; + config.set_tool_mode(ToolMode::kExecute); + config.mutable_analyzer_options().mutable_language()->EnableLanguageFeature( + FEATURE_V_1_4_SQL_MACROS); + std::ostringstream output; + EXPECT_THAT(ExecuteQuery("define macro", config, output), + StatusIs(absl::StatusCode::kInvalidArgument, + HasSubstr("Syntax error: Unexpected end of statement"))); + EXPECT_EQ(output.str(), R"(Expanded SQL: +define macro +)"); + + output.str(""); + ZETASQL_EXPECT_OK(ExecuteQuery("define macro repeat $1, $1, $2, $2", config, output)); + EXPECT_EQ(output.str(), R"(Expanded SQL: +define macro repeat $1 , $1 , $2 , $2 +Macro registered: repeat +)"); + + output.str(""); + EXPECT_THAT( + ExecuteQuery("select $absent", config, output), + StatusIs(absl::StatusCode::kInvalidArgument, + HasSubstr("Syntax error: Unexpected \"$absent\" [at 1:8]"))); + EXPECT_EQ(output.str(), R"(Warning: Macro 'absent' not found. +Expanded SQL: +select $absent +)"); + + output.str(""); + ZETASQL_EXPECT_OK(ExecuteQuery("select $repeat(1, (2))", config, output)); + EXPECT_EQ(output.str(), R"(Expanded SQL: +select 1 , 1 , ( 2 ) , ( 2 ) ++---+---+---+---+ +| | | | | ++---+---+---+---+ +| 1 | 1 | 2 | 2 | ++---+---+---+---+ + +)"); +} + TEST(ExecuteQuery, ParseExpression) { ExecuteQueryConfig config; config.set_tool_mode(ToolMode::kParse); diff --git a/zetasql/tools/execute_query/execute_query_writer.cc b/zetasql/tools/execute_query/execute_query_writer.cc index edf827616..7c3b839cd 100644 --- a/zetasql/tools/execute_query/execute_query_writer.cc +++ b/zetasql/tools/execute_query/execute_query_writer.cc @@ -80,7 +80,7 @@ absl::Status PrintResults(std::unique_ptr iter, out << ToPrettyOutputStyle(result, /*is_value_table=*/false, column_names) - << std::endl; + << '\n'; return absl::OkStatus(); } @@ -89,13 +89,13 @@ ExecuteQueryStreamWriter::ExecuteQueryStreamWriter(std::ostream& out) : stream_{out} {} absl::Status ExecuteQueryStreamWriter::resolved(const ResolvedNode& ast) { - stream_ << ast.DebugString() << std::endl; + stream_ << ast.DebugString() << '\n'; return absl::OkStatus(); } absl::Status ExecuteQueryStreamWriter::explained(const ResolvedNode& ast, absl::string_view explain) { - stream_ << explain << std::endl; + stream_ << explain << '\n'; return absl::OkStatus(); } diff --git a/zetasql/tools/execute_query/execute_query_writer.h b/zetasql/tools/execute_query/execute_query_writer.h index 99f7ccdda..a91b98332 100644 --- a/zetasql/tools/execute_query/execute_query_writer.h +++ b/zetasql/tools/execute_query/execute_query_writer.h @@ -90,7 +90,7 @@ class ExecuteQueryStreamWriter : public ExecuteQueryWriter { protected: absl::Status WriteOperationString(absl::string_view operation_name, absl::string_view str) override { - stream_ << str << std::endl; + stream_ << str << '\n'; return absl::OkStatus(); } diff --git a/zetasql/tools/formatter/internal/BUILD b/zetasql/tools/formatter/internal/BUILD index c61ee7573..b80f4173a 100644 --- a/zetasql/tools/formatter/internal/BUILD +++ b/zetasql/tools/formatter/internal/BUILD @@ -31,6 +31,7 @@ cc_library( "//zetasql/base:flat_set", "//zetasql/base:status", "//zetasql/public:builtin_function", + "//zetasql/public:builtin_function_options", "//zetasql/public:formatter_options", "//zetasql/public:function_headers", "//zetasql/public:language_options", @@ -41,6 +42,7 @@ cc_library( "//zetasql/public:value", "//zetasql/public/types", "@com_google_absl//absl/algorithm:container", + "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/container:flat_hash_set", "@com_google_absl//absl/log", "@com_google_absl//absl/log:die_if_null", diff --git a/zetasql/tools/formatter/internal/chunk.cc b/zetasql/tools/formatter/internal/chunk.cc index 18284c32d..37ad5bc23 100644 --- a/zetasql/tools/formatter/internal/chunk.cc +++ b/zetasql/tools/formatter/internal/chunk.cc @@ -24,10 +24,12 @@ #include #include #include +#include #include #include #include "zetasql/public/builtin_function.h" +#include "zetasql/public/builtin_function_options.h" #include "zetasql/public/formatter_options.h" #include "zetasql/public/function.h" #include "zetasql/public/language_options.h" @@ -35,13 +37,16 @@ #include "zetasql/public/parse_location.h" #include "zetasql/public/parse_tokens.h" #include "zetasql/public/type.pb.h" +#include "zetasql/public/types/type.h" #include "zetasql/public/types/type_factory.h" #include "zetasql/public/value.h" #include "zetasql/tools/formatter/internal/fusible_tokens.h" #include "zetasql/tools/formatter/internal/token.h" #include "absl/algorithm/container.h" +#include "absl/container/flat_hash_map.h" #include "absl/container/flat_hash_set.h" #include "zetasql/base/check.h" +#include "absl/log/die_if_null.h" #include "absl/log/log.h" #include "absl/memory/memory.h" #include "absl/status/statusor.h" @@ -1059,15 +1064,18 @@ bool IsPartOfSameChunk(const Chunk& chunk, const std::vector& tokens, return true; } - // For module declaration statements, we basically ignore the line - // length limit, so everything between an "MODULE" and ";" is a single chunk. - if (chunk.IsModuleDeclaration() && previous != ";") { - return true; - } - - // For import statements, we basically ignore the line length limit, so - // everything between an "IMPORT" and ";" is a single chunk. - if (chunk.IsImport() && previous != ";") { + // For module declaration statements and imports, we basically ignore the line + // length limit, so everything between before the ";" is a single chunk. + if (chunk.IsModuleDeclaration() || chunk.IsImport()) { + if (previous == ";") { + return false; + } + // If a user has a missing semicolon, try to detect that the next token + // is a beginning of a new statement. + if (SpaceBetweenTokensInInput(*previous_token, *current_token) && + IsTopLevelClauseKeyword(*current_token)) { + return false; + } return true; } @@ -1204,7 +1212,7 @@ bool IsPartOfSameChunk(const Chunk& chunk, const std::vector& tokens, // AS is another multi-purpose keyword. It can be used to rename, change type // or define a function. if (previous == "AS") { - return current != "SELECT" && current != "WITH"; + return !IsTopLevelClauseKeyword(*current_token); } if (IsDateOrTimeLiteral(previous, *current_token)) { @@ -2499,6 +2507,31 @@ absl::StatusOr> ChunksFromTokens( return chunks; } +ChunkBlock::ChunkBlock(ChunkBlockFactory* block_factory) + : block_factory_(ABSL_DIE_IF_NULL(block_factory)), + parent_(nullptr), + chunk_(nullptr) {} + +ChunkBlock::ChunkBlock(ChunkBlockFactory* block_factory, ChunkBlock* parent, + int offset) + : block_factory_(ABSL_DIE_IF_NULL(block_factory)), + parent_(ABSL_DIE_IF_NULL(parent)), + chunk_(nullptr) { + if (offset == 0) { + offset = block_factory_->IndentationSpaces(); + } + level_ = parent->IsTopLevel() ? 0 : parent->Level() + offset; +} + +ChunkBlock::ChunkBlock(ChunkBlockFactory* block_factory, ChunkBlock* parent, + class Chunk* chunk) + : block_factory_(ABSL_DIE_IF_NULL(block_factory)), + parent_(ABSL_DIE_IF_NULL(parent)), + chunk_(ABSL_DIE_IF_NULL(chunk)) { + level_ = parent->IsTopLevel() ? 0 : parent->Level(); + chunk->SetChunkBlock(this); +} + class Chunk* ChunkBlock::FirstChunkUnder() const { if (IsLeaf()) { return Chunk(); @@ -2551,14 +2584,14 @@ void ChunkBlock::AddSameLevelCousinChunk(class Chunk* chunk) { } } -void ChunkBlock::AddIndentedChunk(class Chunk* chunk) { +void ChunkBlock::AddIndentedChunk(class Chunk* chunk, int offset) { if (IsTopLevel()) { // Top is a bit special, because it never contains chunks directly and // doesn't have a parent. The operation is the same, but we don't perform it // on the parent, but rather the node itself. - AddChildBlockWithGrandchildChunk(chunk); + AddChildBlockWithGrandchildChunk(chunk, offset); } else { - parent_->AddChildBlockWithGrandchildChunk(chunk); + parent_->AddChildBlockWithGrandchildChunk(chunk, offset); } } @@ -2589,8 +2622,9 @@ void ChunkBlock::AddChildChunk(class Chunk* chunk) { } } -void ChunkBlock::AddChildBlockWithGrandchildChunk(class Chunk* chunk) { - ChunkBlock* new_block = block_factory_->NewChunkBlock(this); +void ChunkBlock::AddChildBlockWithGrandchildChunk(class Chunk* chunk, + int offset) { + ChunkBlock* new_block = block_factory_->NewChunkBlock(this, offset); children_.push_back(new_block); new_block->AddChildChunk(chunk); } @@ -2631,16 +2665,22 @@ void ChunkBlock::AdoptChildBlock(ChunkBlock* block) { block->parent_ = this; // Fix the levels of all children. Avoid recursion for stack protection. - std::queue blocks; - blocks.push(block); + int level_offset = Level() - block->Level(); + if (!block->IsLeaf()) { + level_offset += block_factory_->IndentationSpaces(); + } + if (level_offset != 0) { + std::queue blocks; + blocks.push(block); - while (!blocks.empty()) { - ChunkBlock* b = blocks.front(); - blocks.pop(); + while (!blocks.empty()) { + ChunkBlock* b = blocks.front(); + blocks.pop(); - b->level_ = b->Parent()->Level() + 1; - for (auto it = b->children().begin(); it != b->children().end(); it++) { - blocks.push(*it); + b->level_ = b->level_ + level_offset; + for (auto it = b->children().begin(); it != b->children().end(); it++) { + blocks.push(*it); + } } } @@ -2650,7 +2690,7 @@ void ChunkBlock::AdoptChildBlock(ChunkBlock* block) { ChunkBlock* ChunkBlock::GroupAndIndentChildrenUnderNewBlock( const Children::reverse_iterator& start, const Children::reverse_iterator& end) { - ChunkBlock* new_block = block_factory_->NewChunkBlock(); + ChunkBlock* new_block = block_factory_->NewChunkBlock(this); // We have to use .base() here and go from end to start to preserve the order // of the nodes. @@ -2658,11 +2698,9 @@ ChunkBlock* ChunkBlock::GroupAndIndentChildrenUnderNewBlock( i != children_.end() && i != start.base(); ++i) { new_block->AdoptChildBlock(*i); } - children_.erase(end.base(), start.base()); - this->AdoptChildBlock(new_block); - + children_.push_back(new_block); return new_block; } @@ -2741,7 +2779,8 @@ std::ostream& operator<<(std::ostream& os, const ChunkBlock* const block) { return os << block->DebugString(); } -ChunkBlockFactory::ChunkBlockFactory() { +ChunkBlockFactory::ChunkBlockFactory(int indentation_spaces) + : indentation_spaces_(indentation_spaces) { // Constructor creates a root block to guarantee it is always available. NewChunkBlock(); } @@ -2752,9 +2791,9 @@ ChunkBlock* ChunkBlockFactory::NewChunkBlock() { return blocks_.back().get(); } -ChunkBlock* ChunkBlockFactory::NewChunkBlock(ChunkBlock* parent) { +ChunkBlock* ChunkBlockFactory::NewChunkBlock(ChunkBlock* parent, int offset) { // Using `new` to access a non-public constructor. - blocks_.emplace_back(absl::WrapUnique(new ChunkBlock(this, parent))); + blocks_.emplace_back(absl::WrapUnique(new ChunkBlock(this, parent, offset))); return blocks_.back().get(); } diff --git a/zetasql/tools/formatter/internal/chunk.h b/zetasql/tools/formatter/internal/chunk.h index 2002fe28e..09b30b6cb 100644 --- a/zetasql/tools/formatter/internal/chunk.h +++ b/zetasql/tools/formatter/internal/chunk.h @@ -27,7 +27,6 @@ #include "zetasql/public/formatter_options.h" #include "zetasql/public/parse_location.h" #include "zetasql/tools/formatter/internal/token.h" -#include "absl/log/die_if_null.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" @@ -110,7 +109,7 @@ class Chunk { // (see table in parse_token.h). absl::string_view NonCommentKeyword(int i) const; - // Syntatic sugar to fetch i'th keyword or token from a chunk. See above. + // Syntactic sugar to fetch i'th keyword or token from a chunk. See above. absl::string_view FirstKeyword() const; absl::string_view SecondKeyword() const; absl::string_view LastKeyword() const; @@ -298,7 +297,7 @@ class Chunk { // Returns true if the space is needed between two tokens. Can be used to // decide whether a space is needed for tokens within the same chunk, or // between two chunks. In the latter case, the function should be called on a - // previous chunk with pointing to it's last token and + // previous chunk with pointing to its last token and // pointing to the first token of the next chunk. // // If the layout element is not a newline then the number of spaces we need to @@ -354,13 +353,14 @@ class Chunk { // Generates and owns chunk block objects (see ChunkBlock class description). class ChunkBlockFactory { public: - ChunkBlockFactory(); + explicit ChunkBlockFactory(int indentation_spaces); // Factory functions to create new blocks. The returned block must be added // manually to any other block's children otherwise it will be lost in the - // output. + // output. If offset is provided, it overwrites the default indentation + // relative to the parent block. ChunkBlock* NewChunkBlock(); - ChunkBlock* NewChunkBlock(ChunkBlock* parent); + ChunkBlock* NewChunkBlock(ChunkBlock* parent, int offset = 0); ChunkBlock* NewChunkBlock(ChunkBlock* parent, Chunk* chunk); // Returns the top of the ChunkBlock tree created so far. @@ -371,7 +371,12 @@ class ChunkBlockFactory { // tree. Used in testing. void Reset(); + // Returns the default number of spaces to indent a block relative to its + // parent. + int IndentationSpaces() const { return indentation_spaces_; } + private: + const int indentation_spaces_; std::vector> blocks_; }; @@ -430,13 +435,10 @@ class ChunkBlock { // the line that starts with this block. bool BreakCloseToLineLength() const { return break_close_to_line_length_; } - // Returns the level of this block. For leaf blocks, this is equal to the - // indentation the Chunk would have, if all possible newlines were used. - // - // For practical purposes, this is not equal to the height of the tree formed - // by the chunk blocks, because the root starts at -1 and we subtract the - // leaf, since neither of the two add indentation. - int Level() const { return IsLeaf() ? level_ - 1 : level_; } + // Returns the indentation level of this block. For leaf blocks, this is equal + // to the number of indentation spaces the Chunk would have, if all possible + // newlines were used. + int Level() const { return level_; } // Adds a new leaf chunk block for that is at the same level as this // chunk block. Effectively, this means that the parent of this block will add @@ -478,7 +480,7 @@ class ChunkBlock { // +--new block // | // +--new leaf block for - void AddIndentedChunk(class Chunk* chunk); + void AddIndentedChunk(class Chunk* chunk, int offset = 0); // Adds a new leaf chunk block for that is at the same level as this // block and is immediately before this block. This is used to add comments @@ -565,10 +567,10 @@ class ChunkBlock { explicit Children(ChildrenT* children) : children_ptr_(children) {} // Returns true if the block doesn't contain any child blocks. - const bool empty() const { return children_ptr_->empty(); } + bool empty() const { return children_ptr_->empty(); } // Returns the number of child blocks this block contains. - const std::size_t size() const { return children_ptr_->size(); } + std::size_t size() const { return children_ptr_->size(); } iterator begin() { return children_ptr_->begin(); } iterator end() { return children_ptr_->end(); } @@ -622,34 +624,23 @@ class ChunkBlock { private: // Creates a root chunk block. Level is -1 because the root is before anything // else. - explicit ChunkBlock(ChunkBlockFactory* block_factory) - : block_factory_(ABSL_DIE_IF_NULL(block_factory)), - parent_(nullptr), - chunk_(nullptr) {} + explicit ChunkBlock(ChunkBlockFactory* block_factory); // Creates an intermediate (not top or leaf) chunk block with the given - // . - explicit ChunkBlock(ChunkBlockFactory* block_factory, ChunkBlock* parent) - : block_factory_(ABSL_DIE_IF_NULL(block_factory)), - parent_(ABSL_DIE_IF_NULL(parent)), - chunk_(nullptr) { - level_ = parent->Level() + 1; - } + // . Optional `offset` overrides the default indentation level + // relative to the parent block. + explicit ChunkBlock(ChunkBlockFactory* block_factory, ChunkBlock* parent, + int offset = 0); // Creates a leaf chunk block with the given and . + // Leaf blocks always have the same level as their parent. ChunkBlock(ChunkBlockFactory* block_factory, ChunkBlock* parent, - class Chunk* chunk) - : block_factory_(ABSL_DIE_IF_NULL(block_factory)), - parent_(ABSL_DIE_IF_NULL(parent)), - chunk_(ABSL_DIE_IF_NULL(chunk)) { - level_ = parent->Level() + 1; - chunk->SetChunkBlock(this); - } + class Chunk* chunk); // Adds a new chunk block that is a direct child of this chunk block, and then // adds a new grandchild block that is a leaf block for right under // the newly created block. - void AddChildBlockWithGrandchildChunk(class Chunk* chunk); + void AddChildBlockWithGrandchildChunk(class Chunk* chunk, int offset = 0); // Inserts a child leaf block for right before the given . // diff --git a/zetasql/tools/formatter/internal/chunk_grouping_strategy.cc b/zetasql/tools/formatter/internal/chunk_grouping_strategy.cc index c2132ee11..c7e8c31b2 100644 --- a/zetasql/tools/formatter/internal/chunk_grouping_strategy.cc +++ b/zetasql/tools/formatter/internal/chunk_grouping_strategy.cc @@ -17,7 +17,7 @@ #include "zetasql/tools/formatter/internal/chunk_grouping_strategy.h" #include -#include +#include #include #include "zetasql/tools/formatter/internal/chunk.h" @@ -247,7 +247,7 @@ void AddBlockForAsKeyword(ChunkBlock* const chunk_block, Chunk* as_chunk) { Token::Type::CAST_AS); return; } - // AS is within parantheses, stop searching for the special cases. + // AS is within parentheses, stop searching for the special cases. if (i != t->children().rbegin()) { expression_block = *(i - 1); } @@ -925,6 +925,8 @@ absl::Status ComputeChunkBlocksForChunks(ChunkBlockFactory* block_factory, AddBlockForClosingBracket(&previous_chunk, &chunk); } else if (chunk.IsCreateOrExportIndentedClauseChunk()) { AddBlockForCreateOrExportIndentedClause(&previous_chunk, &chunk); + } else if (chunk.IsTopLevelClauseChunk()) { + AddBlockForTopLevelClause(chunk_block, &chunk); } else if (chunk.IsJoin()) { AddBlockForJoinClause(chunk_block, &chunk); } else if (chunk.FirstKeyword() == "ON" || @@ -935,8 +937,6 @@ absl::Status ComputeChunkBlocksForChunks(ChunkBlockFactory* block_factory, // Should be before handling top level clause keywords, since some ddl // keywords may be top level in other contexts. AddBlockForDdlChunk(chunk_block, &chunk); - } else if (chunk.IsTopLevelClauseChunk()) { - AddBlockForTopLevelClause(chunk_block, &chunk); } else if (chunk.FirstKeyword() == ".") { if (previous_chunk.StartsWithChainableOperator() || previous_chunk.FirstKeyword() == "=") { diff --git a/zetasql/tools/formatter/internal/fusible_tokens.cc b/zetasql/tools/formatter/internal/fusible_tokens.cc index ad6150228..1beb2785a 100644 --- a/zetasql/tools/formatter/internal/fusible_tokens.cc +++ b/zetasql/tools/formatter/internal/fusible_tokens.cc @@ -17,9 +17,9 @@ #include "zetasql/tools/formatter/internal/fusible_tokens.h" #include -#include #include #include +#include #include #include "zetasql/tools/formatter/internal/token.h" diff --git a/zetasql/tools/formatter/internal/layout.cc b/zetasql/tools/formatter/internal/layout.cc index 304381a6f..0e8c9f39e 100644 --- a/zetasql/tools/formatter/internal/layout.cc +++ b/zetasql/tools/formatter/internal/layout.cc @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -257,12 +258,11 @@ std::string StmtLayout::PrintableString() const { chunks_[i])) { result.append(options_.NewLineType()); } - spaces = - (chunks_[i].ChunkBlock()->Level() * options_.IndentationSpaces()) % - // There is no sense to make indent > than the line length limit. - // Instead we start from the beginning as if an editor wrapped the - // line at the line length. - options_.LineLengthLimit(); + spaces = chunks_[i].ChunkBlock()->Level() % + // There is no sense to make indent > than the line length limit. + // Instead we start from the beginning as if an editor wrapped + // the line at the line length. + options_.LineLengthLimit(); result.append(std::string(spaces, ' ')); } else if (chunks_[i].StartsWithSpace()) { result.append(" "); @@ -334,8 +334,9 @@ bool StmtLayout::IsMultilineStatement() const { } bool StmtLayout::RequiresBlankLineBefore() const { - // Starts with a comment that had a blank line before. - if (ChunkAt(0).IsCommentOnly() && ChunkAt(0).GetLineBreaksBefore() > 1) { + // If the user already had a blank line before the statement, preserve it, + // except for IMPORT statements. + if (FirstKeyword() != "IMPORT" && ChunkAt(0).GetLineBreaksBefore() > 1) { return true; } // Already has a blank line before first non-comment line. @@ -346,8 +347,10 @@ bool StmtLayout::RequiresBlankLineBefore() const { if (FirstKeyword() == "SELECT") { return true; } - // Always require a blank line before a multiline statement. - if (IsMultilineStatement()) { + // Require a blank line before a multiline statement, except for SET + // statements. (This allows for compact configuration; note that SET bodies + // are indented by 2.) + if (IsMultilineStatement() && FirstKeyword() != "SET") { return true; } @@ -369,7 +372,7 @@ bool StmtLayout::ForbidsAddingBlankLineBefore() const { } bool StmtLayout::RequiresBlankLineAfter() const { - return IsMultilineStatement(); + return IsMultilineStatement() && FirstKeyword() != "SET"; } StmtLayout::Line StmtLayout::NewLine(int start, int end) const { @@ -409,8 +412,7 @@ absl::Status StmtLayout::ValidateLine(int* start, int* end) const { int StmtLayout::PrintableLineLength(const Line& line) const { int i = line.start; - int line_length = - chunks_[i].ChunkBlock()->Level() * options_.IndentationSpaces(); + int line_length = chunks_[i].ChunkBlock()->Level(); line_length += chunks_[i].PrintableLength(i + 1 == line.end); while (++i < line.end) { @@ -432,8 +434,7 @@ int StmtLayout::FirstChunkEndingAfterColumn(const Line& line, // formats 36MB (200k lines) of SQL in 26 seconds on a standard workstation, // so optimization is not needed yet. int i = line.start; - int line_length = - chunks_[i].ChunkBlock()->Level() * options_.IndentationSpaces(); + int line_length = chunks_[i].ChunkBlock()->Level(); line_length += chunks_[i].PrintableLength(i + 1 == line.end); while (line_length <= column && ++i < line.end) { @@ -606,9 +607,8 @@ void StmtLayout::BreakOnMandatoryLineBreaks() { if (i > line.start) { breakpoints.insert(i); } - if (i + 1 < line.end && chunk.SecondKeyword() == "(") { - // Only for top-level `AS` keywords, always add a line break after - // `AS (`. + if (i + 1 < line.end) { + // Add line break after top-level `AS` keywords as well. breakpoints.insert(i + 1); } } else if ((chunk.IsStartOfCreateStatement() || @@ -725,6 +725,9 @@ void StmtLayout::BreakOnMandatoryLineBreaks() { } } + if (!breakpoints.empty() && *breakpoints.begin() == line.start) { + breakpoints.erase(breakpoints.begin()); + } if (breakpoints.empty()) { // Line is not broken, add it to the result as is. new_lines.insert(line); @@ -1027,8 +1030,7 @@ void StmtLayout::PruneLineBreaks() { if (first_chunk.StartsWithSpace()) { ++prev_chunk_length; } - if (prev_chunk_length == options_.IndentationSpaces() * - (curr_line_level - prev_line_level)) { + if (prev_chunk_length == (curr_line_level - prev_line_level)) { try_merging_line = true; } } @@ -1374,8 +1376,7 @@ absl::btree_set StmtLayout::BreakpointsCloseToLineLength( const Line& line, const int level) const { absl::btree_set breakpoints; - int length = - ChunkAt(line.start).ChunkBlock()->Level() * Options().IndentationSpaces(); + int length = ChunkAt(line.start).ChunkBlock()->Level(); length += ChunkAt(line.start).PrintableLength(line.LengthInChunks() == 1); int last_before_length = -1; @@ -1411,8 +1412,7 @@ absl::btree_set StmtLayout::BreakpointsCloseToLineLength( if (length > Options().LineLengthLimit()) { if (last_before_length != -1) { breakpoints.insert(last_before_length); - length = l * Options().IndentationSpaces(); - length += ChunkAt(i).PrintableLength(i + 1 == line.end); + length = l + ChunkAt(i).PrintableLength(i + 1 == line.end); last_before_length = -1; } } @@ -1802,7 +1802,7 @@ bool StmtLayout::ShouldBreakCloseToLineLength(const Line& line, const int start_level = ChunkAt(line.start).ChunkBlock()->Level(); const int break_point_level = ChunkAt(break_point).ChunkBlock()->Level(); if (!(break_point_level == start_level || - break_point_level == start_level + 1)) { + break_point_level == start_level + options_.IndentationSpaces())) { return false; } diff --git a/zetasql/tools/formatter/internal/parsed_file.cc b/zetasql/tools/formatter/internal/parsed_file.cc index d30145ed2..de78cb385 100644 --- a/zetasql/tools/formatter/internal/parsed_file.cc +++ b/zetasql/tools/formatter/internal/parsed_file.cc @@ -92,7 +92,7 @@ absl::StatusOr> ParseTokenizedStmt( LanguageOptions language_options; language_options.EnableMaximumLanguageFeaturesForDevelopment(); ParserOptions parser_options; - parser_options.set_language_options(&language_options); + parser_options.set_language_options(language_options); ZETASQL_RETURN_IF_ERROR(ParseNextScriptStatement(&location, parser_options, &parser_output, &unused)); @@ -131,8 +131,11 @@ absl::Status UnparsedRegion::Accept(ParsedFileVisitor* visitor) const { } TokenizedStmt::TokenizedStmt(absl::string_view sql, int start_offset, - int end_offset, std::vector tokens) - : FilePart(sql, start_offset, end_offset), tokens_(std::move(tokens)) { + int end_offset, std::vector tokens, + const FormatterOptions& options) + : FilePart(sql, start_offset, end_offset), + tokens_(std::move(tokens)), + block_factory_(options.IndentationSpaces()) { tokens_view_ = TokensView::FromTokens(&tokens_); } @@ -146,7 +149,7 @@ absl::StatusOr> TokenizedStmt::ParseFrom( TokenizeNextStatement(parse_location, options.IsAllowedInvalidTokens())); auto parsed_sql = std::make_unique( parse_location->input(), start_offset, parse_location->byte_position(), - std::move(tokens)); + std::move(tokens), options); ZETASQL_RETURN_IF_ERROR( parsed_sql->BuildChunksAndBlocks(location_translator, options)); return parsed_sql; @@ -171,9 +174,7 @@ absl::Status TokenizedStmt::Accept(ParsedFileVisitor* visitor) const { return visitor->VisitTokenizedStmt(*this); } -bool TokenizedStmt::ShouldFormat() const { - return true; -} +bool TokenizedStmt::ShouldFormat() const { return true; } std::string ParsedStmt::DebugString() const { return parser_output_->statement()->DebugString(); diff --git a/zetasql/tools/formatter/internal/parsed_file.h b/zetasql/tools/formatter/internal/parsed_file.h index c2e3f8266..47818d67a 100644 --- a/zetasql/tools/formatter/internal/parsed_file.h +++ b/zetasql/tools/formatter/internal/parsed_file.h @@ -93,7 +93,7 @@ class TokenizedStmt : public FilePart { // Creates a non-initialized TokenizedStmt object. Use the factory function // above instead. TokenizedStmt(absl::string_view sql, int start_offset, int end_offset, - std::vector tokens); + std::vector tokens, const FormatterOptions& options); // Disable copy (and move) semantics. TokenizedStmt(const TokenizedStmt&) = delete; @@ -127,7 +127,7 @@ class TokenizedStmt : public FilePart { // Groups tokens into chunks and chunks into block tree. absl::Status BuildChunksAndBlocks( const ParseLocationTranslator& location_translator, - const FormatterOptions& otions); + const FormatterOptions& options); std::vector tokens_; TokensView tokens_view_; diff --git a/zetasql/tools/formatter/internal/token.cc b/zetasql/tools/formatter/internal/token.cc index 946a7c42d..045bf20aa 100644 --- a/zetasql/tools/formatter/internal/token.cc +++ b/zetasql/tools/formatter/internal/token.cc @@ -1168,6 +1168,13 @@ TokenGroupingState MaybeMoveParseTokenIntoTokens( } case GroupingType::kLegacySetStmt: { + // If the set statement operand is multiline (grouping state end position + // points at the end of line, hence will be in the middle of operand + // token), update grouping state end position to include the entire token. + if (TokenStartOffset(parse_token) < grouping_state.end_position && + TokenEndOffset(parse_token) > grouping_state.end_position) { + grouping_state.end_position = TokenEndOffset(parse_token); + } // Look for the end of set statement operand. if (parse_token.IsEndOfInput() || TokenStartOffset(parse_token) >= grouping_state.end_position) {